Use cases

This section describes the following Dell EMC Unity monitoring use cases:

  • Storage Capacity Monitoring and Optimization
  • Performance Optimization
  • Traffic Management

Storage Capacity Monitoring and Optimization

Making sure that a storage system has enough remaining disk space available is critical for several reasons:

  • SAN administrators want to make sure to be able to provision disk space for new servers when requested, as quickly as possible.
  • The storage system itself may need additional disk space for specific features to work properly, like automatic snapshots, mirroring, etc.
  • If thin provisioning is used, the remaining disk space becomes dramatically critical since the inability to allocate additional space to a LUN when requested by the subscriber host will lead to catastrophic data loss and corruption.

Reporting on Disk Space Consumption

The main SEN_UNITY_STORAGESYSTEM application class reports various metrics regarding the disk space in the storage system:

  • The size of the storage system (Information available in the InfoBox)
  • The total subscribed capacity, i.e. the total amount of disk space exposed to the servers with the SubscribedCapacity parameter
  • The total amount of free disk space with the AvailableCapacity parameter

For a more granular view of the disk space usage in the disk array, analyze the parameters of the SEN_UNITY_STORAGEPOOL application class:

  • The SubscribedCapacity parameter of the SEN_UNITY_STORAGEPOOL application class represents the amount of disk space that has been made available to the subscriber hosts, or in other words, the amount of disk space that is seen by the servers connected to the storage system.
  • The ConsumedCapacity parameter of the SEN_UNITY_STORAGEPOOL application class represents the actual space usage in the storage pool. For “thin” pools (when thin provisioning is enabled on the storage system), this value is normally lower than the SubscribedCapacity, as it is the main purpose of thin provisioning. For traditional pools, the ConsumedCapacity parameter has the same value as the SubscribedCapacity parameter, as the entirety of the subscribed disk space is fully allocated in the storage pool.
  • The ConsumedCapacityPercentage parameter is the most critical one, even for non-thin storage pools, as a storage pool usage nearing 100% means that SAN administrators will not be able to create new LUNs in the storage pool. By default, no alarm or warning threshold is set on this parameter as the fullness of a storage pool may be a normal situation.

To verify the available disk space in several storage pools:

  1. Create a PATROL Query in the PATROL Console to show the value of the ConsumedCapacityPercentage parameter of the SEN_UNITY_STORAGEPOOL application class; In the main menu bar, click Action > New Query…

    Graph – PATROL Query – General Tab

  2. Enter the Query name (example: Disk Space Consumption)

  3. Enter the Query description (optional)
  4. In the Query Results Filter section, select Show Selected Objects and check the Parameters box
  5. In the Additional Filtering section, select the Enable Application Class level filtering and the Enable Parameter level filtering options
  6. Open the Application Class tab
  7. In the Pattern Matching section, select Like and type SEN_UNITY_STORAGEPOOL

    Graph – PATROL Query – Application Class Tab

  8. Open the Parameter tab

  9. In the Pattern Matching section, select Like and type ConsumedCapacityPercentage

    Graph – PATROL Query – Parameter Tab

  10. Click OK to display a table with the amount of actually consumed capacity in all of the storage pools of your monitored Dell EMC Unity storage systems.

    Graph – PATROL Query – Results

Detecting Oversubscription Situations (Thin Provisioning)

We call an oversubscription situation when:

  • The storage pool is configured for thin provisioning (“thin storage pool”)
  • The storage pool is oversubscribed, i.e. the total disk space visible to the hosts (subscribers) is greater than its actual capacity (this situation is normal for a thin pool since it is its very purpose)
  • The storage pool actual consumed capacity is higher than 75%

Such situation is highly critical because the inability to allocate additional space to a LUN when requested by the subscriber host will lead to catastrophic data loss and corruption.

The OversubscriptionSituation parameter will alert you to an oversubscription situation by triggering an alarm. When such an alarm is issued by the KM, it is highly recommended that the SAN administrators add capacity to the storage pool as soon as possible.

Reclaiming Space of Unused LUNs

Identifying Unmapped (Orphans) LUNs

Over time, as servers connected to a SAN get decommissioned, administrators find an increasing number of unmapped LUNs, or volumes that are no longer used by any server. These LUNs, while unused, still occupy disk space in the storage system. Being able to identify such unmapped LUNs and reclaim the disk space uselessly consumed by these LUNs will help administrators avoid unnecessary upgrades and extensions of their storage systems.

To list the LUNs in a storage system that are not mapped to any server and therefore safe to remove, right-click the KM main icon or the storage system icon > KM Commands > Reporting > LUNs Mapping Table…

Whether a LUN is actually mapped or not is also shown in the InfoBox of each volume instance.

Graph – LUN Properties - InfoBox tab

Identifying Unused LUNs

When a server is decommissioned or reconfigured, its associated LUNs can stay mapped preventing storage administrator from accurately identifying unused LUNs. Since the KM monitors permanently the traffic on each LUN, it becomes easy to detect LUNs for which the activity is null.

  1. Create a PATROL Query in the PATROL Console to show the value of the TimeSinceLastActivity parameter of the SEN_UNITY_VOLUME application class; In the main menu bar, click Action > New Query…

    Graph – PATROL Query – General Tab

  2. Enter the Query name (example: Unused LUNs)

  3. Enter the Query description (optional)
  4. In the Query Results Filter section, select Show Selected Objects and check the Parameters box
  5. In the Additional Filtering, select the Application Class level filtering and the Enable Parameter level filtering options
  6. Open the Application Class tab

    Graph – PATROL Query – Application Class Tab

  7. In the Pattern Matching section, select Like and type SEN_UNITY_VOLUME

  8. Open the Parameter tab

    Graph – PATROL Query – Parameter Tab

  9. In the Pattern Matching section, select Like and type TimeSinceLastActivity

  10. Click OK to display a list of the monitored LUNs and their respective number of days since when the KM has not recorded any activity.

    Graph – PATROL Query – Results

The value collected for this parameter upon the first collect reflects the number of days since any activity occurred on the volume for the time observed by the KM, i.e. this first collected metric might not reflect the actual absence of activity on the volume.

Performance Optimization

Diagnosing a Bad Physical Disk Layout

A non-optimal physical disk layout can cause one single physical disk to become the bottleneck of a SAN. To verify that the I/Os are well-balanced across all physical disks, you can check the ReadByteRate and WriteByteRate parameters of each physical disk and make sure they have similar average values:

  1. Right-click the KM main icon > KM Commands > Reporting > Physical Disks Activity.
  2. Select the parameter for which you wish to generate a report for (read/write bytes traffic).
  3. Select the report range and interval.
  4. Click Show Report.

    Diagnosing a bad physical disk Layout

Detecting High Processor Utilization

Detecting a high processor utilization is important to prevent controller overloading that can lead to unpredictable performance degradations. To prevent such problems administrators need to identify the controller that has become a bottleneck by verifying the ProcessorUtilization Parameter:

  1. In the PATROL console, double-click the ProcessorUtilization parameter of the SEN_UNITY_CONTROLLER application class
  2. A graph is automatically displayed in the console’s graph pane

    Viewing a Controller’s Processor Utilization as a graph

A processor utilization over 80% means that this controller is overloaded and that the controller constitutes a bottleneck for the disk array.

Detecting Unbalanced Workload Distribution on Controllers

A storage controller manages the flow of information between the server and the data, assigning two paths, in case one of the paths fails or is overloaded. For the best levels of performance and availability, administrators must ensure that every layer of technology is balanced by comparing the Processor Utilization of the Controllers:

  1. In the PATROL console, double-click the ProcessorUtilization parameter of the first controller for which you need to compare the activity. A graph is automatically displayed in the graph pane.
  2. Select the ProcessorUtilization parameter of the second controller and drag it from the tree view of the Operator tab to the graph. The second parameter is automatically added to the first one to facilitate the comparison. Compare the values to evaluate the workload distribution.

For example, if the processor utilization on one controller goes above 80% while the other controller stays almost idle, it indicates that one of the controllers constitutes a bottleneck for the storage system that could be alleviated by better sharing the load between the controllers.

Administrators should pay close attention to which logical drive is handled by which controller, depending on the activity of this logical drive to be able to reallocate controller to drive I/O activity so that neither controller is overloaded.

Diagnosing Slow LUNs

If a system administrator complains that his servers are experiencing slow I/Os performance and that it is caused by the SAN, you may want to verify the actual response time of the LUNs the server is relying on.

The ResponseTime parameter of the SEN_UNITY_VOLUME application class represents the average time it took to complete the read and write operations on the LUN during the collection interval. Typically, the average response time is below 10 milliseconds. You may also want to compare this value to the response time of the other LUNs to see whether one server is really getting worse I/O performance than another.

Graph – Diagnosing Slow LUNs

Traffic Management

Identifying Busiest LUNs

To identify the LUNs that generate the most traffic on the disk array, you can use the ReadByteRate and WriteByteRate parameters of the SEN_UNITY_VOLUME class. Dell EMC Unity KM for PATROL offers you two methods to visually represent a LUN traffic:

Method 1: Creating a Multi-Parameter Graph

  1. In the PATROL console, double-click the ReadByteRate parameter of the LUN you are interested in. A graph is automatically displayed in the graph pane.
  2. Then drag and drop the WriteByteRate parameter in the graph window

Graph – Read Byte Rate on a LUN

Method 2: Using the Volume Activity… Command

  1. Right-click the Volume for which you want to create a daily or hourly report of the total amount of data in GB that was read off or written to the each LUN, and select Volume Activity…
  2. Define the report settings:

    Graph – Setting Report Parameters

    • Select the data for which you wish to generate a report for: read bytes traffic, write bytes traffic, transfer bytes traffic or read/write bytes
    • Select the period that you wish the report to cover: number of days or hours
    • Select the interval to apply to the report data: hourly or daily
  3. Click the Show Report button to display the graph.

Once you have identified the busiest LUNs, check the InfoBox of the suspected LUNs to find their storage groups and the hosts that generate such traffic.

Reporting the Total Traffic on an Hourly or Daily Basis

Dell EMC Unity KM for PATROL not only monitors the traffic and activity of the storage system, LUNs and physical disks in MB/sec, but also in GB per hour or per day. The exact amount of data that was read or written to the storage system, LUN or physical disk is calculated for each hour of the day and each day of the week.

The hourly report graph will represent the amount of data in GB from 12:00am to 12:59am, from 1:00am to 1:59am, from 2:00am to 2:59am, etc, while the daily report graph will represent the amount of data in GB for Monday, for Tuesday, for Wednesday, etc.

This report is notably helpful to SAN administrators to understand the impact of the nightly backups, of the amount of data a specific application writes to a LUN and how this evolves (with upgrades for example). In general, this will help administrators analyze the impact of various features of the storage system on the long term.

Generating a Storage System Activity Report

  1. In the PATROL console, right-click the KM main icon> KM Commands > Reporting > Storage Systems Activity…

    Configuring the Report Settings

  2. Define the report settings

    • Select the data you wish to generate a report for: read bytes traffic, transfer bytes traffic, write bytes traffic, or read/write bytes
    • Select the period that you wish the report to cover: number of days or hours
    • Select the interval to apply to the report data: hourly or daily
  3. Press the Storage System Selection button and select the specific storage system(s) you wish to include in the report
  4. Click the Show Report button to display the graph

The ability of the product to report on a given period of time depends on the history retention period of the PATROL Agent.