Use cases

This section describes common hardware monitoring use cases.

Checking Disks Health

Most manufacturers typically use the “mean time to failure” or MTTF to indicate the operational reliability of their products. But the advertized MTTF of 1,000,000 hours (or even more!) is misleading. Recent studies show that the average annual replacement rate for hard disks is typically between 3% and 15%. This means that an organization with just 100 servers and approximately 300 hard disks will experience between 9 and 45 disk failures every year which, if they do not impact the availability of the system, will surely degrade the overall performance dramatically. An organization with 1000 servers will experience almost a disk failure each day of the year. Given their relevant short lifetime and the amount of data they store, disks are one of the most critical devices to be monitored.

Monitoring disk health consists in closely monitoring the 3 typical types of disks: disk controllers, physical and logical disks.

Monitoring Disk Controllers

A disk controller is a card inside a computer that connects one or several physical disk drives to the computer and write cache. To keep this write from being lost if power is interrupted, the card must be configured with a battery. It is thus recommended to closely monitor this battery.

  1. In the operator console, double-click the Graph icon inline of the Battery Monitor Type to verify that the disk controller battery is able to support the controller in the event of a power failure.
  2. It is also recommended to verify the controller health. Double-click the Graph icon inline of the Controller Monitor Type to make sure the controller is not degraded or has not failed.

    Monitoring Disk Controllers and Batteries

Monitoring Physical disks

Physical disks must be monitored to avoid loss of data, unavailability and performance degradation. Contrary to other solutions, Hardware Sentry monitors the actual physical disks (Hardware Physical Disk) behind the controller and not only the disks as seen by the operating system.

  1. In the operator console, double-click the Graph icon inline of the Physical Disk Monitor Type to verify that physical disk has not failed or is not degraded.
  2. In case a replacement is required, you may need to look up the physical disk hardware information (manufacturer, model, serial number, etc.): Double-click the Graph icon inline of the Physical Disk Monitor Type and click the Monitor Information tab:

    Physical Disks Monitor Information

Monitoring Logical Disks

RAID or advanced disk controllers expose several physical disks as a single logical disk to the operating system. The information required by administrators is mainly the logical disk’s status, its RAID type and size. To get that information:

  1. First, click the Graph icon inline of the Logical Disk Monitor Type. A graph is displayed in the graph pane indicating the status of the RAID array (OK, Degraded, Failed).

    Logical Disks Graph

  2. Then, click the Monitor Information tab. It contains all the relevant information about the selected logical disk.

    Logical Disks Monitor Information

Adopting a predictive approach to monitoring a datacenter includes to closely monitor the state and performance of key-components such as processors and disk drives. The valuable indicators provided by Hardware Sentry helps IT administrators implement and maintain a proactive monitoring strategy.

Diagnosing Datacenter Electrical Issues

Understanding the basics of the electrical distribution system can help IT administrators diagnose data center electrical issues. Power is delivered to a data center by the local utility company. Once inside the building, the utility power goes to the Automatic Transfer Switch and to the uninterruptible power supply (UPS) units. These units clean the incoming utility system before passing it to power distribution units (PDUs) for conversion. Power will finally be distributed to electrical outlets and servers. During the distribution, power loss or instability can occur. It can be caused by voltage or AC/DC conversion, hence the importance to monitor voltage and power supplies.

To monitor voltage

Monitoring voltage helps verify the quality of power supplies. In fact, if the power supply is weak, the voltage level on the motherboard will not be steady, which could lead to random crashes or to errors at the processor or memory levels.

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select Add Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance Template

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the device for which you want to monitor potential voltage issues.
  10. Select the Voltage Status parameter of the component you wish to monitor and click Apply.

    Selecting the Voltage Status Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  11. Repeat this operation in the second column with the Voltage parameter of the same component. Click Apply.

  12. The dashboard is completed.

    Dashboard - Monitoring Voltage

Higher voltage and fewer fluctuations in voltage will always guarantee better efficiency. If you notice voltage fluctuations, verify your electrical connections and wiring.

To monitor power supplies

After hard drives, the power supply is the device that is most likely to fail. The proper functioning of this device highly depends on the quality of the data center electrical distribution. Indeed, voltage fluctuations are detrimental to power supplies: they can shorten their life span or lead to severe malfunction.

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select Add Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance Template

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the device for which you want to monitor potential voltage issues.
  10. Select the Present parameter of the power supply you wish to monitor and click Apply.

    Selecting the Present Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  11. Repeat this operation in the second column with the Status parameter of the same power supply. This parameter indicates if the power supply is performing properly. Several power supply failures may reveal an issue on the data center electrical distribution.

  12. Click Apply.
  13. The dashboard is completed.

    Dashboard - Monitoring Power Supplies

Monitoring the Used Capacity parameter can also help ensure that the maximum power output of the power supply is not reached.

Managing Datacenter Heating and Cooling Issues

Even though datacenters and servers are cooled down with air conditioning and fans, computing systems may overheat. Because overheating will lead to a general instability, Hardware Sentry monitors the fans, when present, and all the temperature sensors. Automatic thresholds are set according to the manufacturers’ recommendation and the location of the temperature sensor.

The temperature thresholds set by Hardware Sentry should not be customized or modified.

To monitor the datacenter temperature

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select Add Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance Template

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the first device for which you want to monitor potential temperature issues.
  10. Select the Degrees Below Warning parameter of the device and click Apply.

    Monitoring this parameter can greatly help optimize the data center temperature and reduce the risk of unplanned downtime by providing information about the temperature of a component before it reaches the closest warning threshold.

    Selecting the Degrees Below Warning Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  11. Repeat this operation in the second and third column of the dashboard for other devices. Click Apply.

  12. The dashboard is completed and provides critical information about the temperature level for each monitored device.

Dashboard - Monitoring the Datacenter Temperature

To monitor the fan performance of servers

The temperature inside a server case is controlled with fans. To prevent internal temperature to get too high, verify that the fan is operating properly.

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select Add Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the device for which you want to monitor potential temperature issues.
  10. Select the Present parameter of the fan you wish to monitor. Click Apply.
  11. Repeat this operation in the second column of the dashboard for the Speed parameter of the fan of the same device. This parameter can help you determine if the fan speed is too fast or too slow. Click Apply.

    Selecting the Present Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  12. Repeat this operation in the third column of the dashboard for the Speed Percent parameter of the fan of the same device. This parameter indicates if the fan has reached its maximum speed.

  13. Click Apply.
  14. The dashboard is completed. Cooling might not be sufficient if the fan speed is too low or if maximum speed is reached.

    Dashboard - Monitoring the fan performance of servers

    A fan which is no longer spinning or is turning too slowly should be replaced immediately.

To monitor the status of devices’ temperature sensors

Monitoring temperature sensors helps identify which device is properly operating and which is in poor or critical condition.

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select dd Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance Template

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the device for which you want to monitor potential temperature issues.
  10. Select the Temperature Status parameter of the component you wish to monitor. Click Apply.
  11. Repeat this operation in the second and third column of the dashboard for another component of the same device or for another device.

    Selecting the Temperature Status Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  12. Click Apply.

  13. The dashboard is completed.

    Dashboard - Monitoring the Status of Devices’ Temperature Sensors

  14. To get even more precise information about the temperature of a device component, create a similar dashboard with its Temperature parameter. This parameter reports the actual temperature as registered by the sensor.

    Dashboard - Reporting the Actual Temperature of the Sensor

Monitoring Network Traffic & Preventing Bottlenecks

Applications rely on the network whose bandwidth and latency has a dramatic impact on the overall measured and perceived IT performance. Hardware Sentry monitors the connectivity and the quality of the network connections. The incoming and outgoing traffic is also constantly measured against the available bandwidth to give system administrators the short term and long term visibility on the network capacity utilization.

To verify a network connection

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select Add Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance Template

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the device for which you want to monitor potential network issues.
  10. Select the Link Status parameter of the Network Interface you wish to monitor and click Apply.

    Selecting the Link Status Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  11. Repeat this operation in the second column with the Link Speed parameter of the same Network Interface. For Ethernet or fiber adapters, any movement on this parameter means the quality of the connection is poor and needs to be fixed. By default, a warning event is triggered when the link speed downgrades from its current value to a lower value (from 1Gb/s to 100Mb/s for example).

  12. Click Apply.
  13. The dashboard is completed.

    Dashboard - Monitoring a Network Connection

To get a full picture of your network connections, you can also create a dashboard to monitor:

  • the Duplex Mode parameter that indicates whether the Ethernet connection is working in full duplex or half-duplex mode.
  • the Error Percent parameter to monitor the percentage of transmitted and received packets that were in error. A frequent high error percentage may indicate a serious problem with the cable or the interface.

To monitor the transmission rates

Transmission rates monitoring provides administrators with valuable information about the incoming and outgoing data managed by servers and switches and help identify the traffic demands and peak periods.

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select Add Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance Template

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the device for which you want to monitor transmission rates.
  10. Select the Received Bytes Rate parameter of the Network Interface you wish to monitor and click Apply.

    Selecting the Received Bytes Rate Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  11. Repeat this operation in the second column with the Transmitted Bytes Rate parameter of the same Network Interface. Click Apply.

  12. In the second column of the dashboard, repeat this operation for the Received Packets Rate and the Transmitted Packets Rate parameters of the same Network Interface. Click Apply.
  13. The dashboard is completed.

    Dashboard - Monitoring Transmission Rates

To monitor the bandwidth utilization

Monitoring the bandwidth utilization of network interfaces can help identify unexpected and random peaks in the network activity, which could hide business critical issues, such as a network attack or unauthorized transfer of data.

The Bandwidth Utilization parameter can ONLY be collected if Link Speed, Duplex Mode, Received Bytes Rate and Transmitted Bytes Rate are all properly collected.

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select Add Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance Template

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the device for which you want to monitor bandwidth utilization.
  10. Select the Bandwidth Utilization parameter of the Network Interface you wish to monitor and click Apply.

    Selecting the Bandwidth Utilization Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  11. Repeat this operation in the second column with another Network Interface. Click Apply.

  12. The dashboard is completed.

    Dashboard - Monitoring the Bandwidth Utilization

Predicting Hardware Failures

Even though end-users expect the IT environment they rely on to be flawless, it is common knowledge that hardware components are inherently prone to failure. In most cases, electronic components work as expected or fail completely and it is rare to be able to observe such components degrade slowly over time. That is the reason why Hardware Sentry only reports the overall status for most object classes as simply “OK” or “failed”.

However, some components are able to report their own degradation and warn the administrator of an imminent failure. Such components include:

  • the processors (the more computation errors they detect and correct automatically, the more likely they will fail soon).
  • the memory modules (an increasing number of fixed ECC errors means the module is nearing its end of life).
  • the hard disks (many internal metrics are constantly analyzed by the disk itself to assess its own health and predict an imminent failure – this technology is standard and is called S.M.A.R.T.).

When such information is properly reported by the component or the instrumentation layer of the system itself, Hardware Sentry will trigger an event to warn the administrators that an imminent failure of a processor, a memory module or a physical disk is likely to occur.

To monitor potential hardware failures

  1. Login to your TrueSight console.
  2. Select Dashboards from the navigation pane.
  3. In the Dashboards page, click Add Dashboard or select Add Dashboard from the dashboard action menu inline.

    Adding a Dashboard

  4. Enter a Title for your dashboard and configure a Global Filter, if needed.

  5. Click Add Dashlet to open the dashlet library for the first column of the row.
  6. From the dashlet library, select the Device Performance template, and then click Close.

    Selecting the Device Performance Template

  7. Select Configure Dashlet by clicking the inline button.

  8. In the panel of input fields and options that opens below the dashboard, enter a Title for the dashlet and specify a Refresh Rate (default is 5 minutes).
  9. Select the device for which you want to monitor potential hardware failures.
  10. Select the Predicted Failure parameter(s) of the processor(s) you wish to monitor and click Apply.

    Selecting the Predicted Failure Parameter

    Tip: To quickly retrieve a component or a parameter, enter its name in the Search Parameters field and click Search.

  11. Click Apply.

  12. The dashboard is completed.

Dashboard - Monitoring Potential Hardware Failures

If this parameter is equal to 1 and goes into alarm, the faulty hardware should be replaced.