Monitoring the Environment: Fans, Temperatures, Power-Supplies and Voltages

Home  Previous  Next

BPM Express for Hardware automatically detects the information sources available on the monitored computer and displays the hardware information provided by those sources in the Portal interface. It is the Environment icon that regroups the icons created for each sensor found for Fans, Temperature, Power Supplies and Voltages. The icons are created automatically.

In the left pane, click on an Element > Hardware (<platform>) > Computer Type > Environment to see the discovered hardware components in detail in the right pane. Similarly by clicking on any hardware component in the left pane, you can see details about it in the right pane. For each monitored element, graphs & text reports are built by polling the parameter instances every five minutes. To view these graphs or text reports from the Status tab:

MON_Environment

Monitoring the Environment: fans, temperatures and voltages

1.Click the parameter > click the corresponding History icon that appears in the right pane.

2.For parameters with numeric values or Boolean values such as: Temperature, Voltage, Speed, Speed Percent and Used Capacity, you can see the results either in Chart view (graph) or Table view (for parameters with text values)

Alert Thresholds: Depending on the type of platform and sensors, and whenever possible, alert thresholds are automatically set by BPM Express for Hardware. When the parameter value breaches these thresholds, it is Status - a text parameter that displays the overall status for every instance, that triggers alerts, and a notification is sent out according to the options configured in the Portal.

If a device appears to be missing, the Status parameter will trigger an alert. Alert conditions for Status describe in symbolic terms what occurs in the Status parameter when thresholds are breached: one exclamation mark triggers a warning; two exclamation marks raise an alarm.

Example

If BPM Express for Hardware detects that manufacturer-specified thresholds for the device have been breached, the Status parameter will report for example: “WARNING! The fan speed is too low” or, “ALARM! This fan has stopped working” etc. The history graph shows the exact details of the problem, its consequences and recommended actions.

Fans

To avoid temperatures that are too high, system manufacturers install fans on critical devices (processors, power supplies, etc.). Monitoring fans is important as they ensure a proper temperature for the system to work efficiently. Depending on the available information, the Speed and/or Speed Percent and/or Status parameters will be displayed for each detected fan device:

The Speed parameter represents the speed of the corresponding fan in rotations/minute. An alert is triggered if the fan speed is too low for proper functioning.

The Speed Percent parameter represents the speed of the corresponding fan in percentage of its maximal speed.

The Status parameter represents the overall status of the fan. An alert is triggered if any of the parameters breach their respective thresholds. It is only Status that will trigger and display the alerts. When all is fine, Status  shows “OK”, and when there is problem, it shows “WARNING!” or “ALARM!!” with a detailed description of the issue, its consequences and recommended actions. The alert conditions for Status are: “!”=WARNING; “!!”=ALARM . Example: “OK”, or, “ALARM!! This fan is not detected anymore”.

Temperatures

As with any electronic device, chips and other components of a computer stop working when the temperature rises too high (many unrecoverable errors, crashes and even hardware damage). Temperatures may rise too high when the device is abnormally overloaded, when a fan is not working properly or when the ambient temperature is too hot. Monitoring the temperatures of critical devices of your system allows you to take action before a crash occurs.

Depending on the available information, the Temperature and/or Status parameters will be displayed for each detected temperature sensor:

The Temperature parameter represents the current temperature reading in degrees Celsius (°C).

The Status parameter represents the overall status of the temperature. An alert is triggered if the temperature rises to high, or i.e. if any of the other parameters breach their respective thresholds. It is only Status that will trigger and display the alerts. When all is fine, Status shows “OK”, and when there is problem, it shows “WARNING!” or “ALARM!!” with a detailed description of the issue, its consequences and recommended actions. The alert conditions for Status are: “!”=WARNING; “!!”=ALARM. Example: “OK”, or, “ALARM!!. The temperature is critically high”.

 

Power Supplies

The power supply is the component that transforms the AC Line into electric power needed by the computer. Therefore the power supply is a highly critical device of a computer that should never fail. Due to this, many vendors build servers with redundant power supplies. Monitoring power supplies allows the operators to be alerted when a power supply fails, or even in some cases when a power supply is overloaded.

Depending on the available information, the Used Capacity and/ or Status parameters will be displayed for each power supply or power unit device:

The Used Capacity parameter represents the current power usage as a percentage. The Status parameter triggers an alert when the power supply’s maximum power output is reached.

The Status parameter represents the overall status of the power supply. An alert is triggered if power output goes out of range, or i.e. if the parameter for breaches its thresholds. It is Status that will trigger and display the alerts. When all is fine, Status shows “OK”, and when there is problem, it shows “WARNING!” or “ALARM!!” with a detailed description of the issue, its consequences and recommended actions. The alert conditions for Status are: “!”=WARNING; “!!”=ALARM. Example: “WARNING! Problem: This power supply is in degraded state, or about to fail or “ALARM!! The power consumed by the system is out of the supported range".

Voltages

Power supplies convert the AC line power into voltages and currents needed by the motherboard of the computer. The stability of the motherboard (and therefore that of the overall computer) strongly depends on this power converter. Voltages that are too low or too high may lead to unpredictable system crashes. Monitoring the value of the different voltages needed by the motherboard will help in detecting unstable system instability.

Depending on the available information, the Voltage and/or Status parameters are displayed for each voltage sensor on the motherboard:

The Voltage parameter represents the voltage output in milliVolts (mV). An alert is triggered by the Status parameter if the voltage goes out of the proper range.

The Status parameter represents the overall status of the voltage. It triggers an alert if the voltage output is too low for proper functioning or if it goes out of the proper range. It is only Status that will trigger and display the alerts. When all is fine, Status  shows “OK”, and when there is problem, it shows “WARNING!” or “ALARM!!” with a detailed description of the issue, its consequences and recommended actions. The alert conditions for Status are: “!”=WARNING; “!!”=ALARM. Example: “OK” or “ALARM!! This voltage sensor is no longer detected".

 


Related Topics

Modifying Parameters Thresholds

SEN_HW_VOLTAGE