Monitoring Processors

Home  Previous  Next

Processors (also called CPU, Central Processing Unit) are obviously the most critical devices within a computer. While a processor fault may often lead to a system crash without a chance for a monitoring tool to catch the error, it is still very useful to monitor a server’s processors.

In the case of a system crash due to a processor fault, the system reboots automatically. The reboot is either triggered by the operating system or by the motherboard itself. If a processor is no longer working, it is automatically disabled by the BIOS and, if there is one other processor left, the operating system starts with one processor less. Hardware Sentry monitors each processor and verifies if is present and running. If a processor is missing upon a reboot, Hardware Sentry triggers an alert.

On some recent or high-end servers, processors are able to correct certain operation errors by themselves (like the ECC memory). If this information is available, it is shown in the PATROL Console by Hardware Sentry. In addition, if the processor is able to predict failure, this information will be monitored by Hardware Sentry and shown in the PATROL Console.

Depending on the available information, the Status and/or CorrectedErrorCount and/or PredictedFailure parameters are displayed for each discovered processor (CPU):

The Status parameter represents the current status of the processor. An alert is triggered if the processor is unavailable for proper operation (missing, disable by the BIOS due to a POST error, etc.).
The PredictedFailure parameter reports the predicted failure analysis performed by the processor itself. This information is based on the rate of corrected errors.
The CorrectedErrorCount parameter represents the number of errors that have been automatically corrected by the processor. This information can be very useful to predict a failure in the near future.
The Speed parameter reports the current speed of the processor in megahertz. The speed of a processor can vary depending on the workload of the server.

See Also

Component Monitoring