Type
  • Best Practice
Description
  • This article provides detailed instructions for configuring Hardware Sentry KM for PATROL to properly monitor IBM AIX Servers
Additional Keywords
  • AIX, IBM

Related Topics

Knowledge Base

Monitoring IBM AIX Servers with Hardware Sentry

KB1049 - Sep 20, 2010 - Last reviewed on Sep 13, 2018

This document covers the IBM AIX servers running on the PowerPC processor architecture. This includes:

  • IBM RS/6000
  • IBM pSeries
  • IBM eServer p5
  • IBM eServer p6
  • IBM eServer p7

This guide partially covers the IBM Regatta product line, as well as the IBM pSeries p690 and eServer p5 595, also known as IBM UNIX mainframes.

About IBM AIX servers

Since 1990, AIX has served as the primary operating system for the RS/6000 series (later renamed IBM eServer pSeries, then IBM System p, and now IBM Power Systems). Hardware Sentry KM and BPM Express for Hardware support AIX versions from 4.2.x.

The internal parts of IBM AIX servers have always been a basic set of:

  • PowerPC processors
  • Memory modules
  • Ethernet cards
  • Fiber cards
  • SCSI disk controllers A few of these servers also included a RAID adapter (IBM SSA RAID controllers).

Hardware instrumentation

In-band: AIX system utilities (lsdev, dd, entstat, machstat, etc.)

The IBM AIX operating system comes with several command line utilities that provide useful information about the underlying hardware. However, it is important to note that AIX doesn’t offer any way to retrieve the actual value of environment sensors in the system.

The following utilities are used by Hardware Sentry KM and BMC Performance Manager Express for Hardware to discover, monitor, and process the various hardware components of IBM AIX systems:

  • uname, prtconf, lsdev, lscfg (general device discovery, status of processors and memory modules)
  • entstat (network card discovery and status)
  • fcstat (HBA discovery and status)
  • uesensor (environment on a few IBM pSeries systems)
  • bootinfo, machstat (environment on CHRP systems)
  • dd, errpt, lspv (disk monitoring)
  • awk, tail, head Bootinfo, machstat and dd require sudo / root access.

Setting up Hardware Sentry on IBM AIX servers

Pre-requisites

The server must be running IBM AIX 4.x or above

Configuration

Some system utilities used by Hardware Sentry require root privileges. To ensure that Hardware Sentry can use these utilities to discover and monitor the hardware components of an IBM AIX server, you can either configure Hardware Sentry to execute all of its external commands as root or configure it to use the sudo utility for a specified list of commands.

The list of commands that will require root privileges is the following:

  • /usr/sbin/bootinfo (on CHRP systems, i.e. most of AIX 5.x and 6.x systems)
  • /usr/sbin/machstat (on CHRP systems)
  • /usr/bin/dd

Please note that the sudo utility must have been installed on the system and configured to allow the PATROL Agent’s default account to execute the selected commands as root. This can be done in the /etc/sudoers file.

For PATROL

To add a new AIX system in PATROL:

  1. right-click on the main Hardware icon › KM CommandsAdd a Remote System or External Device.
  2. Provide the Hostname or IP Address, and specify the Device Type as IBM AIX.
  3. Choose SSH (OS Commands) as the protocol to use. and enter the root login and password.
  4. For Connector Selection Mode, choose automatically detect the suitable connectors
  5. Enter the root login and password.

Alternatively, to configure Hardware Sentry to use sudo, follow the same procedure but click on the Sudo Options in the wizard when prompted for Credentials, then select the commands for which Hardware Sentry will use sudo.

For Truesight

To add a new AIX system in Truesight, specify the following In the infrastructure policy’s monitoring configuration.

  1. The Device Type should be set to IBM AIX.
  2. Credentials for the device should be entered in the SSH section of the configuration.
  3. Check the box under Sudo Options labelled “Use When Root Privileges are Needed”, if root credentials have not been specified.

TrueSight Device Configuration

Troubleshooting

If Hardware Sentry KM or BMC Performance Manager Express for Hardware does not seem to monitor the power supplies and fans of IBM AIX 5.x or later systems, it probably means that you haven’t configured the product with the root account or sudo as explained above.

It is normal not to have distinctive instances for each sensor, power supplies and fans. IBM AIX systems are not able to report the status of the environment with a per-sensor granularity. You only get a general “System cooling” instance, and a general “System power” instance. These objects will trigger a warning or an alarm when a fan or a power supply fails or when the temperature goes too high.

If Hardware Sentry reports the status of the disks as “Unknown”, it is likely that the access rights on the /dev/hdiskN device files don’t allow the PATROL Agent’s default account on read access and Hardware Sentry hasn’t been configured to execute external commands as root or use the sudo utility for the dd command.

Discovered components and monitored parameters

When configured properly, the following connectors should be automatically selected by Hardware Sentry in order to monitor an IBM AIX server:

  • IBM AIX - Common
  • IBM AIX - CHRP Environment
  • IBM AIX - SCSI disks
  • IBM AIX - Environment (uesensor) (only on a few pSeries servers)

In turn, the following components and parameters are discovered and monitored:

  • Server model
  • Overall cooling status
  • Overall powering status
  • Memory modules, size, status, error count
  • Processors, type and frequency, status
  • Physical disks, vendor, size, serial number, error count and status
  • Network cards, vendor, model, status, link status, speed and duplex, input and output (bytes, packets and error percentage), bandwidth utilization
  • HBA, model, WWN, serial number, device type, bandwidth, link status, errorcount, total packets.

VIO Servers

Principles

On IBM pSeries systems partitioned in several LPARs, the monitoring of the hardware needs to be configured in a specific way. Typically, one of the LPARs is dedicated to the processing of the I/Os and is called the VIO Server.

The monitoring of CPU, Overall Cooling Status, Overall Power Status and Memory is done at the LPAR level. As each LPAR is only able to see components that have been exclusively dedicated to it or those components that are being shared with another LPAR, you will need to monitor several LPARs to be able to see all components.

The monitoring of Network Cards, Physical disks and HBAs requires access to the VIO. The LPARs are only able to see the virtual versions of these components. To get their real status we require access to the VIOs.

Configuration

As LPARs can be dynamically allocated / de-allocated server components / resources, we recommend turning off missing device detection when monitoring LPARs. For PATROL, this setting is located within Hardware icon > KM Commands > KM Settings > Missing Device Detection.

In TrueSight, this setting is within the Monitoring Policy > Monitoring Configuration > Missing Device Detection. Missing device detection can be left on for the VIO Server.

System utilities used by Hardware Sentry when monitoring the VIO require padmin privileges. To ensure that Hardware Sentry can use these utilities to discover and monitor the hardware components of an IBM AIX server, you can either configure Hardware Sentry to execute all external commands as padmin, or use a user account with equal permission settings.

Using Hardware Management Console

If an IBM Hardware Management Console is available, AIX systems can instead be monitored through the HMC. The status of the System Attention LED of all IBM AIX servers connected to the Hardware Management Console will be reported. An ALARM alert will be triggered for a hardware problem based on the LED status. These alarms are automatically cleared at midnight of the day they occur.

This is a less-detailed form of monitoring for these systems as it reports only LED status and system presence, but may be easier to configure as it requires only monitoring one device, the Hardware Management Console. This would be monitored as a Linux system.