Monitoring Sun Solaris Systems (sun4u) with Hardware Sentry
KB1043 - Sep 17, 2010
Type: Best Practice
Description: Monitoring Sun Solaris Systems (sun4u) with Hardware Sentry KM. Supported communication protocols and discovered hardware components.
Additional Keywords: ALOM, in-band, kstat, out-of-band, prtdiag, psrinfo, Solaris, Sun, sun4u, Sun Blade 100, Sun Fire V100, Sun Fire V210, Sun Fire 480r, Sun Fire V880, Sun Fire V1280, Sun Enterprise 3500
- The Hardware Instrumentation section covers the basic Sun Solaris Systems (sun4u) hardware components and the supported communication protocols.
- The Setting up... section provides detailed instructions for installing and configuring Hardware Sentry KM for PATROL to properly monitor Sun Solaris Systems (sun4u).
- The Discovered Components and Monitored Parameters section lists the components automatically discovered and monitored by Hardware Sentry KM for PATROL as well as the connectors required to ensure a proper monitoring.
- The Troubleshooting section gathers the most frequently asked questions related to issues that stem from installation or configuration issues.
This section covers the Sun Solaris systems running on an UltraSPARC processor (UltraSPARC II, III, IV), including Sun Blade 100, Sun Fire V100, Sun Fire V210, Sun Fire 480r, Sun Fire V880, Sun Fire V1280, Sun Enterprise 3500 and all of their evolutions, also called “sun4u” systems.
About Sun Solaris systems (sun4u)
Sun Microsystems Inc. has been building its own UNIX systems for almost two decades from the ground up, including the processor (SPARC, UltraSPARC) and the operating system (SunOS now dubbed Solaris).
These Solaris/SPARC servers are built like any other server: a given number of SPARC processors, a given amount of RAM, some SCSI controllers (OEM) with a few internal SCSI disks (OEM’ed from Seagate or Fujitsu) as well as a couple network interfaces.
Built to last, these servers nonetheless include few hardware redundancy mechanisms. There is no hardware RAID controller in sun4u systems for example. Disk redundancy is done with software RAID or is handled by an external storage array.
In-band: Solaris system utilities (prtdiag, psrinfo, kstat, etc.)
The Sun Solaris operating system comes with a bunch of command line utilities that provide much detailed information about the underlying hardware (this needs to be noted as an exception on the market).
The following utilities are used by Hardware Sentry to discover and monitor the various hardware components of sun4u systems:
- prtdiag, prtpicl and lom for the environment (temperatures, fans, voltages and power supplies)
- cediag (on pre-Solaris 10 systems) or prtdiag for the memory modules (when the information is available)
- psrinfo for the processors
- ifconfig, kstat and ndd (if required) for the network cards
- iostat and dd for the internal disks
Some of these utilities require root privileges for proper execution: lom, cediag, ndd and dd (in some cases).
Note about memory modules:
cediag is a utility that comes with the optional SUNWcest package. Hardware Sentry will require the SUNWcest package to be installed in order to use cediag to gather information about the memory modules. However, the SUNWcest package is only available for Solaris versions 8 and 9. It is not available for Solaris 10. Also, some Sun Solaris servers report information about their memory modules through the prtdiag utility. In such case, Hardware Sentry will rely on prtdiag and will not require either cediag nor root privileges to discover and monitor memory modules.
Note about environment monitoring:
On Sun Fire V100, V110, V120 (and similar models), Hardware Sentry will use the lom command to gather environment information (temperatures, voltages, fans and power supplies). The lom utility requires root privileges. On other systems, Hardware Sentry will rely on prtdiag and prtpicl. It will actually rely on prtpicl when it is available (Solaris 9 and higher) for the temperature sensors, fans and voltage sensors. It will rely on prtdiag to discover the model of the server, discover the power supplies and the memory modules. It will also rely on prtdiag for temperatures, voltages and fans if prtpicl is not available.
Note about network cards monitoring:
In order to determine the link status of a network card and the link speed, Hardware Sentry needs to read some internal values in the network card driver. Sometimes these values are properly exposed by the driver through the kstat interface. In this case, Hardware Sentry will use kstat to monitor the network cards. Otherwise, Hardware Sentry uses ndd to directly interrogate the driver but this requires root privileges.
Note about disk monitoring:
Hardware Sentry mainly relies on the iostat utility to retrieve information about the disks seen by the system. It will do its best distinguish internal and external disks and show internal disks only. To do so, by default, it will only take into account disks that are marked as products from Sun (SUN36G, SUN72G, etc.). Based on the iostat report, it will collect the ErrorCount parameter that reports the number of hardware and transport errors that occurred on the disk since the last initialization of Hardware Sentry, or the last manual acknowledge. The ErrorCount parameter also reports predicted failure errors. In addition to the iostat utility, Hardware Sentry will also perform a read test on each disk with the dd utility and feed the Status parameter with the result of the read test. The dd utility requires “sys” privileges (i.e. being member of the sys group) on Solaris 10. It requires “root” privileges on Solaris 7, 8 and 9 (this actually depends on the access rights set on the /dev/rdsk/cXtYdZsN device files).
In-band: Sun Management Center (SMC)
The Sun Management Center agent (“SunMC” or “SMC” agent) actually uses Solaris system commands and logs to monitor the health of the server. As such, it doesn’t add any value to the existing. Even worse: the installation of the SMC environment requires deep knowledge of the SMC architecture and internals.
Hardware Sentry does not need and thus does not use the SMC agent to discover and monitor the hardware components of Sun Solaris systems (sun4u).
Out-of-band: Advanced Lights-Out Management card (ALOM)
Some Sun servers are equipped with an out-of-band management card (ALOM) that allows administrators to manage and monitor their servers remotely even when no operating system is running.
Hardware Sentry does not use the ALOM card to discover and monitor the hardware components of Sun Solaris (sun4u) systems.
- The server must be running Sun Solaris 7, 8, 9 or 10
- There is no need for the SMC agent to be installed
- On pre-Solaris 10 systems, the SUNWcest package must have been installed in order to monitor memory modules errors.
- Install the PATROL Agent on the server (versions 3.5.00 and upward are supported, version 3.7.00 minimum is recommended) if it has not been already done.
- Install Hardware Sentry KM for PATROL on the server (this can be done at the same time as the PATROL Agent). Please follow the instructions of the Installation Guide of Hardware Sentry.
Some system utilities used by Hardware Sentry require root privileges. To ensure that Hardware Sentry can use these utilities to discover and monitor the hardware components of a Sun server (sun4u), you can either configure Hardware Sentry to execute all of its external commands as root or configure it to use the sudo utility for a specified list of commands.
To configure Hardware Sentry to impersonate as root for all of its external commands, right-click on the main “Hardware” icon KM Commands This System’s Settings Connection, Credentials and Connectors… and enter the root login and password in the first step of the wizard.
To configure Hardware Sentry to use sudo, follow the same procedure but click on the “Sudo options” in the first step of the wizard. Then select which commands Hardware Sentry will use sudo for.
The list of commands that will require root privileges depends on the platform being monitored:
- lom will be used on Sun Fire V100, V110, V120 and similar models
- cediag will be used on pre-Solaris 10 systems only if the SUNWcest package has been installed (cediag and cestat need to be added to the sudoers file)
- dd will be used on all Solaris systems. Solaris 10 machines require no special configuration as all users are allowed to run the read only dd tests. Solaris 9 machines will require either root (actually “sys”) privileges for dd, or for the /dev/rdsk devices to be configured to be readable by the PATROL agent account. Solaris 8 machines will only work with root privileges.
- ndd will be used on some models of Solaris systems equipped with network cards whose driver is dmfe, bge or e1000g. But this list is not exhaustive and also depends on the version of Solaris. The best way to know for sure if the ndd will be used on a Solaris system is to check whether Hardware Sentry is able to collect the LinkStatus parameter without root privileges.
Please note that the sudo utility must have been installed on the system and configured to allow the PATROL Agent’s default account to execute the selected commands as root. This can be done in the /etc/sudoers file.
When configured properly, the following connectors should be automatically selected by Hardware Sentry in order to monitor a Sun Solaris system (sun4u):
- Sun Solaris - Environment (prtdiag, lom)
- Sun Solaris - Environment (prtpicl)
- Sun Solaris - Processors (psrinfo)
- Sun Solaris - Sun Disks
- Sun Solaris - Memory Modules (cediag)
- Sun Solaris - Network
Components and monitored parameters
In turn, the following components and parameters are discovered and monitored:
- Server model
- Temperature sensors, actual temperature if available and status
- Fans, speed or speed percent if available and status of each fan
- Voltage sensors, actual voltage if available and status
- Power supplies, status
- Memory modules, size, status (when the information is available)
- Overall memory status and predicted failure (if the SUNWcest package is available)
- Processors, type and frequency, status
- Physical disks, vendor, size, serial number, error count and status
- Network cards, vendor, model, connection speed, status, link status and error percentage
Memory modules cannot be seen
If you cannot see the memory modules (or at least a general “Overall” instance), it means that:
- The SUNWcest package hasn’t been installed (on pre-Solaris 10 systems)
- Hardware Sentry hasn’t been configured to use the root account or use sudo for the cediag command
- The server is running Solaris 10 and doesn’t report the status of memory modules individually. In this case, there is nothing that can be done to fix the problem. Sentry Software however plans to release a new connector based on the fmstat Solaris utility for Solaris 10 systems.
Internal disk cannot be seen
If you cannot see internal disks, it may be that your server has non-Sun disks which are excluded by default (this is done to ensure that Hardware Sentry only keeps internal disks). You can force Hardware Sentry to discover non-Sun disks by manually selecting the “Sun Solaris - Non-Sun Disks” through the graphical user interface of the KM. The potential drawback of this is that Hardware Sentry could take some external disks (in an external SAN array for example) as real physical disks and thus create dozens of instances for each of these disks.
Hardware Sentry reports an "Unknown" disk status
If Hardware Sentry reports the status of the disks as “Unknown”, it probably means that the access rights on the /dev/rdsk/cXtYdZsN device files don’t allow the PATROL Agent’s default account on read access and Hardware Sentry hasn’t been configured to execute external commands as root or use the sudo utility for the dd command.
Sun Solaris servers do not report any environment information
Some Sun Solaris servers do not report any environment information, notably all of the workstations (Sun Ultra-5, Ultra-10, etc.) as well as some server models (420r for example). On such systems, Hardware Sentry will only discovery and monitor the processors, network cards, internal disks as well as the memory modules (if this information is available, see above).
- Hardware Monitoring: Abnormally High CPU consumption on SUN platforms
- Monitoring Cisco UCS Servers Through Their IMC
- Monitoring Sun Fire F12K, F15K, E20K and E25K Servers with Hardware Sentry
- Monitoring Sun Fire M4000, M5000, M8000 and M9000 Servers with Hardware Sentry
- Monitoring Sun SPARC T1 and T2 systems (sun4v) with Hardware Sentry
- Monitoring Sun StorageTek Disk Arrays with Hardware Sentry
- Monitoring Sun X86 and X64 Systems with Hardware Sentry
- No Environment Information on Sun T1000/T2000 Servers