Hardware Sentry KM for PATROL

Release Notes for v1.9.50

Home  Previous  Next

What's New

Functionality

Hardware Sentry fully supports SNMP v2c and SNMP v3.

Two new configuration variables, removeAllThresholds and trimFromDisplayName, are available to respectively delete all existing thresholds from the configuration and specify the characters to be removed from the object display names.

New macros are available:

%{SYSTEM_METAFQDN}: MetaFQDN of the monitored system.

%{SYSTEM_IP}: IP address of the monitored system.

%{SYSTEM_DOMAIN}: name of the domain the monitored system belongs to.

Supported Platforms

New Supported Platforms

EMC CLARiiON and VNX systems (monitored through Navisphere CLI).

EMC Isilon Clusters (monitored via SSH).

Hitachi (HDS) AMS, HUS Storage Systems (monitored through the Hitachi Storage Navigator Modular 2 CLI).

HP P2000 G3 systems (monitored via WBEM).

Huawei Storage Systems (OceanStor)(monitored via SNMP)

Huawei Servers (monitored via their management cards).

IBM DataPower Appliances (monitored via SNMP)

IBM PureFlex Chassis (monitored via SNMP)

IBM Storwize v3700 systems (monitored via SSH)

Lenovo servers (via their IMM)

MacroSan Storage Systems (monitored via SNMP)

Oracle/Sun InfiniBand DCS Switches (monitored via SNMP)

Oracle/Sun servers (monitored through the Oracle Hardware Management Agent - Recommended method)

Oracle/Sun ZFS Storage Appliances (monitored via SSH)

Pure Storage FA Series (monitored via SSH)

QLogic HBA adapters on ESX servers (monitored via WBEM)

WMware ESX6 (monitored via WBEM)

Improved Platforms

Cisco

Non-port components for Cisco Ethernet Switches (power supplies, fans, temperature sensors, and voltages) are supported. They are monitored via SNMP.

IBM AIX servers

Hardware Sentry reports the status of the System Attention LED and triggers an alert if a hardware problem has been reported on each system since 0:00am.

Logical disks (hdisks), physical disks (pdisks) and batteries managed by sissasraidmgr are supported.

Oracle/Sun

Hardware monitoring is available for any Oracle/Sun system with an ILOM.

MPIO LUNs managed with the mpathadm utility are supported.

Oracle/Sun Xsigo switches are supported (monitored via SNMP).

The overall CPU and memory status are supported for Oracle/Sun Solaris 11 servers.

SPARC Enterprise Mx000 (XSCF): Negative voltage sensors, such as -12V, are discovered.

Network port statistics are now available for Oracle/Sun - Xsigo Switches.

VMware

The pathcount of LUNs is monitored for VMware ESX5i / vSphere 5 host servers.

Individual memory sensors listed under "Other" sensors in the vCenter/vSphere configuration tab are supported.

Monitored Components

The PowerState parameter is available to indicate whether the blade is currently on or off for Dell Blade Servers, Hitachi BladeSymphony Chassis, HP BladeSystem rack, and IBM BladeCenter chassis.

Changes and Improvements

Functionality

Improved scalability: The KM better handles large number of hosts to monitor.

Improved stability: The KM better handles error conditions on a large number of hosts.

On Windows systems, the KM could take a very long time to initialize upon startup and after a reinitialize.

Hardware Sentry can be configured to automatically delete missing instances after a certain time.

Connectors can be excluded from the detection process.

The way Hardware Sentry determines whether the PATROL Agent is configured via CMA or PATROL Classic has been improved.

Hardware Sentry no longer uses commas in object labels to make sure fields are properly parsed when the Event Management KM sends information to BMC Event Management (BEM) or BMC ProactiveNet Performance Management (BPPM). Other characters can also be removed using the new configuration variable trimFromDisplayName.

A discovery can be triggered for a specific host using the new KM command Rediscover. This KM command will delete the list of current devices (including the missing ones), remove the host instance and trigger a discovery. This KM command replaces the Trigger a Discovery KM command.

Additional information (BIOS Version, Driver Version, Manufacturer, etc.) is provided in the Instant Hardware Health Report and events triggered by the Alert Actions.

Specific PATROL Events triggered upon Hardware and Connector failures also indicate the alert origin (application class name).

The %{SYSTEM_FQDN} macro now provides the Fully Qualified Domain Name instead of the System MetaFQDN.

The %{SYSTEM_NAME} macro now provides the name of the monitored host or hostname specified while configuring the server monitoring instead of the monitored system identifier.

Hardware Sentry debug output file now includes a full hardware inventory.

The debug file and the Hardware Inventory now report the total number of instances for each host for every application classes.

The maximum number of concurrent collection threads is now set per host through the maxConcurrentCollectThreadsPerHost global configuration variable. The maxConcurrentCollectThreads variable is therefore no longer supported.

Platforms

Dell Blade Servers:

The blade's hostname was added to each blade display ID.

Timeouts for all commands have been increased.

The Status of the Hitachi HDS USP/VSP Storage Systems is more accurately reported.

HP ProLiant Servers running VMware ESX: Hardware Sentry collects a real time power consumption value from either the VMware ESX CIM agent or HP Insight Management Agent for VMware ESX.

When HP servers did not return an overall power consumption, the KM disabled the corresponding parameter and made an estimate in the capacity report. When this information is not available, the HP Insight Management Agent - Server connector will now try to sum up the power consumption of all power supplies before falling back on the estimate.

IBM BladeCenter Chassis: Embedded switches, passthroughs, and management modules are now monitored.

On IBM AIX and VIO servers, the system is now fully identified with its hardware ID, LPAR ID, system ID and model name. More details have also been added to several components (disks, network cards, FC ports, and CPUs) to facilitate their identification in case of a failure.

IBM AIX Servers:

Additional information is available for LUNs (WWN, Array Name, Hardware Location Code, and Expected Number of Paths).

The Status of the System Attention LED is now reported and an alert is triggered when this LED is turned on. The alert remains until an administrator manually acknowledges the status of the System Attention LED.

IBM VIO Server systems are better identified with their model name, code, IDs, etc.

IBM Storwize (SSH): Additional information is available for physical disks (vendor, model, serial number, firmware, etc.).

Quantum Scalar i2000 and i6000: The components visible identifiers are now based on sensor names / locations /etc.

SUN SPARC Enterprise Mx000 (XSCF): False voltage alerts were triggered due to incorrect thresholds.

SUN SPARC Servers (Prtpicl): A smaller version of the device ID is used for the display ID to enable easier sensor identification.

VMware ESX: The monitoring of power supplies has been improved. Both VMware ESX health and availability status are used to determine the health of the power supply.

Connectors

Cisco UCS Manager (Blade, Fabric Interconnect Switch): This connector activates when a Cisco UCS Interconnect Switch is detected. A B-Series UCS Chassis was previously required.

Disk Monitoring on WMware ESX servers (IPMI): Because some servers use the same IPMI Monitored Device ID for all physical disks, the connector uses the IPMI Device ID to group sensors for each physical disk. The physical disk's caption is now used as the Display ID.

EMC Disk Arrays: The connector no longer verifies the presence of disks. This reduces the workload on the EMC SMI-S Provider and reduces the risk of the connector failing to query the EMC SMI-S Provider.

HP Insight Management Agent - Server - WBEM: The connector removes all temperature thresholds of zero to avoid unwanted temperature alerts.

IBM Storwize Disk Arrays – SSH: The connector's display name has been changed to "IBM Storwize (SSH)" as it also works on IBM v3700 / v5000 systems.

MIB-2 Standard SNMP Agent - Network Interfaces connectors now use Counter64 OIDs when available.

Because multiple instances of the same LSI RAID Controller could appear in the tree view, the LsiLogic MegaRAID SAS, LSI MegaCli, and LSI Logic - LsiUtil - RAID connectors are now disabled if the SMI-S Compliant RAID Controller Linux/Windows connector is activated.

SMI-S Compliant RAID Controller: Some servers (typically HP ProLiant) do not report sizes of physical disks. When this situation occurs, the connector queries the associated storage extents to retrieve the actual disk size.

SMI-S HBA Connector:

The connector has been modified to report on link speed and statistics.

The SMI-S HBA Connector no longer monitors Logical Disks (LUNs) to avoid duplicate LUN instances.

WMI - HBA Connector:

The connector uses the LUN's naa.ID to identify LUNs. Using the naa.ID helps link LUNs to the Storage System's volume as they share this unique identifier code.

The connector displays the disk's Windows MPIO ID as well as the drive letters and partition names of any volumes on that LUN. A typical LUN ID will therefore read: naa.60616043312F05A4308DC65F111 (MPIO Disk0 - C:(OS) D:(Data))

Monitored Components

Logical Disks, Temperature, Voltage and LED instances will be automatically deleted in the PATROL Console as soon as they are detected as "missing".

A more reliable method is being used to associate batteries to their related disk controllers.

LUNs:

A warning is triggered when the number of available paths is one path lower than the initial number of available paths.

The Present parameter is no more available.

The problem type, consequences, and recommended actions are provided when an alert is triggered on the Status and AvailablePathCount parameters.

Network Links:

Network monitoring through SNMP MIB-2 for Windows and network switches was improved.

Default thresholds are set on the ErrorPercent parameter of the MS_HW_NETWORK application class (≥ 10% = warning, ≥ 30% = alarm).

Fixed Issues

Functionality

Java Settings: When the User Selection option was used, the username entered was not properly saved in the configuration.

The KM could freeze or stop working in case of repeated discovery timeouts, when reinitializing the KM on large environments or when too many connectors failed at the same time.

In some situations, Hardware Sentry would not activate the BandwidthUtilization parameter even if the current connector could collect the network’s bandwidth utilization.

Platforms

The ports link speed was not available for Cisco MDS9000 Series FC switches.

Data Domain Storage Systems: Due to the structure of the Data Domain MIB, specific strings in a Physical Hard Drive's serial number could cause a disk to report an unknown status.

Dell PowerEdge Servers: the physical disk instances were not attached to the proper disk controller instance.

Dell Servers with non-RAID disks could activate both the WMI - Disks and Dell OpenManage Storage Manager connectors, resulting in double monitoring.

Dell TL2000/4000 and IBM TS3100/3200 Tape Libraries: Tape drive mounts were incorrectly reported as errors, which resulted in false alerts to be triggered on the ErrorCount parameters.

Hitachi HDS AMS/HUS Storage Systems:

Authentication issues were encountered.

The execution of all commands is fully serialized to prevent conflicts. All temporary files used by the batch files/shell scripts use randomized file names to prevent file locks and missing files.

Logical disk information was missing.

Logical Disk Status was not collected for some systems when the command output format was not supported by the KM.

The Linux connector did not work properly.

HP-UX System: In some cases, the value of the ErrorPercent parameter of the MS_HW_NETWORK application class was not reported correctly.

HP Servers Running VMware:

Disk controllers and their batteries are now properly discovered even when no information on their model or serial number is available.

Thresholds labeled as "Critical" were often only “Warning” temperatures in HP's Insight Manager Agent. The HP Insight Management Agent - Server - WBEM connector now detects this problem and sets the right thresholds.

Display IDs have been improved to facilitate the identification of components.

HP Servers Running Windows: Disk controllers and their batteries are now properly discovered even when no information on their model or serial number is available.

IBM AIX Servers:

Network statistics were not collected for physical ports that were part of SEA Virtual Adapters.

Power failures were not detected.

Network delays had been observed when the enstat command used to collect Ethernet ports statistics on IBM AIX servers was run on disabled ports.

HBA ports were only considered active if a tape drive or hard disk was attached to them. HBA ports will now be considered active if an enabled path is associated to them.

Ports that were used as failover ports by MPIO were considered disabled. This caused false link down alerts and stopped the monitoring of ports that were in fact active.

Child devices attached to FC ports were not monitored.

IBM Storwize (SSH): LEDs were not reporting all faults on both v3700 and v7000 systems.

IBM x Series Servers: On rare occasions, duplicate processor instances could appear in your monitoring environment because the IBM Director Agent reported each processor twice.

Oracle/Sun Solaris:

Disks branded as Sun and larger than 1TB were excluded from the discovery because the expected product tag was "SUNxxxG" (e.g. the connector would consider "SUN450G" but exclude "SUN2T").

Due to a recent modification in the psrinfo command output, cores were reported as full processors. They are now grouped under a single physical CPU.

Too many CPU cores were detected for Sun Solaris MultiCore Processors.

SUN SPARC Servers:

(Prtpicl): No thresholds appeared for fan sensors when LowWarningThreshold did not exist for fan instances. LowPowerOffThreshold will now be used whenever this situation occurs.

(Running Solaris): False alarms were triggered on LED instances.

(Running Solaris): Invalid values were reported or false alarms were triggered for temperature and voltage sensors.

Quantum Scalar i2000 and i6000: The StatusInformation parameter showed the “Unknown Cleaning Status” message even though the "Cleaning Status" was not available.

VMware:

Authentication failures for some ESXi servers could occur when monitoring VMware ESXi servers using vCenter as a multi-tier authentication server.

The port status for link down ports had been modified in VMware ESX 5.5, which caused the MS_HW_VMwareESX4i.hdf connector to falsely report port failures.

Error messages mentioning esxcli appeared in the System Output Window on PATROL Agents running on Linux systems.

Connectors

The Dell OpenManage Storage Manager connector failed to interpret the status of Non-RAID disks (reported as Unknown instead of OK).

On Windows systems, the DiskPart connector was causing a thread/handle leak in the VDS.EXE process (Virtual Disk Service) which could cause the corresponding service to crash.

Emulex HBA monitoring failed when hbacmd was not installed in /usr/sbin/hbanyware/hbacmd. The Emulex HBAs (hbacmd) connector will now run the command without the full path. Please note that this modification requires hbacmd to be added to the PATH environment variable of the user used to monitor the server.

The status code returned by the IBM Director Agent for the controller’s battery was occasionally misinterpreted as degraded.

The IPMI connector:

recognizes Oracle specific SPARC power supply sensors.

excludes the SEL Fullness sensor to avoid getting SEL Fullness alerts when monitoring an IBM server using IPMI.

discards the monitoring of the BIST_FAIL sensor that reports false CPU alerts on Cisco UCS Blades.

Monitored Components

Devices classified as “Other Devices” (CP Modules, etc.) are now attached to their respective enclosures.

The StatusInformation parameter was reporting devices as present when these devices were missing.

The link status for the HP-UX network cards was always "Unknown".

The KM monitors LSI sas2ircu-Managed RAID Controllers even though the manufacturer's agent does not report its status or the agent is not installed.

The current speed of the CPU core is reported through the CurrentSpeed parameter.

Disks:

Hardware Sentry rounded logical disks size for disks bigger than 1 TB (e.g.: the size of a 1.4 TB logical disk was displayed as 1 TB). The size of disks bigger than 1 TB is now rounded to one decimal place.

For better readability, the physical disk size is expressed in the most relevant unit (GB, TB, etc.).

The availability of the enclosure is reported through the Present parameter.

Redundant fans sometimes reported a speed/speed percent reading of zero, which triggered an alert even if no thresholds were set. The KM now disables the speed/speed percent parameters if a valid status is collected to avoid this issue while maintaining full monitoring.

The Status parameter units for LEDs have been changed to be more meaningful.

Windows MPIO LUNs Monitoring: Because Windows regularly changes the unique identifiers of LUNs and physical disks, false missing/present alerts could occur for LUNs and duplicate instances could appear.

LUNs Monitoring on WMI Disks:

If a Windows server had both LUNs and local non-RAID physical disks, then the WMI - Disks connector monitored both as local physical disks. LUNs will now be excluded from the monitoring.

EMC PowerPath LUNs were monitored when local physical disks were missing.

Logical Disks Monitoring (Windows Environment): "No Collect Value" messages were displayed in the System Output Window when the "Virtual Disk" service failed. To solve this issue, the Windows - DiskPart connector (MS_HW_DiskPart.hdf) no longer activates on non-English versions of Windows.

Some physical disks were missing when monitoring Linux / Solaris servers with Adaptec StorMan managed RAID cards.