BMC TrueSight Operations Management -  Hardware

Release Notes for v1.9.50

Home  Previous  Next

What's New

Functionality

TrueSight OM - Hardware fully supports SNMP v2c and SNMP v3.
Two new configuration variables, removeAllThresholds and trimFromDisplayName, are available to respectively delete all existing thresholds from the configuration and specify the characters to be removed from the object display names.

Supported Platforms

New Supported Platforms

EMC CLARiiON and VNX systems (monitored through Navisphere CLI).
EMC Isilon Clusters (monitored via SSH).
Hitachi (HDS) AMS, HUS Storage Systems (monitored through the Hitachi Storage Navigator Modular 2 CLI).
HP P2000 G3 systems (monitored via WBEM).
Huawei Storage Systems (OceanStor) (monitored via SNMP)
Huawei Servers (monitored via their management cards).
IBM DataPower Appliances (monitored via SNMP)
IBM PureFlex Chassis (monitored via SNMP)
IBM Storwize v3700 systems  (monitored via SSH)
Lenovo servers (via their IMM)
MacroSan Storage Systems (monitored via SNMP)
Oracle/Sun InfiniBand DCS Switches (monitored via SNMP)
Oracle/Sun servers (monitored through the Oracle Hardware Management Agent - Recommended method)
Oracle/Sun ZFS Storage Appliances (monitored via SSH)
Pure Storage FA Series (monitored via SSH)
QLogic HBA adapters on ESX servers (monitored via WBEM)
VMware ESX6 (monitored via WBEM)

Improved Platforms

Cisco

Non-port components for Cisco Ethernet Switches (power supplies, fans, temperature sensors, and voltages) are supported. They are monitored via SNMP.

IBM AIX servers

TrueSight OM - Hardware reports the status of the System Attention LED and triggers an alert if a hardware problem has been reported on each system since 0:00am.
Logical disks (hdisks), physical disks (pdisks) and batteries managed by sissasraidmgr are supported.

Oracle/Sun

Hardware monitoring is available for any Oracle/Sun system with an ILOM.
MPIO LUNs managed with the mpathadm utility are supported.
Oracle/Sun Xsigo switches are supported (monitored via SNMP).
The overall CPU and memory status are supported for Oracle/Sun Solaris 11 servers.
Network port statistics are now available for Oracle/Sun - Xsigo Switches.
SPARC Enterprise Mx000 (XSCF): Negative voltage sensors, such as -12V, are discovered.

VMware

The pathcount of LUNs is monitored for VMware ESX5i / vSphere 5 host servers.
Individual memory sensors listed under "Other" sensors in the vCenter/vSphere configuration tab are supported.

Monitored Components

The Power State attribute is  available to indicate whether the blade is currently on or off for Dell Blade Servers, Hitachi BladeSymphony Chassis, HP BladeSystem rack, and IBM BladeCenter chassis.

Changes and Improvements

Functionality

Improved scalability: The product better handles large number of hosts to monitor.
Improved stability: The product better handles error conditions on a large number of hosts.
On Windows systems, the monitoring solution could take a very long time to initialize upon startup and after a reinitialize.
When the PATROL Agent is managed from CMA, the default thresholds mechanism is now set to "tuning" to automatically generate alarms/events.
TrueSight OM - Hardware can be configured to automatically delete missing instances after a certain time.
Additional information (BIOS Version, Driver Version, Manufacturer, etc.) is provided in the Hardware Health Report and in events triggered by the Alert Actions.
Specific PATROL Events triggered upon Hardware and Connector failures also indicate the alert origin (monitor type name).
By default, TrueSight OM - Hardware will no longer trigger an event when an internal issue occurs.
TrueSight OM - Hardware debug output file now includes a full hardware inventory.
The debug file and the Hardware Inventory now report the total number of instances for each host for every monitor types.
The maximum number of concurrent collection threads is now set per host through the maxConcurrentCollectThreadsPerHost global configuration variable. The maxConcurrentCollectThreads variable is therefore no longer supported.

Supported Platforms

The monitoring of the Cisco UCS Interconnect Switches has been improved.
Dell Blade Servers:
The blade's hostname was added to each blade display ID.
Timeouts for all commands have been increased.
Fujitsu Eternus:  TrueSight OM - Hardware provides more detailed information about Fujitsu Eternus systems and uses significantly less resources.
The Status of the Hitachi HDS USP/VSP Storage Systems is more accurately reported.
HP ProLiant Servers running VMware ESX:
TrueSight OM - Hardware collects a real time power consumption value from either the VMware ESX CIM agent or HP Insight Management Agent for VMware ESX.
All temperature thresholds of zero have been removed to avoid unwanted temperature alerts.
When HP servers did not return an overall power consumption, the monitoring solution disabled the corresponding attribute and made an estimate in the capacity report. When this information is not available, the monitoring solution will now try to sum up the power consumption of all power supplies before falling back on the estimate.
IBM BladeCenter Chassis: Embedded switches, passthroughs, and management modules are now monitored.
IBM AIX Servers:
Additional information is available fo LUNs (WWN, Array Name, Hardware Location Code, and Expected Number of Paths).
The Status of the System Attention LED is now reported and an alert is triggered when this LED is turned on.
On IBM AIX and VIO servers, the system is now fully identified with its hardware ID, LPAR ID, system ID and model name. More details have also been added to several components (disks, network cards, FC ports, and CPUs) to facilitat their identification in case of a failure.
IBM VIO Server systems are better identified with their model name, code, IDs, etc.
IBM Storwize (SSH):
Additional information is available for physical disks (vendor, model, serial number, firmware, etc.).
The instance name has been changed to "IBM Storwize (SSH)".
Quantum Scalar i2000 and i6000: The components visible identifiers are now based on sensor names / locations /etc.
SUN SPARC Enterprise Mx000 (XSCF): False voltage alerts were triggered due to incorrect thresholds.
  SUN SPARC Servers (Prtpicl): A smaller version of the device ID is used for the display ID to enable easier sensor identification.
VMware ESX: The monitoring of power supplies has been improved. Both VMware ESX health and availability status are used to determine the health of the power supply.
Disk Monitoring on WMware ESX servers (IPMI): Because some servers use the same IPMI Monitored Device ID for all physical disks, the monitoring solution uses the IPMI Device ID to group sensors for each physical disk. The physical disk's caption is now used as the Display ID.

Monitored Components

Logical Disks, Temperature, Voltage and LED instances will be automatically deleted in the BMC TrueSight OM Console as soon as they are detected as "missing".
A more reliable method is being used to associate batteries to their related disk controllers.
Harware Sentry now uses Counter64 OIDs for Ethernet switches equipped with a MIB-2 standard SNMP Agent.
HBA Cards Monitoring on all Windows-based systems : The LUN's naa.ID is now used to identify LUNs. Using this naa.ID helps link LUNs to the Storage System's volume as they share this unique identifier code. The cdisk's Windows MPIO ID as well as the drive letters and partition names of any volumes on that LUN are now also provided. A typical LUN ID will therefore read: naa.60616043312F05A4308DC65F111 (MPIO Disk0 - C:(OS) D:(Data))
LUNs:
A warning is triggered when the number of available paths is one path lower than the initial number of available paths.
The Present attribute is no more available.
The problem type, consequences, and recommended actions are provided when an alert is triggered on the Status and Available Path Count attributes.
Windows MPIO LUNs Monitoring: Because Windows regularly changes the unique identifiers of LUNs and physical disks, false missing/present alerts could occur for LUNs and duplicate instances could appear. To solve this issue, the monitoring solution now uses the LUN's naa.ID, which is unique and does not change.
Network Links:
Improved network monitoring through SNMP MIB-2 for Windows and network switches.
Default thresholds are set on the Error Percent attribute of the Hardware Network Interface monitor type (≥ 10% = warning, ≥ 30% = alarm).

Fixed Issues

Functionality

The product could freeze or stop working in case of repeated discovery timeouts, when reinitializing the KM on large environments or when too many connectors failed at the same time.
In some situations, TrueSight OM - Hardware would not activate the Bandwidth Utilization attribute even if it could collect the network’s bandwidth utilization.
When using a version of the PATROL Agent older than 9.0, the Monitor Type for the Hardware LUNs was missing.

Supported Platforms

The ports link speed was not available for Cisco MDS9000 Series FC switches.
Data Domain Storage Systems: Due to the structure of the Data Domain MIB, specific strings in a Physical Hard Drive's serial number could cause a disk to report an unknown status.
Dell PowerEdge Servers: the physical disk instances were not attached to the proper disk controller instance.
Dell Servers with non-RAID disks could appear twice in BMC TrueSight OM.
Dell TL2000/4000 and IBM TS3100/3200 Tape Libraries: Tape drive mounts were incorrectly reported as errors, which resulted in false alerts to be triggered on the Error Count attributes.
EMC Isilon Systems monitoring: Time stamped log files would fill up the filesystem. These log files will now be sent to /tmp/MS_HW_isi_hw_check without a timestamp to solve this issue.
Hitachi HDS AMS/HUS Storage Systems:
Authentication issues were encountered.
The execution of all commands is fully serialized to prevent conflicts. All temporary files used by the batch files/shell scripts  use randomized file names to prevent file locks and missing files.
Logical disk information was missing.
Logical Disk Status was not collected for some systems when the command output format was not supported by the monitoring solution.
The Linux connector did not work properly.
HP Servers Running VMware:
Disk controllers and their batteries are now properly discovered even when no information on their model or serial number is available.
Thresholds labeled as "Critical" were often only “Warning” temperatures in HP's Insight Manager Agent. The monitoring solution now detects this problem and sets the right thresholds.
Display IDs have been improved to facilitate the identification of components.
HP Servers Running Windows: Disk controllers and their batteries are now properly discovered even when no information on their model or serial number is available.
HP-UX System: In some cases, the value of the Error Percent attribute of the Network monitor type was not reported correctly.
IBM AIX Servers:
Network statistics were not collected for physical ports that were part of SEA Virtual Adapters.
Power failures were not detected.
Network delays had been observed when the enstat command used to collect Ethernet ports statistics on IBM AIX servers was run on disabled ports.
HBA ports were only considered active if a tape drive or hard disk was attached to them. HBA ports will now be considered active if an enabled path is associated to them.
Ports that were used as failover ports by MPIO were considered disabled. This caused false link down alerts and stopped the monitoring of ports that were in fact active.
Child devices attached to FC ports were not monitored.
IBM Storwize (SSH): LEDs were not reporting all faults on both v3700 and v7000 systems.
IBM x Series Servers: On rare occasions, duplicate processor instances could appear in your monitoring environment because the IBM Director Agent reported each processor twice.
IPMI Monitoring:
Oracle specific SPARC power supply sensors is now recognized.
The SEL Fullness sensor is now excluded to avoid getting SEL Fullness alerts when monitoring an IBM server using IPMI.
The BIST_FAIL sensor is now excluded to avoid getting false CPU alerts when monitoring a Cisco UCS Blade.
Oracle/Sun Solaris:
Disks branded as Sun and larger than 1TB were excluded from the discovery because the expected product tag was "SUNxxxG".
Due to a recent modification in the psrinfo command output, cores were reported as full processors. They are now grouped under a single physical CPU.
Too many CPU cores were detected for Sun Solaris MultiCore Processors.
SUN SPARC Servers
(Prtpicl): No thresholds appeared for fan sensors when LowWarningThreshold did not exist for fan instances. LowPowerOffThreshold will now be used whenever this situation occurs.
Sun SPARC servers (Running Solaris): False alarms were triggered on LED instances.
Sun SPARC servers (Running Solaris): Invalid values were reported or false alarms were triggered for temperature and voltage sensors.
VMware:
Authentication failures for some ESXi servers could occur when monitoring VMware ESXi servers using vCenter as a multi-tier authentication server.
The port status for link down ports had been modified in VMware ESX 5.5, which caused the VMware ESXi 4.x connector to falsely report port failures.

Monitored Components

Devices classified as “Other Devices” (CP Modules, etc.) are now attached to their respective enclosures.
Emulex HBA monitoring failed when hbacmd was not installed in /usr/sbin/hbanyware/hbacmd. The monitoring solution will now run the command without the full path. Please note that this modification requires hbacmd to be added to the PATH environment variable of the user used to monitor the server.
The link status for the HP-UX network cards was always "Unknown".
The solution monitors LSI sas2ircu-Managed RAID Controllers even though the manufacturer's agent does not report its status or the agent is not installed.
Multiple instances of the same LSI RAID Controller could appear in the monitoring environment.
The current speed of the CPU core is reported through the Current Speed attribute.
Disks:
TrueSight OM - Hardware rounded logical disks size for disks bigger than 1 TB (e.g.: the size of a 1.4 TB logical disk was displayed as 1 TB). The size of disks bigger than 1 TB is now rounded to one decimal place.
A thread/handle leak could occur in the VDS.EXE process (Virtual Disk Service) when monitoring logical disks in a Microsoft Windows system and could cause the corresponding service to crash.
When servers (typically HP ProLiant) do not report sizes of physical disks, the monitoring solution queries the associated storage extents to find the actual disk size.
The monitoring solution failed to interpret the status of Non-RAID disks (reported as Unknown instead of OK)
The availability of the enclosure is reported through the Present attribute.
Redundant fans sometimes reported a speed/speed percent reading of zero, which triggered an alert even if no thresholds were set. The monitoring solution now disables the speed/speed percent attributes if a valid status is collected to avoid this issue while maintaining full monitoring.
The units for the LED Status attribute have been changed to be more meaningful.
Windows MPIO LUNs Monitoring: Because Windows regularly changes the unique identifiers of LUNs and physical disks, false missing/present alerts could occur for LUNs and duplicate instances could appear.
LUNs Monitoring on WMI Disks:
If a Windows server had both LUNs and local non-RAID physical disks, then TrueSight OM - Hardware monitored both as local physical disks. LUNs will now be excluded from the monitoring.
EMC PowerPath LUNs were monitored when local physical disks were missing.
Because the monitoring solution was unable to report problems on logical disks in Windows environments, logical volumes are no longer displayed for non-English versions of Windows.
Some physical disks were missing when monitoring Linux / Solaris servers with Adaptec StorMan managed RAID cards.