Configure the Hardware Sentry Agent

The Hardware Sentry Agent collects the hardware health of the monitored systems and pushes the collected data to the OTLP receiver. Hardware Sentry OpenTelemetry Collector then processes the hardware observability and sustainability metrics and exposes them in the backend platform of your choice (Datadog, BMC Helix, Prometheus, Grafana, etc.).

To ensure this process runs smoothly, you need to configure a few settings in the config/hws-config.yaml file to allow Hardware Sentry OpenTelemetry Collector to:

  • identify which site is monitored with this agent
  • calculate the electricity costs and the carbon footprint of this site
  • monitor the systems in this site.

Note that all changes made to the config/hws-config.yaml file are taken into account immediately. There is therefore no need to restart the OpenTelemetry Collector.

Configure a site

A site represents the data center or the server room in which all the systems to be monitored are located. Configure your site in the extraLabels section of the config/hws-config.yaml file as shown in the example below:

extraLabels:
  site: boston 

Configure the sustainability settings

To obtain the electricity costs and carbon footprint of your site, configure the extraMetrics section of the config/hws-config.yaml file as follows:

extraMetrics:
  hw.site.carbon_density_grams: 350 # in g/kWh
  hw.site.electricity_cost_dollars: 0.12 # in $/kWh
  hw.site.pue_ratio: 1.8

where:

  • hw.site.carbon_density_grams is the carbon density in grams per kiloWatthour. This information is required to calculate the carbon emissions of your site. The carbon density corresponds to the amount of CO₂ emissions produced per kWh of electricity and varies depending on the country and the region where the data center is located. See the electricityMap Web site for reference.
  • hw.site.electricity_cost_dollars is the electricity price in dollars per kiloWattHour. This information is required to calculate the energy cost of your site. Refer to your energy contract to know the tariff by kilowatt per hour charged by your supplier or refer to the GlobalPetrolPrices Web site.
  • hw.site.pue_ratio is the Power Usage Effectiveness (PUE) of your site. By default, sites are set with a PUE of 1.8, which is the average value for typical data centers.

Configure the monitored hosts

To collect metrics from your hosts, you must provide the following information in the config/hws-config.yaml file:

  • the hostname of the host to be monitored
  • its type
  • the protocol to be used.

Important: Because a typo or incorrect indentation in the hws-config.yaml file could cause your hardware monitoring to fail, it is highly recommended to install the vscode-yaml extension in your editor to benefit from tooltips and autocompletion suggested by the Hardware Sentry Configuration JSON Schema.

Monitored hosts

Systems to monitor are defined under hosts with the below syntax:

hosts:

- host:
    hostname: <hostname>
    type: <host-type>
  <protocol-configuration>

where:

  • <hostname> is the name of the host, or its IP address

  • <host-type> is the type of the host to be monitored. Possible values are:

    • win for Microsoft Windows systems
    • linux for Linux systems
    • network for network devices
    • oob for Out-of-band management cards
    • storage for storage systems
    • aix for IBM AIX systems
    • hpux for HP UX systems
    • solaris for Oracle Solaris systems
    • tru64 for HP Tru64 systems
    • vms for HP Open VMS systems Refer to Monitored Systems for more details.
  • <protocol-configuration> is the protocol(s) Hardware Sentry OpenTelemetry Collector will use to communicate with the hosts: http, ipmi, oscommand, ssh, snmp, wmi, wbem or winrm. Refer to Protocols and credentials for more details.

Protocols and credentials

HTTP

Use the parameters below to configure the HTTP protocol:

Parameter Description
http Protocol used to access the host.
port The HTTPS port number used to perform HTTP requests (Default: 443).
username Name used to establish the connection with the host via the HTTP protocol.
password Password used to establish the connection with the host via the HTTP protocol.

Example

hosts:

  - host:
      hostname: myhost-01
      type: storage
    http:
      https: true
      port: 443
      username: myusername
      password: mypwd

IPMI

Use the parameters below to configure the IPMI protocol:

Parameter Description
ipmi Protocol used to access the host.
username Name used to establish the connection with the host via the IPMI protocol.
password Password used to establish the connection with the host via the IPMI protocol.

Example

hosts:

- host:
    hostname: myhost-01
    type: oob
  ipmi:
    username: myusername
    password: mypwd

OS commands

Use the parameters below to configure OS Commands that are executed locally:

Parameter Description
osCommand Protocol used to access the host.
timeout How long until the local OS Commands time out (Default: 120s).
useSudo Whether sudo is used or not for the local OS Command: true or false (Default: false).
useSudoCommands List of commands for which sudo is required.
sudoCommand Sudo command to be used (Default: sudo).

Example

hosts:
  - host:
      hostname: myhost-01
      type: linux
    osCommand:
      timeout: 120
      useSudo: true
      useSudoCommands: [ cmd1, cmd2 ]
      sudoCommand: sudo

SSH

Use the parameters below to configure the SSH protocol:

Parameter Description
ssh Protocol used to access the host.
timeout How long until the command times out (Default: 120s).
useSudo Whether sudo is used or not for the SSH Command (true or false).
useSudoCommands List of commands for which sudo is required.
sudoCommand Sudo command to be used (Default: sudo).
username Name to use for performing the SSH query.
password Password to use for performing the SSH query.
privateKey Private Key File to use to establish the connection to the host through the SSH protocol.

Example

hosts:
  - host:
      hostname: myhost-01
      type: linux
    ssh:
      timeout: 120
      useSudo: true
      useSudoCommands: [ cmd1, cmd2 ]
      sudoCommand: sudo
      username: myusername
      password: mypwd
      privateKey: /tmp/ssh-key.txt

SNMP

Use the parameters below to configure the SNMP protocol:

Parameter Description
snmp Protocol used to access the host.
version The version of the SNMP protocol (v1, v2c, v3-no-auth, v3-md5, v3-sha).
community The SNMP Community string to use to perform SNMP v1 queries (Default: public).
port The SNMP port number used to perform SNMP queries (Default: 161).
timeout How long until the SNMP request times out (Default: 120s).
privacy SNMP v3 only - The type of encryption protocol (none, aes, des).
privacy password SNMP v3 only - Password associated to the privacy protocol.
username SNMP v3 only - Name to use for performing the SNMP query.
password SNMP v3 only - Password to use for performing the SNMP query.

Example

hosts:

- host:
    hostname: myhost-01
    type: linux
  snmp:
    version: v1
    community: public
    port: 161
    timeout: 120s

- host:
    hostname: myhost-01
    type: linux
  snmp:
    version: v2c
    community: public
    port: 161
    timeout: 120s

- host:
    hostname: myhost-01
    type: linux
  snmp:
    version: v3-md5
    community: public
    port: 161
    timeout: 120s
    privacy: des
    privacyPassword: myprivacypwd
    username: myusername
    password: mypwd

WBEM

Use the parameters below to configure the WBEM protocol:

Parameter Description
wbem Protocol used to access the host.
protocol The protocol used to access the host.
port The HTTPS port number used to perform WBEM queries (Default: 5989 for HTTPS or 5988 for HTTP).
timeout How long until the WBEM request times out (Default: 120s).
username Name used to establish the connection with the host via the WBEM protocol.
password Password used to establish the connection with the host via the WBEM protocol.

Example

hosts:

  - host:
      hostname: myhost-01
      type: storage
    wbem:
      protocol: https
      port: 5989
      timeout: 120s
      username: myusername
      password: mypwd

WMI

Use the parameters below to configure the WMI protocol:

Parameter Description
wmi Protocol used to access the host.
timeout How long until the WMI request times out (Default: 120s).
username Name used to establish the connection with the host via the WMI protocol.
password Password used to establish the connection with the host via the WMI protocol.

Example

hosts:

  - host:
      hostname: myhost-01
      type: win
    wmi:
      timeout: 120s
      username: myusername
      password: mypwd

WinRM

Use the parameters below to configure the WinRM protocol:

Parameter Description
winrm Protocol used to access the host.
timeout How long until the WinRM request times out (Default: 120s).
username Name used to establish the connection with the host via the WinRM protocol.
password Password used to establish the connection with the host via the WinRM protocol.
protocol The protocol used to access the host: HTTP or HTTPS (Default: HTTP).
port The port number used to perform WQL queries and commands (Default: 5985 for HTTP or 5986 for HTTPS).
authentications Ordered list of authentication schemes: NTLM, KERBEROS (Default: NTLM).

Example

hosts:

  - host:
      hostname: server-11
      type: win
    winrm:
      protocol: http
      port: 5985
      username: myusername
      password: mypwd
      timeout: 120s
      authentications: [ntml]

Additional settings (Optional)

Alert Settings

Disabling Alerts (Not Recommended)

To disable Hardware Sentry OpenTelemetry Collector's alerts:

  • for all your hosts, set the disableAlerts parameter to true just before the hosts section:

    disableAlerts: true
    
    hosts: # ...
    
  • for a specific host, set the disableAlerts parameter to true in the relevant host section:

    hosts:
    
    - host:
        hostname: myhost
        type: linux
      snmp:
        version: v1
        community: public
        port: 161
        timeout: 120s
      disableAlerts: true
    

Hardware Problem template

When detecting a hardware problem, Hardware Sentry OpenTelemetry Collector triggers an alert as OpenTelemetry log. The alert body is built from the following template:

Hardware problem on ${FQDN} with ${MONITOR_NAME}.${NEWLINE}${NEWLINE}${ALERT_DETAILS}${NEWLINE}${NEWLINE}${FULLREPORT}

To change this default hardware problem template:

  • for all your hosts, configure the hardwareProblemTemplate parameter just before the hosts section:

    hardwareProblemTemplate: Custom hardware problem on ${FQDN} with ${MONITOR_NAME}.
    
    hosts: # ...
    
  • for a specific host, configure the hardwareProblemTemplate parameter in the relevant host section:

    hosts:
    
    - host:
        hostname: myhost
        type: linux
      snmp:
        version: v1
        community: public
        port: 161
        timeout: 120s
      hardwareProblemTemplate: Custom hardware problem on myhost with ${MONITOR_NAME}.
    

and indicate the template to use when building alert messages.

For more information about the alert mechanism and the macros to use, refer to the Alerts page.

Authentication Settings

Basic authentication header

The Hardware Sentry OpenTelemetry Collector's internal OTLP Exporter authenticates itself with the OTLP gRPC Receiver by including the HTTP Authorization request header with the credentials. A predefined Basic Authentication Header value is stored internally and included in each request when sending telemetry data.

To override the default value of the Basic Authentication Header, add a new Authorization header under the exporter:otlp:headers section:

exporter:
  otlp:
    headers:
      Authorization: Basic <credentials>

hosts: # ...

where <credentials> are built by first joining your username and password with a colon (myUsername:myPassword) and then encoding the value in base64.

For more security, encrypt the Basic <credentials> value. See Encrypting Passwords for more details.

Warning: If you update the Basic Authentication Header, you must generate a new .htpasswd file for the OpenTelemetry Collector Basic Authenticator.

Monitoring Settings

Collect period

By default, Hardware Sentry OpenTelemetry Collector collects metrics from the monitored hosts every minute. To change the default collect period:

  • for all your hosts, add the collectPeriod parameter just before the hosts section:

    collectPeriod: 2m
    
    hosts: # ...
    
  • for a specific host, add the collectPeriod parameter in the relevant host section:

    hosts:
    
    - host:
        hostname: myhost
        type: linux
      snmp:
        version: v1
        community: public
        port: 161
        timeout: 120s
      collectPeriod: 1m30s # Customized
    

Warning: Collecting metrics too frequently can cause CPU-intensive workloads.

Connectors

The Hardware Sentry OpenTelemetry Collector comes with the Hardware Connector Library, a library that consists of hundreds of hardware connectors that describe how to discover hardware components and detect failures. When running Hardware Sentry OpenTelemetry Collector, the connectors are automatically selected based on the device type provided and the enabled protocols. You can however indicate to Hardware Sentry OpenTelemetry Collector which connectors should be used or excluded.

Use the parameters below to select or exclude connectors:

Parameter Description
selectedConnectors Connector(s) to use to monitor the host. No automatic detection will be performed.
excludedConnectors Connector(s) that must be excluded from the automatic detection.

Connector names must be comma-separated, as shown in the example below:

hosts:

  - host:
      hostname: myhost-01
      type: win
    wmi:
      timeout: 120s
      username: myusername
      password: mypwd
    selectedConnectors: [ VMwareESX4i, VMwareESXi ]
    excludedConnectors: [ VMwareESXiDisksStorage ]

Note: Any mispelled connector will be ignored.

To know which connectors are available, refer to Monitored Systems or run the below command:

$ hws -l

For more information about the hws command, refer to Hardware Sentry CLI (hws)

Discovery cycle

Hardware Sentry OpenTelemetry Collector periodically performs discoveries to detect new components in your monitored environment. By default, Hardware Sentry OpenTelemetry Collector runs a discovery after 30 collects. To change this default discovery cycle:

  • for all your hosts, add the discoveryCycle just before the hosts section:

    discoveryCycle: 15
    
    hosts: # ...
    
  • for a specific host, add the discoveryCycle parameter in the relevant host section:

    hosts:
    
    - host:
        hostname: myhost
        type: linux
      snmp:
        version: v1
        community: public
        port: 161
        timeout: 120s
      discoveryCycle: 5 # Customized
    

and indicate the number of collects after which a discovery will be performed.

Warning: Running discoveries too frequently can cause CPU-intensive workloads.

Extra labels

Add labels in the extraLabels section to override the data collected by the Hardware Sentry Agent or add additional attributes to the Host Resource. These attributes are added to each metric of that Resource when exported to time series platforms like Prometheus.

In the example below, we override the host.name attribute resolved by Hardware Sentry OpenTelemetry Collector with host01.internal.domain.net and indicate that it is the Jenkins app:

hosts:

- host:
    hostname: host01
    type: Linux
  snmp:
    version: v1
    port: 161
    timeout: 120
  extraLabels:
    host.name: host01.internal.domain.net
    app: Jenkins

Hostname resolution

By default, Hardware Sentry OpenTelemetry Collector resolves the hostname of the host to a Fully Qualified Domain Name (FQDN) and displays this value in the Host Resource attribute host.name. To display the configured hostname instead, set resolveHostnameToFqdn to false:

resolveHostnameToFqdn: false

hosts:

- host:
    hostname: host01
    type: Linux

Job pool size

By default, Hardware Sentry OpenTelemetry Collector runs up to 20 discovery and collect jobs in parallel. To increase or decrease the number of jobs Hardware Sentry OpenTelemetry Collector can run simultaneously, add the jobPoolSize parameter just before the hosts section:

jobPoolSize: 20

hosts: # ...

and indicate a number of jobs.

Warning: Running too many jobs in parallel can lead to an OutOfMemory error.

Sequential mode

By default, Hardware Sentry OpenTelemetry Collector sends the queries to the host in parallel. Although the parallel mode is faster than the sequential one, too many requests at the same time can lead to the failure of the targeted system.

To force all the network calls to be executed in sequential order:

  • for all your hosts, enable the sequential option just before the hosts section (NOT RECOMMENDED):

    sequential: true
    
    hosts: # ...
    
  • for a specific host, enable the sequential option in the relevant host section:

    hosts:
    
    - host:
        hostname: myhost
        type: linux
      snmp:
        version: v1
        community: public
        port: 161
        timeout: 120s
      sequential: true # Customized
    

Warning: Sending requests in sequential mode slows down the monitoring significantly. Instead of using the sequential mode, you could increase the maximum number of allowed concurrent requests in the monitored system, if the manufacturer allows it.

Timeout, Duration and Period Format

Timeouts, durations and periods are specified with the below format:

Unit Description Examples
s seconds 120s
m minutes 90m, 1m15s
h hours 1h, 1h30m
d days (based on a 24-hour day) 1d

Security settings

Trusted certificates file

A TLS handshake takes place when the Hardware Sentry Agent's OTLP Exporter instantiates a communication with the OTLP gRPC Receiver. By default, the internal OTLP Exporter client is configured to trust the OTLP gRPC Receiver's certificate security/otel.crt.

If you generate a new server's certificate for the OTLP gRPC Receiver, you must configure the trustedCertificatesFile parameter under the exporter:otlp section:

exporter:
  otlp:
    trustedCertificatesFile: security/new-server-cert.crt

hosts: # ...

The file should be stored in the security folder and should contain one or more X.509 certificates in PEM format.

No results.