Overview
When cOS Stream is running on dedicated Clavister hardware (and not in a virtual environment), various sensors in the hardware will provide cOS Stream with information about the status of the hardware components. This information ranges from temperatures and fan speeds to indicating if components, such as power supply units, are operational.The Sensor Polling Interval
By default, cOS Stream polls all hardware sensors every 5 seconds and stores the last retrieved sensor information until the next poll overwrites it. This 5 second interval can be changed globally by the administrator to another value between 1 and 30 seconds. For example, if the polling interval is to be changed to 20 seconds, the following CLI command would be used:System:/>
set Settings HWMONSettings SensorRefreshInterval=20
Type Of Hardware Monitoring
The Clavister NetShield Firewall provides two types of hardware monitoring that makes use of the information collected by sensor polling:System hardware monitoring.
User hardware monitoring.
These types are explained in the sections that follow.
In contrast to System Hardware Monitoring, the feature called User Hardware Monitoring is used to generate log event messages for specific sensors and conditions that are defined by the administrator.
Enabling User Hardware Monitoring
To enable user hardware monitoring for a sensor, a HWMONMonitor object must be created in the configuration and associated with the sensor. Only one sensor can be associated with one HWMONMonitor object.Assume that monitoring of the CPU temperature is required. If the sensor name is CPU_Temp and the new object is to be called mon1, user monitoring for this sensor can be enabled with the following CLI command:
System:/>
add HWMONMonitor mon1 SensorID=CPU_Temp
The HWMONMonitor object created will generate an event message when its sensor has a value that moves outside the thresholds defined for the object (in the case of temperatures and fan speeds) or when the status changes (in the case of a failed power supply).
Another log message is generated as the sensor value moves back to a normal value and becomes equal to the threshold value. In the above command, no minimum or maximum threshold values are specified so cOS Stream will use the default thresholds for this sensor. All sensors have default thresholds already defined. Note that the SensorID property must always be specified when creating an HWMONMonitor object.
By default, a severity of Warning is used for all log messages generated. The severity can be explicitly set by changing the Severity property of an HWMONMonitor object. For example, to set the severity to Emergency for the mon1 object, the CLI command would be:
System:/>
set HWMONMonitor mon1 Severity=Emergency
Sensor Types
Sensors are of two types:Numeric sensors - These provide numeric values. For example, temperature or fan speed.
Binary sensors - These provide a value that is either zero or one. For example, a PSU being operational or not operational.
Understanding the thresholds of a numeric sensor is straightforward but binary sensors require some additional explanation. Binary sensors have a value which is either 1 or 0. However, it depends on the sensor which value indicates a "normal" condition. For example, one sensor could indicate that a PSU is installed and its normal value is 1 (the PSU is present). Another sensor might indicate that a PSU has failed and its normal value is 0 (the PSU is operational and has not failed).
The following should be noted about HWMONMonitor object thresholds for binary sensors:
A binary sensor that has a normal value of 1 will have low and high thresholds that are both 1. When the sensor's value becomes 0, this will indicate an abnormal condition.
A binary sensor that has a normal value of 0 will have low and high thresholds that are both 0. When the sensor's value becomes 1, this will indicate an abnormal condition.
Changing Threshold Values
The lower and upper thresholds for an HWMONMonitor object associated with a numeric sensor can be changed from the default, as shown in the following example:System:/>
set HWMONMonitor mon1 LowThresh=-5 HighThresh=100
Note that negative values can be used for temperatures.
![]() |
Important: Do not change thresholds of binary sensors |
---|---|
The thresholds associated with a binary sensor, such as PSU failure, should not be changed by the administrator. The default threshold values should always be used. |
Changing the Monitoring Interval for the HWMONMonitor Object
The Interval property of an HWMONMonitor object decides how often, in seconds, the data from the object's associated sensor is examined. A higher value for this property will cause less log messages to be sent. For example, to examine the CPU temperature every 30 seconds, the CLI command would be as follows:System:/>
set HWMONMonitor mon1 Interval=30
Note that this interval does not affect the rate at which the sensors are polled, which is controlled
by the global setting SensorRefreshInterval (described at the beginning of this topic).
The Interval property only affects the frequency with which the currently stored sensor values
are examined by the HWMONMonitor object to determine if a log message should be generated.
Log Messages
A message is generated when the sensor value passes outside the thresholds specified. A second log message is generated when the sensor value passes back inside the thresholds specified. Similarly, for a binary sensor, a log message is generated when the state of the sensor changes in either direction.The log messages generated by HWMONMonitor objects all belong to the HWMON log message category. As described previously, the severity of the message is Warning by default but this can be changed by setting the Severity property of the HWMONMonitor object generating the message. Below are some examples of the log messages created by user hardware monitoring.
A sensor value above a maximum threshold:
SYSTEM,HWMON: prio=warning id=1081 event=sensor_value_above_monitor_threshold sensorid=”CPU_Temp” description="CPU Temperature" name="n2" value=81 threshold=80 action=none
A sensor value below a minimum threshold:
SYSTEM,HWMON: prio=warning id=1079 event=sensor_value_below_monitor_threshold sensorid=”CPU_Temp” description="CPU Temperature" name="n2" value=29 threshold=30 action=none
A sensor value crossing a threshold back to a normal value:
SYSTEM,HWMON: prio=warning id=1082 event=sensor_returned_to_normal sensorid=”CPU_Temp” description="CPU Temperature" name="n2" value=80 action=none
The Log Repeat Interval
While the sensor value remains outside of its thresholds, a HWMONMonitor object will regenerate the log message, by default, every 6 hours so that the administrator continues to be reminded that the abnormal condition exists.The default log repeat interval value of 6 hours can be changed by assigning a new value to the LogRepeatInterval property of the HWMONMonitor object. The value is specified as an integer number of seconds. For example, the following command would change the log repeat interval for the monitor called mon1 to become 12 hours:
System:/>
Set HWMONMOnitor mon1 LogRepeatInterval=43200
Note that the minimum allowed value for the LogRepeatInterval property is 30 seconds and the maximum is 86,400 seconds (24 hours).
Displaying the Available Sensors
To see all the current available sensors, the hwmon -sensorlist command can be used. The following shows some typical output (the last two right hand columns showing lowest and highest have been truncated to fit the page width):System:/>
hwmon -sensorlist
Sensor ID Description Unit Value Monitor Min Max
-------------- ------------------------ ---- ----- ------- ----- -----
CPU_Temp CPU Temperature C 58 yes 0 98
System_Temp System Temperature C 24 no 0 55
System_Power System Power Consumption W 224 no 0 500
System_12V System Internal Power mV 12126 no 11500 13000
FAN1_RPM System FAN1 Speed RPM 4200 no 1500 6800
FAN2_RPM System FAN2 Speed RPM 4000 no 1500 6800
FAN3_RPM System FAN3 Speed RPM 4100 no 1500 6800
PSU1_Avail PSU1 Available - 1 no 1 1
PSU1_Fail PSU1 Failure Detected - 0 no 0 0
PSU1_Input_Lost PSU1 Power Input Lost - 0 no 0 0
PSU2_Avail PSU2 Available - 1 no 1 1
PSU2_Fail PSU2 Failure Detected - 0 no 0 0
PSU2_Input_Lost PSU2 Power Input Lost - 0 no 0 0
The following should be noted about the above output:
The Value column indicates the value that the sensor returned when it was last polled. Here, the CPU temperature was 58 degrees when that sensor was last polled.
The Min and Max columns show the default thresholds that will be used if none are explicitly specified when an HWMONMonitor object is created. These default thresholds are fixed and cannot be changed by the administrator.
The value of no under the Monitor column means that no HWMONMonitor object is currently associated with that sensor. If there is at least one HWMONMonitor object associated with the sensor, the value in the column becomes Yes. Disabling an HWMONMonitor object will not affect its associated Yes value in this column.
Displaying the Current HWMONMonitor List
To see all the current HWMONMonitor objects, the hwmon is used with no parameters. The following shows some typical output for two configured HWMONMonitor objects called mon1 and mon2 that are monitoring CPU temperature:System:/>
hwmon
Name Sensor Description Value Low High Status #Low #High
---- -------- ---------------- ----- --- ---- ------ ---- -----
mon1 CPU_Temp CPU Temperature 59 0 98 NORMAL 0 0
mon2 CPU_Temp CPU Temperature 59 20 70 NORMAL 0 0
Here, the Low and High columns are the
currently defined thresholds for these HWMONMonitor objects.
The #Low and #High columns are the total
number of alarms triggered when the low and high thresholds are passed. Alarms are explained later in this section.
Note that in the above list, there are two HWMONMonitor objects defined for the same sensor. This is permissible and means that separate log messages can be generated for different sensor ranges.
The Status column in the above output indicates the status of the sensor value when it was last examined by the HWMONMonitor object and can show the following values:
NORMAL - The value is within the thresholds.
ALARM:LOW - The value has exceeded the minimum threshold and has not yet returned to normal.
ALARM:HIGH - The value has exceeded the maximum threshold and has not yet returned to normal.
WARNING: RETURN LOW - The value has just returned from being outside the minimum threshold to the normal range. The column will show NORMAL after the next reading of sensor values.
WARNING: RETURN HIGH - The value has just returned from being outside the maximum threshold to the normal range. The column will show NORMAL after the next reading of sensor values.
UNKNOWN STATUS - The possible reasons for this column entry are the following:
cOS Stream is in the process of restarting and no values are yet available.
The SensorID property of the HWMONMonitor object does not correspond to any sensor in the hardware.
The sensor has a fault and is not returning values.
Displaying HWMONMonitor Properties
To see all the properties for an HWMONMonitor object, the hwmon <monitor-name> command can be used. For example, to see all the values for the HWMONMonitor called mon1, the command would be:System:/>
hwmon mon1
Name : mon1
Sensor id : CPU_Temp
Log severity setting : warning
Description : CPU Temperature
Value : 58
Low Threshold : 0
Recommended low threshold : 0
High Threshold : 98
Recommended high threshold : 98
Action interval (seconds) : 5
ALARM:LOW counter : 1
ALARM:HIGH counter : 3
Last status : NORMAL
Note that the Recommended low threshold and the Recommended high threshold in the above output are the default threshold values for the associated sensor.
Sensor Alarms
As shown in the output above, each HWMONMonitor object has two alarm counters associated with it, the ALARM:LOW counter and the ALARM:HIGH counter. These properties of the object can only be read and cannot be manipulated by the administrator.The following should be noted about the alarm counters:
Alarm counters start at zero after starting cOS Stream and are incremented each time the sensor value passes one of the thresholds.
For binary sensors, such as a PSU fail/not fail sensor, the alarm is incremented when the sensor value changes from 1 to 0, or 0 to 1. It is a change of state from the normal that increments the counter. However, only ALARM:HIGH is incremented if a normal value of 0 changes to 1 and only ALARM:LOW is incremented if a normal value of 1 changes to 0.
Alarm counters are not decremented. This allows the administrator to be able to see the total number of times the alarm has been triggered. The counters are only zeroed when cOS Stream is restarted or when the hardware monitoring is disabled globally and then re-enabled.
The sensor information that is gathered by cOS Stream is available to an SNMP client using the SNMP protocol. This type of monitoring is known as System Monitoring.
System monitoring only has one option available that can be set by the administrator, and that is if it is enabled or disabled. All hardware monitoring is enabled by default, but this can be disabled by the following CLI command:
System:/>
set Settings HWMONSettings MonitorEnable=No
This will disable both system and user monitoring. Both can be re-enabled with the following command:
System:/>
set Settings HWMONSettings MonitorEnable=Yes
This setting affects both system monitoring and user monitoring
The following are the available sensors for the current range of Clavister hardware products running cOS Stream. All fan speeds are given in RPM and temperatures are in degrees centigrade.
Sensor Name | Sensor Type | Sensor Number | Minimum Limit | Maximum Limit |
---|---|---|---|---|
CPUTemp | TEMP | 0 | 0 | 80 |
Sensor Name | Sensor Type | Sensor Number | Minimum Limit | Maximum Limit |
---|---|---|---|---|
Left_PSU | GPIO | 0 | 1 | |
Right_PSU | GPIO | 0 | 1 | |
SysTemp1 | TEMP | 256 | 0 | 70 |
SysTemp2 | TEMP | 257 | 0 | 70 |
SysFan1 | FANRPM | 256 | 1500 | 12800 |
SysFan2 | FANRPM | 258 | 1500 | 12800 |
SysFan3 | FANRPM | 260 | 1500 | 12800 |
SysFan4 | FANRPM | 260 | 1500 | 12800 |
CPUTemp1 | TEMP | 512 | 0 | 80 |
Sensor Name | Sensor Type | Sensor Number | Minimum Limit | Maximum Limit |
---|---|---|---|---|
System Vcore Internal | VOLT | 0 | 0.494 | 1.744 |
System 12V Internal | VOLT | 1 | 11.4 | 13.9 |
System 5V Internal | VOLT | 2 | 4.8 | 5.8 |
System 3.3V Internal | VOLT | 3 | 2.976 | 3.632 |
System CMOS Battery | VOLT | 4 | 2.704 | 3.632 |
System FAN Speed | FANRPM | 5 | 1400 | 14000 |
System Temperature 1 | TEMP | 6 | 0 | 71 |
System Temperature 2 | TEMP | 7 | 0 | 75 |
System Temperature 3 | TEMP | 8 | 0 | 85 |
CPU Core Temperature | TEMP | 9 | 0 | 85 |
Sensor Name | Sensor Type | Sensor Number | Minimum Limit | Maximum Limit |
---|---|---|---|---|
System Vcore Internal | VOLT | 0 | 0.494 | 1.302 |
System 3.3V Internal | VOLT | 1 | 3.135 | 3.465 |
System 12V Internal | VOLT | 2 | 11.4 | 12.6 |
System CMOS Battery | VOLT | 3 | 1.9 | 3.465 |
System 5V Internal | VOLT | 4 | 4.75 | 5.25 |
System FAN Speed | FANRPM | 5 | 1800 | 14000 |
System Temperature 1 | TEMP | 6 | 0 | 80 |
System Temperature 2 | TEMP | 7 | 0 | 80 |
CPU Socket Temperature | TEMP | 8 | 0 | 95 |
Sensor Name | Sensor Type | Sensor Number | Minimum Limit | Maximum Limit |
---|---|---|---|---|
PSU1 Installed | GPIO | 0 | 0 | 1 |
PSU1 Power OK | GPIO | 256 | 0 | 1 |
PSU1 Temperature | TEMP | 512 | 0 | 0 |
PSU1 Fan Speed | FANRPM | 768 | 5000 | 14000 |
PSU1 Output Voltage | VOLT | 1024 | 0 | 0 |
PSU1 Output Current | CURR | 1280 | 0 | 0 |
PSU1 Input Power | POWER | 1536 | 0 | 0 |
PSU1 Output Power | POWER | 1792 | 0 | 0 |
PSU2 Installed | GPIO | 2048 | 0 | 1 |
PSU2 Power OK | GPIO | 2304 | 0 | 1 |
PSU2 Temperature | TEMP | 2560 | 0 | 0 |
PSU2 Fan Speed | FANRPM | 2816 | 5000 | 14000 |
PSU2 Output Voltage | VOLT | 3072 | 0 | 0 |
PSU2 Output Current | CURR | 3328 | 0 | 0 |
PSU2 Input Power | POWER | 3584 | 0 | 0 |
PSU2 Output Power | POWER | 3840 | 0 | 0 |
System Vcore Internal | VOLT | 4096 | 0 | 1.744 |
System 3.3V Internal | VOLT | 4097 | 0 | 0 |
System 12V Internal | VOLT | 4098 | 11.5 | 13.0 |
System CMOS Battery | VOLT | 4099 | 2.9 | 3.2 |
System 5V Internal | VOLT | 4100 | 0 | 0 |
System FAN1 Speed | FANRPM | 4101 | 6000 | 14000 |
System FAN2 Speed | FANRPM | 4102 | 6000 | 14000 |
System FAN3 Speed | FANRPM | 4103 | 6000 | 14000 |
System Temperature 1 | TEMP | 4104 | 0 | 80 |
System Temperature 2 | TEMP | 4105 | 0 | 80 |
Air Intake Temp | TEMP | 4105 | 0 | 50 |
CPU Temperature | TEMP | 4352 | 0 | 95 |