4.1. How Monitoring Variables are Stored

Variables are kept in memory and when the UI requests data to build a graph, the server looks in memory buffer first. If the graph requires data beyond what it has in memory, it makes query to tsdb to pull old data. But first, it needs to find the variable and that is done by comparing the “triplet” from the query with what it has in memory buffers (no tsdb queries are made at this time). The “triplet” consists of three components separated by a dot:

variableName . DeviceId . ComponentId

For example:

cpuUtil.5.32

Here the variable name is “cpuUtil”, device id is 5 and component id is 32. In case of variables related to interfaces, the index (the last component in the triplet) is equal to ifIndex of the interface. In other cases the index is taken from the SNMP OID used to collect data about the component. Index is always unique for the given component of the device, that is, there are never two components of the same device with equal index values, however components of different devices can have the same index. Combination of device id and component index uniquely identifies the component.

Each monitoring variable is a complex object that has the following important attributes (among others):

  • triplet
  • set of tags
  • time series buffer

Tags that belong to the variable are “borrowed” from the device and component it refers. The variable can have its own tags as well, these tags do not belong to either device or a component, instead, they are directly added to the set of tags of the variable.

Time series buffer is a list of (timestamp, value) pairs. The length of this list corresponds to the duration of the time series kept in memory for fast access. NSG appends new (timestamp, value) pair to the end of the list and drops pair from the other end of the list on every polling cycle. This means the list always has the same constant size determined by one of configuration parameters.

As mentioned above, monitoring variables are stored in memory for fast access and in the Time Series Database (tsdb) for long term storage. Aggregation and alerting Python apps can work only with variables present in memory because access to tsdb is too slow.

There are two configuration parameters at play:

monitor.memoryPoolDurationHrs
monitor.retentionHrs

memoryPoolDurationHrs determines how long we keep data in memory as it keeps coming, this is maximum length of the time series buffer. As new data points come from the monitor, they are added to the head of the buffer and oldest data points are pulled from the back of the buffer and thrown away. The memory buffer always has no more than memoryPoolDurationHrs hours of data (or less, if the server has just started)

Parameter retentionHrs determines the behavior of the system when data points for a given monitoring variable stop coming in. There can be many reasons for that, for example the device has stopped responding to queries for the component the variable corresponds to, or device has been decommissioned and removed all together, or perhaps operator reconfigured NSG to stop polling for the corresponding parameter. When this happens, the monitoring variable and its time series buffer are still there, but no new data points are added to it and old ones are not popped from the back. The server periodically checks the timestamp of latest data points in all variables and if it finds that the timestamp is older than ( now - retentionHrs ),

The purpose of this is to drop variables that track things that don’t exist anymore.

Looking at this the opposite way, the purpose of retention is to keep data in memory for a while even when interfaces or whole devices go down and stop responding. This way, you can still get graphs for them, even though corresponding line in the graph will stop at some point in time. However this can only work for a limited time which is no longer than smaller of the values of parameters memoryPoolDurationHrs and retentionHrs.