2.7. Time Series Database

NetSpyGlass can be configured to work with different types of the external storage where it will keep monitoring data. We call this Time Series Database. At this time NetSpyGlass supports the following time series databases:

  • RRD
  • Grpahite
  • InfluxDb
  • Hbase

Each of these has its own properties and the choice is mostly dictated by the scale of your installation.

NetSpyGlass has embedded time series database that is used by default if you dont make any changes to the configuration file. This simple storage is based on RRD (Round Robin Database) and keeps monitoring data in binary files on the machine where NetSpyGlass is running. This is similar to the popular utility rrdtool however it is not the same and rrd files it creates are incompatible with rrdtool. Like rrdtool, this storage automatically averages data after certain period of time.

  RRD Graphite InfluxDb Hbase
external vs embedded embedded external external external
ease of install and maintenance trivial easy easy complex
throughput poor ok very fast very fast
redundancy no no yes yes
distributed operation no no supported supported
recommended size up to 10k variables up to 20k variables tested with 500k variables tested with 1M variables
external access to the data not possible built-in and ext. tools external tools difficult

The choice of the tsdb storage is determined by the parameter monitor.storage.protocol in the configuration file. Prototype configuration files that ship with NetSpyGlass distribution package list basic configuration parameters for all four tsdb variants. RRD storage is turned on by default. To switch to another one, just change the value of the parameter monitor.storage.protocol.

2.7.1. Configuration

TSDB configuration is located inside of the configuration block monitor.storage. Multiple TSDB backends can be configured simultaneously but only one is used. Each TSDB backend is defined by a dictionary that has at least one item with name type. The name of the dictionary parameter can be anything, TSDB type it describes is defined by the value of the item type inside of it. The following example defines TSDB backends with names rrd_basic, hbase_cluster_0 and influxdb_server_1, of which NetSpyGlass is actually going to use rrd_basic:

monitor {
    storage {

       protocol = rrd_basic

       rrd_basic {
           type = rrd
           # other RRD configuration parameters go here
       }

       hbase_cluster_0 {
           type = hbase
           # other hbase configuration parameters go here
       }

       influxdb_server_1 {
           type = influxdb
           # other InfluxDb configuration parameters go here
       }

Note

Built-in “base” configuration file defines TSDB backends with names that are the same as their types: rrd, graphite, influxdb and hbase and sets rrd as working TSDB. If this is what you need, you don’t need to overrride any of these parameters in your nw2.conf file. If your nw2.conf file was created from the prototype configuration file nw2.conf installed by rpm and deb packages, then these parameters are the same as the default and you dont need to change them.

2.7.2. RRD

This is the default. RRD is the simplest time series storage that requires minimal configuration and maintenance, however its performance is limited and we can only recommend it for small installations of NetSpyGlass (up to 10,000 variables or so).

2.7.2.1. Configuration

To enable RRD storage, set parameter protocol to “rrd” as shown below:

monitor {
    storage {

# ==== RRDB ==================================================================
       # We can store data in RRD (Round Robin Database) storage with
       # automatic averaging
       #
       protocol = rrd

       rrd {
           type = rrd

           # Directory where RRD archives should be located
           dir = ${home}"/rrd/"

           # configuration of RRA (Round Robin Archives). The first item
           # in this list must define the archive for the "original" data
           # points, i.e. with no averaging. Parameters "steps" and "rows"
           # are the same as in rrdtool and should be computed taking into
           # account polling interval in seconds defined above as the value of
           # the parameter pollingIntervalSec
           #

           archives = [
               # original data points, keep for 48 hours at 1 min polling interval ( 48 * 3600 / 60 = 2880)
               {
                   steps: 1,
                   rows: 2880
               },
               # 5 min averages, keep for 7 days ( 7 * 24 * 3600 / 300 = 2016 )
               {
                   steps: 5,
                   rows: 2016
               },
               # 1 hour averages, keep for 90 days
               {
                   steps: 60,
                   rows: 2160
               },
               # 1 day averages, keep for 1 year
               {
                   steps: 1440,
                   rows: 365
               },
           ]

           # flush interval, min
           flushIntervalMin = 5

           flushThreads = 2
       }
   }
}

Cofiguration quoted above is the default, you do not need to copy it into your own configuration file if it suits you. Just make sure you have the protocol line and perhaps dir if you want to put the files in a different directory:

monitor {
    storage {

# ==== RRDB ==================================================================
       # We can store data in RRD (Round Robin Database) storage with
       # automatic averaging
       #
       protocol = rrd

       # Directory where RRD archives should be located
       rrd.dir = ${home}"/rrd/"
    }
}

Parameters inside of the rrd.archives block have the usual meaning, the same as in rrdtool configuration.

Parameters flushThreads and flushIntervalMin define how often NetSpyGlass will flush accumulated monitoring data to RRD files on disk. Doing this at the end of each monitoing cycle leads to performance degradation because operations of opening and closing of several thousand files can be expensive. Instead, the server keeps certain amount of data in memory and updated RRD files on disk at the interval defined by parameter flushIntervalMin (in minutes). This operation can also be done in parallel if parameter flushThreads has value greater than 1.

2.7.3. Graphite

We inject data into Graphite via Carbon plain-text injector port and read it back via web API.

2.7.3.1. Configuration

Set parameter protocol to “graphite” as shown below:

monitor {
   storage {
# ==== Graphite ==============================================================
#
       protocol = graphite

       graphite {

           type = graphite

           collector = "carbon_line_collector.domain.com"
           # Carbon plain text interface port
           carbonPort = 2003

           server = "graphite_server.domain.com"
           webPort = 9000

           nameSpace = "netspyglass."${network.name}

           # data upload is going to be spread over this interval of time, defined as fraction of polling interval
           uploadSpreadTime = 0.5

       }
   }
}

Note that parameter carbonPort should refer to the plain text injector port (rather than pickled port).

Parameter monitor.storage.graphite.nameSpace defines the top level name space for Graphite variable paths. By default path starts with “netspyglass” and the next component of the path is the network name defined by the parameter network.name.

Parameter monitor.storage.graphite.uploadSpreadTime can be used to control time interval over which graphite connector spreads data upload. The interval is defined as a fraction of the polling intrval and the value of this parameter is a floating-point number between 0 and 1.

Note

Since network name is used as part of the Graphite path, it should comply with Graphite requirements for path components. That is, it should not include a dot and other special symbols.

2.7.3.2. Schema

Graphite path is constructed from the following components:

<namespace>.<networkName>.<variableCategory>.<varName>.<deviceName>.<componentName>

where namespace is “happygears”

2.7.3.3. Variable, device and interface names

In Graphite, metric path is both used as internal id (it is part of the corresponding whisper database file path on the filesystem) and is visible in Graphite web UI. This means it should be both user-friendly AND acceptable as part of filesystem path. This rules out certain characters, such as “/”, that can’t be part of the variable, device or interface name. NetSpyGlass replaces “/” if it appears in one of these names with underscrore.

Device names often include a ”.”, which is used as component separator in Graphite path. This character is also replaces with “_” when we generate Graphite path for the given variable.

Other characters that cause problems with Graphite because they are either used as part of the Whisper database file path or as part of the data returned by the web UI in the “raw” format are “|”, ”,”, ” ”. These are also replaced with an underscore “_” at the cost of making metric names less readable in Graphite UI.

2.7.4. InfluxDb

See http://influxdb.com/

2.7.4.1. Versions

NetSpyGlass works with InfluxDb versions 0.8, 0.9 and 0.10. InfluxDb has changed internal schema and API when they switches from 0.8 to 0.9. To accommodate this, NetSpyGlass first tries to connect with v0.9 API (this works for both 0.9 and 0.10) and if that attempt fails, falls back to v0.8 API.

NetSpyGlass does not require any configuration change when user switches from InfluxDb v0.8 to v0.9 and 0.10, however data migration is responsibility of the user.

Influxdb team publishes current state of their work on the upgrade path from 0.8 to 0.9 here: https://influxdb.com/docs/v0.9/administration/upgrading.html and here: https://github.com/influxdb/influxdb/pull/3477

2.7.4.2. Configuration

Set parameter protocol to “influxdb” as shown below:

monitor {
    storage {

# ==== InfluxDb ==============================================================
#
       protocol = influxdb

       influxdb {
            type = influxdb

            url = "http://127.0.0.1:8086"
            user = "root"
            password = "root"
            database = "netspyglass_"${network.name}

            # list of tag facets that should not be sent to Influxdb. There are usually
            # many tags in these facets and, since we send each observation multiple times
            # to pick up all combinations of tags from all facets, including these greatly
            # increases amount of data sent to influxDb. Typically, tags in these
            # facets are never used to build graphs and dashboards with tools such
            # as Grafana, so these tags can be skipped to reduce amount of redundand
            # data sent to the database
            dropTags = ["ifRole", "SupportedMIBs"]
       }

2.7.4.3. Schema

Monitoring variables in NetSpyGlass are referred to by “triplet” (see How Monitoring Variables are Stored):

cpuUtil.5.32

Triplet consists of the variable name “cpuUtil”, device id “5” and component (interface or other hardware component) index “32”.

When data is sent to InfluxDb v0.9, the following schema is used:

  • measurement name is variable name (“cpuUtil”)
  • each point has the following tags:
    • “device” : value of device id, as a string (“5” in this example)
    • “index” : value of the index part of the triplet as a string (“32” in this example)
    • “deviceName” : device name
    • “component” : component name (interface name if variable relates to a network interface)
    • all tags from the monitoring variable (see below)
  • field name is fixed string “value” for all measurements

NetSpyGlass tries to create database with the name defined by the configuration parameter monitor.storage.influxdb.database (default value is “netspyglass_”${network.name}). InfluxDb connector tries to create the database when it connects for the first time, but user account used to connect to InfluxDb should have sufficient permissions to be able to do this.

Note

Previously default database name was “netspyglass-“${network.name} (with a dash “-”) but InfluxDb v0.9 does not allow “-” in the database name, so we had to change the default to use underscore. Check your database name if you migrate from InfluxDb v0.8 to v0.9

2.7.4.4. Tags

When NetSpyGlass sends observations to InfluxDb, it attaches set of tags to each one data point. Information about device and component is passed in the form of tags “device”, “deviceName”, “component” and “index” (see above). In addition to this, all tags that belong to the monitoring variable saved to the database are added as well. See Tags for more information on tags in NetSpyGlass.

In NetSpyGlass all tags are organized in groups or facets. Tags are originally assigned to devices, interfaces and other components and are copied into monitoring variables. There can be multiple tags in some facets, for instance device or an interface can have multiple roles. For example, monitoring variable ifInRate that tracks interface inbound traffic for device “ex2200”, interface “ae0”, can have the following set of tags:

Role.Switch, Role.Router
ifAdminStatus.Up
ifRole.Aggregator, ifRole.BroadcastTypeInterface, ifRole.PhysicalPort, ifRole.UntaggedSwtichPort
ifSpeed.2G
Aggregator.ae0
Link.c3560g-1:Po1

I have grouped these tags by facet to emphasize that there are multiple tags in some facets.

InfluxDb treats tags as database columns and each data point can have one value in each column. InfluxDb is “schema-less” when it comes to tags, this means it picks up tags and creates “columns” automatically when data is sent to the database. Each data point represents row in the database, while timestamp, value and all tags represent columns. InfluxDb query language provides operators to match by tags, it looks like this:

> SELECT time,value,Role,Vendor FROM ifInRate WHERE deviceName = 'ex2200' AND component = 'ae0' AND time > 1452388050000000000

Here we match tag deviceName with value ex2200 and tag component with value ae0. Ability to put multiple “values” in the same tag facet in NetSpyGlass presents a problem because in InfluxDb this is not possible. If a variable has tags Role.Switch and Role.Router (as in the example above), we can not send these to InfluxDb as tag Role with two different values because InfluxDb will only accept one of them. To work around this difference, NetSpyGlass sends tag Role.Switch as a tag with value “1”. Therefore observations from the time series of the variable in the example above are sent with the following combination of tags:

Role.Switch = 1
Role.Router = 1
ifAdminStatus.Up = 1
ifRole.Aggregator = 1
ifRole.BroadcastTypeInterface = 1
ifRole.PhysicalPort = 1
ifRole.UntaggedSwtichPort = 1
ifSpeed.2G = 1
Aggregator.ae0 = 1
Link.c3560g-1:Po1 = 1

NetSpyGlass uses tags extensively to pass meta-data about devices and components, this means each monitoring variable may have dozens of tags. To avoid flooding InfluxDb with data that is not going to be useful, NetSpyGlass offers confgiuration parameters parameters acceptTags and dropTags (both are part of the dictionary that describes InfluxDb TSDB connection, see above). These parameters can be used to control which tags should be passed to the database.

Only tags that match items in the list acceptTags will be sent to the database. Items in this list can be either tag facet names (e.g. “Explicit”), full tag strings (e.g. “Explicit.Transit”) or string “*”. If any item in this list is “*”, then all tags are sent to the database (but some may be suppressed if parameter dropTags is also confgiured). If list acceptTags is empty, no tags are sent at all.

Filters acceptTags and dropTags are applied in order. NetSpyGlass takes set of tags from the monitoring variable it is about to send to InfluxDb and applies filter defined by acceptTags, so that only tags that match items in this list remain. Then, it applies filter dropTags by removing tags that match it. Only tags that remain are sent to the database.

Value of both parameters can be a list of strings where each string is the name of the tag facet or the tag string that should NOT be sent to InfluxDb. The default for acceptTags is ["Explicit", "Role"]. Default value of parameter dropTags is a list of all standard roles defined in NetSpyGlass, which allows the user to add their own roles and pass them to InfluxDb. It is easy to change these defaults, to do this, just add corresponding parameter to the configuration of influxdb connector in your nw2.conf file. Here is how this looks like (I have added tag facet “Link” to acceptTags and removed some specific tags from dropTags to let them pass to the database):

influxdb {
    type = influxdb
    url = "http://127.0.0.1:8086"
    database = "netspyglass_"${network.name}
    user = "root"
    password = "root"

    acceptTags = [ "Explicit", "Role", "Link" ]

    dropTags = [
        "ifRole",
        "SupportedMIBs",
        "Role.Unknown",
        "Role.SimulatedNode",
        "Role.SimulatedBridge",
        "Role.Router",
        "Role.Switch",
        "Role.Server",
        "Role.LoadBalancer",
        "Role.WirelessClient",
        "Role.WirelessAP",
        "Role.Firewall",
        "Role.eBgpPeer",
        "Role.iBgpPeer",
        "Role.VMServer",
        "Role.VM",
        "Role.VirtualSwitch",
        "Role.EphemeralNode",
#        "Role.NetSpyGlassServer",
#        "Role.Cluster",
        "Role.PDU",
        "Role.VPNConcentrator"
    ]

}

The rationale behind suppressing these particular facets and tags is that they are usually not used to build graphs and dashboards with Grafana.

2.7.4.5. Queries using tags

Here are few examples of InfluxDb queries that match or group by tags.

Match by deviceName and component (device name and component name) and display time, value and values of tags in two facets:

> SELECT time,value FROM ifInRate WHERE deviceName = 'ex2200' AND component = 'ae0' AND time > 1452388050000000000
name: ifInRate
--------------
time                    value
1452388080000000000     14349.066666666668

Use GROUP BY with tags (I am using SLIMIT to limit output to only two series in this example):

> SELECT time,value FROM ifInRate WHERE time = 1452388050000000000 GROUP BY deviceName, component SLIMIT 2
name: ifInRate
tags: component=Eth0, deviceName=printer1
time                    value
----                    -----
1452388050000000000     4225.066666666667


name: ifInRate
tags: component=Gi0/10, deviceName=c3560g-1
time                    value
----                    -----
1452388050000000000     434.5333333333333

Queries used to build graphs in Grafana look the same but are built using their interactive UI editor.

2.7.4.6. Retention Policies

Retention policy in InfluxDb is a configuration of the database that describes how long time series data is going to be held in the database, as well as number of replicas to be created in the cluster configuration. Each database can have multiple retention policies, each policy is referred to by its name. One policy can be set to be the “default”, which means InfluxDb API queries executed without explicit mention of the retention policy by its name will use the default one. When new database is created, it gets retention policy with name “default” configured to hold data indefinitely.

NetSpyGlass uses default retention policy when it writes and reads from InfluxDb database. You can change the policy by adding new one and marking it as the default, or chaning duration of the policy with name “default”.

Retention policies can be manipulated using InfluxDb query language via InfluxDb web admin console or their cli. See InfluxDb documentation for more information and the syntax of the queries used to create, alter and drop retention policies: https://influxdb.com/docs/v0.9/query_language/database_management.html#retention-policy-management

2.7.5. hbase

As of 01/2015 we have tested with hbase 0.94.15 and 0.96.0

2.7.5.1. Configuration

To configure NetSpyGlass, set parameter protocol to “hbase” as shown below:

monitor {
    storage {

# ==== HBASE =================================================================
        protocol = hbase

        hbase {

            type = hbase

            # If multiple ZooKeeper instances make up your ZooKeeper ensemble,
            # they may be specified in a comma-separated list .
            # If parameter zookeeperAddress is missing, we assume "localhost"
            zookeeperAddress = "zookeeper_host.domain.com"
            # This is the value of HBase configuration parameter `zookeeper.znode.parent`.
            # The default is "/hbase". Change this only if you know it is different in your
            # hbase configuration.
            zookeeperPath = "/hbase"
            timeoutMs = 2000
            table = "netspyglass-"${network.name}
        }
    }

and comment out parameter in other similar blocks inside of the monitor.storage block.

NetSpyGlass stores monitoring data in hbase table identified by the value of the configuration parameter monitor.storage.hbase.table (default value is “netspyglass-“${network.name} as shown in the example above), column family “d”.

Note

The table and the family must exist before the program connects to the hbase cluster. Use the following command in hbase shell to create the table:

hbase(main):002:0> create 'nw2mon', {NAME => 'd'}
0 row(s) in 1.4080 seconds

=> Hbase::Table - nw2mon
hbase(main):003:0>

2.7.5.2. Schema

Monitoring variables have name and several other attibutes, such as name of the device, interface ifIndex, set of tags. Each monitoring variable has an array of (timestamp, value) pairs we call “Observations”. To store data in hbase, we group measurements in blocks by their timestamp. Observations are grouped into blocks that correspond to 2 hours of data. Observations that belong to one block are stored in one row.

Row key is built from the device ID (numeric), variable name and reverse timestamp of the block start time the variable belongs to. Block start time is calculated using simple formula:

block_time_start = int(timestamp / block_time_length)

That is, each row holds observations that belong to the time range that starts at the beginning of 2 hour intervals, e.g. 00:00:00, 00:02:00, 00:04:00 and so on.

We use column family “d”. Column qualifiers are computed using formula:

column_qualifier = timestamp % block_time_length

Since row key is built using device id and variable name, all measurements in the row represent one variable. For example, this can be inbound interface utilization for one interface or one device.

The size of the row depends on polling interval:

Polling interval row size
30 sec 240
1 min 120
5 min 24

NetSpyGlass can retrieve data from HBase even if polling interval changes.

Example of row key:

13-ifOutErrorRate-16-9223372036854309705

this key is built for the device with id=13, variable ‘ifOutErrorRate’ and component index 16. Reverse time stamp of the first observation in this row is 9223372036854309705 and was computed as follows:

MAX_LONG - timestamp_ms = 9223372036854309705