1.53. Release Notes 0.95

1.53.1. Highlights

  • Graphing Workbench: this is a new major addition to the UI. Graphing Workbench is a tool that allows you to browse and graph monitoring variables. See below for more details on this.
  • This version adds support for the discovery of Arista devices
  • This version also adds support for the discovery of interface bundles (802.3ad aggregate links and aggregation ports) and new monitoring variables to track LACP state bits of aggregation ports and percentage of aggregation ports in fully forwarding state for each aggregation interface.
  • New tag is added to interfaces aftrer discovery, this tag carries “device:interface” pair of the device and interface on the other side of the network link terminating on the interface. This tag belongs to the tag facet “Link”.
  • The rule that decides which interfaces of the devices we monitor has changed: by default it will only monitor interfaces that appear in network maps. This helps reduce polling load on devices.
  • Python script that processes collected monitoring data has better structure. Default code is placed in the Python class Nw2Rules; user can create new class, derived from Nw2Rules, to add their own rules.
  • New way to run the application: you can now use shell script start.sh that starts both the UI backend and the monitor. UI backend starts and “manages” the monitor; if you kill the UI backend with SIGINT or SIGTERM, it will gracefully shut down the monitor as well. This makes it easier to run NetSpyGlass under supervisor scripts.

1.53.2. New features and improvements in this release

1.53.3. UI

  • Graphing Workbench: this is a new major addition to the UI. Graphing Workbench is a tool that allows you to browse and graph monitoring variables. You can apply filters by device name, interface or hardware component name or tags. Once filter has been applied, the panel shows all matching monitoring variables. Clicking a variable in the table expands corresponding row and shows mini-graph of its time series and its tags. Click [+] icon to add this variable to the graph. You can add any number of variables to the graph, even if they refer different devices. You can graph any combination of variables together (as long as it makes sense to you). To open Graphing Workbench, click “gear” icon in the upper right corner of the UI in any page.
  • Colored badge drawn over the device icon (in the lowe right corner) in the device details panel gets its color from the chassisAlarms monitoring variables. Its color is grey if device has no alarms, yellow if it has one or more minor alarms and red if it has one or more major alarms. This works for devices where we can discover and monitor chassis alarms (currently only Cisco and Juniper). If we do not have information about chassis alarms for the device, the badge is not drawn at all.
  • User can filter interfaces by their operational status in the device details panel using tags ifOperStatus.Up and ifOperStatus.Down
  • Network links in operational state “down” are now colored using distinct color in maps. The color is defined as an item with ColorLevel index “100” in the dictionary “monitor.display.colors” in the configuration file. Default color is black.

1.53.4. Monitoring and Processing of Collected Data

  • Python script has better structure: the code is in the class Nw2Rules that takes input monitoring variables via constructor parameter. User can create their own class, derived from Nw2Rules, to add more rules or change the defaults. See documentation file rules.md for more information.

  • Standard rules file nw2rules.py has been cleaned up to conform to PEP 8. Note that monitoring variables appear in the UI and configuration files and follow different naming convention, which stems from the naming convention of the standard SNMP MIBs. These variables also appear in the Python module nw2rules.py and do not follow PEP 8.

  • User can set tags in any facet via Python rules. Default Python script comes with a rule that adds tags to reflect interface operational status. Interfaces with status “up” get tag “ifOperStatus.Up” while those in status “Down” get “ifOperStatus.Down”. These tags propagate to the UI and appear in the tag filter in device details page, making it possible to filter interfaces by operational status. Since tags are reset every time monitor processes Python script, they follow actual status of the interface learned via snmp polling. These tags are “transient”, that is, they are not saved to the database and will be lost and reset when monitor restarts. Here is how Python code that does this looks like:

    class Nw2Rules(object):
    
        def setInterfaceOpStatusTag(self, var):
            """
            Set tag to reflect interface oper status. Interfaces with status
            "up" get tag "ifOperStatus.Up", those in status "Down" get
            "ifOperStatus.Down". If variable has no time series data,
            skip it and do nothing.
    
            :param var   MonitoringVariable instance
            """
            if len(var.timeseries) > 1:
                if self.isInterfaceUp(var):
                    var.tags.add("ifOperStatus.Up")
                else:
                    var.tags.add("ifOperStatus.Down")
    
        def execute(self):
            self.log.info("===  Set Tags")
            map(self.setInterfaceOpStatusTag, self.vars['ifOperStatus'])
    
  • User can also set tags in facet “ColorLevel” to assign specific color to the links on network maps and variable values in device details page (the latter appear as colored “dots” next to the variable values in the table cells). Default rules use this facility to make network links in state “down” appear black in maps.

1.53.5. Core Technology

  • Switched to asynchronous hbase client ( https://github.com/OpenTSDB/asynchbase ). This client has much better performance and can recover from zookeeper restarts. This does not affect operation of the RRD monitoring storage backend. This client has been tested with hbase 0.94.6, 0.94.15, 0.96.0

    IMPORTANT: hbase table should be created manually. Use hbase shell command:

    create "nw2mon", "d"
    
  • This version implements a new way to run the application: you can now use shell script start.sh that starts both the UI backend and the monitor. UI backend starts and “manages” the monitor; if you kill the UI backend with SIGINT or SIGTERM, it will gracefully shut down the monitor as well. This makes it easier to run NetSpyGlass under supervisor scripts. Ability to run UI backend as a daemon is coming in one of the future releases.

1.53.6. Improvements in Device Discovery and Monitoring

  • This version implements discovery and monitoring for Arista devices. This includes discovery of interfaces, switch forwarding database, vlans, network toplogy (using Fdb, LLDP and ARP data), hardware components (cpu and temperature sensors), optical transceivers.
  • This version implements discovery of aggregation links (port bundles, per 802.3ad protocol) using IEEE8023-LAG-MIB, LLDP-EXT-DOT3-MIB (part of LLDP protocol) and for some old devices ifStack table of the IF-MIB.
  • Information about aggregation ports and aggregate interfaces is used to set up monitoring: the program automatically starts monitoring LACP state bits of aggregation ports and computes percentage of the aggregation ports in the fully forwarding state for each aggregation interface. User has access to new monitoring variables:
    • ‘portAggregatorBandwidth’ : percentage of aggregation ports that have both LACP state bits “collecting” and “distributing” set (range 0% - 100%)
    • ‘lacpStateCollecting’ : this variable has value “1” if bit “collecting” is set or “0” otherwise
    • ‘lacpStateDistributing’ : this variable has value “1” if bit “distributing” is set or “0” otherwise
  • Added discovery and monitoring of 1 min average of CPU cores utilization for servers running NetSNMP.
  • The system assigns interface tags in the facet “Link” after discovery. These tags carry “device:interface” pair of the device on the other side of the network link attached to the interface.
  • Tags in facet “Link” are used to pick interfaces to monitor by the default selection rules. This greatly reduces number of interfaces we monitor by default, which in turn helps to reduce the load on the device CPU. User can change this by overwriting function “select” in class Nw2Rules. Default function monitors every interface that satisfies the following requirements:
    • is in admin state “Up”
    • is not a loopback, out of band management interface
    • is not “internal” interface. Internal interfaces are those used by the device for all sorts of internal communications
    • is not “simulated” interface. “Simulated” interfaces are those added by NetSpyGlass when it creates objects to represent devices that do not speak SNMP but for which the system knows IP and MAC addresses.
    • is connected to any other device that appears in maps, or is an aggregation port (i.e. is part of an ethernet bundle).

To check if an interface is connected somewhere, the program looks for a tag in the facet “Link”. These tags look like “Link.router1:Te1/2” or “Link.10.10.12.100:Port23” (if we do not know device name).

To check if an interface is an aggregation port, the program looks for the tag ifRole.AggregationPort

You can find actual code of the selector function in the class Nw2Rules in Python module nw2rules.py

  • This version implements discovery and monitoring of hardware components and sensors for HP ProCurve switches.

1.53.7. Notable bugs fixed in this release

  • fixed bug that prevented links on maps to be colored according to the threshold configuration if the range for in- and outbound interface utilization variables was configured as “auto” (the default since 0.941).