1.38. Release Notes 0.98

1.38.1. Important changes in this release

  • Because of the change in the database schema, device and interface tags need to be regenerated after upgrade. After the upgrade open the UI, make sure view “all” loads correctly (note that other views might not load at this time) and start discovery by clicking “Discover Now” button.
  • startup scripts start.sh and`ui.sh` have been deprecated and replaced with script netspyglass.sh. Use this script to start UI backend and monitor detached from the shell session (as “daemons”).
  • Device definitions in the configuration file can use either ip address or host name. If the host name is used, it is resolved to the address via DNS call. The name entered in the config will be used everywhere in the UI and the monitor even if it does not match host name configured on the device itself.
  • This version introduces support for SSL in the UI backend server and ability to authenticate users. User accounts are configured in the local file /etc/users.properties (support for LDAP and Radius is coming in the future). See documentation files ssl.md and authentication.md for more details.
  • This version introduces reporting framework. See document doc/reports.md for more details.

1.38.2. Improvements in this release

1.38.2.1. Startup scripts

  • Administrator can now use script netspyglass.sh (provided in the release tar archive) to launch NetSpyGlass UI server as a “daemon”. We use command line tool jsvc which needs to be installed as an external dependency. Use package “jsvc” on Ubuntu and “jakarta-commons-daemon-jsvc” on RedHat and CentOS Linux. Script netspyglass.sh accepts the following commands as single command line parameter:

    • start starts both UI backend and the monitor and exits
    • stop stops both daemons
    • status checks if the system is running and prints UI backend
      server process id

    Note that if you start NetSpyGlass using this method, you are going to see three processes: two running Java class Nw2UI and one Nw2Monitor. One of the Nw2UI processes owns and controls the other. The pid reported by the “netspyglass.sh status” command corresponds to this child process. If you kill the child process using this pid, it will be automatically restarted by the owner process.

    You can start NetSpyGlass without detaching (similar to how old script start.sh operated) if you add option “–nodetach”, like so:

    netspyglass.sh start --nodetach
    

    See documentation file Installation_and_configuration.md for more information about startup scripts netspyglass.sh and monitor.sh

1.38.2.2. Tags

  • Starting with this version, we add tags to variables that track OSPF and BGP protocol counters. These tags reflect BGP peer AS number and BGP “role”, which can be iBGP or eBGP. These tags appear in Device Details page and in the Graphing Workbench

1.38.2.3. Network Discovery and Monitoring

  • Improved performance of network discovery.

  • This release introduces discovery and monitoring of IGP metrics: ISIS circuit metrics and OSPF interface metrics. NetSpyGlass polls corresponding OIDs and displays values in receives in the device details panel (tab “IGP Metrics”) and in the Graphing Workbench, under category “IGP Metrics”.

  • Collected values of IGP metrics also appear in maps. Variable names are: isisCircLevel1Metrics, isisCircLevel2Metrics, ospfIfMetricValue. Caveat: in maps, devices are connected via their physical ports. For Juniper, this means “parent” interfaces such as “xe-0/0/0”. However, IGP metrics are reported for subinterfaces, such as “xe-0/0/0.0”. In order to show IGP metrics in maps, NetSpyGlass copies the value from subinterfaces to the corresponding parent interfaces. This leads to the following effects:

    • both subinterface and its parent interface appear in the Device details panel, tab “IGP Metrics”, even though only subinterface actually has the metric.
    • if a physical interface has two subinterfaces (for example, because it is a member of two vlans) and these subinterfaces have different IGP metrics, only one of the two will appear in the map. Which one of the two will be visible is undefined.
    • In case of Layer 3 switches where IGP protocol runs over vlan interfaces and their subinterfaces, IGP metrics do not appear in maps. This is because L2 topology is built with their physical ports (e.g. GigabitEthernet0/1 for Cisco), but IGP runs over their vlan interfaces.
  • This release adds support for discovery and monitoring of APC PDUs. We can discover and monitor phases, outlet banks and individual outlets. For the individual outlets, we monitor outlet status (on or off) and load if the PDU has monitored outlets.

  • This version adds support for monitoring of the routing table size. Variable name is ipv4CidrRouteNumber for ipv4 routing table and ipv6CidrRouteNumber for ipv6 routing table. This variable appears under category “Routing Table” in the Graphing Workbench and in the tab of the same name in the Device Details panel.

  • Added support for discovery and monitoring of tunnels

  • This version adds discovery and monitoring of BGP state and admin status of BGP peers (variables bgpPeerState and bgpPeerAdminStatus)

1.38.2.4. User Interface

  • Popup graph that opens when user clicks link label in a map now has UI control to let the user change monitoring variable shown in the graph. Each popup graph can show its own different variable.

  • width of links on maps is a function of interface speed (this is optional and can be controlled by the user using configuration parameters “network.display.map.strokeWidth” and “network.display.map.strokeWidthBySpeed”) See prototype config nw2.conf.prototype for the example of configuration.

  • When NetSpyGlass is configured with nested views, the system computes aggregated values of monitoring variables that correspond to the links that connects such views to other nested views and devices in maps. User can use these aggregated variables in graphs and Nagios alerts.

  • We have improved layout in the Graphing Workbench to make it more usable on laptops

  • Graphing Workbench now supports two different ways to build the graph:

    1. user adds variables to the graph manually, selecting them in the table
    2. user manipulates the filter and all variables that match it appear on the graph automatically

    Graph obtained using method (2) automatically updates itself whenever new devices or components that match the filter are deployed on the network.

  • Graphing Workbench allows user to sort variables by columns “Device”, “Component”, “Current”, “Min”, “Max”

  • New filter option in the Graphing Workbench: “Top N”. When applied, this option limits number of variables in the data table to the first 5, 10 or 20 items. This option is most useful when data in the table is sorted by current or maximum value because it allows to automatically pick “top N” components by current or maximum value of the variable.

  • both sort order and setting of “top N” option are part of the filter rule used to generate graph using method (2) (“automatic”). This allows the user to build graph that always shows top N components, such as 10 most loaded intrfaces or 10 hottest components across all devices. Since the graph is driven by the filter rule, it will updated itself whenever the set of interfaces or components changes because different components begin matching the filter or when new devices have been deployed on the network.

  • Added ability to interpolate “gaps” in the data in graphs. This is controlled by configuration file parameter network.monitor.display.graphs.bridgeGapWidthMin that specifies maximum gap width in minutes. Gaps wider that this will appear in graphs, but shorter ones will be interpolated (only linear interpolation is currently supported). Default is “0”, which means we do not interpolate any gaps.

  • Added link “Console…” to the Device Details panel. This link is built using template defined by the configuration file parameter network.monitor.display.deviceConsoleAccessUrl. The template can, for example, define URL using protocol “ssh://” or “telnet://” which means click on the link should open terminal window with corresponding ssh or telnet clinet. User can use macro “@address@” in the url template, this macro is replaced with device’s ip address. Example (this is the default):

    network.monitor.display {
        #
        # template for the url that is used to build "Console..." link in the Device Details
        # panel in the UI. Macro "@address@ is replaced with device's ip address. At this
        # time only one macro "@address@" is supported. The same url template is used for
        # all devices.
        #
        deviceConsoleAccessUrl = "ssh://@address@"
    
  • Added support for different time zones in UI displays. This is controlled by configuration clauses network.display.time and network.display.tz. Clause network.display.time controls display format for all time-related informaton in the UI, the value follows rules described in http://momentjs.com/docs/#/displaying/format/ Clause network.display.tz allows user to choose between UTC and local time zone for time display in the UI. Value can be “utc” or “local”.

1.38.2.5. Infrastructure

  • This version adds support for Graphite:

    http://graphite.readthedocs.org/en/latest/overview.html
    

    NetSpyGlass uses Graphite as a time series database and uses plain text method to send the data to it. See documentation file graphite.md for more information.

    As of v0.975, you have three choices of the database to keep time series data:

    • embedded RRD storage
    • hbase
    • Graphite

    Only one database can be used at a time. Configuration is done in monitor.storage of the configuration file. See file nw2.conf.prototype for examples of the configuration parameters and file graphite.md included in the distribution package for more details.

  • This version gives the user ability to control which variables should be saved in the timeseries database. Use configuration file parameter monitor.storage.variables to do this. The value of this parameter should be a list of variable names or reference to another list defined elsewhere in the same configuration file. This filter works with all types of time series databases (hbase, rrd, graphite).

    For example:

    monitor.storage.variables = [
        ifOperStatus,
        ifInRate,
        ifOutRate,
        ifInErrorRate,
        ifOutErrorRate,
        ifInDiscardRate,
        ifOutDiscardRate,
        ifSpeed
    ]
    

    or a reference to the list of variables that appear in the Graphing Workbench (this is the default):

    variables = ${network.monitor.display.graphingWorkbench.variables}
    
  • This version includes with multiple optimizations in the time series database store based on RRD.

1.38.2.6. Configuration file format changes

  • Configuration clause used to define variable thresholds and corresponding color levels has changed. There is no support for threshold levels defined as percentages anymore, instead, create new variable with value equal to the percentage or fraction of 1.0 and define threshold for it using its absolute value. If you wish to use color level computed this way with another variable, add it to the configuration clause network.monitor.copyColorLevels. In the following example we use variable ifInUtilization that is calculated in nw2rules.py as (ifInRate / ifSpeed), define thresholds and color levels for it and then copy color levels into ifInRate:

    network.monitor.thresholds {
    
        # Set color level to "100" when value is >=2,
        ifOperStatus : [
            { value = "1", colorLevel = 0},
            { value = "2", colorLevel = 100},
        ],
    
        ifInUtilization: [
           { value = "0",    colorLevel = 0},
           { value = "0.2",  colorLevel = 1},
           { value = "0.6",  colorLevel = 2},
           { value = "0.9",  colorLevel = 3},
        ],
    }
    
    network.monitor.copyColorLevels {
        ifInRate: [ifInUtilization, ifOperStatus],
    }
    

    Function that copies color levels copies from the right to the left and only if the level on the right is higher. In the example above, variable ifInRate aquires color level tag from ifInUtilization or ifOperStatus, whichever is higher. Since ifOperStatus gets color level “100” when interface is down, it overrides any other color level and makes corresponding link appear black in maps.