1.35. Release Notes 0.98.3

1.35.1. Important changes in this release

  • this version greatly improves scalability of the server: memory footprint of server has been reduced by about 50% and the time it takes to process many monitoring variables reduced. The server has been tested with up to 500,000 monitoring variables.
  • We have redesigned the UI used to control time interval for graphs in the Device Details panel and in the Graphing workbench. It is now possible to choose arbitrary interval of time for the graph.
  • this version introduces support for discovery and monitoring of Firewall counters on Juniper devices
  • this version adds support for InfluxDb as time series database
  • this version comes with built-in documentation

1.35.2. Code changes and other internal changes

We have changed internal implementation of the time series buffers to reduce objects churn and avoid excessive garbage collection. This allowed us to reduce memory footprint of the server and reduce number of full garbage collectio runs. This, in turn, makes data processing times much more stable and improves performance of the server in large deployments with lots of monitoring variables.

1.35.3. Start/stop shell script netspyglass.sh

the script can now find backend jar file even when it is started via absolute path, so that the jar is not located in the current directory. The jar is assumed to be located in the same directory where the script netspyglass.sh is located because both of them are part of the distribution package.

1.35.4. Configuration

  • Administrator can specify list of common domain names that should be stripped from fully qualified host names for presentation. Configuration parameter is:

    network.display.commonDomains
    

    Note that domain names are removed from fully qualified host names only for presentation in maps and Graphing Workbench. View and tag building Python hook scripts still “see” full names.

    Example:

    network {
        display {
            commonDomains = ["foo\\.com$", "company\\.foo\\.com$"]
    

    Domain names are specified as regular expression patterns. NetSpyGlass finds occurrence of the pattern in the device name and replaces it with an empty string. The pattern can match any part of the device name.

1.35.5. User Interface

  • We have redesigned Device Details panel to improve performance for devices with large number of interfaces and hardware components. The page loads several times faster now.

  • Device Details panel now has tab “Summary” which is generated using our reporting framework. Contents of this tab is actually a report controlled by the configuration parameters:

    network.monitor.device_summary_report = {
        class: "device_summary.DeviceSummary",
        template: "device_summary.vm"
    }
    

    This report is generated using Python class referred to by the parameter network.monitor.device_summary_report.class and Velocity remplate defined by the parameter network.monitor.device_summary_report.template. You can replace these to change what appears in the “Summary” tab.

  • Number of rows in the data table in Graphing Workbench is now limited. Maximum number of rows is defined by configuration parameter graphingWorkbench.maxDataTableSize (default value is 500). Please use filter by device, component, description or tags to find variables you want to see.

  • We have redesigned the UI used to select time interval for the graph in the Device details panel and in the Graphing Workbench. User can select time interval as:

    • fixed interval, ending “now” (3h, 6h, 12h, 24h, 7d, 30d)
    • fixed interval of time, ending the same time of the day as “now” but yesterday or other number of days ago. This can be useful for the pattern comparison between today and similar interval of time yesterday or several days ago
    • arbitrary interval of time specified using a pair of typical calendar/time picker widgets.

    It is also possible to turn graph autoupdate on and off.

1.35.6. Documentation

To access documentation served by the NetSpyGlass server, click (?) icon in the upper right corner of the UI.

1.35.7. Device discovery

  • this version implements discovery and monitoring of disk I/O variables, CPU statistics, system memory and swap for servers running Net-SNMP

  • if the unit for a monitoring variable is set to “B” (exact string match is required), then this variable is automatically scaled assuming prefix “kilo-” means 1024 instead of 1000. This is used for variables that track system memory and swap.

  • this version implements discovery and monitoring of system memory for Cisco, Juniper, Arista, VMWare devices. We introduced the following new variables:

    • memTotal - total amount of memory, bytes
    • memAvail - amount of available (unused) memory of some type, bytes
    • memUtil - memory utilization, percent

    Different types of memory are discovered and monitored if device reports them separately. For example, Cisco routers can report separately “Processor” and “I/O” memory. Juniper devices report Routing Engine and FPC memory separately.

  • this version implements discovery and monitoring of Firewall counters on Juniper devices. The data we collect is presented via new monitoring variables:

    • fwCntrPacketRate - packet rate, packets/sec
    • fwCntrByteRate - data rate, bytes/sec

    The name of the component in this variable is the name of the filter counter, such as “counter_1-em0.0-i”. This name appears in the Device Details panel in the column “name”. These variables carry tags in tag facets FilterName, FilterInterface and FilterDirection. FilterName tag has the value equal to the filter name, e.g. “FilterName.filter_1”. FilterInterface tag has the value equal to the interface name, e.g. “FilterInterface.em0.0”. Finally, FilterDirection tag indicates the direction of the filter and can be a word “i” or “o”, e.g. “FilterDirection.i”.

1.35.8. Monitoring and data processing

  • NetSpyGlass server can run Python script that processes monitoring data in parallel in multiple threads. This can be used to speed up data processing and improve scalability in large installations. Number of threads is set by the configuration parameter network.monitor.ruleRunnerThreads. The default is “1”. Server restart is not required when the value of this parameter changes.
  • Starting with this version, the server can expire stale monitoring variables still hanging in the memory pool. Variable is considered stale if it has not seen any new monitoring data for a while. Exact amount of time the variable can go without updates is defined by the configuration parameter monitor.storage.retentionHrs (value of this parameter is a floating point number that defines time in hours). Default value of this parameter is 1 hour.

1.35.9. Changes in the Data Processing Python Scripts

  • due to a number of optimizations that helped us improve performance of the Python scripts, some Python API has changed:

    • Function execute does not have parameter vars anymore:

      class UserRules(nw2rules.Nw2Rules):
          def execute(self):
      
    • use new call to import variables from Java environment into Python script:

      from nw2functions import *
      
      op_status = import_var('ifOperStatus')
      
    • use new call to export variables back to Java:

      from nw2functions import *
      
      export_var('memAvail', cisco_pool_free)
      
    • if any of your Python scripts relied on the following sequence to update latest value of a monitoring variable:

      last = mvar.last()
      last.setValue(40000000)
      

      this should be changed to:

      mvar.timeseries.updateLastValue(40000000)
      

      The reason is that call to mvar.timeseries.last() now returns a copy of the latest observation rather than a reference to the object stored in the time series. All updates to time series must go through methods provided by the time series object (Java class TimeSeries) that you can access as an attribute timeseries of the monitoring variable object

    • there is no need to explicitly call function copy() to make a copy of monitoring variable in Python scripts anymore. This happens automatically when needed. Functions provided by the Python module nw2functions do not modify source variables. Call that used to look like this:

      ifInErrorRate = rate(copy(ifInErrors))
      

      becomes:

      ifInErrorRate = rate(ifInErrors)
      

1.35.10. Scalability

Code used to run Python rules has been redesigned to improve its scalability.