Release Notes 1.5.4
===================

NetSpyGlass v1.5.4

Important
---------

This version of NetSpyGlass (1.5.4) can work with both Java7 and Java8, however
performance and memory handling are better when running on Java8.

Next release of NetSpyGlass (1.6.0) will require Java8. Please upgrade machines
you use to run NetSpyGlass.

Support for Influxdb 0.8 has been suspended. This version still works with InfluxDb 0.8
but support for it will be removed in the next release of NetSpyGlass.

Improvements and New Features
-----------------------------

 - this release introduces distributed repository of device objects based on
   Zookeeper. Now secondary servers can create pseudo-devices too, for example
   when they compute aggregate variables.

 - Numerous improvements have been made to reduce Java heap usage and reduce
   garbage collection pauses in large installations (several thousands of
   devices and up to 1 million of variables in the primary server).
   See :ref:`performance_tuning` for more details.

 - the following fields have been removed or modified to improve memory footprint:

   * DataSource.constraints has been removed
   * DataSource.oid now has type String (NET-1224)
   * MonitoringVariable.statistics is now created on demand and not stored permanently with
     monitoring variable object

 - Several improvements in the standard python rules script helped significantly reduce
   time needed to process monitoring data in large NetSpyGlass installations. For
   example, operation that copies latest value of `ifAlias` variable to the `description`
   field of other interface-related variable has been implemented in Java instead
   of Python. Another operation that has moved to Java code is the function that
   copies tags `ifOperStatus.Up` and `ifOperStatus.Down` form variable `ifOperStatus`
   to other interface-related variables

 - graphite connector does not try to maintain persistent connection to the Carbon
   collector server anymore, instead, it opens connection to upload data and
   then closes it when done.

 - configuration parameter `monitor.storage.expireVariablesForOneDeviceAtATime`
   has been deprecated.

 - new configuration parameter `monitor.storage.graphite.uploadSpreadTime` can be
   used to control time interval over which graphite connector spreads data upload.
   The interval is defined as a fraction of the polling intrval and
   the value of this parameter is a floating-point number between 0 and 1.

 - beginning with this version it is possible to run NSG monitors in active-active
   configuration. When two or more monitors can be configured with identical
   or overlapping `allocation` lists in the `cluster.conf` configuration file,
   NetSpyGlass assigns devices that match `allocation` specification evenly between
   these monitors to spread the work. When one monitor goes offline, devices are
   automatically reallocated to others with matching `allocation` configuration.
   Here is an example of `cluster.conf` file using this feature. Monitors `mon1`
   and `mon2` have identical values for their `allocation` parameter. This means
   the server is going to distribute devices that fall into subnets `${SUBNETS}`
   between these two monitors. Each monitor pushes collected variables to its
   respective secondary server (`leaf1` and `leaf2`) which in turn push to
   the primary server::

        PUSH_VARS = ${graphingWorkbench.variables}
        SUBNETS = [ "10.1.1.0/24", "10.1.2.0/24", "10.1.3.0/24" ]

        cluster {
            members = [
                {
                    name = PrimaryServer
                    role = primary
                },

                {
                    name = leaf1
                    role = secondary,
                    push = [
                        {   server = PrimaryServer, variables = ${PUSH_VARS} }
                    ]
                },

                {
                    name = leaf2
                    role = secondary,
                    push = [
                        {   server = PrimaryServer, variables = ${PUSH_VARS} }
                    ]
                },

                {
                    name = mon1
                    role = monitor,
                    allocation = ${SUBNETS},
                    push = [
                        {   server = leaf1, variables = ["*"] }
                    ]
                },

                {
                    name = mon2
                    role = monitor,
                    allocation = ${SUBNETS},
                    push = [
                        {   server = leaf2, variables = ["*"] }
                    ]
                },


 - this version introduces new mechanism for the data push from secondary servers
   to the primary server, it is based on the subscription. To activate, add parameter
   ``subscribe`` to the definition of the primary server in the file `cluster.conf`::

       cluster {
           members = [
                {
                    name = PrimaryServer
                    role = primary
                    subscribe = ${graphingWorkbench.variables}
                },
                {
                    name = leaf1
                    role = secondary,
                },

                {
                    name = leaf2
                    role = secondary,
                },

   The value of parameter `subscribe` is a list of variable names, e.g.
   ``subscribe = [ ifInRate, ifOutRate ]``. Value shown in the example
   is a copy of the variables that appear in Graphing Workbench, this is a reasonable
   default for the primary server.

   Note how in the example above secondary servers do not have parameter `push`
   anymore. This parameter is unnecessary because the primary can find variables
   automatically and ask their "owners" (servers "leaf1" and "leaf2") to push them.

   Subscription is different from the regular data push configured via parameter
   `push` in secondary server configuration in that secondary
   servers are do not to push variables into the primary unless these variables
   are used by some UI or JSON API queries. In this case the primary starts with
   no variables and subscribes to some of them whenever UI tries to access them.
   This helps reduce number of variables in the data pool of the primary server,
   which is useful when NetSpyGlass cluster works with several millions of
   monitoring variables.

   Subscription based push is activated only when parameter `subscribe` is present
   and its value is not an empty list. If this parameter is not there, primary
   server relies on static data push configuration in secondary servers. If
   parameter `subscribe` is missing and at the same time secondary servers are
   not configured to push to the primary, the primary server is not going to have
   access to monitoring variables at all and UI will appear broken.

   You need to restart the server when you add or remove parameter `subscribe`,
   but changes to its value after it has been added do not require restart.

   .. note::
       At the time of this release (v1.5.4) only primary server can use subscription
       based push.

 - NET-1241 monitor compares number returned by OID RFC1213:ifNumber with number
   of interfaces it actually discovered by walking various tables in RFC1213 MIB.
   It retries discovery if numbers do not match. This helpds work around corner
   case failure when device reports number of interfaces via ifNumber but then
   silently fails without timeout when we walk RFC1213 MIB tables.

 - add configuration parameter `push.segmentSize`. Its value sets maximum number
   of monitoring variables that can be pushed in one RPC call from monitor to
   server and from server to server. Pushing lots of variables in one call
   requires very large data structures to be created on both sides and makes
   memory footprint of the server worse. Recommented values are in the range
   between 10 and 400. Changes to the value of this parameter require server
   restart.

 - add configuration parameter `push.threads`. This parameter sets number of
   threads used to make data push RPC calls in parallel. This parameter can
   be used in combination with `push.segmentSize` to tune data push to
   make sure it does not require lots of memory but all data can be transferred
   within time interval shorter than polling interval. Changes to the value of
   this parameter require server restart.

 - size of the batch "put" operation for Influxdb 0.9 is configurable and can
   be changed using configuration parameter `monitor.storage.influxdb.putBlockSize`.
   Default value is 1000, changes to this parameter require server restart.

Bug fixes
---------

 - primary server should make all device objects available to to all cluster
   members rather than only those allocated to them. This includes pseudo-devices
   created when python hook scripts create aggregate variables. When these
   pseudo-devices were not pushed to secondary servers, corresponding aggregate
   variables were not accepted by them.

 - fixed bug that made self-monitoring variables disappear sometimes

 - NSGDB-72 fix autoscaling and prefix display for QoS related variables

 - fixed a bug that caused server to throw ConcurrentModificationException in
   python code that called `filter_by_tags(import_var('someVariable'), tags)`,
   then created new device by calling `new_var()` and after this called `aggregate()`
   that iterated over variables returned by `filter_by_tags()`. Exception was
   thrown only when this sequence was called for the very first time when
   call to `new_var()` actually created new device, all subsequent calls worked as
   expected. The bug was introduced with the new feature that allows running Python
   code in multiple threads.