1.15. Release Notes 1.5.4¶
NetSpyGlass v1.5.4
1.15.1. Important¶
This version of NetSpyGlass (1.5.4) can work with both Java7 and Java8, however performance and memory handling are better when running on Java8.
Next release of NetSpyGlass (1.6.0) will require Java8. Please upgrade machines you use to run NetSpyGlass.
Support for Influxdb 0.8 has been suspended. This version still works with InfluxDb 0.8 but support for it will be removed in the next release of NetSpyGlass.
1.15.2. Improvements and New Features¶
this release introduces distributed repository of device objects based on Zookeeper. Now secondary servers can create pseudo-devices too, for example when they compute aggregate variables.
Numerous improvements have been made to reduce Java heap usage and reduce garbage collection pauses in large installations (several thousands of devices and up to 1 million of variables in the primary server). See Performance tuning for more details.
the following fields have been removed or modified to improve memory footprint:
- DataSource.constraints has been removed
- DataSource.oid now has type String (NET-1224)
- MonitoringVariable.statistics is now created on demand and not stored permanently with monitoring variable object
Several improvements in the standard python rules script helped significantly reduce time needed to process monitoring data in large NetSpyGlass installations. For example, operation that copies latest value of ifAlias variable to the description field of other interface-related variable has been implemented in Java instead of Python. Another operation that has moved to Java code is the function that copies tags ifOperStatus.Up and ifOperStatus.Down form variable ifOperStatus to other interface-related variables
graphite connector does not try to maintain persistent connection to the Carbon collector server anymore, instead, it opens connection to upload data and then closes it when done.
configuration parameter monitor.storage.expireVariablesForOneDeviceAtATime has been deprecated.
new configuration parameter monitor.storage.graphite.uploadSpreadTime can be used to control time interval over which graphite connector spreads data upload. The interval is defined as a fraction of the polling intrval and the value of this parameter is a floating-point number between 0 and 1.
beginning with this version it is possible to run NSG monitors in active-active configuration. When two or more monitors can be configured with identical or overlapping allocation lists in the cluster.conf configuration file, NetSpyGlass assigns devices that match allocation specification evenly between these monitors to spread the work. When one monitor goes offline, devices are automatically reallocated to others with matching allocation configuration. Here is an example of cluster.conf file using this feature. Monitors mon1 and mon2 have identical values for their allocation parameter. This means the server is going to distribute devices that fall into subnets ${SUBNETS} between these two monitors. Each monitor pushes collected variables to its respective secondary server (leaf1 and leaf2) which in turn push to the primary server:
PUSH_VARS = ${graphingWorkbench.variables} SUBNETS = [ "10.1.1.0/24", "10.1.2.0/24", "10.1.3.0/24" ] cluster { members = [ { name = PrimaryServer role = primary }, { name = leaf1 role = secondary, push = [ { server = PrimaryServer, variables = ${PUSH_VARS} } ] }, { name = leaf2 role = secondary, push = [ { server = PrimaryServer, variables = ${PUSH_VARS} } ] }, { name = mon1 role = monitor, allocation = ${SUBNETS}, push = [ { server = leaf1, variables = ["*"] } ] }, { name = mon2 role = monitor, allocation = ${SUBNETS}, push = [ { server = leaf2, variables = ["*"] } ] },this version introduces new mechanism for the data push from secondary servers to the primary server, it is based on the subscription. To activate, add parameter
subscribe
to the definition of the primary server in the file cluster.conf:cluster { members = [ { name = PrimaryServer role = primary subscribe = ${graphingWorkbench.variables} }, { name = leaf1 role = secondary, }, { name = leaf2 role = secondary, },The value of parameter subscribe is a list of variable names, e.g.
subscribe = [ ifInRate, ifOutRate ]
. Value shown in the example is a copy of the variables that appear in Graphing Workbench, this is a reasonable default for the primary server.Note how in the example above secondary servers do not have parameter push anymore. This parameter is unnecessary because the primary can find variables automatically and ask their “owners” (servers “leaf1” and “leaf2”) to push them.
Subscription is different from the regular data push configured via parameter push in secondary server configuration in that secondary servers are do not to push variables into the primary unless these variables are used by some UI or JSON API queries. In this case the primary starts with no variables and subscribes to some of them whenever UI tries to access them. This helps reduce number of variables in the data pool of the primary server, which is useful when NetSpyGlass cluster works with several millions of monitoring variables.
Subscription based push is activated only when parameter subscribe is present and its value is not an empty list. If this parameter is not there, primary server relies on static data push configuration in secondary servers. If parameter subscribe is missing and at the same time secondary servers are not configured to push to the primary, the primary server is not going to have access to monitoring variables at all and UI will appear broken.
You need to restart the server when you add or remove parameter subscribe, but changes to its value after it has been added do not require restart.
Note
At the time of this release (v1.5.4) only primary server can use subscription based push.
NET-1241 monitor compares number returned by OID RFC1213:ifNumber with number of interfaces it actually discovered by walking various tables in RFC1213 MIB. It retries discovery if numbers do not match. This helpds work around corner case failure when device reports number of interfaces via ifNumber but then silently fails without timeout when we walk RFC1213 MIB tables.
add configuration parameter push.segmentSize. Its value sets maximum number of monitoring variables that can be pushed in one RPC call from monitor to server and from server to server. Pushing lots of variables in one call requires very large data structures to be created on both sides and makes memory footprint of the server worse. Recommented values are in the range between 10 and 400. Changes to the value of this parameter require server restart.
add configuration parameter push.threads. This parameter sets number of threads used to make data push RPC calls in parallel. This parameter can be used in combination with push.segmentSize to tune data push to make sure it does not require lots of memory but all data can be transferred within time interval shorter than polling interval. Changes to the value of this parameter require server restart.
size of the batch “put” operation for Influxdb 0.9 is configurable and can be changed using configuration parameter monitor.storage.influxdb.putBlockSize. Default value is 1000, changes to this parameter require server restart.
1.15.3. Bug fixes¶
- primary server should make all device objects available to to all cluster members rather than only those allocated to them. This includes pseudo-devices created when python hook scripts create aggregate variables. When these pseudo-devices were not pushed to secondary servers, corresponding aggregate variables were not accepted by them.
- fixed bug that made self-monitoring variables disappear sometimes
- NSGDB-72 fix autoscaling and prefix display for QoS related variables
- fixed a bug that caused server to throw ConcurrentModificationException in python code that called filter_by_tags(import_var(‘someVariable’), tags), then created new device by calling new_var() and after this called aggregate() that iterated over variables returned by filter_by_tags(). Exception was thrown only when this sequence was called for the very first time when call to new_var() actually created new device, all subsequent calls worked as expected. The bug was introduced with the new feature that allows running Python code in multiple threads.