6.4. How does this work

Function nw2functions.import_var() returns generator that yields copies of net.happygears.nw2.py.MonitoringVariable objects stored in the internal data pool. Copies are returned in order to make it safer and easier to manipulate their time series. If the call to nw2functions.import_var() returned original objects stored in the internal data pool, all functions that operate on their time series would need to be careful to only modify the last observation but not touch all other because these observations are used for graphing, alerts and reports by the server and an error in the Python data processing rules script could easily cause irreversible data loss. Returning copies and them merging them back into the data pool when script calls nw2functions.export_var() makes this safer.

In order to reduce total number of large objects created and destroyed while Python data processing script runs, copies returned by nw2functions.import_var() are taken from a separate pool of net.happygears.nw2.py.MonitoringVariable objects so they can be reused. These objects are called anonymous monitoring variables, the size of this shared pool is monitored using variable numAnonVars in category Monitor.

Using pooled anonymous variables greatly reduces number of objects that need to be created and then destroyed on every run of the data processing script. Consider NetSpyGlass server that monitors 1000 devices and total 100,000 interfaces. In order to calculate inbound interface utilization for each interface, it executes the following code:

if_hc_in_octets = import_var('ifHCInOctets')
export_var('ifInRate', mul(rate(if_hc_in_octets), 8))

Call to import_var() returns 100,000 objects (via generator). These objects are manipulated by rate() and mul() and finally returned back to the system by export_var(). Functions rate() and mul() do not create new objects, they modify those passed to them and return the same ones. Function export_var() takes last observation from the time series buffer of the input variables and merges it into time series buffer of the “actual” monitoring variable objects in the main data pool in the server. This makes calculated value available for graphing, reports etc. The same process has to be repeated for all other monitoring variables that describe some metrics related to interfaces. If we have 10 variables like that, every run of the script will use 1 million of nonymous variables. Pooling these objects allows us to avoid creating and destroying a million of objects on every script run. In reality the number is greater because total number of interface-related monitoring variables is greater and the same thing happens with variables that monitor parameters of hardware components, protocols etc.

To effectively reuse shared anonymous variables, they need to be returned to the pool as soon as the script is done using them. This is not an easy task though. The easiest approach is to collect all allocated anonymous variables at the end of the run of the data processing script. This is not very efficient though because it means each call to import_var() takes new set of anonymous variables from the pool, even if they are the same as those already taken before. Also, code sequences as shown above form “chains”: call to import_var() only returns generator, it does not really perform any calculations or data manilupations. It does not take anonymous variables from the pool either, yet. What happens is that function export_var() starts calling the chain of generators mul -> rate -> import_var which takes one anonymous object from the pool, fills it with data, calculates rate, then multiplies and finally merges result back into the main data pool. At this point export_var() releases anonymous variable and it is returned back into the pool of anonymous variables so that next iteration done by import_var() can take the same object from the pool. Now, instead of creating 100,000 anonymous variables we only need one. This dramatically reduces total number of objects that exist in the system at the same time, reducing total required memory footprint of the server.

However this creates certain caveats. For example, dong this would not work:

if_hc_in_octets = list(import_var('ifHCInOctets'))
export_var('ifInRate', mul(rate(if_hc_in_octets), 8))
# do some other calculations with if_hc_in_octets here

Normally, generator returned by the call to import_var() can only be used once. All objects that it yields are consumed by the call to rate() and after that, passing generator object to other functions would not work. It would seem wrapping it in list() is a solution - list comsumes all objects that the generator can produce and holds references to them in a Python list. However, since call to export_var() returns anonymous variables to the pool, the items in that list can not be used again because they have been recycled and are likely to hold data from some other “original” monitoring variables after the call to export_var().

Another problem with wrapping result of import_var() in a list is that it defeats the purpose of returning anonymous variables back to the pool as soon as possible. As was shown above, normally this code sequence needs only one anonymous variable object, however if we use the list, it is going to hold all 100,000 objects at some time and we need 100,000 anonymous variable objects. All these anonymous variables will be recycled at the end of the run of the data processing script, so it does not leak them, but the server is going to have to hold them all in memory for some time.

Important

Do not wrap result of the call to import_var() in a Python list to use the same anonymous variables more than once. Instances in this list become invalid and operations with them may yield unexpected and unpredictable results if they are used after the call to export_var() and some other functions from module nw2functions

Call to import_var() is cheap, it only creates iterator. Filling returned anonymous variables with data is also very efficient and does not take whole lot of time. Holding on to the prepared anonymous variables does not speed script execution and is unnecessary. Just call import_var() again if you need to perform different operations with the same monitoring variables.

Another interesting case with recycling of anonymous variables is related to filtering functions such as nw2functions.filter_by_tags() or nw2functions.skip_nans(). Consider the following code snippet (taken from one of the examples below in this section):

if_out_rate = filter_by_tags(import_var('ifOutRate'), ['ifBGP4Peer.AS174', '!VariableTags.Aggregate'])
aggr = new_var('Cogent', 'peering')
aggregate(aggr, skip_nans(if_out_rate))
aggr.addTag('VariableTags.Aggregate')
export_var('ifOutRate', aggr)

here we import variable ifOutRate but only want instances that have tag ‘ifBGP4Peer.AS174’ and not ‘VariableTags.Aggregate’. In other words, out of 100,000 instances of the variable ifOutRate we want to do something with only a few. Call to import_var() will still use 100,000 anonymous variables (sequentially, not at the same time) but call to filter_by_tags() will pass through only a few. What happens to those that it does not pass through? If nothing, then they would linger in the state “checked out from the pool but not returned” until the end of the run of data processing script and we are back to the situation when we need 100,000 anonymous variables. To avoid this problem, function nw2functions.filter_by_tags() actually recycles variables it does not pass through.

The same happens with function nw2functions.skip_nans() that skips monitoring variables where last value in the time series buffer is a NaN. This function also recycles variables it does not pass through.