13. nw2functions — Operations with monitoring variables

13.1. Summary

This module provides functions that can be used to analyze or manipulate monitoring variables in NetSpyGlass Python scripts.

13.2. Classes and Functions

nw2functions.add(mvlist1, mvlist2)
Parameters:
  • mvlist1 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects
  • mvlist2 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects or a number
Returns:

generator that yields MonitoringVariable objects

Add last values of the time series of the corresponding items from two lists of MonitoringVariables or add a constant to the last data point in the time series in the list passed as first argument

This function modifies members of mvlist1 and returns generator that yields them.

When both mvlist1 and mvlist2 are lists of MonitoringVariable objects, this function “aligns” items from these lists by matching their device and index attirbutes and performs the calculation using latest values from the time series of matching variable pairs. Then, it clears time series of the item from mvlist1 and puts the result back into the same time series. When this function yields items from mvlist1, they have only one observation in their time series. If matching monitoring variable from mvlist2 has empty time series, the result also has empty time series.

When mvlist2 is a number, this function performs operation on the last observation from the time series of items in mvlist1 and this number and also returns items mvlist1, with their time series holding just one observation/

nw2functions.aggregate(aggr_var, mvlist)
Parameters:
  • aggr_var (MonitoringVariable) – MonitoringVariable instance: this is the variable where we save the result
  • mvlist (list of MonitoringVariable) – source data
Returns:

first argument aggr_var

This function adds values of the latest observations of variables in mvlist and adds observation with value equal to the calculated sum to aggr_var.

If input list mvlist is empty, this function does not add any observations to aggr_var.

If last observations in all matching input variables are NaN, this function does not add any observations to aggr_var.

Monitoring variable passed as first argument gets additional tag ‘VariableTags.Aggregate’. This function this tag to avoid adding aggregated values to themselves.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Compute sum of outbound traffic through interfaces of all devices that
# have tag 'ifBGP4Peer.AS174'. This tag is added automatically to all interfaces
# that carry BGP peering sessions with AS174 (COGENT). Assing result to
# the new monitoring variable 'ifOutRateCogent'

if_out_rate = filter_by_tags(import_var('ifOutRate'), ['ifBGP4Peer.AS174', '!VariableTags.Aggregate'])
# Create new empty variable with given device and component names
aggr = new_var('Cogent', 'peering')
# Calculate aggregate value and assign it to aggr
aggregate(aggr, if_out_rate)
aggr.addTag('VariableTags.Aggregate')
export_var('ifOutRate', [aggr])
nw2functions.alert(name, input, condition, description, details=None, tags=None, duration=0, percent_duration=100, notification_time=0, streams=None, fan_out=True, on_active=None, on_clear=None)
Parameters:
  • name – alert name
  • input – list or generator of MonitoringVariable objects
  • condition – a function of two arguments that should return boolean
  • description – one-line alert description
  • details (dictionary) – multi-line detailed information about this alert. Data should be stored in the form of key-value pairs in this dictionary. The system will merge this dictionary with dictionary it creates using device and component ids and monitoring variable values that triggered the alert. Pass {} if no additional details information is needed.
  • tags (iterable (list, set or generator)) – list of strings - these are the tags that will be added to the alert object and correcponding monitoring variable. Each element in this list should be have the form TagFacet.word
  • duration (number) – specifies interval of time, in seconds, during which the value of the input variable must satisfy condition() to trigger the alert
  • percent_duration (number) – alert will be triggered if input value satisfies condition() at least this percentage of the duration time
  • notification_time (number) – minimum notification interval, seconds
  • streams (list of strings) – list of alert notification stream names. Notifications for this alert will be sent there. Default is equivalent to [‘log’], that is, alert will be logged but not sent to any of the outgoing streams such as Pager Duty, Slack, etc.
  • fan_out (boolean) – if True, separate alert is triggered for each MonitoringVariable instance in mvlist (i.e. separate alert for each device+component pair). If this parameter is False, then only one alert is generated and information about devices and components is placed in its details dictionary
  • on_active – a name of the external script called when alert becomes active. Information is passed to the script through environment variables. Default value of this argument is None.
  • on_clear – a name of the external script called when alert goes from the state active to state cleared. Information is passed to the script through environment variables. Default value of this argument is None.

This function triggers alert with name if condition defined by the function condition() is met for all or part of the values of the input variable collected during the latest duration seconds

This function iterates over MonitoringVariable instances in mvlist, takes observations from their time series that were collected during the latest duration seconds and applies function condition() with monitoring variable object and corresponding observation values as two arguments. If parameter percent_duration is 100 (default value), then the alert is triggered if all non-NaN observations collected during the latest duration seconds satisfy the condition(). If parameter percent_duration has other non-zero value, it is interpreted as a percentage of the number of non-NaN observations collected inside of the duration interval that must satisfy the condition. Observations with value of NaN are always skipped and do not count towards the percentage. If all observations within specified duration are NaN, this function returns False.

Parameter notification_time specifies minimal interval between notifications (seconds). If the value is 0 (which is the default), then we send notification on each cycle when alert is triggered. If the value is negative, notification is never sent. This can be useful to create alert that is always “silenced” but still creates alerting monitoring variable, that in turn can be used to create other alerts or viewed in dashboards. Finally, non-zero positive number makes the system send notifications not more often then the value of this parameter specifies. If the alert stops firing sooner than this amount of time since the last notification has passed, the timer is reset. In this case, notification is going to be sent as soon as the alert starts firing again.

Parameter streams is a list of strings which are interpreted as names of outbound alert notification streams configured in the main configuration file block alerts.streams. Default is equivalent to [‘log’], that is, alerts are logged but not sent to Pager Duty, Slack or any other services.

Typical usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
alert(
    name='cpu_load_high',
    input=import_var('cpuUtil'),
    condition=lambda mvar, value: value > 75,
    description='CPU utilization is over 75% for 50% of time for the last 10 min',
    details={},
    duration=600,
    percent_duration=50,
    notification_time=300,    # send notification once in 5 min
    fan_out=True
)

Here is an example of alert that fires when BGP session goes into a state that is not “established” while corresponding interface is “up”:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def bgp_state_for_intf_up(self):
    '''
    this function returns instances of the variable `bgpPeerState` that correspond to
    peering interfaces in op state "up".
    '''
    if_oper_up = filter(lambda x: x == 1, import_var('ifOperStatus'))
    bgp_peer_state = import_var('bgpPeerState')
    for pair in join_by_tags(if_oper_up, bgp_peer_state, ['BGP4PeerAddress']):
        op_status, bgp_state = pair
        yield bgp_state

def execute(self):

    # take filtered instances of variable `bgpPeerState` (only those that correspond to
    # peering interfaces in state "up") and trigger alert if the value is not 6 ("established").
    # It is assumed that "interface down" condition is tracked by another alert somewhere.
    alert(
        name='BGPDown',
        input=self.bgp_state_for_intf_up(),
        condition=lambda mvar, value: value < 6,
        description='BGP Session is down but interface is up',
        details={},
        notification_time=300,
        streams=['log', 'slack'],
        fan_out=True
    )

You can use parameter tags to supply list of tags that will be added to the alert and monitoring variable created for this alert. Each tag should be in the usual format of TagFacet.word.

You can use macros in details and description fields of the alert. We use Velocity engine to expand macros internally, so the syntax is the same as in device and report templates (basically, variable name prepended with a “$”). You can access alert object while it is under construction to insert values of its other fields into description and details:

  • $alert.deviceId device Id for the device that triggered alert. This is valid only for fan-out alerts
  • $alert.deviceName device name, also valid only for the fan-out alerts
  • $alert.componentIndex component or interface index, valid only for fan-out alerts
  • $alert.componentName component or interface name, valid only for fan-out alerts
  • $alert.variable identifier of the corresponding alerting variable, this can be used to construct urls for graphs
  • $alert.inputVariable identifier of the corresponding input variable, this can be used to construct urls for graphs. This may not be available, for example when alerting rule used temporary variable.
  • $alert.value the value of the input variable that triggered alert
  • $alert.key unique deduplication key for the alert
  • $alert.fanout true or false, indicates whether this is a fan-out alert
  • $alert.tags a set of strings. For fan-out alerts this is a copy of tags from the input variable that triggered alert. For non fan-out alert this is a common set of tags from all input variable instances that contributed to the alert.
  • $alert.activeSince a time stamp (time in milliseconds) when alert entered state “active”
  • $alert.activeSinceStr time when alert entered state “active” as a string in time zone specified by the main configuration file parameter network.display.tz
  • $alert.getTags("facet") returns tags in given facet (both facet and tag word, e.g. “BGP4Peer.AS1000”)
  • $alert.getTagWords("facet") returns tags in given facet, but unlike $alert.getTags("facet"), returns only tag words.

Here is a practical example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
alert(
    name='BGPDown',
    input=self.bgp_state_for_intf_up(),
    condition=lambda mvar, value: value < 6,
    description='BGP Session is down but interface is up: $alert.getTagWords("BGP4Peer"), $alert.getTagWords("BGP4PeerAddress")',
    details={},
    notification_time=300,
    fan_out=True,
    streams=['log', 'slack']
)

Parameter on_active (if provided and if the value is not None) should be a string that consists of the full path and optionally command line arguments (separated by spaces) for an external script or a program that NetSpyGlass should call when alert goes from state ‘cleared’ to state ‘active’. Informtation is passed to this script via environment variables with names composed from alert_ and alert object field name, such as alert_name, alert_deviceName and so on. The set of fields and therefore set of environment variables is the same as the set of variables you can access via macros (see above). Here is an example of possible set of environment variables (all values are strings):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
alert_timeLastNotificationSent=1435620481208
alert_componentIndex=4294968065
alert_activeSince=1435620360000
alert_matchingSilenceId=0
alert_activeSinceStr=2015-06-29 16:26:00 PDT
alert_deviceId=91
alert_notificationSent=true
alert_duration=600.0
alert_tags=[Explicit.datanodes_hbase0, Explicit.datanodes_hbase0_foo, Explicit.hbase0, Explicit.hbase0_foo, Model.linux, Role.Server, Role.datanodes_hbase0, Role.hbase0, Vendor.NetSnmp]
alert_inputVariable=cpuUtil.91.4294968065
alert_variable=busyCpuAlert.91.4294968065
alert_deviceName=img6
alert_streams=['pagerduty', 'log']
alert_name=busyCpuAlert
alert_updatedAt=1435620360000
alert_key=910a2493f8ef37e47cd37d80748a4f75
alert_silenced=false
alert_active=true
alert_fanout=true
alert_value=80.7270970106975
alert_details={deviceId=91, index=4294968065, slack_channel=#netspyglass, variable=busyCpuAlert.91.4294968065}
alert_notificationTimeMs=60000.0
alert_percentage=20.0
alert_description=img6:cpu2 : CPU utilization is over 75% for 20% of time for the last 10 min
alert_componentName=cpu2

NetSpyGlass creates new thread that launches the script and waits for it to complete. There is hard time out for the script, equal to the duration of the monitoring interval in NetSpyGlass. The script must terminate before this time out expires or it will killed by NetSpyGlass. No information can be passed from the script back to NetSpyGlass.

Parameter on_clear works in a way similar to on_active, except the script is called when alert clears (goes from the state active to state clear).

nw2functions.average(mvlist)
Parameters:mvlist (list or generator) – list of monitoring variables
Returns:yields members of the same list

For each monitoring variable object in mvlist, compute average of values of observations, then clear the time series and put computed value back. This function yields the same monitoring variable instances, but their time series have only one observation. The time stamp of these observations is that of the last observation of the input time series.

nw2functions.compute_sum_for_clusters(name)
Parameters:name – monitoring variable name

Compute aggregated values for monitoring variables tracking aggregated links terminating on clusters. This function does not modify variables referred to by name and returns nothing, but it creates new or updates existing variables as needed. Calculated values are added to variables with the same name as name. Variables referred to by name must have been already exported before the call to this function. Example:

1
2
3
4
in_octets = import_var('ifHCInOctets')
in_rate = mul(median(rate(in_octets, POINTS_FOR_MEDIAN)), 8)
export_var('ifInRate', in_rate)
compute_sum_for_clusters('ifInRate')
nw2functions.concat(mvlist1, mvlist2)
Parameters:
  • mvlist1 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects
  • mvlist2 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects

Generator that concatenates monitoring variables from two lists or generators. This function does not modify variables or their time series, it just concatenates two lists or generators.

nw2functions.condition_with_duration(mvar, condition, duration_sec, percent_duration=100)
Parameters:
  • mvar – MonitoringVariable instance (not a list or generator!)
  • condition – a function of one argument that returns boolean
  • duration – interval of time, seconds
  • percent_duration (number) – alert will be triggered if input value satisfies condition() at least this percentage of the duration time
Returns:

True or False

iterate over values in the time series of mvar for the latest interval_sec and check if all values satisfy condition (a function of single argument that returns boolean). If any observation during this interval has value of NaN, it is skipped. If at least one observation does not satisfy the condition, this function returns False. It returns True only if all non-NaN observations satisfy the condition. If all observations that fall within the interval have value of NaN, this function returns False. If interval_sec has value less than polling interval, only the latest observation in the time series is tested for the condition. If this observation has value of NaN, this function returns False.

if the length of the time series in mvar is shorter than specified duration, this function returns False

this function is used internally to generate alerts that trigger when value of the monitoring variable meets certain condition for specified period of time, such as “CPU utilization is over 50% for 5 min”.

nw2functions.copy(mv)
Parameters:mv (MonitoringVariable) – original variable
Returns:new variable that is a copy of mv

Make a deep copy of the monitoring variable, including its tags and time series buffer. Original variables are not modified in any way.

nw2functions.copy_attrs(mv_dst, mv_src)
Parameters:
  • mv_dst – MonitoringVariable object
  • mv_src – MonitoringVariable object, list or generator that yields MonitoringVariable objects

Copy attributes from :py:mv_src to :py:mv_dst

nw2functions.copy_if_oper_status_tag(variable_names)
Parameters:variable_names – list of variable names to copy ifOperStatus tag from variable ifOperStatus to

this function speeds up typical operation of copying current tag ifOperStatus.Up / ifOperStatus.Down from variable ifOperStatus to other interface-related variables

nw2functions.copy_ifalias(variable_names)
Parameters:variable_names – list of variable names to copy value of ifAlias to

Copy latest value of ifAlias to “description” field of variables identified by names. Note that this does not affect the value of these variables in any way, this function updates their “description” field that appears in the data table in Graphing Workbench. This function is implemented in Java for speed because it can take significant time in NetSpyGlass servers that monitor many interfaces (hundreds of thousands)

nw2functions.copy_list(mvlist)
Parameters:mvlist – list of MonitoringVariable objects or generator that yields MonitoringVariable objects

Generator that yields copies of MonitoringVariable instances from the list mvlist.

nw2functions.copy_tags(source_variable, tag_facet_name, variable_names)
Parameters:
  • source_variable – (string) the name of the variable to take tags from
  • tag_facet_name – (string) copy all tags in this facet
  • variable_names – (list of strings) list of variable names to copy tags to

this function speeds up typical operation of copying tags in one facet from one variable to many

nw2functions.current_timestamp()
Returns:(a number) timestamp for all new observations in the current cycle, ms

Returns timestamp value of new observations created in the current monitoring cycle, in milliseconds. The value is suitable for use in net.happygears.nw2.time_series_buffer.TimeSeriesBuffer.put()

nw2functions.dbscan(mvar, n_points, eps)
Find outliers in the observations in the time series of the monitoring variable mvar using DBSCAN algorithm See https://en.wikipedia.org/wiki/DBSCAN
Parameters:
  • mvar – MonitoringVariable
  • n_points – number of observations in the time series to analyse, counting backwards from the last
  • eps – maximum radius of the neighborhood to be considered. This value affects sensitivity of the algorithm and is measured in units of the values in the time series. Make this smaller to make algoritm find outliers with values closer to the average.
Returns:

True if one or several observations in the time series are outliers

This function can be used with nw2functions.alert() as alert condition. Normally, nw2functions.alert() calls its condition function several times, passing the value of a different observation from the time series each time. This assumes that the condition function processes one observation value at a time. Function dbscan() is different, it scans several observations in the time series to find an outlier. There is no need to call dbscan() several times because this will lead to it making the same calculations multiple times. Instead, configure the alert with parameter duration=0 (or equal to polling interval in seconds) to make it call its condition function only one time and pass number of observations you want to take into account as a second argument to dbscan(). You can skip parameter percent_duration, too.

Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
alert(
    name='cpu_load_high',
    input=import_var('cpuUtil'),
    condition=lambda mvar, _: dbscan(mvar, 10, 50),   # this looks at last 10 observations
    description='Found spike in the CPU utilization during the last 10 min',
    details={},
    duration=0,
    notification_time=300,    # send notification once in 5 min
    fan_out=True
)

Assuming polling interval of 1 min, call to dbscan(mvar, 10, 50)() will analyse last 10 observations, or the latest 10 min of data, and will detect observations with values that differ from others by more than given threshold. In this case, threshold was chosen to be “50”. The value of the threshold is in the units of the variable, in the case of cpuUtil it is percentages, but this does not mean the condition triggers when cpu utilization goes over the fixed threshold of 50%. Function dbscan() returns True if the value deviates from the average by approximately 50%. For example, if the value is close to 20% most of the time but sometimes spikes above 70%, then the call to dbscan() will return True.

nw2functions.derivative(mvlist, limit=1)
Parameters:
  • mvlist – monitoring variable list
  • limit – number of observations at the tail end of the time series of mvlist to process
Returns:

yields new modified MonitoringVariable objects

This function calculates derivative of values from the time series of items of mvlist. Time series of monitoring variables from mvlist are modified in place in a such way that upon return they hold no more than limit observations. These observations have values equal to the derivative of the input values and are calculated using observations taken from the end of the input time series. If input has NaNs mixed with a non-NaN values, NaNs are skipped. This function attempts to find necessary number of non-NaN values in the input to produce limit number of observations for the output.

Unlike rate(), this function does not try to compensate for the counter roll-over and can produce negative results.

nw2functions.div(mvlist1, mvlist2)
Parameters:
  • mvlist1 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects
  • mvlist2 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects or a number
Returns:

generator that yields MonitoringVariable objects

Divide two lists of MonitoringVariables or list and a constant

This function modifies members of mvlist1 and returns generator that yields them.

When both mvlist1 and mvlist2 are lists of MonitoringVariable objects, this function “aligns” items from these lists by matching their device and index attirbutes and performs the calculation using latest values from the time series of matching variable pairs. Then, it clears time series of the item from mvlist1 and puts the result back into the same time series. When this function yields items from mvlist1, they have only one observation in their time series. If matching monitoring variable from mvlist2 has empty time series, the result also has empty time series.

When mvlist2 is a number, this function performs operation on the last observation from the time series of items in mvlist1 and this number and also returns items mvlist1, with their time series holding just one observation/

nw2functions.ema(input_mv, avg_mv, n_periods)
Parameters:
  • input_mv – list of generator of MonitoringVariable instances: input variable
  • avg_mv – list of generator of MonitoringVariable instances: aggregate variables
  • n_periods – number of periods (observations) used to calculate smoothing factor for EMA
Returns:

calculated average

For each monitoring variable object in input_mv, compute exponential moving average of values of observations, then clear the time series and put computed value back. This function yields the same monitoring variable instances provided as second argument (avg_mv), but their time series have only one observation.

See http://en.wikipedia.org/wiki/Moving_average Exponential moving average

workingAverage = (newValue*smoothingFactor) + ( workingAverage * ( 1.0 - smoothingFactor) ) smoothingFactor = 2 / (1 + n_periods)
nw2functions.export_var(name, mvar)
Parameters:
  • name – monitoring variable name
  • mvar – list or generator of MonitoringVariable objects

“Export” monitoring variable to Java environment. Example:

1
2
3
in_octets = import_var('ifHCInOctets')
in_rate = mul(median(rate(in_octets, POINTS_FOR_MEDIAN)), 8)
export_var('ifInRate', in_rate)

Important

Monitoring variable instances passed to export_var() via its second argument are processed by the server, added to the internal data pool and then recycled. Their contents may change after the call to export_var() at any moment. Do not use these monitoring variables even if you pass them to export_var() wrapped in Python list.

nw2functions.filter_by_tags(mvlist1, tags)
Parameters:
  • mvlist1 – a list or generator that yields MonitoringVariable instances
  • tags – list of strings - tags to match MonitoringVariable objects from mvlist1 (see below)
Returns:

yields MonitoringVariable instances from mvlist1

Yield list of MonitoringVariable instances filtered by combination of tags

This function does not modify the list passed as argument

Tags in tags have the following format: ‘TagFacet.word’. Tag string can be preceded with ‘!’ to indicate negation, that is, that this tag must not be present: ‘!TagFacet.word’. Tags in tags` are combined using logical AND operation. That is, for the tags list:

['Facet1.word', 'Facet2.word', '!Facet3.word', '!Facet4.word']

the expression is:

1
2
'Facet1.word' in mvar.tags and 'Facet2.word' in mvar.tags and
    not 'Facet3.word' in mvar.tags and not 'Facet4.word' in mvar.tags

Example:

1
2
3
if_out_rate = import_var('ifOutRate')
filtered_var = filter_by_tags(if_out_rate, ['ifBGP4Peer.AS174'])
aggr_var = reduce(lambda x, y: x + y, filtered_var)

Call to filter_by_tags() acts as a filter and returns only instances of if_out_rate that have tag ‘ifBGP4Peer.AS174’. Call to reduce() uses provided function to accumulate sum of values of last observation in each instance and assigns it to the newly create monitoring variable object. This works because Java class MonitoringVariable has “magic” function __radd__() with semantics suitable for this kind of operation.

See also

join_by_tags()

nw2functions.find_step(mvar, threshold1, threshold2)
Parameters:
  • mvar (MonitoringVariable) – MonitoringVariable instance
  • threshold1 – threshold “before”
  • threshold2 – threshold “after” the change
Returns:

list of Observation objects

Scan timeseries of mvar to find all consequtive pairs of observations with values changing from threshold1 to threshold2 and return list of second observations in pairs like that. Picked values must satisfy the following crireria:

  • observation1 must have value >= threshold2
  • observation2 must have value <= threshold1
_images/aafig-258cb125947faca4bcc80217aa8b8b47194c149e.svg

Observations 1 and 2 in the pair can be separated by zero or more NaNs

nw2functions.get_devices()
Returns:list of device ids allocated to this thread. When server is configured to run Python rules hook script in one thread (which is the default), this function returns complete list of ids of all devices in the system. When server is configured to run the script in multiple threads in parallel, this function returns list of ids allocated to the current thread. Worker threads get portion of all devices, while aggregator thread gets all of them
nw2functions.get_or_create(var_name, device, component, description='', initial_value=None)
Parameters:
  • var_name – monitoring varible name
  • device_name – device name
  • component_name – component name
  • description – optional component description
  • initial_value – if no instances of this variable exist and we create new one, add observation with this value to its time series
Returns:

monitoring variable instance

Find variable with given name, device and component, or create it if it does not exist. Here device and component are referred by name. This function can be used to create persistent aggregate variables with value that survives server restart. A variable like this can be used as an “accumulator”. The following example calculates monthly inbound traffic volume in bits for the purpose of checking data cap over the link to MyIsp. Input variable is ifInRate, its instances are picked by matching taG Link.MyIsp and the result is stored in the variable ifInMonthlyTrafficBit. Call to nw2functions.get_or_create() finds existing instance of this variable for the device with name MyIsp and component traffic, or creates new one if it does not exist. Since usually data caps are expressed in bytes rather than bits, variable ifInMonthlyTrafficBit is later multiplied by 8 to get ifInMonthlyTrafficByte. Function beginning_of_month() (not shown here) returns True if time stamps of two latest observations in the time series of the variable passed as argument correspond to different months.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def calculate_monthly_traffic(self, input_var, aggregate_mvar):
    '''
    Takes values of the latest observations from variables in input_var and if they are not NaN,
    multiplies it by polling interval and adds the result to the value of the last observation in
    time series of the accumulator variable `aggregate_mvar`

    The output variable value is reset on the first minute of the first day
    of each month.

    This basically integrates values of the input and stores result in the output.
    At any given time, the latest value of the output variable is equal to the accumulated
    integrated value of the input from the beginning of the month up to that moment.

    @param input_var:      input variable (iterable - list or generator)
    @param aggregate_mvar: aggregate variable
    @return: MonitoringVariable instance with the result
    '''
    assert isinstance(aggregate_mvar, MonitoringVariable)

    input_var_list = list(skip_nans(input_var))

    agg_val = aggregate_mvar.timeseries.getLastNonNaNValue()
    if math.isnan(agg_val) or self.beginning_of_month(input_var_list[0]):
        agg_val = 0.0

    # Add new observation with the same value as the currently last one but new
    # time stamp - reduce() will use this as starting value
    aggregate_mvar.timeseries.put(current_timestamp(), agg_val)

    # add up values of latest observations in monitoring variable instances
    # input_var_list and aggregation variable itself
    return reduce(lambda x, y: x + y * polling_interval(), input_var_list, aggregate_mvar)

def execute(self):
    super(MyRules, self).execute()

    traffic_mvar = get_or_create('ifInMonthlyTrafficBit', 'MyIspName', 'traffic', '', 0.0)
    traffic_mvar = self.calculate_monthly_traffic(filter_by_tags(import_var('ifInRate'), ['Link.MyIsp:']), traffic_mvar)
    export_var('ifInMonthlyTrafficBit', [traffic_mvar])
    export_var('ifInMonthlyTrafficByte', div(import_var('ifInMonthlyTrafficBit'), 8))
nw2functions.get_scaler(name, mvar)
Parameters:
Returns:

DataScaler object instance

Create and return DataScaler object that can be used to scale values to human-friendly range.

nw2functions.group_by_device(mvlist1)
Parameters:mvlist1 – a list or generator of MonitoringVariable instances
Returns:yields lists of MonitoringVariable instances grouped by “device” attribute.

This function is a generator, it yields lists of MonitoringVariable instances grouped by “device” attribute.

nw2functions.iff(func, cond, then)
Parameters:
  • func – function to be applied to variables in the list cond
  • cond – “condition” list
  • then – variables from this list will be added to the list this method returns if func() return true
Returns:

new list of monitoring variable instances taken from the list then

Using list cond as a condition, yield only those unchanged MonitoringVariable instances from then for which function func() returns true when called with corresponding MonitoringVariable instance from cond as an argument.

This function does not modify both lists.

Use case:

Take ifHCInOctets only for monitoring variables that correspond to interfaces in operational state Up and call function report_traffic():

1
2
3
in_octets = import_var('ifHCInOctets')
op_status = import_var('ifOperStatus')
map(report_traffic, iff(lambda x: x < 2.0, op_status, in_octets))
nw2functions.import_var(name)

“Import” monitoring variable from Java environment. This functions makes and returns copies of the MonitoringVariable instances. Example:

in_octets = import_var('ifHCInOctets')
Parameters:name – monitoring variable name
Returns:Iterator that returns MonitoringVariable objects. If name is unknown, valueIterator returns nothing.
nw2functions.is_aggregator_thread()
Returns:True if this is aggregator thread, False otherwise
nw2functions.is_outlier(mvar, value, n_sigmas)
Parameters:
  • mvar (MonitoringVariable) – MonitoringVariable instance
  • value (number) – observation value to analyze
  • n_sigmas – result will be True if value deviates from the median by n_sigmas or greater number of sigmas (standard deviations)
Returns:

(boolean) returns True is value is an outlier. Returns False if this condition is not satsfied or the time series is empty

Scan timeseries of mvar, compute mean and standard deviation and compare the deviation of value with n_sigmas. Return True if the latest value deviates by n_sigmas or more standard deviations from the mean.

nw2functions.is_worker_thread()
Returns:True if this is a worker thread, False otherwise
nw2functions.join(mvlist1, mvlist2)
Parameters:
  • mvlist1 (list or generator) – list of MonitoringVariable objects
  • mvlist2 (list or generator) – list of MonitoringVariable objects
Returns:

yields tuples of two MonitoringVariable instances

This function is similar to the standard Python zip() except it matches monitoring variables from two lists by their device and h/w component attributes rather than picking them up sequentially like Python’s zip().

nw2functions.join_by_tags(mvlist1, mvlist2, tags)
Parameters:
  • mvlist1 (list or generator) – list of MonitoringVariable objects
  • mvlist2 (list or generator) – list of MonitoringVariable objects
  • tags – list of strings - tags to match MonitoringVariable objects from mvlist1 (see below)
Returns:

yields tuples of two MonitoringVariable instances

This function is similar to the standard Python zip() except it matches monitoring variables from two lists or generators by their tags. Tags in the list tags can be prefixed with ”!” to indicate that corresponding tag should not be present (similar to how filter_by_tags() works). Tags can be either full, e.g. ‘Facet.word’, or just a facet ‘Facet’. In the latter case this function matches monitoring variable instances that have any tags in the corresponding facet. If an item in the list tags is only a facet, variables in pairs this function returns will always have the same tag from this facet. If input variables have multiple tags in this facet, this function returns multiple matching pairs. Tags in the returned monitoring variable objects are not modified and no tags are added or removed.

Examples:

1
2
3
4
if_oper_status = import_var('ifOperStatus')
bgp_peer_state = import_var('bgpPeerState')
for pair in join_by_tags(if_oper_status, bgp_peer_state, ['BGP4PeerAddress.10.0.0.4', '!Explicit.foo']):
    op_status, bgp_state = pair

in this example, we import variables ifOperStatus and bgpPeerState abd match them by tag BGP4PeerAddress.10.0.0.4. This picks up variable that tracks operational status of the peering interface and variable that tracks status of the BGP4 session. This is useful if we want to build an alert that should fire when BGP4 session state is not “6” (“established”) while interface operational status is “1” (“up”). Once we have found matching variables, we’ll need to match the value of the op_status variable and pass bgp_state variable to the function alert().

In the example above we match by tag ‘BGP4PeerAddress.10.0.0.4’ which means we are only interested in this particular BGP4 peer. This can be modified to get pairs of monitoring variables for all BGP peers that exist in the system. To do that, we’ll match by tag facet BGP4PeerAddress:

1
2
3
4
if_oper_status = import_var('ifOperStatus')
bgp_peer_state = import_var('bgpPeerState')
for pair in join_by_tags(if_oper_status, bgp_peer_state, ['BGP4PeerAddress', '!Explicit.foo']):
    op_status, bgp_state = pair

Match ‘!Explicit.foo’ is provided to demonstrate how to use negative tag matching to filter some variables out.

See example in nw2functions.alert() that demonstrates how join_by_tags() can be used to establish dependencies between monitoring variables while building an alert.

See also

filter_by_tags()

nw2functions.left_join(mvlist1, mvlist2)
Parameters:
  • mvlist1 (list or generator) – list of MonitoringVariable objects
  • mvlist2 (list or generator) – list of MonitoringVariable objects
Returns:

yields tuples of two MonitoringVariable instances

This function is similar to the standard Python zip() except it matches monitoring variables from two lists or generators by their device and h/w component attributes rather than picking them up sequentially like Python’s zip(). Unlike join(), this function also returns items from the list mvlist1 that have no matching item in mvlist2. In cases like this, returned tuple has None as a second item.

See also

join() , join_by_tags()

nw2functions.linear_regression(mvlist, limit=2)
Parameters:
  • mvlist – list of monitoring variables
  • limit – number of observations at the tail end of the time series of mvlist to process
Returns:

yields new modified MonitoringVariable objects

This function applies linear regression to limit observations at the tail end of the time series in each monitoring variable in mvlist and replaces these observations with values calculated using computed regression parameters. This can be useful in a few special cases. For example, it helps “fill the gaps” when data has NaNs. Another case is when NetSopyGlass collects a counter metric from a device that updates its internally at an interval that is greater than polling interval in NetSpyGlass. Since NetSpyGlass polls faster than the counter update interval, it gets the same value in consequitive polling cycles. The counter looks like it does not change, then it “jumps” and does not change again, and so on. It is difficult to calculate rate() of a counter like that because in general it is unknown what time interval to use in that calculation. One of the ways to compensate for this problem is to use linear regression to build a model that fits last few observations and calculate rate of the result using this model.

The number of observations used to compute regression is the lesser of limit and the size of the time series. This function returns original monitoring variable unchanged if its time series is empty or contains only one observation.

Upon completion, the time series contains calculated observations for the time interval covered by last limit observations in the original time series. This function calculates new value for each polling interval.

..note:
Approximation leads to loss of accuracy and should be used as a last resort when other methods do not work.
nw2functions.log(mvlist)
Parameters:mvlist – list or generator of monitoring variables
Returns:yields new modified MonitoringVariable objects

This function calculates logarithm by base e of the last value in time series of items in mvlist. Time series of monitoring variables from mvlist are modified in place in a such way that upon return they hold only one observation.

nw2functions.log10(mvlist)
Parameters:mvlist – list or generator of monitoring variables
Returns:yields new modified MonitoringVariable objects

This function calculates logarithm by base 10 of the last value in time series of items in mvlist. Time series of monitoring variables from mvlist are modified in place in a such way that upon return they hold only one observation.

nw2functions.max_observation(mvar)
Parameters:mvar – input monitoring variable
Returns:Observation object (that can be 0.0 if timeseries is empty or all observations in the timeseries are NaNs)

Find observation with maximum value in the timeseries of the input monitoring variable This is equivalent to writing in Python:

max(mvar.timeseries)

but uses Java implementation and improves performance by a factor of 10

nw2functions.median(mvlist)
Parameters:mvlist (list or generator) – list of monitoring variables
Returns:yields new MonitoringVariable objects

For each monitoring variable object in mvlist, compute median of values of observations, then clear the time series and put computed value back. This function yields the same monitoring variable instances, but their time series have only one observation. The time stamp of these observations is that of the last observation of the input time series.

nw2functions.mul(mvlist1, mvlist2)
Parameters:
  • mvlist1 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects
  • mvlist2 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects or a number
Returns:

generator that yields MonitoringVariable objects

Multiply two lists of MonitoringVariables or list and a constant

This function modifies members of mvlist1 and returns generator that yields them.

When both mvlist1 and mvlist2 are lists of MonitoringVariable objects, this function “aligns” items from these lists by matching their device and index attirbutes and performs the calculation using latest values from the time series of matching variable pairs. Then, it clears time series of the item from mvlist1 and puts the result back into the same time series. When this function yields items from mvlist1, they have only one observation in their time series. If matching monitoring variable from mvlist2 has empty time series, the result also has empty time series.

When mvlist2 is a number, this function performs operation on the last observation from the time series of items in mvlist1 and this number and also returns items mvlist1, with their time series holding just one observation/

nw2functions.mvar_abs(mvlist)

abs(mvlist)

Parameters:mvlist (list or generator) – list of monitoring variables
Returns:yields members of the same list

For each monitoring variable object in mvlist, compute absolute value of the latest value in its time series, then clear the time series and put computed value back. This function yields the same monitoring variable instances, but their time series have only one observation. The time stamp of these observations is that of the last observation of the input time series.

nw2functions.mvar_zip(mvlist1, mvlist2)

an alias for nw2functions.join() maintained for backward compatibility

nw2functions.new_var(device_name, component_name, description='')
Parameters:
  • device_name – new device name for this variable
  • component_name – new component name
  • description – optional component description
Returns:

new MonitoringVariable instance

Create new monitoring variable object with given device and component names and optional component description. Description can be passed as optional third argument to new_var(). Description appears in the Graphing Workbench.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
out_rate = filter_by_tags(import_var('ifOutRate'), ['ifBGP4Peer.AS174', '!VariableTags.Aggregate'])
if out_rate:
    # Create new empty variable with given device and component names
    aggr = new_var('Cogent', 'peering')
    # Calculate aggregate value and assign it to aggr
    aggregate(aggr, out_rate)
    # set tag used to avoid adding this variable to itself
    aggr.addTag('VariableTags.Aggregate')
    # finally export calculated aggregate variable under the same variable name.
    # NetSpyGlass will merge this new variable instance into the list of
    # other variables with this name
    export_var('ifOutRate', [aggr])
nw2functions.percentile(mvar, percentage)
Calculate percentile value of observations in monitoring variable mvar
Parameters:
  • mvar (MonitoringVariable) – MonitoringVariable instance
  • percentage – threshold for percentile calculation, as percentage of the total number of observations
Returns:

a number, calculated percentile value or NaN if time series of mvar is empty

nw2functions.polling_interval()
Returns:(a number) polling interval value, sec

Returns polling interval value, in seconds. This can be used in rules to scale rates to polling interval.

nw2functions.query_tsdb(triplet, start_time, n_points=1)
Parameters:
  • triplet (string) – Triplet that describes monitoring variable (e.g. “cpuUtil.1.2”)
  • start_time – start time, timestamp
  • n_points – how many observations to retrieve from tsdb
Returns:

list of observations

Query Time Series Database to get n_points observations for the monitoring variable identified by triplet, beginning with time start_time. Start time is in the form of timestamp. Data retrieved from TSDB is returnes as a list of Observation objects

This function is blocking; it only returns when database query completes.

Note

Calls to this function are expensive (potentially may take a long time to complete) and therefore it should be used sparingly. Always watch monitoring variable freeTime (category Monitor in the Graphing Workbench) to make sure the server does not run out of time.

Example 1: Run query using known triplet and store results in the list observations:

1
2
3
now = time.time()
# clear time series of `mvar`, then add one observation that was taken exactly 24hr ago
observations = query_tsdb('cpuUtil.1.2', now-24*3600, 1)

In the following more meaningful example we prepare aggregation monitoring variable with name ifInMonthlyTrafficBit that is used to calculate monthly traffic cap, that is, sum of all traffic that crosses interface to MyIsp over period of one month. You can find the rest of the code that does it in Calculating total monthly traffic value (data cap). One of the problems with calculation like that is that right after server restart the aggregation variable does not exist and needs to be initialized with the value it had before the restart. Normally NetSpyGlass server does this automatically by saving values of all variables to a set of files on the file system before it shuts down and loading these values back when it restarts. This mechanism may break down if server completely crashes, or when administrator moves the server to new machine but does not copy files of this “frozen state”. In the case like that the server starts up with empty data pool and can not automatically recover the value of the aggregation variable. Call to nw2functions.query_tsdb() can help to fill the gap.

First, we call nw2functions.get_or_create() to get or create new variable with name ifInMonthlyTrafficBit, device name MyIsp and “component” name traffic. Since the call to nw2functions.get_or_create() does not include parameter initial_value, time series of this variable is not initialized if the variable does not exist and new one is created. In this case, its timeseries buffer is empty and we call query_tsdb() to try to find time series data in the database. The triplet for the query is constructed using variable name that is known here, as well as device id and index taken from the variable object. If call to nw2functions.query_tsdb() returns non-empty list, take the last observation and put it into time series of the variable, otherwise initialize it with “0”. We call nw2functions..query_tsdb() to get 24hrs of data because we do not know for sure how long ago the last observation occurred.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
traffic_mvar = get_or_create('ifInMonthlyTrafficBit', 'MyIsp', 'traffic')
assert isinstance(traffic_mvar, MonitoringVariable)
if not traffic_mvar.timeseries:
    # get latest 24hr of data from tsdb
    triplet = 'ifInMonthlyTrafficBit.{0}.{1}'.format(traffic_mvar.ds.deviceId, traffic_mvar.ds.index)
    observations = query_tsdb(triplet, time.time() - 24*3600, 24*3600 / polling_interval())
    if observations:
        # I only need the last observation
        traffic_mvar.timeseries.put(observations[-1])
    else:
        # there was nothing in the database
        traffic_mvar.timeseries.put(current_timestamp(), 0.0)
nw2functions.rate(mvlist, limit=1)
Parameters:
  • mvlist – list of monitoring variables
  • limit – number of observations at the tail end of the time series of mvlist to process
Returns:

yields new modified MonitoringVariable objects

This function calculates rate of change of values from the time series of items of mvlist. Time series of monitoring variables from mvlist are modified in place in a such way that upon return they hold no more than limit observations. These observations have values equal to the rate of change of the input values and are calculated using observations taken from the end of the input time series. If input has NaNs mixed with a non-NaN values, NaNs are skipped. This function attempts to find necessary number of non-NaN values in the input to produce limit number of observations for the output.

This function assumes input variable is a counter and tries to compensate for the counter roll-over.

nw2functions.recycle_var(mvar)

Return anonymous MonitoringVariable object back to the pool.

Parameters:mvar – single MonitoringVariable object
nw2functions.set_ctx(ctx)
nw2functions.skip_nans(mvlist)

This function acts as a filter and returns only those instances of MonitoringVariable objects from the input mvlist that have last observation in their time series with a value that is not NaN. This function is implemented like this:

return filter(lambda x: not x.timeseries.isLastNaN(), mvlist)

Note that this function skips monitoring variable instances with last value NaN and those with empty time series.

This filter can be used in combination with aggregate(). If some instances of the input variable passed to aggregate() have last value NaN, the result of calculation is also NaN because addition of a NaN and a number yields NaN, which breaks aggregate value. To avoid this, pre-filter variables using skip_nan() before sending them to aggregate():

1
2
3
4
5
6
7
if_out_rate = filter_by_tags(import_var('ifOutRate'), ['ifBGP4Peer.AS174', '!VariableTags.Aggregate'])
# Create new empty variable with given device and component names
aggr = new_var('Cogent', 'peering')
# Calculate aggregate value and assign it to aggr. Input variable is filtered through :func:`skip_nans()`.
aggregate(aggr, skip_nans(if_out_rate))
aggr.addTag('VariableTags.Aggregate')
export_var('ifOutRate', [aggr])
Parameters:mvlist
Returns:
nw2functions.sub(mvlist1, mvlist2)
Parameters:
  • mvlist1 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects
  • mvlist2 – list of MonitoringVariable objects or generator that yields MonitoringVariable objects or a number
Returns:

generator that yields MonitoringVariable objects

Subtract last values of the time series of the corresponding items from two lists of MonitoringVariables or subtract a constant from the last data point in the time series in the list passed as first argument

This function modifies members of mvlist1 and returns generator that yields them.

When both mvlist1 and mvlist2 are lists of MonitoringVariable objects, this function “aligns” items from these lists by matching their device and index attirbutes and performs the calculation using latest values from the time series of matching variable pairs. Then, it clears time series of the item from mvlist1 and puts the result back into the same time series. When this function yields items from mvlist1, they have only one observation in their time series. If matching monitoring variable from mvlist2 has empty time series, the result also has empty time series.

When mvlist2 is a number, this function performs operation on the last observation from the time series of items in mvlist1 and this number and also returns items mvlist1, with their time series holding just one observation/

nw2functions.tail(mvlist, limit)
Parameters:
  • mvlist – list of monitoring variables
  • limit – number of observations at the tail end of the time series of mvlist to process
Returns:

yields new modified MonitoringVariable objects

Trim time series of monitoring variables in mvlist, leaving no more than limit observations at the end.

nw2functions.wait_at_barrier()

when python rules hook is executed in several parallel threads, the script must call this function before performing any calculations that involve variables from different devices.