10. nw2functions — Operations with monitoring variables

10.1. Summary

This module provides functions that can be used to analyze or manipulate monitoring variables in NetSpyGlass Python scripts.

10.2. Classes and Functions

nw2functions.aggregate(aggr_var, mvlist)
Parameters:
  • aggr_var (MonitoringVariable) – MonitoringVariable instance: this is the variable where we save the result
  • mvlist (list of MonitoringVariable) – source data
Returns:

first argument aggr_var

This function adds values of the latest observations of variables in mvlist and adds observation with value equal to the calculated sum to aggr_var.

If input list mvlist is empty, this function does not add any observations to aggr_var.

If last observations in all matching input variables are NaN, this function does not add any observations to aggr_var.

Monitoring variable passed as first argument gets additional tag ‘VariableTags.Aggregate’. This function this tag to avoid adding aggregated values to themselves.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Compute sum of outbound traffic through interfaces of all devices that
# have tag 'ifBGP4Peer.AS174'. This tag is added automatically to all interfaces
# that carry BGP peering sessions with AS174 (COGENT). Assing result to
# the new monitoring variable 'ifOutRateCogent'

if_out_rate = query('FROM ifOutRate WHERE ifBGP4Peer=AS174 AND VariableTags!=Aggregate')
# Create new empty variable with given device and component names
aggr = new_var('Cogent', 'peering')
# Calculate aggregate value and assign it to aggr
aggregate(aggr, if_out_rate)
aggr.addTag('VariableTags.Aggregate')
export_var('ifOutRate', [aggr])
nw2functions.aggregation_query(nsgql_query, duration=0)

Beginning with NSG v2021.x , this function is synonymous to query(). These two functions will be merged into one in the future.

nw2functions.alert(name, input, condition, description, ignore_if=None, details=None, links=None, tags=None, duration=0, percent_duration=100, notification_time=0, streams=None, fan_out=True, action_on_clear=0)
Parameters:
  • name – alert name. Only letters (upper and lowercase), numbers and underscore are permitted. The name must start with a letter.
  • input – iterable or generator of MonitoringVariable objects, typically result of a call to query()
  • condition – a function of two arguments that should return boolean
  • description – one-line alert description
  • ignore_if – a function of one argument that returns boolean
  • details (dictionary) – multi-line detailed information about this alert. Data should be stored in the form of key-value pairs in this dictionary. The system will merge this dictionary with dictionary it creates using device and component ids and monitoring variable values that triggered the alert. Pass {} if no additional details information is needed.
  • links – a list of dictionaries with three keys each: ‘label’, ‘icon’, ‘url’. These dictionaries describe external urls that will be used to render as links in the alerting UI in NSG. Key ‘label’ is used to render the label, key ‘icon’ is the file name of the icon that will appear next to the label and ‘url’ is the url it will link to. We expand macros in the label and url
  • tags (iterable (list, set or generator)) – list of strings - these are the tags that will be added to the alert object and corresponding monitoring variable. Each element in this list should be have the form TagFacet.word
  • duration (number) – specifies interval of time, in seconds, during which the value of the input variable must satisfy condition() to trigger the alert
  • percent_duration (number) – alert will be triggered if input value satisfies condition() at least this percentage of the duration time
  • notification_time (number) – minimum notification interval, seconds
  • streams (list of strings) – list of alert notification stream names. Notifications for this alert will be sent there. Default is equivalent to [‘log’], that is, alert will be logged but not sent to any of the outgoing streams such as Pager Duty, Slack, etc.
  • fan_out (boolean) – if True, separate alert is triggered for each MonitoringVariable instance in mvlist (i.e. separate alert for each device+component pair). If this parameter is False, then only one alert is generated and information about devices and components is placed in its details dictionary
  • action_on_clear (number) – specifies the action to be taken when alert is cleared. Possible values: 0 - do nothing 1 - close corresponding ticket/incident immediately

This function triggers alert with name if condition defined by the function condition() is met for all or part of the values of the input variable collected during the latest duration seconds, and function ignore_if() is either undefined, or returns False.

This function iterates over MonitoringVariable instances in mvlist, takes observations from their time series that were collected during the latest duration seconds and applies function condition() with monitoring variable object and corresponding observation values as two arguments. If parameter percent_duration is 100 (default value), then the alert is triggered if all non-NaN observations collected during the latest duration seconds satisfy the condition(). If parameter percent_duration has other non-zero value, it is interpreted as a percentage of the number of non-NaN observations collected inside of the duration interval that must satisfy the condition. Observations with value of NaN are always skipped and do not count towards the percentage. If all observations within specified duration are NaN, this function returns False.

If parameter ignore_if is provided, it should be a lambda or a function of one argument. When it is called, a reference to the same MonitoringVariable that was passed to condition is passed to it. The alert is ignored, even if the main condition function indicates that it should be active, if this function is present and returns True. This function is called only once when it has been determined that the condition is met and the alert should be active. Function ignore_if acts as a suppression condition. It can be used to build alerts with dependencies, that is, when an alert activates only when another alert it depends on is not active. Another use case for ignore_if is when alert should be suppressed when certain tag is present or missing in the set of tags of the input monitoring variable.

There is a difference in how parameter ignore_if is processed for non-fanout alerts (those with parameter fan_out=False). Since alerts like that process on many input variables before it is determined if the alert should activate, and function ignore_if should be called only once once it is decided that the alert should activate, then there is no single input variable to pass to this function as its first argument. Because of this reason, the first argument is always None when this function is called for non-fanout alerts.

Parameter notification_time specifies minimal interval between notifications (seconds). If the value is 0 (which is the default), then we send notification on each cycle when alert is triggered. If the value is negative, notification is never sent. This can be useful to create alert that is always “silenced” but still creates alerting monitoring variable, that in turn can be used to create other alerts or viewed in dashboards. Finally, non-zero positive number makes the system send notifications not more often then the value of this parameter specifies. If the alert stops firing sooner than this amount of time since the last notification has passed, the timer is reset. In this case, notification is going to be sent as soon as the alert starts firing again.

Parameter streams is a list of strings which are interpreted as names of outbound alert notification streams configured in the main configuration file block alerts.streams. Default is equivalent to [‘log’], that is, alerts are logged but not sent to Pager Duty, Slack or any other services.

Parameter action_on_clear defines the action to be taken when alert gets cleared. There are two possible values: [0, 1]. When it is set to 0 or not specified at all, NSG does nothing. If it is set to 1, the PagerDuty and ServiceNow notification streams make API call to respective service to resolve incident corresponding to the alert.

Typical usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
alert(
    name='cpu_load_high',
    input=import_var('cpuUtil'),
    condition=lambda mvar, value: value > 75,
    description='CPU utilization is over 75% for 50% of time for the last 10 min',
    details={},
    duration=600,
    percent_duration=50,
    notification_time=300,    # send notification once in 5 min
    fan_out=True
)

Here is an example of alert that fires when BGP session goes into a state that is not “established” while corresponding interface is “up”:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def intf_down(mvar):
    # MonitoringVariable instance passed as an argument is an instance of `bgpPeerState`
    # variable. Find corresponding `ifOperState` variable using device id and tag `BGP4PeerAddress`
    for tag in mvar.getTagsInFacet('BGP4PeerAddress'):
        query('FROM ifOperState WHERE deviceId={0} AND BGP4PeerAddress={1}'.format(mvar.ds.deviceId, tag))
        return mvar.timeseries.getLastNonNaNValue() > 1
    return False

def alert_bgp_down(log):

    # take instances of variable `bgpPeerState`and check if the value
    # is not 6 ("established"). If this condition is satisfied,, check if the interface
    # that is responsible for this BGP session is in operational state "up"

    alert(
        name='BGPDown',
        input=query('FROM bgpPeerState'),
        condition=lambda mvar, value: value < 6,
        ignore_if=lambda mvar: intf_down(mvar),
        description='BGP Session is down but interface is up',
        details={},
        notification_time=300,
        streams=['log', 'slack'],
        fan_out=True
    )

You can use parameter tags to supply list of tags that will be added to the alert and monitoring variable created for this alert. Each tag should be in the usual format of TagFacet.word.

You can use macros in details and description fields of the alert. We use Velocity engine to expand macros internally, so the syntax is the same as in device and report templates (basically, variable name prepended with a “$”). You can access alert object while it is under construction to insert values of its other fields into description and details:

  • $alert.deviceId device Id for the device that triggered alert. This is valid only for fan-out alerts
  • $alert.deviceName device name, also valid only for the fan-out alerts
  • $alert.componentIndex component or interface index, valid only for fan-out alerts
  • $alert.componentName component or interface name, valid only for fan-out alerts
  • $alert.variable identifier of the corresponding alerting variable, this can be used to construct urls for graphs
  • $alert.inputVariable identifier of the corresponding input variable, this can be used to construct urls for graphs. This may not be available, for example when alerting rule used temporary variable.
  • $alert.value the value of the input variable that triggered alert
  • $alert.key unique deduplication key for the alert
  • $alert.fanout true or false, indicates whether this is a fan-out alert
  • $alert.tags a set of strings. For fan-out alerts this is a copy of tags from the input variable that triggered alert. For non fan-out alert this is a common set of tags from all input variable instances that contributed to the alert.
  • $alert.activeSince a time stamp (time in milliseconds) when alert entered state “active”
  • $alert.activeSinceStr time when alert entered state “active” as a string in time zone specified by the main configuration file parameter network.display.tz
  • $alert.getTags("facet") returns tags in given facet (both facet and tag word, e.g. “BGP4Peer.AS1000”)
  • $alert.getTagWords("facet") returns tags in given facet, but unlike $alert.getTags("facet"), returns only tag words.

Here is a practical example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
alert(
    name='BGPDown',
    input=self.bgp_state_for_intf_up(),
    condition=lambda mvar, value: value < 6,
    description='BGP Session is down but interface is up: $alert.getTagWords("BGP4Peer"), ' +
                '$alert.getTagWords("BGP4PeerAddress")',
    details={},
    notification_time=300,
    fan_out=True,
    streams=['log', 'slack']
)
nw2functions.copy(mv)
Parameters:mv (MonitoringVariable) – original variable
Returns:new variable that is a copy of mv

Make a deep copy of the monitoring variable, including its tags and time series buffer. Original variables are not modified in any way.

nw2functions.copy_list(mvlist)
Parameters:mvlist – list of MonitoringVariable objects or generator that yields MonitoringVariable objects

Generator that yields copies of MonitoringVariable instances from the list mvlist.

nw2functions.current_cycle_number()
Returns:(a number) returns current cycle number
nw2functions.current_timestamp()
Returns:(a number) timestamp for all new observations in the current cycle, ms

Returns timestamp value of new observations created in the current monitoring cycle, in milliseconds. The value is suitable for use in net.happygears.nw2.time_series_buffer.TimeSeriesBuffer.put()

nw2functions.execution_id()
Returns:(a number) returns internally generated unique id generated for each call to Python script
nw2functions.export_alert(alert)

“Export” generated alert by making grpc call to the alert manager

Parameters:alert – Alert object (protobuffer NSG.Alert)
nw2functions.export_var(name, mvar)
Parameters:
  • name – monitoring variable name
  • mvar – list or generator of MonitoringVariable objects

“Export” monitoring variable to Java environment. Example:

1
2
3
in_octets = import_var('ifHCInOctets')
in_rate = mul(median(rate(in_octets, POINTS_FOR_MEDIAN)), 8)
export_var('ifInRate', in_rate)

Important

Monitoring variable instances passed to export_var() via its second argument are processed by the server, added to the internal data pool and then recycled. Their contents may change after the call to export_var() at any moment. Do not use these monitoring variables even if you pass them to export_var() wrapped in Python list.

nw2functions.filter_by_tags(mvlist1, tags)
Parameters:
  • mvlist1 – a Query, list, or generator that yields MonitoringVariable instances
  • tags – list of strings - tags to match MonitoringVariable objects from mvlist1 (see below)
Returns:

Query or a generator that yields MonitoringVariable instances from mvlist1

This is legacy function that will be deprecated in the future. Only Dropbox script rules.py uses it for historical reasons. First argument always comes from previous call to import_var() and therefore must always be an instance of QueryBuilder

Returns Query or a generator that yields list of MonitoringVariable instances filtered by combination of tags

This function does not modify the list passed as argument

Tags in tags have the following format: ‘TagFacet.word’. Tag string can be preceded with ‘!’ to indicate negation, that is, that this tag must not be present: ‘!TagFacet.word’. Tags in tags` are combined using logical AND operation. That is, for the tags list:

['Facet1.word', 'Facet2.word', '!Facet3.word', '!Facet4.word']

the expression is:

1
2
'Facet1.word' in mvar.tags and 'Facet2.word' in mvar.tags and
    not 'Facet3.word' in mvar.tags and not 'Facet4.word' in mvar.tags

Example:

1
2
3
if_out_rate = import_var('ifOutRate')
filtered_var = filter_by_tags(if_out_rate, ['ifBGP4Peer.AS174'])
aggr_var = reduce(lambda x, y: x + y, filtered_var)

Call to filter_by_tags() acts as a filter and returns only instances of if_out_rate that have tag ‘ifBGP4Peer.AS174’. Call to reduce() uses provided function to accumulate sum of values of last observation in each instance and assigns it to the newly create monitoring variable object. This works because Java class MonitoringVariable has “magic” function __radd__() with semantics suitable for this kind of operation.

nw2functions.import_var(name)

“Import” monitoring variable from Java environment. This functions makes and returns copies of the MonitoringVariable instances. Example:

in_octets = import_var('ifHCInOctets')
Parameters:name – monitoring variable name
Returns:Iterator that returns MonitoringVariable objects. If name is unknown, valueIterator returns nothing.
nw2functions.join(mvlist1, mvlist2)
Parameters:
  • mvlist1 (list or generator) – list of MonitoringVariable objects
  • mvlist2 (list or generator) – list of MonitoringVariable objects
Returns:

yields tuples of two MonitoringVariable instances

This function is similar to the standard Python zip() except it matches monitoring variables from two lists by their device and h/w component attributes rather than picking them up sequentially like Python’s zip().

See also

left_join()

nw2functions.left_join(mvlist1, mvlist2)
Parameters:
  • mvlist1 (list or generator) – list of MonitoringVariable objects
  • mvlist2 (list or generator) – list of MonitoringVariable objects
Returns:

yields tuples of two MonitoringVariable instances

This function is similar to the standard Python zip() except it matches monitoring variables from two lists or generators by their device and h/w component attributes rather than picking them up sequentially like Python’s zip(). Unlike join(), this function also returns items from the list mvlist1 that have no matching item in mvlist2. In cases like this, returned tuple has None as a second item.

See also

join()

nw2functions.new_var(device_name, component_name, description='')
Parameters:
  • device_name – new device name for this variable
  • component_name – new component name
  • description – optional component description
Returns:

new MonitoringVariable instance

Find variable with given name, device and component, or create it if it does not exist. Here device and component are referred by name. This function can be used to create persistent aggregate variables with value that survives server restart.

nw2functions.polling_interval()
Returns:(a number) polling interval value, sec

Returns polling interval value, in seconds. This can be used in rules to scale rates to polling interval.

nw2functions.query(nsgql_query, duration=0)

Import monitoring variables from Java environment by running NsgQL query that begins with clause “FROM” and optionally can have clause “WHERE”. Variable name should follow the “FROM” keyword (it acts as an SQL “table”). This functions returns MonitoringVariable instances constructed on the fly, with time series data downloaded from TSDB. This makes it possible to use this function in NSG servers to compute something using input variables that are not present in the memory pool of the server that runs the script. This is typically the case with aggregate calculations.

IMPORTANT: NsgQL queries used with this method are subject to the following limitations: operators ‘DISTINCT’, ‘GROUP BY’, ‘LIMIT’, ‘OFFSET’ are not supported.

This function has access to the complete global set of monitoring variables rather than only those stored in the data pool in memory of the server that executes it.

This function is useful for the aggregation calculations.

Behavior of this function depends on the execution context. In the Python app that performs aggregation calculations this function executes NsgQL query and returns iterable that returns MonitoringVariable objects. In the context of the Python app that generates alerts, this function simply returns its argument and lets internal alerting framework execute the query later. However, the call format is identical in both cases and the Python script designer does not have to worry about this difference.

Examples:

1
2
in_octets = aggregation_query('FROM ifHCInOctets')
in_octets = aggregation_query('FROM ifHCInOctets WHERE ifDescription=TP')
Parameters:
  • nsgql_query – NsgQL query
  • duration – time series duration in seconds (this function always retrieves the most recent data points). If not provided or value is 0, then only the most recent point is returned.
Returns:

Iterator that returns MonitoringVariable objects. If name is unknown, valueIterator returns nothing.

nw2functions.set_ctx(ctx)
nw2functions.skip_nans(mvlist)

This function acts as a filter and returns only those instances of MonitoringVariable objects from the input mvlist that have last observation in their time series with a value that is not NaN. This function is implemented like this:

return filter(lambda x: not x.timeseries.isLastNaN(), mvlist)

Note that this function skips monitoring variable instances with last value NaN and those with empty time series.

This filter can be used in combination with aggregate(). If some instances of the input variable passed to aggregate() have last value NaN, the result of calculation is also NaN because addition of a NaN and a number yields NaN, which breaks aggregate value. To avoid this, pre-filter variables using skip_nan() before sending them to aggregate():

1
2
3
4
5
6
7
if_out_rate = filter_by_tags(import_var('ifOutRate'), ['ifBGP4Peer.AS174', '!VariableTags.Aggregate'])
# Create new empty variable with given device and component names
aggr = new_var('Cogent', 'peering')
# Calculate aggregate value and assign it to aggr. Input variable is filtered through :func:`skip_nans()`.
aggregate(aggr, skip_nans(if_out_rate))
aggr.addTag('VariableTags.Aggregate')
export_var('ifOutRate', [aggr])
Parameters:mvlist
Returns:
nw2functions.tsmax(mvar)

Calculates maximum value from the time series buffer of mvar

Example:

1
2
3
4
from nw2functions import tsmax

assert isinstance(mv, MonitoringVariable)
value = tsmax(mv)
Parameters:mvar – monitoring variable instance
Returns:(number) calculated value
nw2functions.tsmean(mvar)

Calculates average value of across all observations in the time series buffer of mvar

Example:

1
2
3
4
from nw2functions import tsmean

assert isinstance(mv, MonitoringVariable)
value = tsmean(mv)
Parameters:mvar – monitoring variable instance
Returns:(number) calculated value