8.7. Alert Notifications

Notifications are sent to notification streams according to the following rules:

  1. when alert’s state changes from cleared to active, or this is the first time the alert was created and it is in the state active right away, it needs to send notification to each stream listed in the parameter streams. Before the notification is sent to the streams, NetSpyGlass checks existing silence objects to see if any one of them matches the alert. If the match has been found, notification is suppressed and only log record of the active but silenced alert is made. If matching silence could not be found, notification is delivered to all streams.
  2. if the input variable satisfies alert condition on subsequent calls to alert(), the alert remains in the state active. However, it does not send notification on every monitor cycle. As long as the condition persits, new notifications are sent on the interval defined by the parameter notification_time (in seconds). In the example above this parameter has value 300 sec, this means the alert sends notification every 5 min as long as it remains in the state active.
  3. if on the next call to alert() it is determined that the input variable does not satisfy the condition anymore, alert goes into state cleared. The log record is made at this time. Also, for PagerDuty and ServiceNow streams, when action_on_clear alert’s parameter is set to 1, NetSpyGlass sends API requests to resolve corresponding incidents. If action_on_clear is not specified or set to 0, no attempt to resolve the incident is made.

When NetSpyGlass creates an alert, it generates unique key and stores it with the alert. The key is unique to the alert object. If the alert is “fan-out”, that is, separate alert object has been created for each device and component, then the key is unique to the combination of alert name, device and component. If the alert is not “fan-out”, that is only one alert object has been created for the input monitoring variable and information about matching devices and components is passed via alert field details, then the key is based only on the alert name and does not include device and component. In any case, the key can be passed to the notification service for deduplication. Of the services we currently support, only PagerDuty uses this. You can always use macro $alert.key to expose the key in log records, email or Slack messages.

8.8. Notification streams

Notification streams are configured in the top level parameter alerts in the main configuration file nw2.conf. The default config includes working logging stream configuration and templates for other streams and looks like this:

graph24hr = ${ui.url}"emb_graph.html?update=true&networkId=1&intervalHr=24&width=500&height=300&vars=$alert.inputVariable"
graph12hr = ${ui.url}"emb_graph.html?update=true&networkId=1&intervalHr=12&width=500&height=300&vars=$alert.inputVariable"
graph6hr  = ${ui.url}"emb_graph.html?update=true&networkId=1&intervalHr=6&width=500&height=300&vars=$alert.inputVariable"

alerts {

    # the size of the internal queue used to accumulate alerts when alerting
    # service is not available. changes to this parameter require server restart
    queueSize = 1000

    # list of "services" we send alert notifications to. Multiple services can
    # be enabled at the same time. You need to restart the server when you add
    # or remove a service, however restart is not required
    # for changes to service configuration (even if you change web hook url)

    streams {

        #--------------------------------------------------------------------
        # PagerDuty
        # Change your service id
        # You can use macros in `description` and `details`

        pagerduty {
            type = pagerduty

            # See PagerDuty developers documentation under Integration / Events
            triggerUrl = "https://events.pagerduty.com/generic/2010-04-15/create_event.json"

            # API service key. Find it in PagerDuty under Configuration / Services
            service = "xxxxxxxxxxxxxxxxxxxxxxxxxxx"

            # Name of the API client, this appears in PagerDuty dashboards and incidents lists
            client = "NetSpyGlass"

            # PagerDuty constructs "View in NetSpyGlass" link using parameters `client` and `clientUrl`
            clientUrl = ${ui.url}

            # this appears in the incident view under "Details"
            details {
                "device": "$alert.deviceName"
                "component": "$alert.componentName"
                "value": "$alert.value"
            }

            # Contexts to be included with the incident trigger such as links to graphs or images. See
            # PagerDuty developers docs.
            contexts = [
                {
                    type: link
                    href: ${ui.url}
                },
                {
                    type: image
                    src: ${graph24hr}
                }
            ]
        },

        #--------------------------------------------------------------------
        # Slack:
        # you can use macros in fields `channel` and `template`. This makes it
        # possible to pass Slack channel name from the alert definition via
        # `$alert.details.slack_channel`

        slack : {
            type = slack
            channel = "#netspyglass"
            username = netspyglass
# replace this with your webhook url
            webHookUrl = "https://hooks.slack.com/services/TTTTTTTTT/BBBBBBBBB/01234567890"
            template = """
*$alert.name* : $alert.deviceName : $alert.componentName
$alert.description | latest value: $alert.value
active since: $alert.activeSinceStr | """ "<"${graph24hr}"|graph(24hr)> | <"${graph12hr}"|graph(12hr)> | <"${graph6hr}"|graph(6hr)>"
        },

        #--------------------------------------------------------------------
        # email notifications
        # change `hostName`, `smtpAuth` (if applicable) and other parameters.
        # You can use macros in `subject` and `message`

        email {
            type = email,
            hostName = "your_smtp_gateway",
            port = 587,
            smtpAuth {
                user = "mailsender",
                password = "enter_password_here",
            },
            startTLSEnabled = true,
            startTLSRequired = false,
            from = "netspyglass@company.com",
            to = "alerts@company.com",
            subject = "$alert.variable | $alert.deviceName | $alert.componentName | active since: $alert.activeSinceStr",
            message = """
$alert.name : $alert.deviceName : $alert.componentName
$alert.description
latest value: $alert.value
active since: $alert.activeSinceStr

"""${graph24hr}"""
"""${graph12hr}"""
"""${graph6hr}

        },

        #--------------------------------------------------------------------
        # log (File logs/alerts.log)
        # use macros in `template`

        log : {
            type = logger
            path = ${home}"/logs/alerts.log"
            template = "$alert.variable | $alert.deviceName | $alert.componentName | active since: $alert.activeSinceStr"
        }
    }
}

You can find this in the copy of the default configuration provided as part of the NetSpyGlass distribution package, see the file /opt/netspyglass/current/doc/default_config/netspyglass.conf

Parameter alerts.queueSize defines the capacity of the internal message queue maintained by each notification stream (except log). Notifications are added to the queue when alerts trigger and send them. Each stream takes messages from this queue and delivers them to the respective service asynchronously and a background thread, with timeout and exponential back-off if it detects problems with the service.

Actual notification streams are configured inside of the dictionary alerts.streams. Each stream is described by an item in this dictionary, where the key is stream name and the contents is its configuration. The word used as a key in dictionary alerts.streams is stream name; it is this word that should be placed in the argument streams in the call to function alerts() that creates the alert. Stream type (e.g. “log”, “email” etc) is defined by the attribute with key type in the corresponding stream dictionary. You can have multiple items inside of alerts.streams with the same type but different names.

Note

This structure of the configuration parameters allows you to have several streams of the same type but with different names and different configuration. For example, you could have different streams of the type “email” that send email to different destination addresses, or separate streams of type “slack” that post to different Slack channels. Each of these streams would have distinct name, which allows you to route notifications created by different alerts to different email recipients or different Slack channels.

8.8.1. Properties

Some parameters of notification streams can be constructed using the same macros used to construct alert attributes (see also nw2functions.alert()). These macros refers properties of the object that encapsulates the alert and the corresponding device (if any).

  • $alert.deviceId device Id for the device that triggered alert. This is valid only for fan-out alerts
  • $alert.deviceName device name, also valid only for the fan-out alerts
  • $alert.componentIndex component or interface index, valid only for fan-out alerts
  • $alert.componentName component or interface name, valid only for fan-out alerts
  • $alert.variable identifier of the corresponding alerting variable, this can be used to construct urls for graphs
  • $alert.inputVariable identifier of the corresponding input variable, this can be used to construct urls for graphs. This may not be available, for example when alerting rule used temporary variable.
  • $alert.value the value of the input variable that triggered alert
  • $alert.key unique deduplication key for the alert
  • $alert.fanout true or false, indicates whether this is a fan-out alert
  • $alert.tags a set of strings. For fan-out alerts this is a copy of tags from the input variable that triggered alert. For non fan-out alert this is a common set of tags from all input variable instances that contributed to the alert.
  • $alert.tagsMap a dictionary where the key is a tag facet and the value is a list of tags in that facet. You can use it to get tags only in specific facet, e.g. $alert.tagsMap.Role will return list of roles of the device that triggered the alert.
  • $alert.activeSince a time stamp (time in milliseconds) when alert entered state “active”
  • $alert.activeSinceStr time when alert entered state “active” as a string in time zone specified by the main configuration file parameter network.display.tz
  • $alert.getTags("facet") returns tags in given facet (both facet and tag word, e.g. “BGP4Peer.AS1000”)
  • $alert.getTagWords("facet") returns tags in given facet, but unlike $alert.getTags("facet"), returns only tag words.
  • $alert.details returns a dictionary passed as argument details to function nw2functions.alert()

Properties of the object returned by $device (note that some of the properties have two names):

  • $device.name device name
  • $device.reverse_dns reverse DNS (PTR) record for the device’s management address
  • $device.sys_name system name defined in SYSTEM MIB
  • $device.location location defined in the SYSTEM MIB
  • $device.contact contact name defined in the SYSTEM MIB
  • $device.description “system description” returned by SYSTEM:sysDescr OID
  • $device.sys_descr “system description” returned by SYSTEM:sysDescr OID
  • $device.box_descr “box description”: usually model name and number
  • $device.boxdescr “box description”: usually model name and number
  • $device.sw_rev software revision
  • $device.hostname host name as defined on the device
  • $device.address management address of the device
  • $device.deviceaddress management address of the device
  • $device.tags set of this device tags

8.8.2. Mixing Macros and configuration parameter expansion

NetSpyGlass configuration file supports parameter expansion that uses syntax ${path.to.variable}. This looks similar to the macros you can use in the alert and notification fields, but is not the same. Configuration file parameter expansion refers to other parameters in the confgiuration file and only works for unquoted parameters. Configuration parametrs are not expanded inside of the quoted strings. Alert and notification macros, on the other hand, are only expanded if they are part of particular configuration parameters and are inside of quoted strings. Consider the template used to build email notification in the default configuration. It looks like this:

message = """
$alert.name : $alert.deviceName : $alert.componentName
$alert.description
latest value: $alert.value
active since: $alert.activeSinceStr

"""${graph24hr}"""
"""${graph12hr}"""
"""${graph6hr}

This configuration is an example of a mix of both configuration parameter expansion (${graph24hr}, ${graph12hr} and ${graph6hr}) and macros ($alert.name, $alert.description etc). Configuration parameters graph24hr, graph12hr and graph6hr are defined in the same configuration file and are used to construct URLs that open graph for the variable that triggered the alert. You can find definition of these parameters in the same confgiuration example above.

Note how this template uses multi-line strings with triple quotes ‘”””’ and how alert field macros are located inside of the triple-quoted text, but config file parameter expansions such as ${graph24hr} are outside. Also, we had to use triple-quoted text again to put links to 24h, 12h and 6h graphs on separate lines.

Configuration parameters ${graph24hr}, ${graph12hr} and ${graph6hr} are expanded early, when configuration file is loaded. At this time configuration parameter message is constructed by concatenating all strings it is composed of, including the result of config. parameter expansion for ${graph24hr} and others. As the reuslt, parameter message is a multi-line text with some alert and notification macros inside (these macros got there both because they are directly entered as part of message value and because they are part of the graph24hr value). The value of the parameter message is used as a template to build email message body when notification is about to be sent, at which time alert and notification macros are expanded.

8.8.3. Log

This is the simplest notification stream. By default it writes log to the file /opt/netspyglass/var/logs/alerts.log. The log record is constructed by expanding macros in the string defined by the parameter template in the configuration of this stream.

8.8.4. Email

Email stream requires the following configuration parameters:

  • type = email
  • hostName : this is an address or host name of the mail gateway to be used. This parameter is mandatory.
  • port : port number to use. This parameter is mandatory
  • smtpAuth.user : if mail gateway requires authentication, this should the user name to use. This parameter is optional
  • smtpAuth.password : if authentication is required, put the plain text password here. This parameter is optional
  • startTLSEnabled : enable TLS. This parameter is optional
  • startTLSRequired : require TLS in communication with the gateway. This parameter is optional
  • from : the address to use in the “From”
  • to : destination address to send email to. This parameter is mandatory.
  • subject : mail subject. You can use macros in this field.
  • message : mail body. You can also use macros in this field. Note that confgiuration file used by NetSpyGlass supports multi-line text if it is included in triple quotes (see the default configuration above for an example).

8.8.5. Slack

Slack stream requires the following configuration parameters:

  • type = slack

  • channel : channel name to post to. This overrides channel configured in the web hook in Slack. Note that you can override this again by adding channel name to the parameter details in the call to function alert() like this:

    alert(
      name='bigChangeInVariables',
      input=import_var('numVars'),
      condition=lambda mvar, x: compare_to_mean(mvar, x),
      description='Big change in the number of monitoring variables',
      details={'slack_channel': '#netspyglass'},
      notification_time=300,
      streams=['slack', 'log'],
      fan_out=True
    )
    

    Passing dictionary item with key ‘slack_channel’ as part of the parameter details makes Slack connector use this channel instead of the one configured in the notification stream. Slack channel should exist before the first message can be posted to it.

  • username : this is the name of the user that will appear to be the author of messages posted to Slack channel.

  • webHookUrl : web hook url. It looks similar to this: https://hooks.slack.com/services/TTTTTTTTT/BBBBBBBBB/01234567890

  • template : this is the template for the message to be posted to Slack. This can be a multi-line text with macros.

8.8.6. PagerDuty

Note

PagerDuty notification stream using PagerDuty APIv1 has been deprecated

PagerDuty stream uses PagerDuty APIv2 ( https://developer.pagerduty.com/api-reference/docs/CONCEPTS.md ). To create this notification stream, add configuration dictionary with unique name and type pagerduty under alerts. It supports the following configuration parameters:

  • type = pagerduty
  • apiVersion : (optional). PagerDuty API version to use. The value can be 1 or 2 (numeric). If this parameter is missing, NSG will try to use APIv2 by default
  • apiKey : API service key. Create it in PagerDuty under Configuration / API Access Keys. IF this parameter is not defined in the configuration of the notification stream, NetSpyGlass will still be able to submit alerts to PagerDuty but won’t be able to query for incident number and state (see below)
  • verbose : supported values are 0 and 1, setting this to 1 turns additional logging on to help with troubleshooting
  • integrationKey : The default integration key used to route triggered alerts to PagerDuty service. You can override this in each alert to route them to different services, so the key set in the notification stream configuration acts as a default or “catch all” only. Integration key appears in the a PagerDuty API’s service detail page under “Integrations”.
  • deduplicationKey : the value of this parameter can accept macros and defaults to $alert.key. It is used to set incident key in PagerDuty and is used for deduplication of alerts when they are assigned to an incident
  • summary : the value of this parameter is used to generate Summary field of the PagerDuty alert. You can use macros in this field.
  • details : data provided in this dictionary appears in the incident view under “Details”. Entries defined in this dictionary are merged with those provided as the value of the parameter details in the call to alert() function that creates the alert. You can use details in the notification stream configuration as the default, letting details dictionary defined with each alert extend or update the default.

8.8.6.1. Deduplication

NetSpyGlass automatically sends unique key to PagerDuty to ensure that repeated notifications generated for the same NSG alert do not open new incidents in PagerDuty. This facility is always on and does not require any additional configuration in NetSpyGlass and PagerDuty, however you can change the way this key is composed using configuration parameter deduplicationKey. It defaults to ‘$alert.key’ which is a combination of the alert name, device id and component index that triggered it. For example, if NetSpyGlass generates several alerts for different components of the same device, such as an alert for high cpu load, another for high memory utilization on the same device, and so on, by default these alerts will create separate incidents in PagerDuty. If you want to merge them ito one incident associated with the device, set configuration parameter deduplicationKey to the device name using macro:

deduplicationKey = "$alert.deviceName"

Note

If you want some alerts to create incidents based only on the device name, while others use full alert key, you need to create two PagerDuty notification streams with different names and different configurations.

8.8.6.2. Example

The following example describes notification stream that uses PagerDuty APIv2:

alerts {
    streams {
        pagerduty-v2 {
            type = pagerduty
            apiVersion = 2  # 2 is the default, this parameter is optional
            # API service key. Find it in PagerDuty under Configuration / Services
            apiKey = "arRvZ4sd4B1kHiW2Rjvk"
            verbose =  1

            # "Integration Key" appears on a Generic API's service detail page under "Integrations".
            # integration key defined here is the default, you can override it using key
            # `integrationKey` in the dictionary `details` of an individual alert
            integrationKey = "fd334bv340014a5addb334d348140763"  # "Test Service"

            deduplicationKey = "$alert.key"   # the default

            # "summary" field for the PD API call.
            summary = "$alert.name [$alert.deviceName] : $alert.description"

            # this appears in the incident view under "Details"
            details {
                "alert_name": "$alert.name",
                "device": "$alert.deviceName"
                "component": "$alert.componentName"
                "value": "$alert.value"
                "detected_by" : "${alert.server}"
            }
        },

To set alert severity and provide routing key (integration key), use parameter details in the call to function alert(). Alert defined in this example will be routed to PD service “SJC Datacenter” because it provides its own integration key:

alert(
    name='memoryUtilizationCritical',
    input=query('FROM memUtil WHERE Location = "SJC" AND Vendor = "Cisco"'),
    condition=lambda _, value: value >= 90,
    description='memory utilization is over 90% for 10 min',
    duration=600,
    percent_duration=100,
    notification_time=3600,
    streams=['log', 'pagerduty-v2'],
    fan_out=True,
    details={
        'severity': 'critical',
        'integrationKey': "112bf4aea5844b0a9d8e5cc0c38fc9b6"  # integration key for the service "SJC Datacenter",
        'team': 'SJC Datacenter'
    },
    action_on_clear=1
)

Note that you can add any entries to dictionary details provided with the alert definition, as well as the default one which is defined with the notification stream configuration. Example above demonstrates how you could add team name. These entries appear in the alert and incident details in PagerDuty UI.

8.8.6.3. Query

You can query for PagerDuty incidents and their status using NsgQL ( ref:nsgql). Incident number, state, id, and url, are supported and appear as columns in the table alerts. The column name consists of the stream name, dot, and the column name. Here is an example of a query that reads alert name, device, component and associated PagerDuty incident information, if any. We assume here the stream name is pagerduty-v2 as used in the example above:

nsgql "select name, device, component, pagerduty-v2.incident, pagerduty-v2.state, pagerduty-v2.id, pagerduty-v2.url from alerts "
--------------------------------+----------+----------------------------+-----------------------+--------------------+-----------------+--------------------------------------------
name                            | device   | component                  | pagerduty-v2.incident | pagerduty-v2.state | pagerduty-v2.id | pagerduty-v2.url
--------------------------------+----------+----------------------------+-----------------------+--------------------+-----------------+--------------------------------------------
busy_cpu_alert                  | gw-colo  | Routing Engine             | NULL                  | NULL               | NULL            | NULL
busy_cpu_alert                  | carrier  | cpu1                       | NULL                  | NULL               | NULL            | NULL
busy_cpu_alert                  | carrier  | cpu2                       | NULL                  | NULL               | NULL            | NULL
busy_cpu_alert                  | carrier  | cpu3                       | NULL                  | NULL               | NULL            | NULL
noisy_interface_alert_ifOutRate | c3560g-1 | Gi0/12                     | 200                   | resolved           | PMTAWFC         | https://api.pagerduty.com/incidents/PMTAWFC
noisy_interface_alert_ifOutRate | c3560g-1 | Po1                        | 198                   | resolved           | PRSOYZY         | https://api.pagerduty.com/incidents/PRSOYZY
noisy_interface_alert_ifOutRate | trigger  | ens3                       | 212                   | resolved           | PZUJCJH         | https://api.pagerduty.com/incidents/PZUJCJH
noisy_interface_alert_ifOutRate | carrier  | docker0                    | 203                   | resolved           | PXNKNK7         | https://api.pagerduty.com/incidents/PXNKNK7
noisy_interface_alert_ifOutRate | carrier  | enp13s0                    | 202                   | resolved           | PUQ4T9J         | https://api.pagerduty.com/incidents/PUQ4T9J
noisy_interface_alert_ifOutRate | carrier  | enp14s0                    | 215                   | resolved           | P5LGBYI         | https://api.pagerduty.com/incidents/P5LGBYI
noisy_interface_alert_ifOutRate | carrier  | ovsbr0p1                   | 201                   | resolved           | PBYDUBZ         | https://api.pagerduty.com/incidents/PBYDUBZ
noisy_interface_alert_ifOutRate | ex2200   | ge-0/0/0                   | 206                   | resolved           | P8EV2E2         | https://api.pagerduty.com/incidents/P8EV2E2
noisy_interface_alert_ifOutRate | ex2200   | ge-0/0/10                  | 213                   | resolved           | POZMAFE         | https://api.pagerduty.com/incidents/POZMAFE
noisy_interface_alert_ifOutRate | ex2200   | ge-0/0/5                   | 208                   | resolved           | PVKJLG3         | https://api.pagerduty.com/incidents/PVKJLG3
test_alert_new                  | ex2200   | FPC: EX2200-48T-4G @ 0/*/* | 216                   | resolved           | P1O0Y2T         | https://api.pagerduty.com/incidents/P1O0Y2T
test_alert_new                  | ex2200   | Routing Engine 0           | 217                   | resolved           | PCLAWKK         | https://api.pagerduty.com/incidents/PCLAWKK
--------------------------------+----------+----------------------------+-----------------------+--------------------+-----------------+--------------------------------------------

Alerts that do not use PagerDuty notification or haven’t opened any incidents yet return NULL for all PagerDuty columns.

Incident parameters that are supported at this time:

  • id (incident id)
  • incident (incident number)
  • state (incident state)
  • url (incident url in PagerDuty UI)
  • dedupKey (incident deduplication key)

8.8.7. Jira

NetSpyGlass alert notifications can open Jira issues if notification stream with type jira is used in the alert definition. Jira stream requires the following configuration parameters:

  • type = jira
  • webHookUrl this should be the host part of the url of your Jira without any path part. For example, this can be https://happygears.atlassian.net
  • auth.user and auth.password user name and password of the user account in your Jira that NetSpyGlass will use to make Jira API calls. The user should have permissions to create issues in the project. We recommend using descriptive user name (e.g. “netspyglass”) because it is visible in the list of Jira issues.
  • projectKey Jira projrect key where issues will be created. Note that Jira has two ways to refer to projects, issue types and other items: there is “key” and there is internal “id”. The key is what you see in the UI. Project key appears in the UI under the list of projects and is used as a prefix in issue keys. For example, if issue key is NSGA-1234, then “NSGA” is the project key. The value of the parameter projectKey should be the project key string.
  • issueTypeName the name of the issue type NetSpyGlass should assign to issues it creates. This must be one of the valid types in the Jira schema you are using. The value of this parameter is case-sensitive: use “Task” rather than “task”.
  • deduplicationField a name of the custom field in Jira schema that NetSpyGlass uses to deduplicate issues. See below for more details.
  • openedStates The list of Jira issue statuses the issue can have to be considered “open”. The value of this parameter is a list of words that represent Jira issue states (case insensitive). Default value of this parameter is [“open”, “reopened”, “in progress” ]. See below for more details.
  • summary this parameter defines the template for the Jira issue summary field. You can use macros in this field.
  • description this parameter defines the template for Jira issue description. You can also use macros here.

8.8.7.1. Deduplication

Some alerts can be “busy”, that is, they can potentially generate many notifications during the time alert condition exists. Even though you can control this by tuning parameter notification_time in the call to nw2functions.alert(), it is still possible to get “bursts” of notifications because of misconfiguration or when new alert with yet unknown operational parameters is added to the system. NetSpyGlass can be configured to not create new Jira issue for each notification sent by the same alert to avoid “spamming”. Parameters deduplicationField and openedStates are used to configure this.

Whenever new alert notification is generated, NetSpyGlass makes a string (a key) that can be used to uniquely identify the alert that sent it. This key is added to the newly created Jira issues in the field with the name provided by the configuration parameter deduplicationField. Later on, when notification from the same alert arrives again, NetSpyGlass uses this key to perform Jira API call to search for existing issue. This search query matches deduplication field value and issue status. Only issues in one of the “open” states will match. If an issue matching this search query exists, NetSpyGlass does not create new issue.

Since NetSpyGlass matches issue status, it becomes part of the Jira issue workflow. If an issue with matching alert deduplication key exists and is “open”, there is no need to create new one since presumably people work on it (or at least should be aware of it). The default list of states is [“open”, “reopened”, “in progress” ] (wors in the list are case-insensitive). Once issue has been closed, we assume the problem is considered to have been resolved. However if the alert triggers again and we get its notification, a new issue should be opened even though the old one with the same alert deduplication key exists.

8.8.7.1.1. Setting Values of Jira Issue Fields

NetSpyglass can set certain fields in the Jira issue it creates if the call to alert() has parameter details. The value of this parameter is Python dictionary where keys are interpreted as Jira field names or ids; corresponding values become values of those fields in the created issue. NetSpyglass downloads Jira schema and validates field names and ids. It also checks if corresponding field has allowedValues in the schema and validates the value provided in the dictionary details.

For example:

alert(
    name='busyCpuAlert',
    input=import_var('cpuUtil'),
    condition=lambda _, value: value > 25,
    description='CPU utilization is over 50% for 20% of time for the last 10 min',
    details={'Priority': 'Critical'},
    tags=['Explicit.alert_group_2'],
    duration=60,
    percent_duration=20,
    notification_time=3600,
    streams=['log', 'jira'],
    fan_out=True,
)

In this call parameter details has value:

{'Priority': 'Critical'}

Created Jira issue will have its priority set to “Critical”. Note that in the default Jira schema field priority has name Priority, id priority and allowed values. You can inspect these using Jira admin interface or with Jira API call:

https://happygears.atlassian.net/rest/api/2/issue/createmeta?projectKeys=:projectKey&issuetypeNames=Task&expand=projects.issuetypes.fields

(replace :projectKey with your Jira project key to try this).

This call returns schema in Json format, it looks something like this:

{
  "expand": "projects",
  "projects": [
    {
      "expand": "issuetypes",
      "self": "https://happygears.atlassian.net/rest/api/2/project/10200",
      "id": "10200",
      "key": "NSGA",
      "name": "NetSpyGlass Alerts Testing",
      "avatarUrls": {},
      "issuetypes": [
        {
          "self": "https://happygears.atlassian.net/rest/api/2/issuetype/3",
          "id": "3",
          "description": "A task that needs to be done.",
          "iconUrl": "https://happygears.atlassian.net/secure/viewavatar?size=xsmall&avatarId=10418&avatarType=issuetype",
          "name": "Task",
          "subtask": false,
          "expand": "fields",
          "fields": {
            "priority": {
              "required": false,
              "schema": {},
              "name": "Priority",
              "hasDefaultValue": true,
              "operations": [
                "set"
              ],
              "allowedValues": [
                {
                  "self": "https://happygears.atlassian.net/rest/api/2/priority/1",
                  "iconUrl": "https://happygears.atlassian.net/images/icons/priorities/blocker.svg",
                  "name": "Blocker",
                  "id": "1"
                },
                {
                  "self": "https://happygears.atlassian.net/rest/api/2/priority/2",
                  "iconUrl": "https://happygears.atlassian.net/images/icons/priorities/critical.svg",
                  "name": "Critical",
                  "id": "2"
                },

Note how field “priorty” has name “Priority” and list allowedValues. Each allowed value has its own name. The value provided in the dictionary passed via details parameter in the call to alert() must match one of the allowed value names.

If the field passed via details dictionary does not have allowed names in the schema, its value is passed to Jira without validation.

Note

NetSpylass caches downloaded Jira schema internally for 1 hour.

8.8.7.2. Example

Here is an example of fully configured Jira notification stream:

alerts {
    streams {
        jira_nsga {
            type = jira

            auth {
                user = netspyglass
                password = jira_user_password
            }

            # web hook url without "/rest/api/2/" part
            #
            webHookUrl = "https://happygears.atlassian.net"

            projectKey = NSGA

            # issue type name. The value of this parameter is case sensitive. Valid values (for the default
            # JIRA schema):  Bug, Epic, Improvement, "New Feature", Story, "Sub-task", "Task", "Technical task".
            # These can be different if your JIRA schema is different.

            issueTypeName = Task

            # custom field name that should be used to deduplicate issues (alert notifications that have
            # the same NetSpyGlass-generated alert key do not create new issues)
            #
            # If this parameter is missing or its value is an empty string, NetSpyGlass is going to
            # open new Jira issue for each alert notification without any attempt at deduplication.
            #
            # The field must already exist in Jira schema.

            deduplicationField = "alert key"

            # NetSypGlass will update existing issue with matching deduplication field only if it is
            # in one of the following states. If the matching issue exists but is not in one of these
            # states, new issue will be opened. The value of this parameter is a list of words - Jira
            # issue status names (case insensitive). If this parameter is missing, NetSpyGlass
            # uses list ["open", "reopened", "in progress" ] as the default. If this parameter has
            # value of an empty list or includes invalid state names, no Jira issue will ever match
            # it and therefore NetSpyGlass will always open new issue for every alert notification.

            openedStates = ["open", "reopened", "in progress" ]

            # Issue summary (you can use template macros here)

            summary = "$alert.name : $alert.deviceName : $alert.componentName $alert.description"

            # Issue description template (you can use template macros here).

            description = """
*$alert.deviceName : $alert.componentName*

Values:
{code}
$alert.values
{code}
~active since: $alert.activeSinceStr~
"""
        }

Here we use Jira custom field with name “alert key” for deduplication.

8.8.7.3. How to add custom Jira field

click the “gear” icon in the upper right, choose JIRA Administration, Issues click “Custom fields” in the panel on the left

../_images/adding_custom_field.png

Click “Add custom field” button

follow the wizard to add Text Field (single line)

../_images/select_field_type.png

give it name and description

../_images/field_name_and_description.png

JIRA will ask you to add this field to some screens. The field is used by NetSpyGlass to deduplicate alerts and should not be modified by the user, but it should be added to the default screen, otherwise it can not be used when issue is created.

Use newly created Jira issue field name “alert key” as the value of the configuration parameter:

deduplicationField = “alert key"

Note

Deduplication field name is provided in the Jira notification stream configuration in the configuration file nw2.conf rather than via dictionary details that is passed as a parameter in the call to alert(). Deduplication field with be set automatically and used to find existing Jira issues before new one is created. You could set its value via dictionary details but then it won’t be used for deduplication.

8.8.8. ServiceNow

NetSpyGlass alerts can open ServiceNow incidents if notification stream with type servicenow is used in the alert definition. ServiceNow stream requires the following configuration parameters:

  • type = servicenow
  • verbose the value is a number (default 0) that sets “verbosity level”. Set to 1 to make NetSpyGlass log all API calls to ServiceNow. Both request and response are logged. Information appears in the UI panel “Logs / System Events”
  • webHookUrl this should be the host part of the API url of your ServiceNow instance. For example, this can be https://dev54202.service-now.com/. The URL _must_ end with “/”.
  • auth.user and auth.password user name and password of the user account in your ServiceNow that NetSpyGlass will use to make API calls.
  • timeout API call timeout as a duration parameter. Since this is a duration, the value needs use appropriate suffix: timeout = 30s
  • fields this is a dictionary of fields and their values that will be sent to the server when NetSpyGlass make API call to create an incident. Values can be either fixed strings or macros referring various fields of the alert object.
  • incidents.table the name of the table that should be used when NetSpyGlass makes API call to create new incident
  • deduplication.table the name of the table that should be used when NetSpyGlass makes API call to find existing incident
  • deduplication.deduplicationField the name of the deduplication field.
  • cmdb.enable the value can be ‘true’ or ‘false’. If set to true, NetSpyGlass tries to find device that triggered the alert in ServiceNow CMDB to set “affected CI” field in the created incident
  • cmdb.table the name of the table that should be used when NetSpyGlass makes API call to find the CI to associate it with the incident

Here is an example:

    servicenowSev1 {

        type = servicenow

        webHookUrl = "https://dev54202.service-now.com/"

        verbose = 0

        auth {
            user = "admin"
            password = "XXXXXXXXXXXX"
        }

        incidents {
            table = "incident"
        }

        # this parameter defines the logic used to decide when to open new incident.
        # If the value is "createNewIfClosed", then NetSpyGlass makes a call to
        # find exising incident using deduplication key field and creates new
        # incident if it does not exist or is in a state "Resolved", "Closed" or
        # "Cancelled". If the value is "alwaysCreateNew", then it does not attempt
        # to find existing incident and always creates a new one

        logic {
            createIncident = createNewIfClosed  # alwaysCreateNew or createNewIfClosed
        }

        # the following parameters defines the custom field used to store unique
        # key that describes the alert and API request NetSpyGlass is going to
        # make to find if an incident already exists using the value of this
        # key. See documentation for more details
        #
        deduplication {
            deduplicationField = "u_nsg_deduplication_key"
            table = "incident"
        }

        # parameters that describe operations with CMDB.
        cmdb {
            table = "cmdb_ci"
            # set to false to disable cmdb call that tries to find affected CI (NET-3368)
            enable = true
        }

        # parameter `timeout` is a duration, use suffix 's' to indicate the value is in seconds
        timeout = 30s

        fields = {
            "sys_class_name": "incident",
            "sys_class_name": "incident",
            "short_description": "$alert.description",
            "description": """
Alert name:              $alert.name
Device:                  $alert.deviceName
Component:               $alert.componentName
Latest measured value:   $alert.values
active since:            $alert.activeSinceStr

$device.description

$device.boxdescr


Tags:

$alert.tagsMap.location
$alert.tagsMap.Vendor
$alert.tagsMap.Model
$alert.tagsMap.SoftwareRev
$alert.tagsMap.Role

"""
            "severity" : 1,
            "category": "network",
            "correlation_id": "$alert.deviceName"
        }
    }

You can find several full examples of the ServiceNow configurations in Examples.

8.8.8.1. Tables

NetSpyGlass works with two or three tables in the ServiceNow API:
  • a table used to create new incidents (defined using configuration parameter incidents.table)
  • a table used to find existing incidents using deduplication key (parameter deduplication.table)
  • a table used to find a CI to associate it with the incident (parameter cmdb.table)

Example shown above demonstrates how these configuration parameters can be defined.

Table name is used as part of the ServiceNow API call. For example, according to the documentation at https://developer.servicenow.com/app.do#!/rest_api_doc?v=london&id=c_TableAPI , the call to create an incident uses URI path /api/now/table/incident. Here, the table name is incident, however, if your instance of ServiceNow uses different name, you can configure NetSpyGlass to use it by setting the value of the parameter incidents.table accordingly. The same goes for the table cmdbCi used to find “affected CI” (i.e. devices). Values shown in the example above are the defaults that match ServiceNow documentation.

8.8.8.2. ServiceNow Incident Fields

NetSpyGlass makes the POST API call against table incident when it creates new incident as described here: https://developer.servicenow.com/app.do#!/rest_api_doc?v=london&id=r_TableAPI-POST

Here is an example of the incident fields dictionary:

        fields = {
            "sys_class_name": "incident",
            "short_description": "$alert.description",
            "description": """
Alert name:              $alert.name
Device:                  $alert.deviceName
Component:               $alert.componentName
Latest measured value:   $alert.values
active since:            $alert.activeSinceStr

$device.description

$device.boxdescr


Tags:

$alert.tagsMap.SoftwareRev
$alert.tagsMap.Role

"""
            "severity" : 1,
            "category": "network",
            "correlation_id": "$alert.deviceName"
        }

You can add any fields recognized by ServiceNow API to this dictionary and use macros recognized by NetSpyGlass.

8.8.8.3. Incident Deduplication

Just like with Jira integration, NetSpyGlass uses special field to avoid creating duplicate incidents if an alert sends repeating notification because of the persisting problem. NetSpyGlass generates a key that is unique for the alert and sends it to ServiceNow when it creates an incident. ServiceNow saves the value of this key in the deduplication field of the incident. Every time NetSpyGlass needs to create new incident, it sends an API call to ServiceNow to check if another incident with the same value of the deduplication key already exists and if it does, whether the incident is open. If it exists and is open, a new one will not be created.

Configuration of the deduplication field consists of two parts: 1) adding the field in ServiceNow and 2) configuration in NetSpyGlass.

To add the field in ServiceNow, open existing incident, click the context menu in the upper left corner and choose “Configure -> Form Layout”:

../_images/form_layout.png

This opens a form where you can add new field:

../_images/adding_new_field.png

Recommended name of the field is “NSG deduplication key” (all examples in this document assume this is the name of the field). The field length can be “Small (40)”.

The second part of the configuration is to add the name of the field to the alert stream configuration in NetSpyGlass. The name of the field in NSG configuration is different, this is what ServiceNow uses in their API. The name is constructed from the human-friendly name visible in the form layout dialog by replacing whitespaces with underscores, converting it to lower case and prepending with prefix u_. In our example, the name becomes “u_nsg_deduplication_key”. Here is how it looks like (only the deduplication field is shown in this example for brevity, see full configuration examples below):

alerts {
    streams {
        servicenowSev1 {
            type = servicenow
            webHookUrl = "https://dev54202.service-now.com/"

            incidents {
                table = "incident"
            }

            # the following parameters defines the custom field used to store unique
            # key that describes the alert and API request NetSpyGlass is going to
            # make to find if an incident already exists using the value of this
            # key. See documentation for more details
            #
            deduplication {
                deduplicationField = "u_nsg_deduplication_key"
                table = "incident"
            }

        }
    }
}

NetSpyGlass makes the following API call to check for existing incident with given deduplication key:

GET https://dev54202.service-now.com/api/now/table/incident?u_nsg_deduplication_key=c42c7764

Here, the table name (incident) was taken from the parameter deduplication.table.

Expected response should have at least the following two fields: number and incident_state. Empty or error response is interpreted as if the incident does not exist so an attempt to create it will be made. Field number is used to determine existing incident number and field incident_state is used to determine its state. Normally, Service Now returns incident state as a number with the following meaning:

  • New: 1
  • In Progress: 2
  • On Hold: 3
  • Resolved: 4
  • Closed: 5
  • Cancelled: 6

NetSpyGlass is not going to create new incident if the one that already exists is in states New, In Progress or On Hold. If the state is different from those three, NetSpyGlass opens a new incident.

Incident deduplication is optional and is controlled by the configuration parameter logic.createIncident that can have values “alwaysCreateNew” or “createNewIfClosed”. If the value is createNewIfClosed, then NetSpyGlass makes a call to find existing incident using deduplication key field and creates new incident only if it does not exist or is in a state “Resolved”, “Closed” or “Cancelled”. If the value is alwaysCreateNew, then it does not attempt find existing incident and always creates new one.

The default setting is createNewIfClosed, that is, to make API call to try to find existing incident before an attempt to create new one.

8.8.8.4. Affected CI

To correlate newly created incident with a device found in ServiceNow CMDB, NetSpyGlass makes additional API call before it creates the incident. This additional API call is used to find the device in CMDB by its IP address. If this was successful, returned value of sys_id is passed together with the rest of the data used to create the incident. This makes the device appear under “Affected CI” in the incident form.

Parameter cmdb.enable can be used to turn this on and off.

8.8.8.5. Examples

The full notification stream configuration for ServiceNow looks like this:

alerts {
    streams {


        servicenowSev1 {
            type = servicenow
            webHookUrl = "https://dev54202.service-now.com/"

            auth {
                user = "admin"
                password = "XXXXXXXX"
            }

            incidents {
                table = "incident"
            }

            # this parameter defines the logic used to decide when to open new incident.
            # If the value is "createNewIfClosed", then NetSpyGlass makes a call to
            # find exising incident using deduplication key field and creates new
            # incident if it does not exist or is in a state "Resolved", "Closed" or
            # "Cancelled". If the value is "alwaysCreateNew", then it does not attempt
            # to find existing incident and always creates a new one

            logic {
                createIncident = createNewIfClosed  # alwaysCreateNew or createNewIfClosed
            }

            # the following parameters defines the custom field used to store unique
            # key that describes the alert and API request NetSpyGlass is going to
            # make to find if an incident already exists using the value of this
            # key. See documentation for more details
            #
            deduplication {
                deduplicationField = "u_nsg_deduplication_key"
                table = "incident"
            }

            # parameters that describe operations with CMDB.
            cmdb {
                table = "cmdb_ci"
            }

            # parameter `timeout` is a duration, use suffix 's' to indicate the value is in seconds
            timeout = 30s

            fields = {
                "sys_class_name": "incident",
                "short_description": "$alert.description",
                "description": """
Alert name:              $alert.name
Device:                  $alert.deviceName
Component:               $alert.componentName
Latest measured value:   $alert.values
active since:            $alert.activeSinceStr

$device.description

$device.boxdescr


Tags:

$alert.tagsMap.location
$alert.tagsMap.Vendor
$alert.tagsMap.Model
$alert.tagsMap.SoftwareRev
$alert.tagsMap.Role

"""
                "severity" : 1,
                "category": "network",
            }
        }
    }
}

In the example shown above, the incident is always created with severity “1” and default priority. If you want to make different alerts create incidents with different severity and priority levels, create several configuration blocks with different names but the same value of the type parameter. This is how this might look like:

alerts {
    streams {

        servicenowSev1 {
            type = servicenow
            webHookUrl = "https://dev54202.service-now.com/"
            auth {
                . . . .
            }
            fields = {

                . . . .

                "severity" : 1,
                "category": "network",
            }
        }

        servicenowSev3 {
            type = servicenow
            webHookUrl = "https://dev54202.service-now.com/"
            auth {
                . . . .
            }
            fields = {

                . . . .

                "severity" : 3,
                "category": "network",
            }
        }

    }
}

8.8.8.6. Query

You can query for ServiceNow incidents and their status using NsgQL ( ref:nsgql). Incident number and state are supported and appear as columns in the table alerts. The column name consists of the stream name, dot, and the column name. Here is an example of a query that reads alert name, device, component and associated ServiceNow incident information, if any. We assume here the stream name is servicenowSev1 as used in the example above:

nsgql "select name, device, component, servicenowSev1.incident, servicenowSev1.state from alerts "

Alerts that do not use ServiceNow notification or haven’t opened any incidents yet return NULL for all ServiceNow columns.

Incident parameters that are supported at this time:

  • incident (incident number)
  • state (incident state)
  • dedupKey (incident deduplication key)

8.9. Notification streams - UI

Notification streams are configured through NSG UI

URL : https://{INSTANCE_NAME}/config/notifications/streams

../_images/configuration-notifications.png

8.9.1. ServiceNow

NetSpyGlass alerts can open ServiceNow incidents if notification stream with type servicenow is used in the alert definition. ServiceNow stream requires the following configuration: Select “ServiceNow” as shown in below screenshot

../_images/add-snow-stream.png

This has following sections

8.9.1.1. Basic Info

  • Enter stream name
  • Enter stream description

8.9.1.2. Notification Content

This section contains 4 main categories:

  • Class Name
  • Summary : This is short_description
  • Description : Incident body
  • Fields

You can add any fields recognized by ServiceNow API and use macros recognized by NetSpyGlass. Here user can add more custom fields like priority, correlation_id, affected_ci, app_or_infra, location etc

8.9.1.3. ServiceNow Integration

  • Webhook URL : This should be the host part of the API url of your ServiceNow instance. For example, this can be https://dev114771.service-now.com/. The URL _must_ end with “/”.
  • Incidents Path : Table used to create new incidents.
  • Incident Field : Name of the field ServiceNow uses for the incident number.
  • Response Fields : Fields NetSpyGlass expects in the response to the query.
  • Max Incidents/hour : Max number of new incidents NetSpyGlass can create per clock hour (the counter resets at the top of each hour).
  • Timeout : How long before connection times out.
  • Debug Log Level : How much detail to log for troubleshooting purposes, from 0 (minimal) to 2 (verbose).

8.9.1.4. De-Duplication

  • Incident Deduplication

NetSpyGlass uses special field to avoid creating duplicate incidents if an alert sends repeating notification because of the persisting problem. NetSpyGlass generates a key that is unique for the alert and sends it to ServiceNow when it creates an incident. ServiceNow saves the value of this key in the deduplication field of the incident. Every time NetSpyGlass needs to create new incident, it sends an API call to ServiceNow to check if another incident with the same value of the deduplication key already exists and if it does, whether the incident is open. If it exists and is open, a new one will not be created.

Configuration of the deduplication field consists of two parts: 1) adding the field in ServiceNow and 2) configuration in NetSpyGlass.

To add the field in ServiceNow, open existing incident, click the context menu in the upper left corner and choose “Configure -> Table”:

../_images/configure_table.png

This opens a form where you can add new field:

../_images/add_new_table_field_.png

Recommended name of the field is “NSG deduplication key” (all examples in this document assume this is the name of the field). The field length can be “String (40)”.

The second part of the configuration is to add the name of the field to the alert stream configuration in NetSpyGlass. The name of the field in NSG configuration is different, this is what ServiceNow uses in their API. The name is constructed from the human-friendly name visible in the form layout dialog by replacing whitespaces with underscores, converting it to lower case and prepending with prefix u_. In our example, the name becomes “u_nsg_deduplication_key”. Here is how it looks like (only the deduplication field is shown in this example for brevity, see full configuration examples below):

../_images/dictionary_entry.png

NetSpyGlass makes API call to find existing incident

NetSpyGlass makes the following API call to check for existing incident with given deduplication key:

GET https://dev54202.service-now.com/api/now/table/incident?u_nsg_deduplication_key=c42c7764

Here, the table name (incident) was taken from the parameter deduplication.table.

Expected response should have at least the following two fields: number and incident_state. Empty or error response is interpreted as if the incident does not exist so an attempt to create it will be made. Field number is used to determine existing incident number and field incident_state is used to determine its state. Normally, Service Now returns incident state as a number with the following meaning:

  • New: 1
  • In Progress: 2
  • On Hold: 3
  • Resolved: 4
  • Closed: 5
  • Cancelled: 6

NetSpyGlass is not going to create new incident if the one that already exists is in states New, In Progress or On Hold. If the state is different from those three, NetSpyGlass opens a new incident.

Incident deduplication is optional and is controlled by the configuration parameter logic.createIncident that can have values “alwaysCreateNew” or “createNewIfClosed”. If the value is createNewIfClosed, then NetSpyGlass makes a call to find existing incident using deduplication key field and creates new incident only if it does not exist or is in a state “Resolved”, “Closed” or “Cancelled”. If the value is alwaysCreateNew, then it does not attempt find existing incident and always creates new one.

The default setting is createNewIfClosed, that is, to make API call to try to find existing incident before an attempt to create new one.

  • Path : Table used to find existing incidents using deduplication key. NOTE: The deduplication key must be configured separately in ServiceNow, and the same key name (we recommend “u_nsg_deduplication_key”) must be used in the Query below.
  • Incident No. Field : Name of field ServiceNow uses for the incident number.
  • Deduplication Field : The name of the deduplication field
  • Query : Query that should match existing incident using deduplication key.
    value = u_nsg_deduplication_key=$dedupKey^state!=6^state!=7^ORDERDESCnumber
  • Response Fields : Fields NetSpyGlass expects in the response to the query.

8.9.1.5. CMDB

  • Affected CI

To correlate newly created incident with a device found in ServiceNow CMDB, NetSpyGlass makes additional API call before it creates the incident. This additional API call is used to find the device in CMDB by its IP address. If this was successful, returned value of sys_id is passed together with the rest of the data used to create the incident. This makes the device appear under “Affected CI” in the incident form.

  • Duplicate Checking : If enabled, NetSpyGlass tries to find device that triggered the alert in ServiceNow CMDB to set “affected CI” field in the created incident
  • Path : Table used to find ServiceNow configuration items (that is, devices) to associate with incident.

8.9.1.6. Authentication

  • Username & Password : user name and password of the user account in your ServiceNow that NetSpyGlass will use to make API calls.

8.9.1.7. History/Changelog

  • This shows detailed information about the stream creation details and changes done subsequently with timestamp, user and description

8.9.1.8. Sample Screenshots

../_images/sample_snow_1.png ../_images/sample_snow_2.png ../_images/sample_snow_3.png