.. _alert_notifications: Notifications ============= Notifications are sent to notification streams according to the following rules: #. when alert's state changes from *cleared* to *active*, or this is the first time the alert was created and it is in the state *active* right away, it needs to send notification to each stream listed in the parameter `streams`. Before the notification is sent to the streams, NetSpyGlass checks existing silence objects to see if any one of them matches the alert. If the match has been found, notification is suppressed and only log record of the active but silenced alert is made. If matching silence could not be found, notification is delivered to all streams. #. if the input variable satisfies alert condition on subsequent calls to `alert()`, the alert remains in the state *active*. However, it does not send notification on every monitor cycle. As long as the condition persits, new notifications are sent on the interval defined by the parameter `notification_time` (in seconds). In the example above this parameter has value 300 sec, this means the alert sends notification every 5 min as long as it remains in the state *active*. #. if on the next call to `alert()` it is determined that the input variable does not satisfy the condition anymore, alert goes into state *cleared*. Log record is made at this time but no special notification is sent to notification streams. When NetSpyGlass creates an alert, it generates unique key and stores it with the alert. The key is unique to the alert object. If the alert is "fan-out", that is, separate alert object has been created for each device and component, then the key is unique to the combination of alert name, device and component. If the alert is not "fan-out", that is only one alert object has been created for the input monitoring variable and information about matching devices and components is passed via alert field `details`, then the key is based only on the alert name and does not include device and component. In any case, the key can be passed to the notification service for deduplication. Of the services we currently support, only PagerDuty uses this. You can always use macro `$alert.key` to expose the key in log records, email or Slack messages. Notification streams ==================== Notification streams are configured in the top level parameter `alerts` in the main configuration file `nw2.conf`. The default config includes working logging stream configuration and templates for other streams and looks like this: .. code-block:: none graph24hr = ${ui.url}"emb_graph.html?update=true&networkId=1&intervalHr=24&width=500&height=300&vars=$alert.inputVariable" graph12hr = ${ui.url}"emb_graph.html?update=true&networkId=1&intervalHr=12&width=500&height=300&vars=$alert.inputVariable" graph6hr = ${ui.url}"emb_graph.html?update=true&networkId=1&intervalHr=6&width=500&height=300&vars=$alert.inputVariable" alerts { # the size of the internal queue used to accumulate alerts when alerting # service is not available. changes to this parameter require server restart queueSize = 1000 # list of "services" we send alert notifications to. Multiple services can # be enabled at the same time. You need to restart the server when you add # or remove a service, however restart is not required # for changes to service configuration (even if you change web hook url) streams { #-------------------------------------------------------------------- # PagerDuty # Change your service id # You can use macros in `description` and `details` pagerduty { type = pagerduty # See PagerDuty developers documentation under Integration / Events triggerUrl = "https://events.pagerduty.com/generic/2010-04-15/create_event.json" # API service key. Find it in PagerDuty under Configuration / Services service = "xxxxxxxxxxxxxxxxxxxxxxxxxxx" # Name of the API client, this appears in PagerDuty dashboards and incidents lists client = "NetSpyGlass" # PagerDuty constructs "View in NetSpyGlass" link using parameters `client` and `clientUrl` clientUrl = ${ui.url} # this appears in the incident view under "Details" details { "device": "$alert.deviceName" "component": "$alert.componentName" "value": "$alert.value" } # Contexts to be included with the incident trigger such as links to graphs or images. See # PagerDuty developers docs. contexts = [ { type: link href: ${ui.url} }, { type: image src: ${graph24hr} } ] }, #-------------------------------------------------------------------- # Slack: # you can use macros in fields `channel` and `template`. This makes it # possible to pass Slack channel name from the alert definition via # `$alert.details.slack_channel` slack : { type = slack channel = "#netspyglass" username = netspyglass # replace this with your webhook url webHookUrl = "https://hooks.slack.com/services/TTTTTTTTT/BBBBBBBBB/01234567890" template = """ *$alert.name* : $alert.deviceName : $alert.componentName $alert.description | latest value: $alert.value active since: $alert.activeSinceStr | """ "<"${graph24hr}"|graph(24hr)> | <"${graph12hr}"|graph(12hr)> | <"${graph6hr}"|graph(6hr)>" }, #-------------------------------------------------------------------- # Hipchat: you can use macros in the `template`. Do not forget to # change room name/id and authentication token in `webHookUrl`. # To create token, log in to Hipchat, navigate to "Rooms", select # your room, then navigate to "Tokens" and create new token there. # Note that the name of the token appears in hipchat room in place # of the user name, so give it some descriptive name. # Use macros in `template`. It is possible to set message beckground # color in the alert definition using dictionary `details` (key "color"). # To use this, set parameter `color` below to `$alert.details.color`. # This can be used as a visual indicator of the alert severity. # hipchat { type = hipchat # Change room name (or use its id) and authentication token webHookUrl = "https://api.hipchat.com/v2/room/{room_name}/notification?auth_token={authentication_token}" # background color for the message. Valid values: yellow, green, # red, purple, gray, random. Default: gray color = "$alert.details.color" # should this message trigger user notification? notify = false # message body template. Message is sent to hipchat with message_format=text template = """ *$alert.name* : $alert.deviceName : $alert.componentName $alert.description | latest value: $alert.value active since: $alert.activeSinceStr """ } #-------------------------------------------------------------------- # email notifications # change `hostName`, `smtpAuth` (if applicable) and other parameters. # You can use macros in `subject` and `message` email { type = email, hostName = "your_smtp_gateway", port = 587, smtpAuth { user = "mailsender", password = "enter_password_here", }, startTLSEnabled = true, startTLSRequired = false, from = "netspyglass@company.com", to = "alerts@company.com", subject = "$alert.variable | $alert.deviceName | $alert.componentName | active since: $alert.activeSinceStr", message = """ $alert.name : $alert.deviceName : $alert.componentName $alert.description latest value: $alert.value active since: $alert.activeSinceStr """${graph24hr}""" """${graph12hr}""" """${graph6hr} }, #-------------------------------------------------------------------- # log (File logs/alerts.log) # use macros in `template` log : { type = logger path = ${home}"/logs/alerts.log" template = "$alert.variable | $alert.deviceName | $alert.componentName | active since: $alert.activeSinceStr" } } } You can find this in the copy of the default configuration provided as part of the NetSpyGlass distribution package, see the file `/opt/netspyglass/current/doc/default_config/netspyglass.conf` Parameter `alerts.queueSize` defines the capacity of the internal message queue maintained by each notification stream (except `log`). Notifications are added to the queue when alerts trigger and send them. Each stream takes messages from this queue and delivers them to the respective service asynchronously and a background thread, with timeout and exponential back-off if it detects problems with the service. Actual notification streams are configured inside of the dictionary `alerts.streams`. Each stream is described by an item in this dictionary, where the key is stream name and the contents is its configuration. The word used as a key in dictionary `alerts.streams` is stream name; it is this word that should be placed in the argument `streams` in the call to function `alerts()` that creates the alert. Stream type (e.g. "log", "email" etc) is defined by the attribute with key `type` in the corresponding stream dictionary. You can have multiple items inside of `alerts.streams` with the same type but different names. .. note:: This structure of the configuration parameters allows you to have several streams of the same type but with different names and different configuration. For example, you could have different streams of the type "email" that send email to different destination addresses, or separate streams of type "slack" that post to different Slack channels. Each of these streams would have distinct name, which allows you to route notifications created by different alerts to different email recipients or different Slack channels. Macros ------ Some parameters of notification streams can be constructed using the same macros used to construct alert attributes (see also :func:`nw2functions.alert()`). These macros are: - :obj:`$alert.deviceId` device Id for the device that triggered alert. This is valid only for fan-out alerts - :obj:`$alert.deviceName` device name, also valid only for the fan-out alerts - :obj:`$alert.componentIndex` component or interface index, valid only for fan-out alerts - :obj:`$alert.componentName` component or interface name, valid only for fan-out alerts - :obj:`$alert.variable` identifier of the corresponding alerting variable, this can be used to construct urls for graphs - :obj:`$alert.inputVariable` identifier of the corresponding input variable, this can be used to construct urls for graphs. This may not be available, for example when alerting rule used temporary variable. - :obj:`$alert.value` the value of the input variable that triggered alert - :obj:`$alert.key` unique deduplication key for the alert - :obj:`$alert.fanout` true or false, indicates whether this is a fan-out alert - :obj:`$alert.tags` a set of strings. For fan-out alerts this is a copy of tags from the input variable that triggered alert. For non fan-out alert this is a common set of tags from all input variable instances that contributed to the alert. - :obj:`$alert.activeSince` a time stamp (time in milliseconds) when alert entered state "active" - :obj:`$alert.activeSinceStr` time when alert entered state "active" as a string in time zone specified by the main configuration file parameter `network.display.tz` - :obj:`$alert.getTags("facet")` returns tags in given facet (both facet and tag word, e.g. "BGP4Peer.AS1000") - :obj:`$alert.getTagWords("facet")` returns tags in given facet, but unlike :obj:`$alert.getTags("facet")`, returns only tag words. - :obj:`$alert.details` returns a dictionary passed as argument `details` to function :func:`nw2functions.alert()` Mixing Macros and configuration parameter expansion --------------------------------------------------- NetSpyGlass configuration file supports parameter expansion that uses syntax `${path.to.variable}`. This looks similar to the macros you can use in the alert and notification fields, but is not the same. Configuration file parameter expansion refers to other parameters in the confgiuration file and only works for unquoted parameters. Configuration parametrs are *not* expanded inside of the quoted strings. Alert and notification macros, on the other hand, are only expanded if they are part of particular configuration parameters and are inside of quoted strings. Consider the template used to build email notification in the default configuration. It looks like this: .. code-block:: none message = """ $alert.name : $alert.deviceName : $alert.componentName $alert.description latest value: $alert.value active since: $alert.activeSinceStr """${graph24hr}""" """${graph12hr}""" """${graph6hr} This configuration is an example of a mix of both configuration parameter expansion (`${graph24hr}`, `${graph12hr}` and `${graph6hr}`) and macros (`$alert.name`, `$alert.description` etc). Configuration parameters `graph24hr`, `graph12hr` and `graph6hr` are defined in the same configuration file and are used to construct URLs that open graph for the variable that triggered the alert. You can find definition of these parameters in the same confgiuration example above. Note how this template uses multi-line strings with triple quotes '"""' and how alert field macros are located inside of the triple-quoted text, but config file parameter expansions such as `${graph24hr}` are outside. Also, we had to use triple-quoted text again to put links to 24h, 12h and 6h graphs on separate lines. Configuration parameters `${graph24hr}`, `${graph12hr}` and `${graph6hr}` are expanded early, when configuration file is loaded. At this time configuration parameter `message` is constructed by concatenating all strings it is composed of, including the result of config. parameter expansion for `${graph24hr}` and others. As the reuslt, parameter `message` is a multi-line text with some alert and notification macros inside (these macros got there both because they are directly entered as part of `message` value and because they are part of the `graph24hr` value). The value of the parameter `message` is used as a template to build email message body when notification is about to be sent, at which time alert and notification macros are expanded. Log --- This is the simplest notification stream. By default it writes log to the file `/opt/netspyglass/home/logs/alerts.log`. The log record is constructed by expanding macros in the string defined by the parameter `template` in the configuration of this stream. Email ----- Email stream requires the following configuration parameters: - `type` = email - `hostName` : this is an address or host name of the mail gateway to be used. This parameter is mandatory. - `port` : port number to use. This parameter is mandatory - `smtpAuth.user` : if mail gateway requires authentication, this should the user name to use. This parameter is optional - `smtpAuth.password` : if authentication is required, put the plain text password here. This parameter is optional - `startTLSEnabled` : enable TLS. This parameter is optional - `startTLSRequired` : require TLS in communication with the gateway. This parameter is optional - `from` : the address to use in the "From" - `to` : destination address to send email to. This parameter is mandatory. - `subject` : mail subject. You can use macros in this field. - `message` : mail body. You can also use macros in this field. Note that confgiuration file used by NetSpyGlass supports multi-line text if it is included in triple quotes (see the default configuration above for an example). Slack ----- Slack stream requires the following configuration parameters: - `type` = slack - `channel` : channel name to post to. This overrides channel configured in the web hook in Slack. Note that you can override this again by adding channel name to the parameter `details` in the call to function `alert()` like this:: alert( name='bigChangeInVariables', input=import_var('numVars'), condition=lambda mvar, x: compare_to_mean(mvar, x), description='Big change in the number of monitoring variables', details={'slack_channel': '#netspyglass'}, notification_time=300, streams=['slack', 'log'], fan_out=True ) Passing dictionary item with key 'slack_channel' as part of the parameter `details` makes Slack connector use this channel instead of the one configured in the notification stream. Slack channel should exist before the first message can be posted to it. - `username` : this is the name of the user that will appear to be the author of messages posted to Slack channel. - `webHookUrl` : web hook url. It looks similar to this: `https://hooks.slack.com/services/TTTTTTTTT/BBBBBBBBB/01234567890` - `template` : this is the template for the message to be posted to Slack. This can be a multi-line text with macros. HipChat ------- HipChat notification stream requires the following parameters: - `type` = hipchat - `webHookUrl` the web hook url for HipChat. The url includes room name (or its id) and authentication token. - `color` message background color. Allowed values are: yellow, green, red, purple, gray, random. You can set the color statically, by setting the value to one of the valid colors, or pass it via alert definition. See below. - `notify` the value can be `true` or `false`. This is used to trigger user notification in HipChat. - `template` HipChat message body template. You can use macros in this template to include alert fields. To find url for the HipChat room, click "Configure Integrations" in the HipChat UI: .. image:: ../images/hipchat/hipchat_integration.png this will take you to HipChat web site where it should open Integrations dialog for this room: .. image:: ../images/hipchat/hipchat_room_integration_step_1.png Click "Build your own integration": .. image:: ../images/hipchat/hipchat_room_integration_step_2.png Enter integration name "NetSpyGlass" and click "Create": .. image:: ../images/hipchat/hipchat_room_integration_step_3.png here you get the url for the room, copy it into the value of parameter `webHookUrl` in your HipChat notification stream configuration in NetSpyGlass. .. note:: Room name can contain spaces or any special characters that HipChat allows. This does not matter because `webHookUrl` refers to the room by its id. It is possible to set HipChat message color in the alert definition as a way to express its "severity" and visualize it in HipChat using color. Alert definition might look like this:: alert( name='busyCpuAlert', input=import_var('cpuUtil'), condition=lambda _, value: value > 75, description='CPU utilization is over 75% for 20% of time for the last 10 min', details={'color': 'red'}, duration=600, percent_duration=20, notification_time=60, streams=['hipchat', 'log'], fan_out=True, ) Note parameter `details`, which is a dictionary with one key `color`. Notification stream `hipchat` then sets color using macro: .. code-block:: none hipchat { type = hipchat webHookUrl = "https://api.hipchat.com/v2/room/{room_name}/notification?auth_token={authentication_token}" # background color for the message. Valid values: yellow, green, # red, purple, gray, random. Default: gray color = "$alert.details.color" notify = false template = """ *$alert.name* : $alert.deviceName : $alert.componentName $alert.description | latest value: $alert.value active since: $alert.activeSinceStr """ } PagerDuty --------- PagerDuty stream requires the following configuration parameters: - `type` = pagerduty - `triggerUrl` : this is PagerDuty web API URL. See PagerDuty developers documentation under Integration / Events. At this time it should be `https://events.pagerduty.com/generic/2010-04-15/create_event.json` - `service` : API service key. Find it in PagerDuty under Configuration / Services - `client` : Name of the API client, this appears in PagerDuty dashboards and incidents lists. Default value is `NetSpyGlass`. - `clientUrl` : the user used by PagerDuty to construct links for "View in NetSpyGlass". Default is `${ui.url}` - `details` : data provided in this dictionary appears in the incident view under "Details". Default is: .. code-block:: none { "device": "$alert.deviceName" "component": "$alert.componentName" "value": "$alert.value" } - `contexts` : Contexts to be included with the incident trigger such as links to graphs or images. See PagerDuty developers docs. Default is: .. code-block:: none { type: link href: ${ui.url} }, { type: image src: ${graph24hr} } Jira ---- NetSpyGlass alert notifications can open Jira issues if notification stream with type `jira` is used in the alert definition. Jira stream requires the following configuration parameters: - `type` = jira - `webHookUrl` this should be the host part of the url of your Jira without any path part. For example, this can be `https://happygears.atlassian.net` - `auth.user` and `auth.password` user name and password of the user account in your Jira that NetSpyGlass will use to make Jira API calls. The user should have permissions to create issues in the project. We recommend using descriptive user name (e.g. "netspyglass") because it is visible in the list of Jira issues. - `projectKey` Jira projrect key where issues will be created. Note that Jira has two ways to refer to projects, issue types and other items: there is "key" and there is internal "id". The key is what you see in the UI. Project key appears in the UI under the list of projects and is used as a prefix in issue keys. For example, if issue key is `NSGA-1234`, then "NSGA" is the project key. The value of the parameter `projectKey` should be the project key string. - `issueTypeName` the name of the issue type NetSpyGlass should assign to issues it creates. This must be one of the valid types in the Jira schema you are using. The value of this parameter is case-sensitive: use "Task" rather than "task". - `deduplicationField` a name of the custom field in Jira schema that NetSpyGlass uses to deduplicate issues. See below for more details. - `openedStates` The list of Jira issue statuses the issue can have to be considered "open". The value of this parameter is a list of words that represent Jira issue states (case insensitive). Default value of this parameter is `["open", "reopened", "in progress" ]`. See below for more details. - `summary` this parameter defines the template for the Jira issue summary field. You can use macros in this field. - `description` this parameter defines the template for Jira issue description. You can also use macros here. Deduplication ^^^^^^^^^^^^^ Some alerts can be "busy", that is, they can potentially generate many notifications during the time alert condition exists. Even though you can control this by tuning parameter `notification_time` in the call to :func:`nw2functions.alert()`, it is still possible to get "bursts" of notifications because of misconfiguration or when new alert with yet unknown operational parameters is added to the system. NetSpyGlass can be configured to not create new Jira issue for each notification sent by the same alert to avoid "spamming". Parameters `deduplicationField` and `openedStates` are used to configure this. Whenever new alert notification is generated, NetSpyGlass makes a string (a key) that can be used to uniquely identify the alert that sent it. This key is added to the newly created Jira issues in the field with the name provided by the configuration parameter `deduplicationField`. Later on, when notification from the same alert arrives again, NetSpyGlass uses this key to perform Jira API call to search for existing issue. This search query matches deduplication field value and issue status. Only issues in one of the "open" states will match. If an issue matching this search query exists, NetSpyGlass does not create new issue. Since NetSpyGlass matches issue status, it becomes part of the Jira issue workflow. If an issue with matching alert deduplication key exists and is "open", there is no need to create new one since presumably people work on it (or at least should be aware of it). The default list of states is `["open", "reopened", "in progress" ]` (wors in the list are case-insensitive). Once issue has been closed, we assume the problem is considered to have been resolved. However if the alert triggers again and we get its notification, a new issue should be opened even though the old one with the same alert deduplication key exists. Setting Values of Jira Issue Fields ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NetSpyglass can set certain fields in the Jira issue it creates if the call to :func:`alert()` has parameter `details`. The value of this parameter is Python dictionary where keys are interpreted as Jira field names or ids; corresponding values become values of those fields in the created issue. NetSpyglass downloads Jira schema and validates field names and ids. It also checks if corresponding field has `allowedValues` in the schema and validates the value provided in the dictionary `details`. For example:: alert( name='busyCpuAlert', input=import_var('cpuUtil'), condition=lambda _, value: value > 25, description='CPU utilization is over 50% for 20% of time for the last 10 min', details={'Priority': 'Critical'}, tags=['Explicit.alert_group_2'], duration=60, percent_duration=20, notification_time=3600, streams=['log', 'jira'], fan_out=True, ) In this call parameter `details` has value: .. code-block:: json {'Priority': 'Critical'} Created Jira issue will have its priority set to "Critical". Note that in the default Jira schema field `priority` has name `Priority`, id `priority` and allowed values. You can inspect these using Jira admin interface or with Jira API call: .. code-block:: none https://happygears.atlassian.net/rest/api/2/issue/createmeta?projectKeys=:projectKey&issuetypeNames=Task&expand=projects.issuetypes.fields (replace `:projectKey` with your Jira project key to try this). This call returns schema in Json format, it looks something like this: .. code-block:: none { "expand": "projects", "projects": [ { "expand": "issuetypes", "self": "https://happygears.atlassian.net/rest/api/2/project/10200", "id": "10200", "key": "NSGA", "name": "NetSpyGlass Alerts Testing", "avatarUrls": {}, "issuetypes": [ { "self": "https://happygears.atlassian.net/rest/api/2/issuetype/3", "id": "3", "description": "A task that needs to be done.", "iconUrl": "https://happygears.atlassian.net/secure/viewavatar?size=xsmall&avatarId=10418&avatarType=issuetype", "name": "Task", "subtask": false, "expand": "fields", "fields": { "priority": { "required": false, "schema": {}, "name": "Priority", "hasDefaultValue": true, "operations": [ "set" ], "allowedValues": [ { "self": "https://happygears.atlassian.net/rest/api/2/priority/1", "iconUrl": "https://happygears.atlassian.net/images/icons/priorities/blocker.svg", "name": "Blocker", "id": "1" }, { "self": "https://happygears.atlassian.net/rest/api/2/priority/2", "iconUrl": "https://happygears.atlassian.net/images/icons/priorities/critical.svg", "name": "Critical", "id": "2" }, Note how field "priorty" has name "Priority" and list `allowedValues`. Each allowed value has its own name. The value provided in the dictionary passed via `details` parameter in the call to :func:`alert()` must match one of the allowed value names. If the field passed via `details` dictionary does not have allowed names in the schema, its value is passed to Jira without validation. .. note:: NetSpylass caches downloaded Jira schema internally for 1 hour. Example ^^^^^^^ Here is an example of fully configured Jira notification stream: .. code-block:: none alerts { streams { jira_nsga { type = jira auth { user = netspyglass password = jira_user_password } # web hook url without "/rest/api/2/" part # webHookUrl = "https://happygears.atlassian.net" projectKey = NSGA # issue type name. The value of this parameter is case sensitive. Valid values (for the default # JIRA schema): Bug, Epic, Improvement, "New Feature", Story, "Sub-task", "Task", "Technical task". # These can be different if your JIRA schema is different. issueTypeName = Task # custom field name that should be used to deduplicate issues (alert notifications that have # the same NetSpyGlass-generated alert key do not create new issues) # # If this parameter is missing or its value is an empty string, NetSpyGlass is going to # open new Jira issue for each alert notification without any attempt at deduplication. # # The field must already exist in Jira schema. deduplicationField = "alert key" # NetSypGlass will update existing issue with matching deduplication field only if it is # in one of the following states. If the matching issue exists but is not in one of these # states, new issue will be opened. The value of this parameter is a list of words - Jira # issue status names (case insensitive). If this parameter is missing, NetSpyGlass # uses list ["open", "reopened", "in progress" ] as the default. If this parameter has # value of an empty list or includes invalid state names, no Jira issue will ever match # it and therefore NetSpyGlass will always open new issue for every alert notification. openedStates = ["open", "reopened", "in progress" ] # Issue summary (you can use template macros here) summary = "$alert.name : $alert.deviceName : $alert.componentName $alert.description" # Issue description template (you can use template macros here). description = """ *$alert.deviceName : $alert.componentName* Values: {code} $alert.values {code} ~active since: $alert.activeSinceStr~ """ } Here we use Jira custom field with name "alert key" for deduplication. How to add custom Jira field ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ click the “gear” icon in the upper right, choose JIRA Administration, Issues click “Custom fields” in the panel on the left .. image:: ../images/jira/adding_custom_field.png Click “Add custom field” button follow the wizard to add Text Field (single line) .. image:: ../images/jira/select_field_type.png give it name and description .. image:: ../images/jira/field_name_and_description.png JIRA will ask you to add this field to some screens. The field is used by NetSpyGlass to deduplicate alerts and should not be modified by the user, but it should be added to the default screen, otherwise it can not be used when issue is created. Use newly created Jira issue field name “alert key” as the value of the configuration parameter: .. code-block:: none deduplicationField = “alert key" .. note:: Deduplication field name is provided in the Jira notification stream configuration in the configuration file `nw2.conf` rather than via dictionary `details` that is passed as a parameter in the call to :func:`alert()`. Deduplication field with be set automatically and used to find existing Jira issues before new one is created. You could set its value via dictionary `details` but then it won't be used for deduplication.