.. _performance_tuning:

Performance tuning
******************


Java Command line
=================

Large installations (over 1 thousand of devices, over 500,000 variables) require
Java command line parameters different from the default to accommodate lots of
large objects the server needs to create in memory. These parameters include
optiosn to set maximum memory heap size and tuning of the garbage collector.
These parameters are set in the file ``/etc/default/netspyglass``. Here are
recommended settings tested on the server running with 2000 devices and
500,000 variables::


    # /etc/default/netspyglass
    # startup configuration variables for NetSpyGlass server and monitor
    #
    # start both server and monitor on this machine
    COMPONENTS="server monitor"

    # directory where the package is installed
    INSTALL_DIR="/opt/netspyglass/current"

    # User to run the server as
    USER=nw2

    # NetSpyGlass server home directory
    HOME="/opt/netspyglass/home"

    #----------------------------------------------------------
    #   Server command line parameters
    #
    # JVM command line parameters. Garbage collector parameters go here, among other things
    JVM_CLI="-Xmx32g -XX:+UseG1GC -XX:MaxPermSize=256m -XX:G1HeapRegionSize=32M -XX:+ParallelRefProcEnabled"
    SERVER_CLI="${JVM_CLI} -DZK=embedded -DNAME=PrimaryServer -DROLE=primary"
    SERVER_CLI="$SERVER_CLI -DCONFIG=${HOME}/nw2.conf -DLOG_DIR=${HOME}/logs"

This configuration sets maximum heap size at 32G, which may be excessive if you have fewer
devices and variables. Watch variables `jvmMemTotal` and `jvmMemFree` in Graphing Workbench
(category `Monitor`) to get the idea of how much memory your server really uses. Usually,
right after restart, the server does not utilize maximum allowed amount of memory and
`jvmMemTotal` is less than what is specified with `-Xmx` parameter. Memory will grow
during network discovery run and after each reconfiguration, when devices or python hook
scripts change. Once the value of `jvmMemTotal` reaches maximum, it should stay there. At
this point watch `jvmMemFree`. If you see that the server still runs with lots of free memory
after many discovery and reconfiguration cycles, you can reduce maximum allowed amount in
the `-Xmx` parameter if necessary.

.. note::

    If you experiment with different Java garbage collector configuration parameters,
    make sure to keep parameters `-XX:MaxPermSize=256m`, `-XX:+ParallelRefProcEnabled`
    and `-XX:G1HeapRegionSize=32M`, however you can drop parameter `-XX:MaxPermSize=256m`
    if you run NetSpyGlass with Java 8


Data Push Tuning
================

Monitoring data collected by monitors or secondary servers is transmitted to other
servers via data push. What server the data is sent to is dictated by the parameter `push`
in the configuration file `cluster.conf`. You can find more information about data push
operation in :ref:`data_flow`. Data is transmitted in blocks, several
monitoring variables at a time. Since all NetSpyGlass cluster members operate on
strict schedule, it is important to make sure all data can be transmitted on time.
Each cluster member must complete data push in time that is shorter than monitoring cycle.
Even though the push usually starts after some delay after the beginning of the cycle,
this is true for every subsequent push, too, so all cluster members transmit data to each
other in a orchestrated synchronised manner. If a cluster member takes a long time
to transmit all data it has accumulated, it is going to fall behind and its subsequent
pushes will happen with progressively greater delay after the beginning of
corresponding monitoring cycles. Servers wait for some time for their downstreams to
complete the push (usually a couple of cycles) but then they time out and won't process
the data even if downstreams actually complete their push late.

Sometimes you may need to tune a couple of parameters to make sure data push can
keep up. These parameters are located in the dictionary `push` at the top level of
the configuration file `nw2.conf`::

    # these parameters are used to fine tune timeouts in the data push accumulator.
    # Most likely you do not need to change these.
    push {
        monitorPushEndWaitTimeoutPollingCycles = 1.5
        serverPushEndWaitTimeoutPollingCycles = 4

        # number of threads to use to make data push calls in parallel. Changes to
        # this parameter require server restart
        threads = 12
    }

parameters `monitorPushEndWaitTimeoutPollingCycles` and `serverPushEndWaitTimeoutPollingCycles`
tell the server how long it should wait for all downstreams it expects to receive data push
from to complete it. Both parameters define the time in units of polling cycle interval.
Most likely you don't need to change the defaults.

Parameter `threads` tells the sender how many parallel threads to use to transmit
the data. The set of variables that need to be pushed is divided between these
threads so that they can be transmitted in parallel. This helps speed up data push
over links with high latency.

To verify that data push is able to keep up, inspect log file `/opt/netspyglass/home/logs/info.log`
on sender's side. Look for lines that include words "PUSH DONE" (here the long log
record line has been folded for readability)::

  2016-05-18 23:49:42,430 INFO  pool-17-thread-1 [rocessor.ServerDataPusher]: PUSH DONE   (23:49:00); cycle 24;
       to PrimaryServer; variables: 253857; calls: 635; took 4303 ms in 12 threads

It reports how many variables it pushed (253857), how many API calls it made (635) and how
much time it took (4303ms). Since it was able to complete push in just over 4 sec, this
server is doing ok.