2.12. Performance tuning

2.12.1. Java Command line

Large installations (over 1 thousand of devices, over 500,000 variables) require Java command line parameters different from the default to accommodate lots of large objects the server needs to create in memory. These parameters include optiosn to set maximum memory heap size and tuning of the garbage collector. These parameters are set in the file /etc/default/netspyglass. Here are recommended settings tested on the server running with 2000 devices and 500,000 variables:

# /etc/default/netspyglass
# startup configuration variables for NetSpyGlass server and monitor
#
# start both server and monitor on this machine
COMPONENTS="server monitor"

# directory where the package is installed
INSTALL_DIR="/opt/netspyglass/current"

# User to run the server as
USER=nw2

# NetSpyGlass server home directory
HOME="/opt/netspyglass/home"

#----------------------------------------------------------
#   Server command line parameters
#
# JVM command line parameters. Garbage collector parameters go here, among other things
JVM_CLI="-Xmx32g -XX:+UseG1GC -XX:MaxPermSize=256m -XX:G1HeapRegionSize=32M -XX:+ParallelRefProcEnabled"
SERVER_CLI="${JVM_CLI} -DZK=embedded -DNAME=PrimaryServer -DROLE=primary"
SERVER_CLI="$SERVER_CLI -DCONFIG=${HOME}/nw2.conf -DLOG_DIR=${HOME}/logs"

This configuration sets maximum heap size at 32G, which may be excessive if you have fewer devices and variables. Watch variables jvmMemTotal and jvmMemFree in Graphing Workbench (category Monitor) to get the idea of how much memory your server really uses. Usually, right after restart, the server does not utilize maximum allowed amount of memory and jvmMemTotal is less than what is specified with -Xmx parameter. Memory will grow during network discovery run and after each reconfiguration, when devices or python hook scripts change. Once the value of jvmMemTotal reaches maximum, it should stay there. At this point watch jvmMemFree. If you see that the server still runs with lots of free memory after many discovery and reconfiguration cycles, you can reduce maximum allowed amount in the -Xmx parameter if necessary.

Note

If you experiment with different Java garbage collector configuration parameters, make sure to keep parameters -XX:MaxPermSize=256m, -XX:+ParallelRefProcEnabled and -XX:G1HeapRegionSize=32M, however you can drop parameter -XX:MaxPermSize=256m if you run NetSpyGlass with Java 8

2.12.2. Data Push Tuning

Monitoring data collected by monitors or secondary servers is transmitted to other servers via data push. What server the data is sent to is dictated by the parameter push in the configuration file cluster.conf. You can find more information about data push operation in Data Flow. Data is transmitted in blocks, several monitoring variables at a time. Since all NetSpyGlass cluster members operate on strict schedule, it is important to make sure all data can be transmitted on time. Each cluster member must complete data push in time that is shorter than monitoring cycle. Even though the push usually starts after some delay after the beginning of the cycle, this is true for every subsequent push, too, so all cluster members transmit data to each other in a orchestrated synchronised manner. If a cluster member takes a long time to transmit all data it has accumulated, it is going to fall behind and its subsequent pushes will happen with progressively greater delay after the beginning of corresponding monitoring cycles. Servers wait for some time for their downstreams to complete the push (usually a couple of cycles) but then they time out and won’t process the data even if downstreams actually complete their push late.

Sometimes you may need to tune a couple of parameters to make sure data push can keep up. These parameters are located in the dictionary push at the top level of the configuration file nw2.conf:

# these parameters are used to fine tune timeouts in the data push accumulator.
# Most likely you do not need to change these.
push {
    monitorPushEndWaitTimeoutPollingCycles = 1.5
    serverPushEndWaitTimeoutPollingCycles = 4

    # number of threads to use to make data push calls in parallel. Changes to
    # this parameter require server restart
    threads = 12
}

parameters monitorPushEndWaitTimeoutPollingCycles and serverPushEndWaitTimeoutPollingCycles tell the server how long it should wait for all downstreams it expects to receive data push from to complete it. Both parameters define the time in units of polling cycle interval. Most likely you don’t need to change the defaults.

Parameter threads tells the sender how many parallel threads to use to transmit the data. The set of variables that need to be pushed is divided between these threads so that they can be transmitted in parallel. This helps speed up data push over links with high latency.

To verify that data push is able to keep up, inspect log file /opt/netspyglass/home/logs/info.log on sender’s side. Look for lines that include words “PUSH DONE” (here the long log record line has been folded for readability):

2016-05-18 23:49:42,430 INFO  pool-17-thread-1 [rocessor.ServerDataPusher]: PUSH DONE   (23:49:00); cycle 24;
     to PrimaryServer; variables: 253857; calls: 635; took 4303 ms in 12 threads

It reports how many variables it pushed (253857), how many API calls it made (635) and how much time it took (4303ms). Since it was able to complete push in just over 4 sec, this server is doing ok.