.. _performance_tuning: Performance tuning ****************** Java Command line ================= Large installations (over 1 thousand of devices, over 500,000 variables) require Java command line parameters different from the default to accommodate lots of large objects the server needs to create in memory. These parameters include optiosn to set maximum memory heap size and tuning of the garbage collector. These parameters are set in the file ``/etc/default/netspyglass``. Here are recommended settings tested on the server running with 2000 devices and 500,000 variables:: # /etc/default/netspyglass # startup configuration variables for NetSpyGlass server and monitor # # start both server and monitor on this machine COMPONENTS="server monitor" # directory where the package is installed INSTALL_DIR="/opt/netspyglass/current" # User to run the server as USER=nw2 # NetSpyGlass server home directory HOME="/opt/netspyglass/home" #---------------------------------------------------------- # Server command line parameters # # JVM command line parameters. Garbage collector parameters go here, among other things JVM_CLI="-Xmx32g -XX:+UseG1GC -XX:MaxPermSize=256m -XX:G1HeapRegionSize=32M -XX:+ParallelRefProcEnabled" SERVER_CLI="${JVM_CLI} -DZK=embedded -DNAME=PrimaryServer -DROLE=primary" SERVER_CLI="$SERVER_CLI -DCONFIG=${HOME}/nw2.conf -DLOG_DIR=${HOME}/logs" This configuration sets maximum heap size at 32G, which may be excessive if you have fewer devices and variables. Watch variables `jvmMemTotal` and `jvmMemFree` in Graphing Workbench (category `Monitor`) to get the idea of how much memory your server really uses. Usually, right after restart, the server does not utilize maximum allowed amount of memory and `jvmMemTotal` is less than what is specified with `-Xmx` parameter. Memory will grow during network discovery run and after each reconfiguration, when devices or python hook scripts change. Once the value of `jvmMemTotal` reaches maximum, it should stay there. At this point watch `jvmMemFree`. If you see that the server still runs with lots of free memory after many discovery and reconfiguration cycles, you can reduce maximum allowed amount in the `-Xmx` parameter if necessary. .. note:: If you experiment with different Java garbage collector configuration parameters, make sure to keep parameters `-XX:MaxPermSize=256m`, `-XX:+ParallelRefProcEnabled` and `-XX:G1HeapRegionSize=32M`, however you can drop parameter `-XX:MaxPermSize=256m` if you run NetSpyGlass with Java 8 Data Push Tuning ================ Monitoring data collected by monitors or secondary servers is transmitted to other servers via data push. What server the data is sent to is dictated by the parameter `push` in the configuration file `cluster.conf`. You can find more information about data push operation in :ref:`data_flow`. Data is transmitted in blocks, several monitoring variables at a time. Since all NetSpyGlass cluster members operate on strict schedule, it is important to make sure all data can be transmitted on time. Each cluster member must complete data push in time that is shorter than monitoring cycle. Even though the push usually starts after some delay after the beginning of the cycle, this is true for every subsequent push, too, so all cluster members transmit data to each other in a orchestrated synchronised manner. If a cluster member takes a long time to transmit all data it has accumulated, it is going to fall behind and its subsequent pushes will happen with progressively greater delay after the beginning of corresponding monitoring cycles. Servers wait for some time for their downstreams to complete the push (usually a couple of cycles) but then they time out and won't process the data even if downstreams actually complete their push late. Sometimes you may need to tune a couple of parameters to make sure data push can keep up. These parameters are located in the dictionary `push` at the top level of the configuration file `nw2.conf`:: # these parameters are used to fine tune timeouts in the data push accumulator. # Most likely you do not need to change these. push { monitorPushEndWaitTimeoutPollingCycles = 1.5 serverPushEndWaitTimeoutPollingCycles = 4 # number of threads to use to make data push calls in parallel. Changes to # this parameter require server restart threads = 12 } parameters `monitorPushEndWaitTimeoutPollingCycles` and `serverPushEndWaitTimeoutPollingCycles` tell the server how long it should wait for all downstreams it expects to receive data push from to complete it. Both parameters define the time in units of polling cycle interval. Most likely you don't need to change the defaults. Parameter `threads` tells the sender how many parallel threads to use to transmit the data. The set of variables that need to be pushed is divided between these threads so that they can be transmitted in parallel. This helps speed up data push over links with high latency. To verify that data push is able to keep up, inspect log file `/opt/netspyglass/home/logs/info.log` on sender's side. Look for lines that include words "PUSH DONE" (here the long log record line has been folded for readability):: 2016-05-18 23:49:42,430 INFO pool-17-thread-1 [rocessor.ServerDataPusher]: PUSH DONE (23:49:00); cycle 24; to PrimaryServer; variables: 253857; calls: 635; took 4303 ms in 12 threads It reports how many variables it pushed (253857), how many API calls it made (635) and how much time it took (4303ms). Since it was able to complete push in just over 4 sec, this server is doing ok.