Controlling query worker heap size

Remote query dispatcher parameters

Each remote query dispatcher instance can be configured as to the resources it allows workers to consume. Dispatchers running on larger servers may allow more resources to be consumed, while smaller dispatchers (e.g., on test nodes) may have less resources available.

Note

Each parameter below is shown with its default value.

Some of these properties apply only to non-interactive workers. Most workers are considered interactive, including embedded consoles and persistent queries. Non-interactive workers are created through the use of specific Deephaven APIs.

  • The following parameter defines the total available heap for all worker usage. The combined heap requests for all workers cannot exceed this value in MB, and the default value is 354304:
    • RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB
  • The following parameter defines the maximum heap size allowed for any worker on this dispatcher. If not defined, it defaults to the total available heap size for the dispatcher: RemoteQueryDispatcher.maxPerWorkerHeapMB

Note

If a remote query dispatcher belongs to a restricted class such as Merge then a user must be able to start workers on a server of that class to be able to start consoles on that dispatcher, based on their ACLs. This can be done by either:

  1. Ensuring that the user is in a group that can start at least one type of persistent query on a server of that class (see Query Types). For example, iris-dataimporters can create data-import persistent queries, so members of that group are allowed to start workers on merge servers.
  2. Adding one or more servers and users to a console server class and the associated ACL group (see Console Server Classes). For example, add a console group for the Merge class with the allowed-group of merge-console-group: ConsoleServerClass.Merge.allowedGroups=merge-console-group. With the ACL editor, add any users that should be allowed to start consoles to the specified ACL group, in this case merge-console-group.

Configuring Worker Heap Overhead

The Deephaven system creates a new JVM for each worker, specifying the maximum allowed heap through the -Xmx parameter based on the user's specification. The bulk of a JVM's memory usage is for the heap, which is where most user-defined objects are allocated. However, the JVM also uses off-heap native memory for several reasons, including direct memory allocations, garbage collection information, meta-space, classes, compiler caches, and more. Particularly for dispatchers that run many small workers, this can result in more memory usage than the configured values, causing the dispatcher to under-calculate actual memory usage for the RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB restriction. If this happens and the server runs out of memory, the operating system can kill random processes.

Deephaven provides the RemoteQueryDispatcher.memoryOverheadMB and RemoteQueryDispatcher.memoryOverheadMultiplier properties to change the way the dispatcher calculates memory usage, increasing every worker's assumed memory usage if these are set. For example:

  • RemoteQueryDispatcher.memoryOverheadMB=300: Adds 300MB to the memory-used calculation for every worker.
  • RemoteQueryDispatcher.memoryOverheadMultiplier=0.05: Adds 5% of the requested heap size to the memory-used calculation for every worker.

With the above example, if a heap of 1GB (1024MB) is requested, the dispatcher assumes the worker process uses 1,375MB of memory and subtracts it from the available heap. For a 32GB worker (32768MB), the dispatcher would account for 34,706MB of memory.

If workers are being unexpectedly OOM killed, then these properties should be increased. The disadvantage of increasing them beyond what is necessary is that the dispatcher can create fewer workers.

Persistent Query Controller and Console

When creating a persistent query, a user cannot set its heap size to a larger value than allowed for the chosen server based on that server's configured active query processing heap. If that dispatcher is not running, then a basic maximum is based off the following property. If the property is not defined, a maximum of 1024 GB is enforced:

PersistentQueryController.defaultMaxHeapSizeGB