Automated server selection

This guide discusses the Persistent Query Controller's automated server selection feature, called the dispatcher, which is used to balance the load of Persistent Queries (PQs) and consoles across multiple servers. The dispatcher uses a Server Selection Provider (SSP) to determine which server to use for each worker and PQ for load balancing and resource management. Deephaven offers a simple implementation, as well as the ability to implement your own.

  • Server groups consist of one or more dispatchers where PQs can be started. Server groups found in nearly every Deephaven installation include AutoQuery, which includes all query servers, and AutoMerge, which includes all merge servers.
  • The SSP provides these group names to the controller, allowing users to select the server group on which to run a PQ or console. When a user selects a group, the worker may be started on any dispatcher within that group.

Server selection providers can implement any algorithm of your choice and must implement the com.illumon.iris.controller.IServerSelectionProvider interface, including the required constructor. The com.illumon.iris.controller.SimpleServerSelectionProvider class is a Deephaven implementation and is described in detail here.

The provider is specified by the PersistentQueryController.ServerSelectionProvider property. For example:

PersistentQueryController.ServerSelectionProvider=com.illumon.iris.controller.SimpleServerSelectionProvider

Dispatcher configuration

Server selection logic uses two dispatcher configuration properties to determine the maximum heap size for each worker:

  • RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB specifies the total amount of heap all of its workers are allowed to use. For installations where each dispatcher runs on its own dedicated server, this is typically most of that server's memory; for installations where multiple dispatchers run on a single server, each dispatcher should have a reasonable amount configured.
  • RemoteQueryDispatcher.maxPerWorkerHeapMB specifies the maximum heap each worker is allowed to request.

For example, the following stanza specifies that the merge servers should be allocated 40GB of heap for workers, but each worker is only allowed 4GB of heap:

[service.name=dbmerge] {
  RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB=40960
  RemoteQueryDispatcher.maxPerWorkerHeapMB=4096
}

Simple server selection provider

The com.illumon.iris.controller.SimpleServerSelectionProvider class is a simple implementation of the com.illumon.iris.controller.IServerSelectionProvider interface. It performs round-robin-based selection of servers within a group based on server availability and current usage.

The Simple Server Selection Provider uses a basic resource-comparison algorithm. It compares the percentage of heap utilization on each server and, in case of a tie, uses the number of running workers. When a PQ has replicas or spares, the provider first compares the number of workers for that specific PQ on each server. This prevents all replicas from starting on a single, lightly used server. The Simple Server Selection Provider is configured using the following properties.

The property below defines the groups for the provider. It is a comma-delimited list of group names, which are used to derive the rest of the properties:

SimpleServerSelectionProvider.ActiveGroups

The following example defines two groups called AutoQuery and AutoMerge:

SimpleServerSelectionProvider.ActiveGroups=AutoQuery,AutoMerge

The Simple Server Selection Provider then uses these group names to find the other properties:

SimpleServerSelectionProvider.Group.<group name>.<property suffix>=<value>

The following table defines the property suffixes.

Property SuffixMeaning
ServerClassSpecifies the server class for this group. Only servers of this class will be chosen when a server is requested. Typical values are Query and Merge. If this is not supplied, then the value specified by the property iris.db.defaultServerClass is used, which is typically Query.
ServersSpecifies the servers in this group. If not specified, all servers of the specified class are in this group.
MaxHeapMBPerWorkerSpecifies the maximum heap in MB allowed for a worker for this group. If not specified or 0, then the value specified in this group's DefaultHeapMBPerServer property is used.
DefaultHeapMBPerServerSpecifies the default total heap for each server for this group. This will be updated by the controller after it gets a connection to each dispatcher with the dispatcher's real configured value.
ConsoleGroupsIf defined, specifies a list of ACL groups. A user must belong to one of these groups to specify this server group when starting a console or Code Studio. This property is used by the UI to restrict the options shown to a user, while dispatcher group restrictions covers dispatcher enforcement.

Here is an example configuration:

# Use the SimpleServerSelectionProvider
PersistentQueryController.ServerSelectionProvider=com.illumon.iris.controller.SimpleServerSelectionProvider

# Specify two groups, named AutoQuery and AutoMerge
SimpleServerSelectionProvider.ActiveGroups=AutoQuery,AutoMerge

# Properties for the AutoQuery group:
# It will use all available servers of the Query class
# The default total heap MB per server is 65536
# No workers can use this group if they request over 32768M in heap
SimpleServerSelectionProvider.Group.AutoQuery.ServerClass=Query
SimpleServerSelectionProvider.Group.AutoQuery.MaxHeapMBPerWorker=32768
SimpleServerSelectionProvider.Group.AutoQuery.DefaultHeapMBPerServer=65536

# Properties for the AutoMerge group:
# It will only use the servers named Merge_Server_1 and Merge_Server_2
# The default total heap MB per server is 65536
# No workers can use this group if they request over 32768M  in heap
SimpleServerSelectionProvider.Group.AutoMerge.ServerClass=Merge
SimpleServerSelectionProvider.Group.AutoMerge.Servers=Merge_Server_1,Merge_Server_2
SimpleServerSelectionProvider.Group.AutoMerge.MaxHeapMBPerWorker=32768
SimpleServerSelectionProvider.Group.AutoMerge.DefaultHeapMBPerServer=65536

Mark a server administratively down

You can temporarily mark a server as unavailable using the dhconfig pq selection-provider command. This will exclude the server from the list of available servers for the Simple Server Selection Provider. This prevents new queries or consoles from being automatically assigned to that query server. However, the server can still be selected manually, and existing running queries will not be evicted. This is useful if a server is malfunctioning, allowing you to remove it from the rotation while retaining access for debugging purposes.

You can also use dhconfig pq selection-provider to add a server back to the list of available servers after it's been removed.

The down state of a server is not persisted; on controller restart, all servers are marked as up. To permanently remove a server, re-run the Deephaven installer after updating the appropriate properties.

Failure Backoff Policy

Sometimes, servers can fail to start workers as directed by the dispatcher. In these cases, they often become the least-loaded server. Consequently, the Simple Server Selection Provider may try to assign all new workers to that server.

To prevent this, a backoff policy is available. This policy avoids immediately assigning workers to a server that recently failed to start a worker due to a non-script related issue. Before assigning another worker to that server, the selection provider ensures that each of the other servers has had a worker assigned to them successfully. The backoff policy state is cleared as soon as a worker is successfully assigned.

This backoff policy prevents the Simple Server Selection Provider algorithm from repeatedly assigning workers to a server that cannot start them. It also periodically attempts to assign workers to handle transient failures or misconfigured queries. The backoff policy is configured using the following property:

SimpleServerSelectionProvider.FailureBackoffPolicy=ALL_OTHERS_AFTER_ACQUISITION_FAILURE

The default (and only other valid value) for this property is NONE.

Reload the controller configuration

While the server selection provider type cannot be dynamically reloaded (i.e., you can't dynamically change the PersistentQueryController.ServerSelectionProvider property), the properties used by the SimpleServerSelectionProvider are reloadable. Providers should automatically adjust to any changes in the controller server when dynamically reloaded using the controller tool's reload capability. For example, a newly added server should become available to the algorithm as soon as it is running. Additionally, new groups and changes to the servers allowed within a class are updated when the controller reloads its configuration.