Automated server selection
This guide discusses the Persistent Query Controller's automated server selection feature, called the dispatcher, which is used to balance the load of Persistent Queries (PQs) and consoles across multiple servers. The dispatcher uses a Server Selection Provider (SSP) to determine which server to use for each worker and PQ for load balancing and resource management. Deephaven offers a simple implementation, as well as the ability to implement your own.
- Server groups consist of one or more dispatchers where PQs can be started. Server groups found in nearly every Deephaven installation include
AutoQuery
, which includes all query servers, andAutoMerge
, which includes all merge servers. - The SSP provides these group names to the controller, allowing users to select the server group on which to run a PQ or console. When a user selects a group, the worker may be started on any dispatcher within that group.
Server selection providers can implement any algorithm of your choice and must implement the com.illumon.iris.controller.IServerSelectionProvider
interface, including the required constructor. The com.illumon.iris.controller.SimpleServerSelectionProvider
class is a Deephaven implementation and is described in detail here.
The provider is specified by the PersistentQueryController.ServerSelectionProvider
property. For example:
PersistentQueryController.ServerSelectionProvider=com.illumon.iris.controller.SimpleServerSelectionProvider
Dispatcher configuration
Server selection logic uses two dispatcher configuration properties to determine the maximum heap size for each worker:
RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB
specifies the total amount of heap all of its workers are allowed to use. For installations where each dispatcher runs on its own dedicated server, this is typically most of that server's memory; for installations where multiple dispatchers run on a single server, each dispatcher should have a reasonable amount configured.RemoteQueryDispatcher.maxPerWorkerHeapMB
specifies the maximum heap each worker is allowed to request.
For example, the following stanza specifies that the merge servers should be allocated 40GB of heap for workers, but each worker is only allowed 4GB of heap:
[service.name=dbmerge] {
RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB=40960
RemoteQueryDispatcher.maxPerWorkerHeapMB=4096
}
Simple server selection provider
The com.illumon.iris.controller.SimpleServerSelectionProvider
class is a simple implementation of the com.illumon.iris.controller.IServerSelectionProvider
interface. It performs round-robin-based selection of servers within a group based on server availability and current usage.
The Simple Server Selection Provider uses a basic resource-comparison algorithm. It compares the percentage of heap utilization on each server and, in case of a tie, uses the number of running workers. When a PQ has replicas or spares, the provider first compares the number of workers for that specific PQ on each server. This prevents all replicas from starting on a single, lightly used server. The Simple Server Selection Provider is configured using the following properties.
The property below defines the groups for the provider. It is a comma-delimited list of group names, which are used to derive the rest of the properties:
SimpleServerSelectionProvider.ActiveGroups
The following example defines two groups called AutoQuery and AutoMerge:
SimpleServerSelectionProvider.ActiveGroups=AutoQuery,AutoMerge
The Simple Server Selection Provider then uses these group names to find the other properties:
SimpleServerSelectionProvider.Group.<group name>.<property suffix>=<value>
The following table defines the property suffixes.
Property Suffix | Meaning |
---|---|
ServerClass | Specifies the server class for this group. Only servers of this class will be chosen when a server is requested. Typical values are Query and Merge . If this is not supplied, then the value specified by the property iris.db.defaultServerClass is used, which is typically Query . |
Servers | Specifies the servers in this group. If not specified, all servers of the specified class are in this group. |
MaxHeapMBPerWorker | Specifies the maximum heap in MB allowed for a worker for this group. If not specified or 0, then the value specified in this group's DefaultHeapMBPerServer property is used. |
DefaultHeapMBPerServer | Specifies the default total heap for each server for this group. This will be updated by the controller after it gets a connection to each dispatcher with the dispatcher's real configured value. |
ConsoleGroups | If defined, specifies a list of ACL groups. A user must belong to one of these groups to specify this server group when starting a console or Code Studio. This property is used by the UI to restrict the options shown to a user, while dispatcher group restrictions covers dispatcher enforcement. |
Here is an example configuration:
# Use the SimpleServerSelectionProvider
PersistentQueryController.ServerSelectionProvider=com.illumon.iris.controller.SimpleServerSelectionProvider
# Specify two groups, named AutoQuery and AutoMerge
SimpleServerSelectionProvider.ActiveGroups=AutoQuery,AutoMerge
# Properties for the AutoQuery group:
# It will use all available servers of the Query class
# The default total heap MB per server is 65536
# No workers can use this group if they request over 32768M in heap
SimpleServerSelectionProvider.Group.AutoQuery.ServerClass=Query
SimpleServerSelectionProvider.Group.AutoQuery.MaxHeapMBPerWorker=32768
SimpleServerSelectionProvider.Group.AutoQuery.DefaultHeapMBPerServer=65536
# Properties for the AutoMerge group:
# It will only use the servers named Merge_Server_1 and Merge_Server_2
# The default total heap MB per server is 65536
# No workers can use this group if they request over 32768M in heap
SimpleServerSelectionProvider.Group.AutoMerge.ServerClass=Merge
SimpleServerSelectionProvider.Group.AutoMerge.Servers=Merge_Server_1,Merge_Server_2
SimpleServerSelectionProvider.Group.AutoMerge.MaxHeapMBPerWorker=32768
SimpleServerSelectionProvider.Group.AutoMerge.DefaultHeapMBPerServer=65536
Mark a server administratively down
You can temporarily mark a server as unavailable using the dhconfig pq selection-provider
command. This will exclude the server from the list of available servers for the Simple Server Selection Provider. This prevents new queries or consoles from being automatically assigned to that query server. However, the server can still be selected manually, and existing running queries will not be evicted. This is useful if a server is malfunctioning, allowing you to remove it from the rotation while retaining access for debugging purposes.
You can also use dhconfig pq selection-provider
to add a server back to the list of available servers after it's been removed.
The down
state of a server is not persisted; on controller restart, all servers are marked as up. To permanently remove a server, re-run the Deephaven installer after updating the appropriate properties.
Failure Backoff Policy
Sometimes, servers can fail to start workers as directed by the dispatcher. In these cases, they often become the least-loaded server. Consequently, the Simple Server Selection Provider may try to assign all new workers to that server.
To prevent this, a backoff policy is available. This policy avoids immediately assigning workers to a server that recently failed to start a worker due to a non-script related issue. Before assigning another worker to that server, the selection provider ensures that each of the other servers has had a worker assigned to them successfully. The backoff policy state is cleared as soon as a worker is successfully assigned.
This backoff policy prevents the Simple Server Selection Provider algorithm from repeatedly assigning workers to a server that cannot start them. It also periodically attempts to assign workers to handle transient failures or misconfigured queries. The backoff policy is configured using the following property:
SimpleServerSelectionProvider.FailureBackoffPolicy=ALL_OTHERS_AFTER_ACQUISITION_FAILURE
The default (and only other valid value) for this property is NONE
.
Reload the controller configuration
While the server selection provider type cannot be dynamically reloaded (i.e., you can't dynamically change the PersistentQueryController.ServerSelectionProvider
property), the properties used by the SimpleServerSelectionProvider
are reloadable. Providers should automatically adjust to any changes in the controller server when dynamically reloaded using the controller tool's reload capability. For example, a newly added server should become available to the algorithm as soon as it is running. Additionally, new groups and changes to the servers allowed within a class are updated when the controller reloads its configuration.