Persistent Query Controller configuration
Users define Persistent Queries (PQs) to perform operations with a Deephaven database. A PQ configuration includes:
- The PQ's owner and name.
- The type of PQ (including live query, import, merge, and validate).
- Details on which users are allowed to view or administer the PQ.
- Scheduling to define when the PQ runs.
- Some PQs include a script to perform database operations.
- Some PQs include other details specific to the type of PQ, such as a table to be merged.
PQ details are fully defined in the protocol documentation.
PQs are defined through the Query Monitor and then stored for future use, and are under the control of the Persistent Query Controller (controller) process. This process stores all PQs, and starts and stops them at the scheduled times.
The Persistent Query Controller runbook contains further details on the controller.
Several additional pages relate to controller configuration:
- The controller uses automated server selection to determine where workers should be started.
- Remote processing profiles specify Java parameters such as garbage collection that can be defined by an administrator.
- Dispatcher usage restrictions can be configured to restrict where workers are started.
Configuration
The controller's configuration exists in both property files and XML files. Like all Deephaven processes, property files are retrieved from the Configuration Server. The XML files are read directly from disk.
Reloadable configuration
Some aspects of the controller's configuration can be dynamically reloaded without the need to restart the controller. These include:
- The list of query and merge servers.
- The list of temporary queues.
- JVM profiles.
- Configuration types.
- Automated server selection properties.
The controller reload command is issued by a privileged user (typically irisadmin
) running the dhconfig pq reload
command - by default, only superusers can issue reload commands. Additional users can be authorized to issue reload commands by adding them to a group defined by the property configuration.reload.userGroups
with the ACL editor or the command-line dhconfig acls tool.
Controller restart and failover
Multiple instances of the Persistent Query Controller can be configured by the installer, but only one controller is ever active at a time (the leader). On startup, the controller participates in a leader election using etcd
. The leader writes its address to an etcd key that clients use to connect to the active controller. Leadership can change in several ways.
- When the active controller terminates, it resigns leadership.
- If a controller detects that it has lost leadership, then it terminates. For example, this may happen if it loses contact with etcd.
- If a controller terminates unexpectedly (including the loss of the node where the controller is running), the loss of heartbeats will cause a leader election.
The active controller stores the state of Persistent Query workers in etcd. When a controller is elected leader and made active, it reads back the state stored in etcd, using it to handle any running PQs.
- Core+ workers in the
Running
state are restored by re-establishing communication with them. - A running Core+ worker is terminated if the controller cannot promptly reconnect to it.
- All other workers are terminated to release system resources. This includes all legacy PQs and any PQs not in
Running
state.
When elected leader, the controller refreshes its reloadable configuration. Many other configuration properties are loaded statically and use the values when the controller was originally started, not when it was made active. Some additional properties will be re-read at the time the controller was elected leader.
Several configuration properties control the timeouts used for leader election and worker restoration.
Caution
Do not change these property values unless you have consulted with Deephaven support as they interact with other configuration values.
Property | Description | Default |
---|---|---|
PersistentQueryController.etcdStateLeaseTtlSeconds | How long the etcd lease for PQ state lasts, in seconds. If the lease is not renewed, etcd automatically revokes it, which will effectively kill this controller. | 60 |
PersistentQueryController.etcdStateLeaseRenewSeconds | How often the controller renews its etcd lease for PQ State KVs, in seconds. This must be less than PersistentQueryController.etcdStateLeaseTtlSeconds ; for safety, it should be no more than 50% of the TTL value. | 15 |
PersistentQueryController.etcdStateLeaseRetryDelaySeconds | The maximum frequency at which the controller tries to renew a PQ State lease. If a lease renewal fails, the controller retries faster than the standard PersistentQueryController.etcdStateLeaseRenewSeconds . | 1 |
PersistentQueryController.etcdStateLeaseTimeoutMS | How long the controller waits for our etcd PQ State lease to be granted or renewed in milliseconds before determining that it had a failure. Defaults to PersistentQueryController.etcdStateLeaseRenewSeconds * 1000. | See notes. |
PersistentQueryController.etcdElectionLeaseTtlSeconds | How long our etcd lease for leader election lasts, in seconds. If the lease is not renewed for this many seconds, etcd revokes it, and the controller loses leadership and its resolver registration is removed. | 15 |
PersistentQueryController.etcdElectionLeaseRenewSeconds | How often, in seconds, the controller renews its etcd lease for leader election. | 7 |
PersistentQueryController.etcdElectionLeaseRetryDelaySeconds | The maximum frequency at which the controller tries to renew an election lease. If a lease renewal fails, it retries faster than the standard PersistentQueryController.etcdElectionLeaseRenewSeconds . | 1 |
PersistentQueryController.etcdElectionLeaseTimeoutMS | How long the controller waits for our etcd election lease to be granted or renewed in milliseconds before determining that it had a failure. Defaults to PersistentQueryController.etcdElectionLeaseRenewSeconds * 1000. | See notes. |
PersistentQueryController.etcdStateKvsLoadTimeoutSeconds | How long in seconds the controller waits to load existing worker state. | 60 |
PersistentQueryController.etcdStateKvsLoadBatchSize | How many worker states the controller loads in a single batch at startup. | 2048 |
PersistentQueryController.electionRpcTimeoutMillis | The timeout used for election-related RPCs. | 15 |
PersistentQueryController.electionObserveTimeoutMillis | How long the controller waits after being elected leader to observe itself as leader before timing out. | 15 |
PersistentQueryController.restoreTimeoutMillis | How long the controller waits to restore existing workers on startup (after winning the election). If this is set to too small a value, then a worker that is truly live may not be restored in time. If this is set too large, then a single hung worker may cause the controller to not recover after failover for longer than is desirable. | 40,000 |
Determining the active controller
On a Deephaven server, run the command dhconfig pq leader
to find the active (leader) controller.
Query and merge servers
All Persistent Queries are started by a Remote Query Dispatcher, also known as a dispatcher. The controller has a view of all the dispatchers in a Deephaven installation. Each dispatcher listens on a specific port for connections from the controller. The controller uses the query type and the defined server classes, as well as any defined dispatcher usage restrictions, to direct specific PQs to start on specific dispatchers.
The most common types of dispatchers are query servers and merge servers, which are included in a default Deephaven installation.
- Query servers run most workers that do not require the ability to write data. There is at least one query server in a Deephaven installation. Additional query servers can be configured to increase capacity, typically by running an additional query server on each additional machine.
- Merge servers run privileged workers that are allowed to write to (and delete) disk-backed tables, both intraday and historical. Only users with specific privileges are allowed to start merge workers. Many installations only need one merge server, although more can be added if necessary.
Server definitions
Servers (dispatchers) are defined by a set of properties that the controller reads on startup. This list of available servers may need to be updated - for example, to add a new server or to change the address of a failed server - and can be dynamically reloaded.
The properties that define the dispatchers are automatically added to iris-endpoints.prop
during installation, and should be changed by updating the installation properties and rerunning the installer.
- If custom changes are required that are not supported by the installer (such as the
name
,port
, andconsoleGroups
properties described below) they should be added to a[service.name=iris_controller|controller_tool|configuration_server]
stanza iniris-common.prop
. - If it is not possible to run the installer,
iris-endpoints.prop
can be updated manually, but the installer properties must be updated before the next upgrade or the changes will be lost. - See viewing and changing configuration for details on how to change configuration files.
The list of query servers is always started by defining the number of available servers:
iris.db.nservers=<N>
This is followed by a list of server properties in the format:
iris.db.<server number>.<property>=<value>
The server numbers start at 1 and increment to N from the value defined by iris.db.nservers
. Following are the properties that can be defined for each server:
Property | Description | Default |
---|---|---|
iris.db.<server number>.host | The host name or IP for the server. | None |
iris.db.<server number>.port | The port on which the dispatcher listens for client connections. | The default port from the property RemoteQueryDispatcherParameters.queryPort , usually 22013 for query servers and 30002 for merge servers. |
iris.db.<server number>.class | The server class, usually Query or Merge . | The value from the iris.db.defaultServerClass property, which defaults to Query . |
iris.db.<server number>.name | A name for the server, displayed in the Code Studio screen and stored with PQs. | Generated by using <server class>_<number> , where the number starts with 1 and increments by 1 for each server of a given class. |
iris.db.<server number>.consoleGroups | List of ACL groups to which a user must belong for this server to be visible in the server list (see dispatcher usage restrictions for details). | None |
The dispatcher usage restrictions documentation includes scenarios where some of these properties are changed.
Temporary query queues
Temporary queries are used for PQs that run once and then get deleted, such as batch imports of historical data. The controller uses temporary query queues to run these temporary queries when resources are available. These queues are defined through controller properties, which can be dynamically reloaded.
As many temporary query queues can be defined as needed, and each one will have its own properties based on the queue's name. These properties define the resources that the temporary queue is allowed to consume. Both of the following properties are required for each temporary query queue:
Property | Description |
---|---|
PersistentQueryController.temporaryQueryQueue.<queue_name>.maxConcurrentQueries | The maximum number of concurrent queries allowed to run on the named temporary query queue. |
PersistentQueryController.temporaryQueryQueue.<queue_name>.maxHeapMB | The maximum heap in MB that the temporary queries are allowed to use while running on the temporary query queue. |
These resource restrictions determine when the next temporary query can run on a queue. A query will only run if it does not cause the maximum concurrent queries or the maximum heap to be exceeded. If either limit is exceeded, the query will wait until sufficient resources are available. Queries are run in the order they were submitted to the queue.
The property PersistentQueryController.defaultTemporaryQueryQueue
specifies the default temporary query queue used when a user selects temporary scheduling.
Below is a simple default configuration that defines a single queue, allowing one query to run at a time with a maximum heap of 20,000 MB.
PersistentQueryController.temporaryQueryQueue.DefaultTemporaryQueue.maxConcurrentQueries=1
PersistentQueryController.temporaryQueryQueue.DefaultTemporaryQueue.maxHeapMB=20000
PersistentQueryController.defaultTemporaryQueryQueue=DefaultTemporaryQueue
Persistent Query startup pool
The Persistent Query Controller may need to start a large number of PQs simultaneously at the beginning of a business day. It uses a thread pool to manage this process. Although extra threads are added as needed, it can be beneficial to increase the thread pool size on systems where the controller is expected to start and stop many PQs at once. The following properties control this thread pool:
Property | Description | Default |
---|---|---|
PersistentQueryController.queryStartThreadPoolCoreSize | The minimum number of threads maintained for Persistent Query startup; the number of available threads will never drop below this value. | 20 |
PersistentQueryController.queryStartThreadPoolKeepAliveMinutes | Extra threads added for query startup will be removed if they are idle for this number of minutes. | 10 |
For example, on a system with a lot of PQs that run every hour, PersistentQueryController.queryStartThreadPoolKeepAliveMinutes
can be updated to ensure that threads remain available for an hour or more.
Query types
Query configuration types (such as Live Query (Script)
, Batch Query (RunAndDone)
, and Data Merge
) are defined in an XML configuration file, which is used by the controller and web interface to understand how to handle each type of query. New query types can be added.
Warning
Modification of the existing Deephaven query types is not recommended.
The property iris.controller.configurationTypesXml
defines a comma-delimited list of XML files that contain the query configuration type definitions. The default value is PersistentQueryConfigurationTypes.xml
. This default file is overwritten during each installation so should not be edited unless instructed by Deephaven support.
Query type attributes
Each query type is defined in an XML ConfigurationType
element, defining the following attributes:
Attribute | Description | Default |
---|---|---|
allowedGroups | A comma-delimited list that restricts owners of the query to the specified user groups. If a user is not a member of one of the specified groups or a superuser, the user will not be able to create a query of this type. | None |
displayable | Whether or not the query type should be displayed in a console. If false , this query type is only displayed to superusers. | true |
enabled | If false , this query type is disabled. A disabled query type is not available to users. | true |
hasScript | Defines whether the query has a script. If a query does not have a script (hasScript="false" ), then a script panel is not displayed when editing a query of this type. | true |
name | The name of the query type, such as Live Query (Script). | None |
serverTypes | An optional comma-delimited list which restricts the server types on which a query can run. | The default server types from the property iris.db.defaultServerClass , which defaults to Query . |
stopTimeRequired | Whether scheduling of the query requires a stop time. Query types such as Live Query (Script) that run continuously require stop times, while query types such as Batch Query (RunAndDone) do not. | true |
supportsReplicas | Whether the query type supports replicas. | false |
Some pre-defined groups are used for allowedGroups
in standard Deephaven PQ types:
iris-superusers
- The most privileged group; has all privileges.iris-datamanagers
- Can import, merge, and delete data.iris-dataimporters
- Can import data, but not merge or delete it.iris-datavalidators
- Can run data validation PQs.
Query sub-elements
Each ConfigurationType
element can define the following sub-elements to further define behavior. The classes defined within these elements are dynamically created during the creation of workers:
<SetupQuery name="Java setup class" />
- This required element defines a Java class that will be used to create an instance of the query type. This class must extend thecom.illumon.iris.db.tables.remotequery.ContextAwareRemoteQuery<com.illumon.iris.controller.PersistentQueryState>
class. A query type is not valid without this setup class.<ConfigChecker class="Java configuration checker class" />
- An optional Java configuration checker class that will be run to validate data before a query of this type can be saved. This class must implement thecom.illumon.iris.controller.ConfigChecker
interface. If it is not provided, no extra validation is performed on a query of this type before it is saved.
Persistent Query storage
All Persistent Queries created by users are stored by the controller in etcd. See the Persistent Query Backup and Restore Process runbook documentation for details on how to back up and restore PQs.
If the controller does not start up because of an error reading Persistent Queries, see the controller etcd access troubleshooting guide.
Git configuration
By default, the source code for a Persistent Query in the Script Editor tab is stored by the Persistent Query Controller as part of the PQ's configuration. However, with Git integration, a PQ's source code is optionally not stored in Deephaven but instead can be loaded directly from its associated Git repository.
Deephaven's use of Git for maintaining PQ scripts is entirely as a consumer. When a PQ is configured to use Git for the script source, the Controller will read the script file from the Git repository when the PQ starts, or to display in the UI when creating or editing the PQ configuration. Since Deephaven is a consumer of the Git-managed content, the script itself can not be edited in the Deephaven UI. To create or edit Git-managed PQ scripts, save them to Groovy or Python (.groovy
or .py
) files, and push them to the Git repository using a Git client such as the git
command line tool or a Git-integrated IDE.
Warning
Controller Git repositories should not be stored on NFS.
Note
For more information on configuring PQs to use git for their scripts, see the Query Monitor documentation.
By default, Deephaven uses a local Git repository. To enable repository updates (i.e., git pull
), you must configure the Persistent Query Controller to not use a local Git repository.
To change the default behavior, add the PersistentQueryController.useLocalGit
setting. Set it to true
to use a local repository as the script source. This will disable repository updates globally, regardless of each repository's updateEnabled
setting. The default value is true
, meaning Deephaven uses a local repository and does not check out a configured branch from the remote. Set it to false
to allow the Deephaven Controller to clone and fetch from a remote Git server.
To enable Git integration, several properties must be set in the Deephaven controller's configuration. One global property must be set, followed by several properties for each repository.
The global property is:
iris.scripts.repos
— a comma-separated list of Git repositories the controller should use.
The additional properties for each repository are listed below:
iris.scripts.repo.<repo_name>.groups
— a comma-separated list of the Deephaven groups who may access the repository. Group access is configured with the ACL editor or the command-line dhconfig acl tool.iris.scripts.repo.<repo_name>.updateEnabled
— Set totrue
to automatically update the repository (i.e., run agit pull
) once per minute. This helps ensure that when a query runs, it uses the most recent version of the script available in the repository's remote origin.iris.scripts.repo.<repo_name>.branch
— the Git branch to check out; if this is not set, the controller'sPersistentQueryController.defaultBranch
property value is used, ormaster
if this is not defined.iris.scripts.repo.<repo_name>.prefixDisplayPathsWithRepoName
— Iftrue
, the "Choose Script" dialog of the Persistent Query Configuration Editor will include the repository's name next to each script path. This helps disambiguate scripts for users who have access to multiple repositories.iris.scripts.repo.<repo_name>.root
— the directory on the filesystem into which Deephaven will clone the Git repository. Each repository must have a distinct root directory. If a relative path is used, the path will be relative to the workspace directory of the Controller process. On Deephaven servers, this will normally be/db/TempFiles/irisadmin/iris_controller
.iris.scripts.repo.<repo_name>.paths
— the paths to include, relative to the repository's root directory. Files in all other paths will not be available to PQs.iris.scripts.repo.<repo_name>.uri
— the SSH URI used to access the Git repository, such asgit@github.com:my-git-org/myrepo.git
. This can be empty ifupdateEnabled
is false.iris.scripts.repo.<repo_name>.remote
— sets the name of the remote alias. Defaults toorigin
.iris.scripts.repo.<repo_name>.resetGitLockFiles
— whether the controller should reset if it finds Git locks when it starts a sync. Defaults totrue
. Only the controller should be running Git commands in the irisadmingit
directory path, so lock files should only be left over if the controller is stopped during a Git operation, and it is beneficial to allow the controller to clear these locks automatically.
Git garbage collection is controlled by the following property:
PersistentQueryController.gitGcEnabled
: If defined and set tofalse
, this disables Git garbage collection.
Git authentication
If a keypair is being used to authenticate with the Git server, a common approach is to create the keypair in the .ssh
directory under the irisadmin
home directory, typically located at /db/TempFiles/irisadmin
. Additionally, the Git server's host must be added to the known_hosts
file for irisadmin
. By default, the irisadmin
user does not have an .ssh
directory. Therefore, the complete process to set up irisadmin
for SSH authentication to Git is as follows:
sudo su - irisadmin
to switch to theirisadmin
user context (or any custom name being used).mkdir .ssh
chmod 700 .ssh
ssh-keygen -t rsa
(accept all defaults - press Enter each time - to create a new key pair with no passphrase.)ssh-keyscan -t rsa <fqdn_of_git_server> >> .ssh/known_hosts
cat .ssh/known_hosts
to verify that the host from step 5 was added; if not, tryssh-keyscan -H <fqdn_of_git_server> >> .ssh/known_hosts
.
If the .ssh
directory already exists, then steps 1 through 3 are not needed.
Note
Since .ssh
starts with a .
for its name, it is a hidden directory which will not be displayed by default by ls
. Use ls -a
or ls -la
to display all directories.
The newly created public key for the irisadmin
account can then be added to the Git server to allow ssh authentication for Git commands:
cat ./ssh/id_rsa.pub
to get the public key that needs to be provided to the Git server. This may be through a UI, for tools like GitLab, or, if the server is a simple Linux server, the public key can be appended to the.ssh/authorized_keys
file for the Git user on the Git server.
Warning
Since the controller's Git key is accessible to anybody with access to the irisadmin
account, it should be set up on the Git server to be read-only.
Example configuration
The example configuration below configures Deephaven to read scripts from three repositories: team1
, team2
, and shared
. All users can access scripts in the shared
repository, but the team1
and team2
repositories are restricted to specific users. All three repositories use a branch called master
. Also, the shared
repository uses a different Git server than the other two.
PersistentQueryController.useLocalGit=false
iris.scripts.repos=shared,team1,team2
iris.scripts.repo.shared.groups=*
iris.scripts.repo.shared.updateEnabled=true
iris.scripts.repo.shared.branch=master
iris.scripts.repo.shared.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.shared.root=../git/shared
iris.scripts.repo.shared.paths=IrisQueries/groovy,IrisUtils/groovy
iris.scripts.repo.shared.uri=git@git.mycompany.net:common-libs/shared.git
iris.scripts.repo.team1.groups=user1,user2,user3
iris.scripts.repo.team1.updateEnabled=true
iris.scripts.repo.team1.branch=master
iris.scripts.repo.team1.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.team1.root=../git/team1
iris.scripts.repo.team1.paths=IrisQueries/groovy
iris.scripts.repo.team1.uri=git@gitlab.mycompany.net:team1/team1.git
iris.scripts.repo.team2.groups=user1,user4,user5
iris.scripts.repo.team2.updateEnabled=true
iris.scripts.repo.team2.branch=master
iris.scripts.repo.team2.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.team2.root=../git/team2
iris.scripts.repo.team2.paths=IrisQueries/groovy
iris.scripts.repo.team2.uri=git@gitlab.mycompany.net:team2/team2.git
Troubleshooting
In most cases, if updates are enabled and there is a Git-related problem, the Deephaven controller process fails to start or shuts down shortly after starting. Details of such issues will be logged to the process startup log: /var/log/deephaven/iris_controller/iris_controller.log.<yyyy-mm-dd>
.
If a non-fatal Git error occurs, this is generally logged to the controller process log: /var/log/deephaven/iris_controller/PersistentQueryController.log.<yyyy-mm-dd-hhmmss.mmm+/-hhmm>
.
Note that ssh is used for authentication, but the URI should not include ssh://
as a prefix. https
is not currently supported.
ssh -v <user>@<fqdn_of_git_server>
may provide additional diagnostic information about the connection and authentication processes.
Other properties
Other properties related to the controller's operation follow. These parameters are not reloadable.
Property | Description | Default |
---|---|---|
critEmail | The email distribution list to which critical alerts will be sent (currently, the only critical alert is for a hung script update job, which is used to refresh scripts from Git). | None |
iris.authentication.keyfile | The keyfile used to authenticate the controller to the dispatchers. | /etc/sysconfig/deephaven/auth/priv-iris.base64.txt |
PersistentQueryController.binaryLogTimeZone | The time zone that determines column partition values for the controller's data (PersistentQueryStateLog and PersistentQueryConfigurationLogV2 ). | The server's default time zone. |
PersistentQueryController.defaultMaxHeapSizeGB | The maximum heap the controller will allow for a query, in GB; further information on controlling worker heap size can be found in the worker heap size documentation. | 1024 |
PersistentQueryController.keyPairFile | The keypair file used to encrypt sensitive information for the controller's use - this file should not be visible to users of the system. | /etc/sysconfig/illumon.d/auth/priv-controllerConsole.base64.txt |
PersistentQueryController.port | The port on which the controller listens for client connections, from user clients or configuration tools. | 20126 |
queryScheduler.restartWhenRunning | This defines the default value populated in the Persistent Query scheduler’s “Restart when running” option; either Yes or No . | Yes |
In addition, the controller's logging behavior can be changed with the standard logging parameters. See the Deephaven Operations Guide Log-Related Properties section for further details.
Initial startup configuration
When the controller starts for the first time on any Deephaven installation, it must create helper queries to assist with Deephaven operations. Properties are used to configure these queries. Changing these properties is not recommended unless directed by Deephaven support.
Revert Helper Query
The Revert Helper query assists Deephaven when a query is reverted to a previous version. The following parameters are used when creating the initial revert helper query (the first time the controller is run), along with their default values:
revertHelper.queryOwner=<superuser>
: The owner of the revert helper query; this must be a superuser.revertHelper.queryName=RevertHelperQuery
: The name of the query.revertHelper.dbServer=Query_1
: The server on which the revert helper will run. It should be a server with a type of Query; if custom-named servers are used, this should reflect a named query server.revertHelper.heapSize=1
: The heap size in GB of the helper query.
The following parameter defines how far back the revert helper looks when a user requests to revert a query to a previous version:
revertHelper.lookbackDays=180
The following parameter defines the number of seconds a request to revert a query will wait for a response from the helper before displaying an error:
revertHelper.waitQuerySeconds=30
Import Helper Query
The Import Helper query assists with import, merge, and validation queries. The initial query-creation parameters have the same meanings as for the revert helper:
importHelper.queryOwner=<username>
importHelper.queryName=ImportHelperQuery
importHelper.dbServer=Merge_1
: The import helper should run on a server with a type of Merge.importHelper.heapSize=1
Web Client Data Query
The Web Client Data query is used by the web_api_service
to retrieve information related to web sessions. The initial query-creation parameters have the same meanings as for the revert helper:
webClientData.queryOwner=<username>
webClientData.queryName=WebClientData
webClientData.dbServer=Query_1
webClientData.heapSize=1