Persistent Query Controller configuration

Users define Persistent Queries (PQs) to perform operations with a Deephaven database. A PQ configuration includes:

  • The PQ's owner and name.
  • The type of PQ (including live query, import, merge, and validate).
  • Details on which users are allowed to view or administer the PQ.
  • Scheduling to define when the PQ runs.
  • Some PQs include a script to perform database operations.
  • Some PQs include other details specific to the type of PQ, such as a table to be merged.

PQ details are fully defined in the protocol documentation.

PQs are defined through the Query Monitor and then stored for future use, and are under the control of the Persistent Query Controller (controller) process. This process stores all PQs, and starts and stops them at the scheduled times.

The Persistent Query Controller runbook contains further details on the controller.

Several additional pages relate to controller configuration:

Configuration

The controller's configuration exists in both property files and XML files. Like all Deephaven processes, property files are retrieved from the Configuration Server. The XML files are read directly from disk.

Reloadable configuration

Some aspects of the controller's configuration can be dynamically reloaded without the need to restart the controller. These include:

The controller reload command is issued by a privileged user (typically irisadmin) running the dhconfig pq reload command - by default, only superusers can issue reload commands. Additional users can be authorized to issue reload commands by adding them to a group defined by the property configuration.reload.userGroups with the ACL editor or the command-line dhconfig acls tool.

Controller restart and failover

Multiple instances of the Persistent Query Controller can be configured by the installer, but only one controller is ever active at a time (the leader). On startup, the controller participates in a leader election using etcd. The leader writes its address to an etcd key that clients use to connect to the active controller. Leadership can change in several ways.

  • When the active controller terminates, it resigns leadership.
  • If a controller detects that it has lost leadership, then it terminates. For example, this may happen if it loses contact with etcd.
  • If a controller terminates unexpectedly (including the loss of the node where the controller is running), the loss of heartbeats will cause a leader election.

The active controller stores the state of Persistent Query workers in etcd. When a controller is elected leader and made active, it reads back the state stored in etcd, using it to handle any running PQs.

  • Core+ workers in the Running state are restored by re-establishing communication with them.
  • A running Core+ worker is terminated if the controller cannot promptly reconnect to it.
  • All other workers are terminated to release system resources. This includes all legacy PQs and any PQs not in Running state.

When elected leader, the controller refreshes its reloadable configuration. Many other configuration properties are loaded statically and use the values when the controller was originally started, not when it was made active. Some additional properties will be re-read at the time the controller was elected leader.

Several configuration properties control the timeouts used for leader election and worker restoration.

Caution

Do not change these property values unless you have consulted with Deephaven support as they interact with other configuration values.

PropertyDescriptionDefault
PersistentQueryController.etcdStateLeaseTtlSecondsHow long the etcd lease for PQ state lasts, in seconds. If the lease is not renewed, etcd automatically revokes it, which will effectively kill this controller.60
PersistentQueryController.etcdStateLeaseRenewSecondsHow often the controller renews its etcd lease for PQ State KVs, in seconds. This must be less than PersistentQueryController.etcdStateLeaseTtlSeconds; for safety, it should be no more than 50% of the TTL value.15
PersistentQueryController.etcdStateLeaseRetryDelaySecondsThe maximum frequency at which the controller tries to renew a PQ State lease. If a lease renewal fails, the controller retries faster than the standard PersistentQueryController.etcdStateLeaseRenewSeconds.1
PersistentQueryController.etcdStateLeaseTimeoutMSHow long the controller waits for our etcd PQ State lease to be granted or renewed in milliseconds before determining that it had a failure. Defaults to PersistentQueryController.etcdStateLeaseRenewSeconds * 1000.See notes.
PersistentQueryController.etcdElectionLeaseTtlSecondsHow long our etcd lease for leader election lasts, in seconds. If the lease is not renewed for this many seconds, etcd revokes it, and the controller loses leadership and its resolver registration is removed.15
PersistentQueryController.etcdElectionLeaseRenewSecondsHow often, in seconds, the controller renews its etcd lease for leader election.7
PersistentQueryController.etcdElectionLeaseRetryDelaySecondsThe maximum frequency at which the controller tries to renew an election lease. If a lease renewal fails, it retries faster than the standard PersistentQueryController.etcdElectionLeaseRenewSeconds.1
PersistentQueryController.etcdElectionLeaseTimeoutMSHow long the controller waits for our etcd election lease to be granted or renewed in milliseconds before determining that it had a failure. Defaults to PersistentQueryController.etcdElectionLeaseRenewSeconds * 1000.See notes.
PersistentQueryController.etcdStateKvsLoadTimeoutSecondsHow long in seconds the controller waits to load existing worker state.60
PersistentQueryController.etcdStateKvsLoadBatchSizeHow many worker states the controller loads in a single batch at startup.2048
PersistentQueryController.electionRpcTimeoutMillisThe timeout used for election-related RPCs.15
PersistentQueryController.electionObserveTimeoutMillisHow long the controller waits after being elected leader to observe itself as leader before timing out.15
PersistentQueryController.restoreTimeoutMillisHow long the controller waits to restore existing workers on startup (after winning the election). If this is set to too small a value, then a worker that is truly live may not be restored in time. If this is set too large, then a single hung worker may cause the controller to not recover after failover for longer than is desirable.40,000

Determining the active controller

On a Deephaven server, run the command dhconfig pq leader to find the active (leader) controller.

Query and merge servers

All Persistent Queries are started by a Remote Query Dispatcher, also known as a dispatcher. The controller has a view of all the dispatchers in a Deephaven installation. Each dispatcher listens on a specific port for connections from the controller. The controller uses the query type and the defined server classes, as well as any defined dispatcher usage restrictions, to direct specific PQs to start on specific dispatchers.

The most common types of dispatchers are query servers and merge servers, which are included in a default Deephaven installation.

  • Query servers run most workers that do not require the ability to write data. There is at least one query server in a Deephaven installation. Additional query servers can be configured to increase capacity, typically by running an additional query server on each additional machine.
  • Merge servers run privileged workers that are allowed to write to (and delete) disk-backed tables, both intraday and historical. Only users with specific privileges are allowed to start merge workers. Many installations only need one merge server, although more can be added if necessary.

Server definitions

Servers (dispatchers) are defined by a set of properties that the controller reads on startup. This list of available servers may need to be updated - for example, to add a new server or to change the address of a failed server - and can be dynamically reloaded.

The properties that define the dispatchers are automatically added to iris-endpoints.prop during installation, and should be changed by updating the installation properties and rerunning the installer.

  • If custom changes are required that are not supported by the installer (such as the name, port, and consoleGroups properties described below) they should be added to a [service.name=iris_controller|controller_tool|configuration_server] stanza in iris-common.prop.
  • If it is not possible to run the installer, iris-endpoints.prop can be updated manually, but the installer properties must be updated before the next upgrade or the changes will be lost.
  • See viewing and changing configuration for details on how to change configuration files.

The list of query servers is always started by defining the number of available servers:

iris.db.nservers=<N>

This is followed by a list of server properties in the format:

iris.db.<server number>.<property>=<value>

The server numbers start at 1 and increment to N from the value defined by iris.db.nservers. Following are the properties that can be defined for each server:

PropertyDescriptionDefault
iris.db.<server number>.hostThe host name or IP for the server.None
iris.db.<server number>.portThe port on which the dispatcher listens for client connections.The default port from the property RemoteQueryDispatcherParameters.queryPort, usually 22013 for query servers and 30002 for merge servers.
iris.db.<server number>.classThe server class, usually Query or Merge.The value from the iris.db.defaultServerClass property, which defaults to Query.
iris.db.<server number>.nameA name for the server, displayed in the Code Studio screen and stored with PQs.Generated by using <server class>_<number>, where the number starts with 1 and increments by 1 for each server of a given class.
iris.db.<server number>.consoleGroupsList of ACL groups to which a user must belong for this server to be visible in the server list (see dispatcher usage restrictions for details).None

The dispatcher usage restrictions documentation includes scenarios where some of these properties are changed.

Temporary query queues

Temporary queries are used for PQs that run once and then get deleted, such as batch imports of historical data. The controller uses temporary query queues to run these temporary queries when resources are available. These queues are defined through controller properties, which can be dynamically reloaded.

As many temporary query queues can be defined as needed, and each one will have its own properties based on the queue's name. These properties define the resources that the temporary queue is allowed to consume. Both of the following properties are required for each temporary query queue:

PropertyDescription
PersistentQueryController.temporaryQueryQueue.<queue_name>.maxConcurrentQueriesThe maximum number of concurrent queries allowed to run on the named temporary query queue.
PersistentQueryController.temporaryQueryQueue.<queue_name>.maxHeapMBThe maximum heap in MB that the temporary queries are allowed to use while running on the temporary query queue.

These resource restrictions determine when the next temporary query can run on a queue. A query will only run if it does not cause the maximum concurrent queries or the maximum heap to be exceeded. If either limit is exceeded, the query will wait until sufficient resources are available. Queries are run in the order they were submitted to the queue.

The property PersistentQueryController.defaultTemporaryQueryQueue specifies the default temporary query queue used when a user selects temporary scheduling.

Below is a simple default configuration that defines a single queue, allowing one query to run at a time with a maximum heap of 20,000 MB.

PersistentQueryController.temporaryQueryQueue.DefaultTemporaryQueue.maxConcurrentQueries=1
PersistentQueryController.temporaryQueryQueue.DefaultTemporaryQueue.maxHeapMB=20000
PersistentQueryController.defaultTemporaryQueryQueue=DefaultTemporaryQueue

Persistent Query startup pool

The Persistent Query Controller may need to start a large number of PQs simultaneously at the beginning of a business day. It uses a thread pool to manage this process. Although extra threads are added as needed, it can be beneficial to increase the thread pool size on systems where the controller is expected to start and stop many PQs at once. The following properties control this thread pool:

PropertyDescriptionDefault
PersistentQueryController.queryStartThreadPoolCoreSizeThe minimum number of threads maintained for Persistent Query startup; the number of available threads will never drop below this value.20
PersistentQueryController.queryStartThreadPoolKeepAliveMinutesExtra threads added for query startup will be removed if they are idle for this number of minutes.10

For example, on a system with a lot of PQs that run every hour, PersistentQueryController.queryStartThreadPoolKeepAliveMinutes can be updated to ensure that threads remain available for an hour or more.

Query types

Query configuration types (such as Live Query (Script), Batch Query (RunAndDone), and Data Merge) are defined in an XML configuration file, which is used by the controller and web interface to understand how to handle each type of query. New query types can be added.

Warning

Modification of the existing Deephaven query types is not recommended.

The property iris.controller.configurationTypesXml defines a comma-delimited list of XML files that contain the query configuration type definitions. The default value is PersistentQueryConfigurationTypes.xml. This default file is overwritten during each installation so should not be edited unless instructed by Deephaven support.

Query type attributes

Each query type is defined in an XML ConfigurationType element, defining the following attributes:

AttributeDescriptionDefault
allowedGroupsA comma-delimited list that restricts owners of the query to the specified user groups. If a user is not a member of one of the specified groups or a superuser, the user will not be able to create a query of this type.None
displayableWhether or not the query type should be displayed in a console. If false, this query type is only displayed to superusers.true
enabledIf false, this query type is disabled. A disabled query type is not available to users.true
hasScriptDefines whether the query has a script. If a query does not have a script (hasScript="false"), then a script panel is not displayed when editing a query of this type.true
nameThe name of the query type, such as Live Query (Script).None
serverTypesAn optional comma-delimited list which restricts the server types on which a query can run.The default server types from the property iris.db.defaultServerClass, which defaults to Query.
stopTimeRequiredWhether scheduling of the query requires a stop time. Query types such as Live Query (Script) that run continuously require stop times, while query types such as Batch Query (RunAndDone) do not.true
supportsReplicasWhether the query type supports replicas.false

Some pre-defined groups are used for allowedGroups in standard Deephaven PQ types:

  • iris-superusers - The most privileged group; has all privileges.
  • iris-datamanagers - Can import, merge, and delete data.
  • iris-dataimporters - Can import data, but not merge or delete it.
  • iris-datavalidators - Can run data validation PQs.

Query sub-elements

Each ConfigurationType element can define the following sub-elements to further define behavior. The classes defined within these elements are dynamically created during the creation of workers:

  • <SetupQuery name="Java setup class" /> - This required element defines a Java class that will be used to create an instance of the query type. This class must extend the com.illumon.iris.db.tables.remotequery.ContextAwareRemoteQuery<com.illumon.iris.controller.PersistentQueryState> class. A query type is not valid without this setup class.
  • <ConfigChecker class="Java configuration checker class" /> - An optional Java configuration checker class that will be run to validate data before a query of this type can be saved. This class must implement the com.illumon.iris.controller.ConfigChecker interface. If it is not provided, no extra validation is performed on a query of this type before it is saved.

Persistent Query storage

All Persistent Queries created by users are stored by the controller in etcd. See the Persistent Query Backup and Restore Process runbook documentation for details on how to back up and restore PQs.

If the controller does not start up because of an error reading Persistent Queries, see the controller etcd access troubleshooting guide.

Git configuration

By default, the source code for a Persistent Query in the Script Editor tab is stored by the Persistent Query Controller as part of the PQ's configuration. However, with Git integration, a PQ's source code is optionally not stored in Deephaven but instead can be loaded directly from its associated Git repository.

Deephaven's use of Git for maintaining PQ scripts is entirely as a consumer. When a PQ is configured to use Git for the script source, the Controller will read the script file from the Git repository when the PQ starts, or to display in the UI when creating or editing the PQ configuration. Since Deephaven is a consumer of the Git-managed content, the script itself can not be edited in the Deephaven UI. To create or edit Git-managed PQ scripts, save them to Groovy or Python (.groovy or .py) files, and push them to the Git repository using a Git client such as the git command line tool or a Git-integrated IDE.

Warning

Controller Git repositories should not be stored on NFS.

Note

For more information on configuring PQs to use git for their scripts, see the Query Monitor documentation.

By default, Deephaven uses a local Git repository. To enable repository updates (i.e., git pull), you must configure the Persistent Query Controller to not use a local Git repository.

To change the default behavior, add the PersistentQueryController.useLocalGit setting. Set it to true to use a local repository as the script source. This will disable repository updates globally, regardless of each repository's updateEnabled setting. The default value is true, meaning Deephaven uses a local repository and does not check out a configured branch from the remote. Set it to false to allow the Deephaven Controller to clone and fetch from a remote Git server.

To enable Git integration, several properties must be set in the Deephaven controller's configuration. One global property must be set, followed by several properties for each repository.

The global property is:

iris.scripts.repos — a comma-separated list of Git repositories the controller should use.

The additional properties for each repository are listed below:

  • iris.scripts.repo.<repo_name>.groups — a comma-separated list of the Deephaven groups who may access the repository. Group access is configured with the ACL editor or the command-line dhconfig acl tool.
  • iris.scripts.repo.<repo_name>.updateEnabled — Set to true to automatically update the repository (i.e., run a git pull) once per minute. This helps ensure that when a query runs, it uses the most recent version of the script available in the repository's remote origin.
  • iris.scripts.repo.<repo_name>.branch — the Git branch to check out; if this is not set, the controller's PersistentQueryController.defaultBranch property value is used, or master if this is not defined.
  • iris.scripts.repo.<repo_name>.prefixDisplayPathsWithRepoName — If true, the "Choose Script" dialog of the Persistent Query Configuration Editor will include the repository's name next to each script path. This helps disambiguate scripts for users who have access to multiple repositories.
  • iris.scripts.repo.<repo_name>.root — the directory on the filesystem into which Deephaven will clone the Git repository. Each repository must have a distinct root directory. If a relative path is used, the path will be relative to the workspace directory of the Controller process. On Deephaven servers, this will normally be /db/TempFiles/irisadmin/iris_controller.
  • iris.scripts.repo.<repo_name>.paths — the paths to include, relative to the repository's root directory. Files in all other paths will not be available to PQs.
  • iris.scripts.repo.<repo_name>.uri — the SSH URI used to access the Git repository, such as git@github.com:my-git-org/myrepo.git. This can be empty if updateEnabled is false.
  • iris.scripts.repo.<repo_name>.remote — sets the name of the remote alias. Defaults to origin.
  • iris.scripts.repo.<repo_name>.resetGitLockFiles — whether the controller should reset if it finds Git locks when it starts a sync. Defaults to true. Only the controller should be running Git commands in the irisadmin git directory path, so lock files should only be left over if the controller is stopped during a Git operation, and it is beneficial to allow the controller to clear these locks automatically.

Git garbage collection is controlled by the following property:

  • PersistentQueryController.gitGcEnabled: If defined and set to false, this disables Git garbage collection.

Git authentication

If a keypair is being used to authenticate with the Git server, a common approach is to create the keypair in the .ssh directory under the irisadmin home directory, typically located at /db/TempFiles/irisadmin. Additionally, the Git server's host must be added to the known_hosts file for irisadmin. By default, the irisadmin user does not have an .ssh directory. Therefore, the complete process to set up irisadmin for SSH authentication to Git is as follows:

  1. sudo su - irisadmin to switch to the irisadmin user context (or any custom name being used).
  2. mkdir .ssh
  3. chmod 700 .ssh
  4. ssh-keygen -t rsa (accept all defaults - press Enter each time - to create a new key pair with no passphrase.)
  5. ssh-keyscan -t rsa <fqdn_of_git_server> >> .ssh/known_hosts
  6. cat .ssh/known_hosts to verify that the host from step 5 was added; if not, try ssh-keyscan -H <fqdn_of_git_server> >> .ssh/known_hosts.

If the .ssh directory already exists, then steps 1 through 3 are not needed.

Note

Since .ssh starts with a . for its name, it is a hidden directory which will not be displayed by default by ls. Use ls -a or ls -la to display all directories.

The newly created public key for the irisadmin account can then be added to the Git server to allow ssh authentication for Git commands:

  • cat ./ssh/id_rsa.pub to get the public key that needs to be provided to the Git server. This may be through a UI, for tools like GitLab, or, if the server is a simple Linux server, the public key can be appended to the .ssh/authorized_keys file for the Git user on the Git server.

Warning

Since the controller's Git key is accessible to anybody with access to the irisadmin account, it should be set up on the Git server to be read-only.

Example configuration

The example configuration below configures Deephaven to read scripts from three repositories: team1, team2, and shared. All users can access scripts in the shared repository, but the team1 and team2 repositories are restricted to specific users. All three repositories use a branch called master. Also, the shared repository uses a different Git server than the other two.

PersistentQueryController.useLocalGit=false

iris.scripts.repos=shared,team1,team2

iris.scripts.repo.shared.groups=*
iris.scripts.repo.shared.updateEnabled=true
iris.scripts.repo.shared.branch=master
iris.scripts.repo.shared.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.shared.root=../git/shared
iris.scripts.repo.shared.paths=IrisQueries/groovy,IrisUtils/groovy
iris.scripts.repo.shared.uri=git@git.mycompany.net:common-libs/shared.git

iris.scripts.repo.team1.groups=user1,user2,user3
iris.scripts.repo.team1.updateEnabled=true
iris.scripts.repo.team1.branch=master
iris.scripts.repo.team1.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.team1.root=../git/team1
iris.scripts.repo.team1.paths=IrisQueries/groovy
iris.scripts.repo.team1.uri=git@gitlab.mycompany.net:team1/team1.git

iris.scripts.repo.team2.groups=user1,user4,user5
iris.scripts.repo.team2.updateEnabled=true
iris.scripts.repo.team2.branch=master
iris.scripts.repo.team2.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.team2.root=../git/team2
iris.scripts.repo.team2.paths=IrisQueries/groovy
iris.scripts.repo.team2.uri=git@gitlab.mycompany.net:team2/team2.git

Troubleshooting

In most cases, if updates are enabled and there is a Git-related problem, the Deephaven controller process fails to start or shuts down shortly after starting. Details of such issues will be logged to the process startup log: /var/log/deephaven/iris_controller/iris_controller.log.<yyyy-mm-dd>.

If a non-fatal Git error occurs, this is generally logged to the controller process log: /var/log/deephaven/iris_controller/PersistentQueryController.log.<yyyy-mm-dd-hhmmss.mmm+/-hhmm>.

Note that ssh is used for authentication, but the URI should not include ssh:// as a prefix. https is not currently supported.

ssh -v <user>@<fqdn_of_git_server> may provide additional diagnostic information about the connection and authentication processes.

Other properties

Other properties related to the controller's operation follow. These parameters are not reloadable.

PropertyDescriptionDefault
critEmailThe email distribution list to which critical alerts will be sent (currently, the only critical alert is for a hung script update job, which is used to refresh scripts from Git).None
iris.authentication.keyfileThe keyfile used to authenticate the controller to the dispatchers./etc/sysconfig/deephaven/auth/priv-iris.base64.txt
PersistentQueryController.binaryLogTimeZoneThe time zone that determines column partition values for the controller's data (PersistentQueryStateLog and PersistentQueryConfigurationLogV2).The server's default time zone.
PersistentQueryController.defaultMaxHeapSizeGBThe maximum heap the controller will allow for a query, in GB; further information on controlling worker heap size can be found in the worker heap size documentation.1024
PersistentQueryController.keyPairFileThe keypair file used to encrypt sensitive information for the controller's use - this file should not be visible to users of the system./etc/sysconfig/illumon.d/auth/priv-controllerConsole.base64.txt
PersistentQueryController.portThe port on which the controller listens for client connections, from user clients or configuration tools.20126
queryScheduler.restartWhenRunningThis defines the default value populated in the Persistent Query scheduler’s “Restart when running” option; either Yes or No.Yes

In addition, the controller's logging behavior can be changed with the standard logging parameters. See the Deephaven Operations Guide Log-Related Properties section for further details.

Initial startup configuration

When the controller starts for the first time on any Deephaven installation, it must create helper queries to assist with Deephaven operations. Properties are used to configure these queries. Changing these properties is not recommended unless directed by Deephaven support.

Revert Helper Query

The Revert Helper query assists Deephaven when a query is reverted to a previous version. The following parameters are used when creating the initial revert helper query (the first time the controller is run), along with their default values:

  • revertHelper.queryOwner=<superuser>: The owner of the revert helper query; this must be a superuser.
  • revertHelper.queryName=RevertHelperQuery: The name of the query.
  • revertHelper.dbServer=Query_1: The server on which the revert helper will run. It should be a server with a type of Query; if custom-named servers are used, this should reflect a named query server.
  • revertHelper.heapSize=1: The heap size in GB of the helper query.

The following parameter defines how far back the revert helper looks when a user requests to revert a query to a previous version:

  • revertHelper.lookbackDays=180

The following parameter defines the number of seconds a request to revert a query will wait for a response from the helper before displaying an error:

  • revertHelper.waitQuerySeconds=30

Import Helper Query

The Import Helper query assists with import, merge, and validation queries. The initial query-creation parameters have the same meanings as for the revert helper:

  • importHelper.queryOwner=<username>
  • importHelper.queryName=ImportHelperQuery
  • importHelper.dbServer=Merge_1: The import helper should run on a server with a type of Merge.
  • importHelper.heapSize=1

Web Client Data Query

The Web Client Data query is used by the web_api_service to retrieve information related to web sessions. The initial query-creation parameters have the same meanings as for the revert helper:

  • webClientData.queryOwner=<username>
  • webClientData.queryName=WebClientData
  • webClientData.dbServer=Query_1
  • webClientData.heapSize=1