Deephaven status dashboard

Deephaven includes a status dashboard process that provides a Prometheus interface, providing data that a Prometheus installation can scan. Installation of Prometheus is not detailed here, but you can refer to the Prometheus GitHub page for instructions.

Application Prometheus Configuration

The following properties (with the shown defaults specified in iris-defaults.prop) specify the basic operation of the Deephaven status dashboard process. The default address on which the dashboard provides data is https://<server's fqdn>>:8112/.

Property NameProperty MeaningDefault Value
StatusDashboard.prometheus.portThe port on which the Prometheus data is exposed.8112
StatusDashboard.prometheus.namespaceThe Prometheus namespace to be used for the data.Deephaven

SSL and Authentication

By default, the Prometheus web interface uses SSL and requires authentication. The user must log in with a valid Deephaven user, who must be either a superuser or in a group specified by the StatusDashboard.allowedGroups property. If authentication is required, then SSL must be used.

Property NameProperty MeaningDefault Value
StatusDashboard.useSslIf true, the Prometheus interface uses https.true
StatusDashboard.useAuthenticationIf true, the Prometheus interface requires authentication.true
StatusDashboard.allowedGroupsIf authentication is enabled, the user must be a superuser or a member of one of these groups.dashboard

Prometheus Node Exporter

The Prometheus node exporter provides status on various aspects of the host's health, such as available disk space and CPU utilization. Installation and configuration of the node exporter is beyond the scope of Deephaven documentation; a good starting point is the Prometheus node exporter GitHub page. The example dashboard discussed below assumes that a node exporter is configured and running on the server. If the Deephaven installation contains more than one server, then a node exporter should be run on each one.

Prometheus Server

To access the Prometheus interface effectively, configure a Prometheus server. A Prometheus server is usually set up on a system other than the one being monitored. It should be configured to scrape the Deephaven status dashboard's data and any node exporters that have been configured. Instructions for installing and running Prometheus can be found at the Prometheus documentation.

An example Prometheus configuration file is provided in the Deephaven installation in /usr/illumon/latest/etc/prometheus.yml. Copy it somewhere that it won't be overridden on each upgrade, wherever Prometheus is running. This isn't needed if an existing Prometheus installation is being used.

The Prometheus configuration YAML must be edited to point to the appropriate locations. Alternatively, if an existing Prometheus installation is being modified, the file can serve as an example of options to add to the existing file.

  • Update the targets to point to your server.
  • Update the username to match the user you're using for dashboard authentication. Choose a user created explicitly for the dashboard process, not a default superuser.

Put the password into a file and update password_file in prometheus.yml to point to that file. This file should be owned by the user that will run Prometheus.

Ensure that the file is only visible to the user running Prometheus. For example, as the user that runs Prometheus:

chmod 700 <location of password file>

Run Prometheus. For example, if you're in the directory where Prometheus is installed and logged in as the user under which it was installed:

./prometheus --config.file=<location of Prometheus configuration yaml file>

Grafana

Grafana is often used to provide visual representations of Prometheus data. Further information, including installation details, can be found at the Grafana website. Once it's installed and running, it can be accessed through a web browser, typically on the address http://<fqdn of grafana server>:3000.

A simple example dashboard is provided in /usr/illumon/latest/etc/grafanaDashboard.json, which relies on the node exporter, as well as the Persistent Query data example below. To use it:

  1. Set up your data source using the Grafana web GUI. The data source is the location of the Prometheus server, and will be something like http://<fqdn of prometheus server>:9090.
  2. Find the uid of that data source. You should see this in the URL of the Grafana page where you are editing the data source. For example, http://localhost:3000/datasources/edit/35EH6Z84k indicates that the uid is 35EH6Z84k.
  3. Update all the uid fields in the example Grafana JSON file to be the uid of your data source.
  4. In the Grafana Dashboards menu, select Import and use the edited JSON file.

Configuration

The status dashboard uses JSON configuration files to determine which certificates, persistent queries, and tables to monitor. These files are provided in a comma-delimited list with the property StatusDashboard.configuration.files. For example:

StatusDashboard.configuration.files=status-dashboard-defaults.json,status-dashboard-custom.json

status-dashboard-defaults.json is provided by Deephaven and should not be edited. Additional files can be imported like any other property files with the dhconfig utility.

Each JSON file uses the following format. The formats of each monitor entry type is described below.

{
  "DashboardMonitors": {
    <monitor entries>
  }
}

As many entries as needed can be placed in a single file as long as the JSON format is maintained.

Monitored Processes

Persistent Query Controller

The status dashboard provides automatic monitoring of the Persistent Query controller, which should always be running. Once it's connected to the controller, it will start monitoring any remote query dispatchers the controller knows about.

Remote Query Dispatchers

Once the status dashboard has connected to the Persistent Query controller, it monitors all the dispatchers configured in the controller.

Persistent Query Status

The dashboard can monitor the state of any Persistent Query (i.e., determine whether it's running/completed or not). This is controlled by JSON configuration using the following properties in a PQMonitors block.

Property NameProperty Meaning
NameThe name of this monitor. This is only used for logging.
PqOwnersAn array containing the owners of the persistent queries to be monitored.
PqNamesAn array containing the names of the persistent queries to be monitored.
PqNameMatchesAn array containing regular expressions used to match persistent queries to be monitored.
PrometheusPublisherPrefixAn optional prefix added to the Prometheus gauge name

The following example (from the default configuration) monitors the helper queries with a gauge prefix of PQ_. The final comma is because there is an additional entry following this PQMonitors entry.

    "PQMonitors": [
      {
        "Name": "HelperQueries",
        "PrometheusPublisherPrefix": "PQ_",
        "PqNames": ["WebClientData", "RevertHelperQuery", "ImportHelperQuery", "TelemetryHelperQuery"]
      }
    ],

The following example adds two monitors.

  • The first monitors all persistent queries owned by user1 with the gauge name prefix USER1_.
  • The second monitors all persistent queries that have names ending in DataGenerator.
    "PQMonitors": [
      {
        "Name": "User1",
        "PqOwners": ["user1"],
        "PrometheusPublisherPrefix": "USER1"
      },
      {
        "Name": "AllDataGenerators",
        "PqNameMatches": ["^.*DataGenerator$"]
      }
    ]

Persistent Query Data

The status dashboard can monitor data latency by subscribing to a Persistent Query's data. The Persistent Query must publish a timestamp column containing the time of the table's most recent update. The following example script does this for the Process Event Log and Audit Event Log tables.

Core+

import io.deephaven.engine.util.SortedBy

PELWatch = SortedBy.sortedLastBy(db.liveTable("DbInternal", "ProcessEventLog").where("Date=today()").view("Timestamp"), "Timestamp")
AELWatch = SortedBy.sortedLastBy(db.liveTable("DbInternal", "AuditEventLog").where("Date=today()").view("Timestamp"), "Timestamp")

Enterprise

import com.illumon.iris.db.util.SortedBy

PELWatchBase=SortedBy.sortedLastBy(db.i("DbInternal", "ProcessEventLog").where("Date=currentDateNy()").view("Timestamp"), "Timestamp")
AELWatchBase=SortedBy.sortedLastBy(db.i("DbInternal", "AuditEventLog").where("Date=currentDateNy()").view("Timestamp"), "Timestamp")

PELWatch=PELWatchBase.preemptiveUpdatesTable(1000)
AELWatch=AELWatchBase.preemptiveUpdatesTable(1000)

The status dashboard uses JSON configuration files to determine what data to publish for Prometheus to scrape. Several options are available, and at least one Persistent Query restriction (Persistent Query owners, names, or name-matches) must be provided, as well as at least one table restriction (table names or name-matches).

Property NameProperty Meaning
NameThe name of this monitor. This is only used for logging.
PqOwnersAn array containing the owners of the persistent queries to be monitored.
PqNamesAn array containing the names of the persistent queries to be monitored.
PqNameMatchesAn array containing regular expressions used to match persistent queries to be monitored.
TableNamesAn array containing the names of the tables to be monitored.
TableNameMatchesAn array containing regular expressions used to match table names to be monitored.
TimestampColumnNameThe name of the column containing the timestamp. If this isn't provided, the column name Timestamp will be used.
JobIntervalMillisThe number of milliseconds between examining the published data. If this isn't provided, a default of 30 seconds is used.
PrometheusPublisherPrefixAn optional prefix added to the Prometheus gauge name

The following example monitors any iris-owned persistent queries with names ending in DataLagWatcher, lookng for table names ending in Watch, publishing the data every five seconds. If the example scripts were saved with appropriate names, this would monitor their tables and create gauges for them.

    "DataMonitors": [
      {
        "Name": "InternalTables",
        "PqOwners": ["iris"],
        "PqNameMatches": ["^DataLagWatcher.*"],
        "TableNameMatches": [".*Watch$"],
        "JobIntervalMillis": "5000"
      },
    ]

Certificate Expiration

The status dashboard monitors the number of days until certificates expire, and this is controlled by JSON files. It's a simple property list, each entry containing the gauge name and the certificate property prefix. The interval between certificate checks can also be defined globally. For example:

    "CertificateMonitors": {
      "MonitoredCertificates": {
        "webcert": "StatusDashboard.tls",
        "authservercert": "authserver.tls",
        "configservercert": "configuration.server"
      }
    },
    "CertificateJobIntervalHours": "1"

Standard Deephaven properties are then used to retrieve the certificate information. For the webcert example, the expected p12 file and passhprase file are defined by the StatusDashboard.tls value.

    StatusDashboard.tls.keystore=/etc/sysconfig/illumon.d/auth-user/webServices-keystore.p12
    StatusDashboard.tls.passphrase.file=/etc/sysconfig/deephaven/auth-user/.webapi_passphrase

Similar keystore and passphrase files are provided for the authservercert and configservercert entries. See public and private keys for more information on these files and properties.

The status dashboard also monitors the root certificate as defined by the following properties (shown with their default values):

tls.truststore=/etc/sysconfig/illumon.d/resources/truststore-iris.p12
tls.truststore.passphrase.file=/etc/sysconfig/illumon.d/resources/truststore_passphrase

Envoy Configuration

If you're using Envoy, the installer should set up any required properties for the status dashboard. Properties for the node exporter are not automatically added, but here's an example that can be added to iris-environment.prop. Each node in an Envoy cluster running a node exporter will need its own set of properties using its own FQDN.

[service.name=configuration_server] {
    envoy.xds.extra.routes.node_exporter1.host=<host's FQDN>
    envoy.xds.extra.routes.node_exporter1.port=9100
    envoy.xds.extra.routes.node_exporter1.prefix=/node_exporter/
    envoy.xds.extra.routes.node_exporter1.prefixRewrite=/
    envoy.xds.extra.routes.node_exporter1.tls=false
    envoy.xds.extra.routes.node_exporter1.exactPrefix=false
}