Scaling to multiple servers

Deephaven is designed with horizontal scaling in mind. This means adding capacity for services typically involves adding additional compute resources in the form of servers or virtual machines.

When considering how to scale Deephaven, the types of capacity needed should be determined before adding resources.

Scaling capacities

  1. Read capacity
  • Queries requiring access to very large data sets might require more IO bandwidth or substantial SAN caching.
  • Add more db_query_server processes with added IO bandwidth.
  1. Write capacity
  • Handling multiple near real-time data streams quickly (through the Deephaven binary log/tailer/db_dis architecture).
  • Expanding IO for batch data imports or merges.
  • Large batch imports of historical data.
  1. Query capacity (Memory and CPU)
  • Queries may take large amounts of memory to handle large amounts of data, a lot of CPU to handle intensive manipulation of that data, or both.
  • Support for more end-user queries.
  • Support for analysis of large datasets that may consume smaller server's resources.

Server types

  1. Query servers
  • Increase read and analysis capacity of the database for end-users.
  • Large cpu and memory servers.
  • Run the db_query_server and the spawned worker processes.
  1. Real-time import servers
  • Increase write capacity for importing near real-time data.
  • Increase write capacity for merging near real-time data to historical partitions.
  • Run the db_dis and db_ltds processes.
  1. Batch import servers
  • Increase write capacity for batch data imports to intraday partitions.
  • Increase write capacity for merging batch data to historical partitions.
  • Separate the batch data import function away from the near real-time data imports.
  1. Administrative Servers
  • Run administrative Deephaven processes, such as the Deephaven controller, the authentication server, and the ACL write server
  • May have additional requirements, such as access to MySQL.

A single host may run multiple server types. For example, the administrative processes do not require many resources, and consequently may be co-located with other services.

Single Server Architecture

The Deephaven Installation guide illustrates the provisioning of a server with a network interface and the loopback interface (network layer), local disk storage (storage layer), and the Deephaven software (application layer) to create a single server Deephaven deployment. The guide then used this single server to perform basic operations by connecting to Deephaven from a remote client.

With that in mind, this single-server installation looks like the following:

img

This fully functional installation of Deephaven is limited to the compute resources of the underlying server. As the data import or query capacity requirements of the system grow beyond the server's capabilities, it is necessary to distribute the Deephaven processes across several servers, converting to a multiple server architecture.

Multiple server architecture

The multiple server architecture is built by moving the network and storage layers outside the server, and by adding hardware to run additional data loading or querying processes. The network layer moves to subnets or vlans. The storage layer becomes a shared filesystem that is mounted across all servers. The application layer becomes a set of servers with the Deephaven software installed and configured to perform certain tasks.

img

Deploying Deephaven on multiple servers has the following requirements:

  1. Modification of each layer in the three-layer model.
  2. Modification and management of the Deephaven configuration.
  3. Modification and management of the monit Deephaven configuration.

The network layer

The network and necessary services (DNS, NTP, etc.) is supplied by the customer. Therefore, this document will not cover configuration or implementation of the network except to specify the network services on which Deephaven depends.

Deephaven requires a subnet or VLAN for the servers to communicate. Like all big data deployments, fast network access to the data will benefit Deephaven in query and analysis speed. If using FQDNs in the Deephaven configuration, a robust DNS service is also recommended.

It is recommended that the network layer provide the fastest possible access to the storage layer.

The storage layer

The storage layer has the most changes in a multiple server installation. Deephaven requires access to a shared filesystem for historical and intraday data that can be mounted by each server in the Deephaven environment.

Typically, these disk mounts are provided via the NFS protocol exported from a highly available storage system. Other types of distributed or clustered filesystems such as glusterFS or HDFS should work but have not been extensively tested.

As mentioned, Deephaven relies on a shared filesystem architecture for access to its data. Currently, the larger installations use ZFS filers to manage the storage mounts that are exported to the query and data import servers via NFS over 10g network interfaces.

Deephaven divides data into two categories:

  • Intraday data is any data that hasn't been merged to historical storage partitions, including near real-time as well as batch-imported data. Depending on the data size and rate, the underlying storage should typically be high-throughput (SSD) or contain sufficient caching layers to allow fast writes. Intraday data is written to the database via the Data Import Server or other import processes onto disks mounted on /db/Intraday/<namespace>/.
  • Historical data is populated by performing a merge of the Intraday data to the historical file systems. As storage needs grow, further storage can be easily added in the form of writable partitions without the need to reorganize existing directory or file structures, as Deephaven queries will automatically search additional historical partitions as they are added. Intraday data is typically merged to Historical data on the Data Import Servers during off hours by scheduled processes. The historical disk volumes are mounted into two locations on the servers:
    1. For writing: /db/Systems/<databaseNamespace>/WritablePartitions/[0..N]
    2. For reading: /db/Systems/<databaseNamespace>/Partitions/[0..N]

The Deephaven application layer

The software installation procedure for each server does not change from that posted in the Deephaven installation guide. Once the server is deployed, the network storage should be mounted on all the servers. Once this is done, the main configuration of Deephaven must now be managed and the Monit configuration modified to only run processes for each server type.

Users and groups

Deephaven requires three unix users and five unix groups that are created at install time by the iris-config rpm.

The uid and gids associated with the users and groups need to be created on the shared storage to allow them to read and write to the mounted file systems.

Group NamegidMembers
dbquery9000dbquery
dbmerge9001dbmerge
irisadmin9002irisadmin
dbmergegrp9003dbquery, dbmerge, irisadmin
dbquerygrp9004dbquery, dbmerge, irisadmin
User Nameuid
dbquery9000
dbmerge9001
irisadmin9002

Deephaven server types

In the server types section we described four server types: Data Import Server, Batch Import Server, Query Server and an Infrastructure Server. Each of these server types is defined by the Deephaven processes they run and defined by the hardware resources required for each. You can also easily combine server types into a single server to provide both sets of functionality given sufficient hardware resources. For instance, it is common to have administrative servers combined with a data import server.

As an example, a query server will require more CPU and memory than an administrative server, which requires a smaller CPU and memory footprint.

Example server types and associated resources are outlined below.

Administrative server

Server Hardware and Operating System

  • 6 cores (2 core minimum) @ 3.0 Ghz Intel
  • 10GB RAM
  • Linux - RedHat 7 or derivative
  • Network: 1g
  • Storage:
    • Root - sufficient local disk space

Processes

  • MySql - ACL database
  • authentication_server - Securely authenticates users on login
  • db_acl_write_server - Serves as a single writer for ACL information stored within the ACL database
  • iris_controller - Specialized client that manages persistent query lifecycles, and provides discovery - services to other clients for persistent queries
Real-time import server

Server Hardware and Operating System

  • 24 cores (6 cores minimum) @ 3.0 Ghz Intel
  • 256GB RAM (32GB minimum)
  • Linux - RedHat 7 or derivative
  • Network: 10g mtu 8192
  • Storage:
    • Root - sufficient local disk space
    • Database mount - 250GB SSD mirror local
    • Intraday data - SSD either local or iScsi
    • Historical data - NFS mounts for all historical data

Processes

  • db_dis - receives binary table data from user processes and writes it to user namespaces, simultaneously serving read requests for same
  • db_ltds - serves read requests for local data, used for serving intraday data not managed by db_dis
  • Tailer1
    • The tailer1 application sends data from the processes in the stack to the db_dis for the DbInternal namespace.
    • A tailer application is used to read data from binary log files and send them to a db_dis process.
    • Tailer configuration is covered in Importing Data.
Batch import server

Server Hardware and Operating System

  • 24 cores (6 cores minimum) @ 3.0 Ghz Intel
  • 256GB RAM (32GB minimum)
  • Linux - RedHat 7 or derivative
  • Network: 10g mtu 8192
  • Storage:
    • Root - sufficient local disk space
    • Database mount - 250GB SSD mirror local
    • Intraday data - SSD either local or iScsi
    • Historical data - NFS mounts for all historical data

Processes

  • Batch import processes run via cron or manually to load intraday data from external data sources such as CSV files
  • Validation and merge processes run via cron or manually to validate intraday data and add it to the historical database
  • Scripts to delete intraday data after it has been successfully merged
Query server

Server Hardware and Operating System

  • 24 cores (6 cores minimum) @ 3.0 Ghz Intel
  • 512GB RAM (32GB minimum)
  • Linux - RedHat 7 or derivative
  • Network: 10g mtu 8192
  • Storage:
    • Root - sufficient local disk space
    • Database mount - 250GB SSD mirror local
    • Historical data - 16 NFS mounts @ 35TB each

Processes

  • db_query_server - manages query requests from the client and forks db_query_worker processes to do the actual work of a query

Adding a dedicated DIS for Centrally Managed User Tables

One option Deephaven provides for user tables are intraday user tables written by a Data Import Server (Centrally Managed User Tables (Legacy) or Live Partioned User Tables (Core+)). While historical user tables are directly written to disk by the worker, DIS-written user tables are written by the worker passing binary log rows to the Log Aggregation Server (LAS). The LAS then writes them to disk, where a tailer picks them up and streams them to a DIS. Besides processing the binary log rows into table rows on disk, the DIS also serves these updates to subscribers, which allows for ticking user tables, and also for multiple appends to user tables.

In environments where there is significant use of DIS-written user tables, it may be desirable to add a dedicated Data Import Server to avoid contention for DIS resources between system table log file processing and intraday user table operations. Note that this is only a concern for Centrally Managed User Tables (which use appendCentral and related commands) or Live Partitioned User Tables (which use appendLiveTable and related commands).

Adding the DIS process

The assumption is that the dedicated DIS server is some non-"infra" server in the Deephaven cluster -- most likely a query server with sufficient local storage for Centrally Managed User Tables. Like the regular system DIS, the dedicated server requires locally-attached high-speed storage for storing user table data; NFS should not be used for this.

The first step in setting up this dedicated server is to install or activate a DIS process on the server. This can be done during the cluster installation or upgrade process by adding DH_NODE_<node_number>_ROLE_DIS=true for the node running the dedicated DIS to the cluster.cnf file. This activates a DIS on the selected server when the installer's master_install.sh is run.

If the dedicated DIS server is an already existing Deephaven server, activate the DIS process by:

  1. Renaming db_dis.disabled to db_dis.conf on the dedicated server, under /etc/sysconfig/deephaven/monit/.
  2. Running /usr/illumon/latest/bin/dh_monit reload.

/usr/illumon/latest/bin/dh_monit summary should now show a db_dis process.

Configuring the DIS process

  1. Edit the routing YAML to:
  • Change the dh-rta anchor to point to the new node:
- &dh-rta <FQDN of dedidicated server for DIS-written user tables>
  • Add a filter for System tables to the default DIS.
  • Add an entry for the dedicated DIS, including a filter for User tables.

See Edit Routing Configuration for details on editing the routing configuration.

  dataImportServers:
    # The primary data import server
    db_dis:
      storage: default
      endpoint:
        host: *dh-import   # reference the address defined above for "dh-import"
        tailerPort: *default-tailerPort
        tableDataPort: *default-tableDataPort
      userIntradayDirectoryName: "IntradayUser"
      webServerParameters:
        enabled: true
        port: 8086
        authenticationRequired: false
        sslRequired: false
      properties:
        # re-add the default properties (omit the <<: if you don't want the defaults)
        <<: *DIS-defaultProperties
        ## any additional properties for this DIS
      filters:
        namespaceSet: System

    # The dedicated RTA data import server
    db_rta:
      storage: default
      endpoint:
        host: *dh-rta   # reference the address defined above for "&dh-rta"
        tailerPort: *default-tailerPort # reference the port defined above for "default_tailerPort"
        tableDataPort: *default-tableDataPort # reference the port defined above for "default_tableDataPort"
      userIntradayDirectoryName: "IntradayUser"
      # handle Online User tables
      webServerParameters:
        enabled: true
        port: 8086
        authenticationRequired: false
        sslRequired: false
      properties:
        # re-add the default properties (omit the <<: if you don't want the defaults)
        <<: *DIS-defaultProperties
      filters:
        namespaceSet: User

Note the change of port for db_rta to 8086, since 8084 is now used by the db_query_server.

  1. On the dedicated server, edit illumon.iris.hostconfig (located in /etc/sysconfig/illumon.confs/) and add the EXTRA_ARGS details below. (This is so the dedicated DIS still gets config properties that apply to DISes and routing for db_rta.)
    db_dis)
        EXTRA_ARGS="$EXTRA_ARGS -j -Dprocess.name=db_rta"
        ;;
  1. On the dedicated server, edit /etc/sysconfig/deephaven/monit/db_dis.conf and change the first line as below (so monit can find the right PID file):
check process db_dis with pidfile /etc/deephaven/run/db_rta.pid
  1. Run /usr/illumon/latest/bin/dh_monit reload to update monit's configuration.

  2. Run /usr/illumon/latest/bin/dh_monit restart all on all servers.