Develop an R Client Query

Welcome to the Client APIs section of the crash course! So far, you've learned how to use Deephaven from the server itself. If you missed any of the previous sections, check out the links below:

This section of the crash course covers Deephaven Enterprise's client APIs. This guide discusses the R client, which allows you to:

  • Create new workers
  • Interact with tables and objects on the server
  • Connect to existing Persistent Queries (PQs)
  • Create PQs
  • Run queries server-side
  • And more!

The Enterprise R client is built on top of the Community R client, giving you access to its rich feature set.

Installation

Download the R client artifact

Note

To follow these steps, you will need a Deephaven Jfrog login. If you do not have a login, contact someone in your organization who can perform the download for you.

First, download the R client artifact for your version of Deephaven Enterprise from jfrog.io. Navigate to:

https://illumon.jfrog.io/ui/repos/tree/General/libs-customer/io/deephaven/enterprise/dhe-r-src/<dh_version>/dhe-r-src-<dh_version>.tgz

dh_version is the Deephaven Enterprise version you are running. For example, with Deephaven Enterprise version 1.20240517.298, you will navigate to https://illumon.jfrog.io/ui/repos/tree/General/libs-customer/io/deephaven/enterprise/dhe-r-src/1.20240517.298/dhe-r-src-1.20240517.298.tgz to download the artifact.

After navigating to the site, look in the upper right corner for the download icon. It can be easy to miss. The image below highlights the icon. img

Build the R Client

The first step in building the client is unpacking the source code:

mkdir build_r_client
cd build_r_client
cp /path/to/dhe-r-src-1.20240517.298.tgz .
tar -zxvf dhe-r-src-1.20240517.298.tgz

Next, follow the detailed instructions for building your version at ./coreplus/R/rdnd/README.md.

Create a session

The first and most important step when using the R client is to create a session. A session is a connection to the Deephaven server that allows you to authenticate with the server, create tables, PQs, and more. A SessionManager is an R6 object that creates a session.

The SessionManager constructor takes two arguments:

  1. A descriptive name that will be used on the server and client sides to log information about operations performed by this client.
  2. A URL pointing to a file in JSON format containing information about connectivity parameters for the Deephaven installation. A Deephaven Enterprise installation provides such a file under iris/connection.json.

The descriptive name can, in principle, be any string; however, it is important to select a value that allows the client to be distinguished in client logs. One way to do this is to include the process ID where the client is created. Below, this is done with Sys.getpid().

Note

The client will append information about the host where this client is running, so there is no need to include host information in descriptive_name.

library("rdnd")
library("rdeephaven")

# Port 8000 is the port for Deephaven servers with Envoy, but 8123 is the default for those without.
connection_json <- "https://host.mydomain.com:8000/iris/connection.json"

# sm is an R6 SessionManager object.
sm <- SessionManager$new(
        descriptive_name=paste0("R session manager pid=", Sys.getpid()),
        source=connection_json)

Tip

To see all of the methods on SessionManager, run ?SessionManager.

Authenticate

There are two ways to authenticate with the server.

First, you can use your Deephaven username and password.

user <- "username"
password <- "password"

auth_result <- sm$password_authentication(user = user, password = password, operate_as = user)

Note

operate_as is only relevant for system administrators. It allows the administrator to operate as another user. Under normal circumstances user and operate_as are provided the same value.

Second, you can use a private key stored in a file.

auth_result <- sm$private_key_authentication(private_key_filename)

Now that you're logged in, you can start doing things with Deephaven!

Use the client

Create a new worker

A common use case for the client is to create a new worker and run queries on it. The following code creates a new temporary worker with 4GB of memory and connects to it:

new_pq_name <- "my_new_pq"

# pq_config_builder is an R6 PqConfigBuilder object
pq_config_builder <- sm$make_temp_pq_config_builder(new_pq_name)
pq_config_builder.set_max_heap_size_gb(4);

session <- $sm$add_query_and_connect(pq_config_builder$build());

img

Tip

  • To see all of the options for configuring a persistent query, run ?PqConfigBuilder.
  • To see all of the methods on session, run ls(session).

Run queries on the worker

Tables in the R client are handles that reference tables that live on the server. As such, the following operations happen on the server:

  • Load and create tables.
  • Perform calculations on tables.

In the following example, t_static and t_ticking are merely references, so the tables and operations happen server-side.

t_static <- session$historical_table("LearnDeephaven", "StockTrades")

t_ticking <- session$
   live_table("DbInternal", "ProcessEventLog")$
   tail(100)$
   reverse()

print(paste("Static table:", t_static))
print(paste("Ticking table:", t_ticking))

img

These tables can be queried, and the resultant table can be pulled into R as a dataframe.

t_query <- t_static$where(c("Date = `2017-08-25`"))$view(c("USym", "Dollars = Last * Size"))$avg_by("USym")

df <- t_query$as_data_frame()
print(df)

Output:

   USym   Dollars
1  AAPL  33786.52
2   AXP  13724.41
3   BAC  12419.70
4  CSCO   8825.76
5  GOOG  48514.16
6   IBM  17168.40
7  INTC  10852.20
8  MSFT  13295.24
9   PFE  12523.75
10  XOM  15053.68

Warning

R dataframes do not support ticking data. as_data_frame must be called every time you want a new snapshot of a Deephaven server table in R.

The client session can also run queries on the server:

my_query <- paste(
  "from deephaven import time_table",
  "from deephaven import empty_table",
  "static_table = db.historical_table(namespace='LearnDeephaven', table_name='StockTrades')",
  "ticking_table = db.live_table(namespace='DbInternal', table_name='ProcessEventLog').tail(100).reverse()",
  sep = "\n"
)

session$run_script(my_query)

Warning

The code provided to run_script must match the script language configured for the PQ at creation. To set the script language for a new PQ, use the set_script_language method of PqConfigBuilder. This method accepts a string argument of either "groovy" or "python".

The client session can get a handle to a server table by using the table name:

t_by_name <- client$open_table("ticking_table")
print(paste("Ticking table has", t_by_name$size, "rows."))

Tip

To see all of the options for queries on a table, run ?rdeephaven::TableHandle.

Create a new table on the server

You may have an R dataframe that you want to use in a Deephaven query.
To do this, you will create a table on the Deephaven server from the local R dataframe.

Let's start by creating a dataframe:

   rows <- 10

   df_local <- data.frame(
     # A time column
     T = seq.POSIXt(as.POSIXct(Sys.Date()), as.POSIXct(Sys.Date() + 30), by = "1 sec")[rows],
     # A boolean (logical) column
     B = sample(c(TRUE, FALSE), rows, TRUE),
     # An int (integer) column
     I1 = sample(1:rows),
     # Anoter int (integer) column
     I2 = sample(1:rows)
   )

   print(df_local)

Output:

                        T     B I1 I2
   1  2023-09-23 00:00:09 FALSE  7  7
   2  2023-09-23 00:00:09 FALSE  8  4
   3  2023-09-23 00:00:09  TRUE  6  1
   4  2023-09-23 00:00:09 FALSE  9  6
   5  2023-09-23 00:00:09  TRUE  5  8
   6  2023-09-23 00:00:09  TRUE 10  2
   7  2023-09-23 00:00:09 FALSE  2 10
   8  2023-09-23 00:00:09 FALSE  1  3
   9  2023-09-23 00:00:09  TRUE  4  9
   10 2023-09-23 00:00:09  TRUE  3  5

Now the dataframe can be used to create a table on the Deephaven server.

t_new = session$import_table(df_local)

t_new is a table handle that references the new table on the server and can be used in queries.

   t_query_2 = t_new$update("S = I1 + I2")$group_by("S")
   df2 = t_query_2$as_data_frame()
   print(df2)

Output:

      S                                  T                  B       I1       I2
   1 14                         1695427209              FALSE        7        7
   2 12 1695427209, 1695427209, 1695427209 FALSE, TRUE, FALSE 8, 10, 2 4, 2, 10
   3  7                         1695427209               TRUE        6        1
   4 15                         1695427209              FALSE        9        6
   5 13             1695427209, 1695427209         TRUE, TRUE     5, 4     8, 9
   6  4                         1695427209              FALSE        1        3
   7  8                         1695427209               TRUE        3        5

Connect to an existing PQ

Instead of creating a new temporary worker, you can also connect to an existing PQ.
In the Create a new worker section, you created a new worker. Let's create a new client session to connect to that worker and get a table created by the script executed on the worker:

pq_session <- sm$connect_to_pq_by_name(new_pq_name)

my_ticking_table <- pq_session$open_table("ticking_table")
print(paste("Ticking table has", my_ticking_table$size, "rows."))

Alternatively, you can connect to an existing PQ via its serial number:

pq_serial <- "9876543210"
client2 <- sm$connect_to_pq_by_serial(pq_serial)

Query the Deephaven Controller

The Deephaven controller manages all PQs. Using the R client, you can query the controller to learn about all PQs on the system.

Let's get a list of all PQs running on the system. To do this, you must first subscribe to controller updates and then poll the subscription for the controller's current state.

  # Create a new subscription to the controller and store it in the `controller_subscription` variable
  controller_subscription <- sm$subscribe()
  # Poll the controllers current state; the output is TRUE for a successful poll;
  # The controller state is stored in the `controller_subscription` object.
  controller_subscription$poll()

Output:

[1] TRUE

Once the controller state has been updated, it can be retrieved as a map:

  controller_state = controller_subscription$snapshot_map()
  print(controller_state)

Output:

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

$`1725469718910000000`
C++ object <0x55efce6b1ee0> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>

$`1725469718910000001`
C++ object <0x55efce6f1020> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>

$`1725469718910000002`
C++ object <0x55efce6f1040> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>

$`1725469718910000003`
C++ object <0x55efce6f1060> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>

$`1725469718910000004`
C++ object <0x55efce6f11a0> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>

$`1725469718910000029`
C++ object <0x55efce6f12e0> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>

controller_state maps the PQ serial number string to an S6 INTERNAL_PqInfoView object containing the information about the PQ.

Run the following to get the current running status for a PQ by serial number:

   pq_status = smap[['1725469718910000029']]$state()$status()
   print(pq_status == PqStatusEnumValues$PQS_RUNNING)

Output:

[1] 6
[1] TRUE

Note

INTERNAL_PqInfoView mirrors PersistentQueryInfoMessage in the Core+ gRPC API. See the Core+ gRPC PersistentQueryInfoMessage documentation for details on what data is available.

Built-in R documentation

You can get more information about the Deephaven R client by consulting the built-in R documentation.

In R, you can use the ? operator to access documentation for any function or package. To do this, simply type ? followed by the name of the function or package you want to learn about.

For example, to get documentation on the mean function, type ?mean in the R console. This command will open the help page for the mean function, providing detailed information on its usage, arguments, and examples.

Similarly, to access documentation for a specific package, type ?package_name.

This method is a quick and efficient way to access R's extensive built-in documentation and learn more about the functions and packages you are using.

Try this:

library("rdnd")
library("rdeephaven")

?rdnd
?rdeephaven
?SessionManager
?PqConfigBuilder
?DndClient
?Client
?Table

Additionally, you can use the ls method to list all of the methods and fields on an object:

ls(sm)
ls(session)
ls(t)

Cleanup

Once you are done with a client session or session manager, call the respective close method. This will ensure that all network connections are closed and all server-side resources associated with the object are released.

  session$close()
  pq_session$close()
  sm$close()

Caution

Once the close method is called on a client session or session manager, any subsequent operation results in an error.