Develop an R Client Query
Welcome to the Client APIs section of the crash course! So far, you've learned how to use Deephaven from the server itself. If you missed any of the previous sections, check out the links below:
This section of the crash course covers Deephaven Enterprise's client APIs. This guide discusses the R client, which allows you to:
- Create new workers
- Interact with tables and objects on the server
- Connect to existing Persistent Queries (PQs)
- Create PQs
- Run queries server-side
- And more!
The Enterprise R client is built on top of the Community R client, giving you access to its rich feature set.
Installation
Download the R client artifact
Note
To follow these steps, you will need a Deephaven Jfrog login. If you do not have a login, contact someone in your organization who can perform the download for you.
First, download the R client artifact for your version of Deephaven Enterprise from jfrog.io. Navigate to:
https://illumon.jfrog.io/ui/repos/tree/General/libs-customer/io/deephaven/enterprise/dhe-r-src/<dh_version>/dhe-r-src-<dh_version>.tgz
dh_version
is the Deephaven Enterprise version you are running. For example, with Deephaven Enterprise version 1.20240517.298
, you will navigate to https://illumon.jfrog.io/ui/repos/tree/General/libs-customer/io/deephaven/enterprise/dhe-r-src/1.20240517.298/dhe-r-src-1.20240517.298.tgz to download the artifact.
After navigating to the site, look in the upper right corner for the download icon. It can be easy to miss. The image below highlights the icon.
Build the R Client
The first step in building the client is unpacking the source code:
mkdir build_r_client
cd build_r_client
cp /path/to/dhe-r-src-1.20240517.298.tgz .
tar -zxvf dhe-r-src-1.20240517.298.tgz
Next, follow the detailed instructions for building your version at ./coreplus/R/rdnd/README.md
.
Create a session
The first and most important step when using the R client is to create a session. A session is a connection to the Deephaven server that allows you to authenticate with the server, create tables, PQs, and more. A SessionManager
is an R6 object that creates a session.
The SessionManager
constructor takes two arguments:
- A descriptive name that will be used on the server and client sides to log information about operations performed by this client.
- A URL pointing to a file in JSON format containing information about connectivity parameters for the Deephaven installation. A Deephaven Enterprise installation provides such a file under
iris/connection.json
.
The descriptive name can, in principle, be any string; however, it is important to select a value that allows the client to be distinguished in client logs. One way to do this is to include the process ID where the client is created. Below, this is done with Sys.getpid()
.
Note
The client will append information about the host where this client is running, so there is no need to include host information in descriptive_name
.
library("rdnd")
library("rdeephaven")
# Port 8000 is the port for Deephaven servers with Envoy, but 8123 is the default for those without.
connection_json <- "https://host.mydomain.com:8000/iris/connection.json"
# sm is an R6 SessionManager object.
sm <- SessionManager$new(
descriptive_name=paste0("R session manager pid=", Sys.getpid()),
source=connection_json)
Tip
To see all of the methods on SessionManager
, run ?SessionManager
.
Authenticate
There are two ways to authenticate with the server.
First, you can use your Deephaven username and password.
user <- "username"
password <- "password"
auth_result <- sm$password_authentication(user = user, password = password, operate_as = user)
Note
operate_as
is only relevant for system administrators. It allows the administrator to operate as another user.
Under normal circumstances user
and operate_as
are provided the same value.
Second, you can use a private key stored in a file.
auth_result <- sm$private_key_authentication(private_key_filename)
Now that you're logged in, you can start doing things with Deephaven!
Use the client
Create a new worker
A common use case for the client is to create a new worker and run queries on it. The following code creates a new temporary worker with 4GB of memory and connects to it:
new_pq_name <- "my_new_pq"
# pq_config_builder is an R6 PqConfigBuilder object
pq_config_builder <- sm$make_temp_pq_config_builder(new_pq_name)
pq_config_builder.set_max_heap_size_gb(4);
session <- $sm$add_query_and_connect(pq_config_builder$build());
Tip
- To see all of the options for configuring a persistent query, run
?PqConfigBuilder
. - To see all of the methods on
session
, runls(session)
.
Run queries on the worker
Tables in the R client are handles that reference tables that live on the server. As such, the following operations happen on the server:
- Load and create tables.
- Perform calculations on tables.
In the following example, t_static
and t_ticking
are merely references, so the tables and operations happen server-side.
t_static <- session$historical_table("LearnDeephaven", "StockTrades")
t_ticking <- session$
live_table("DbInternal", "ProcessEventLog")$
tail(100)$
reverse()
print(paste("Static table:", t_static))
print(paste("Ticking table:", t_ticking))
These tables can be queried, and the resultant table can be pulled into R as a dataframe.
t_query <- t_static$where(c("Date = `2017-08-25`"))$view(c("USym", "Dollars = Last * Size"))$avg_by("USym")
df <- t_query$as_data_frame()
print(df)
Output:
USym Dollars
1 AAPL 33786.52
2 AXP 13724.41
3 BAC 12419.70
4 CSCO 8825.76
5 GOOG 48514.16
6 IBM 17168.40
7 INTC 10852.20
8 MSFT 13295.24
9 PFE 12523.75
10 XOM 15053.68
Warning
R dataframes do not support ticking data. as_data_frame
must be called every time you want a new snapshot of a Deephaven server table in R.
The client session can also run queries on the server:
my_query <- paste(
"from deephaven import time_table",
"from deephaven import empty_table",
"static_table = db.historical_table(namespace='LearnDeephaven', table_name='StockTrades')",
"ticking_table = db.live_table(namespace='DbInternal', table_name='ProcessEventLog').tail(100).reverse()",
sep = "\n"
)
session$run_script(my_query)
Warning
The code provided to run_script
must match the script language configured for the PQ at creation. To set the script language for a new PQ, use the set_script_language
method of PqConfigBuilder
. This method accepts a string argument of either "groovy"
or "python"
.
The client session can get a handle to a server table by using the table name:
t_by_name <- client$open_table("ticking_table")
print(paste("Ticking table has", t_by_name$size, "rows."))
Tip
To see all of the options for queries on a table, run ?rdeephaven::TableHandle
.
Create a new table on the server
You may have an R dataframe that you want to use in a Deephaven query.
To do this, you will create a table on the Deephaven server from the local R dataframe.
Let's start by creating a dataframe:
rows <- 10
df_local <- data.frame(
# A time column
T = seq.POSIXt(as.POSIXct(Sys.Date()), as.POSIXct(Sys.Date() + 30), by = "1 sec")[rows],
# A boolean (logical) column
B = sample(c(TRUE, FALSE), rows, TRUE),
# An int (integer) column
I1 = sample(1:rows),
# Anoter int (integer) column
I2 = sample(1:rows)
)
print(df_local)
Output:
T B I1 I2
1 2023-09-23 00:00:09 FALSE 7 7
2 2023-09-23 00:00:09 FALSE 8 4
3 2023-09-23 00:00:09 TRUE 6 1
4 2023-09-23 00:00:09 FALSE 9 6
5 2023-09-23 00:00:09 TRUE 5 8
6 2023-09-23 00:00:09 TRUE 10 2
7 2023-09-23 00:00:09 FALSE 2 10
8 2023-09-23 00:00:09 FALSE 1 3
9 2023-09-23 00:00:09 TRUE 4 9
10 2023-09-23 00:00:09 TRUE 3 5
Now the dataframe can be used to create a table on the Deephaven server.
t_new = session$import_table(df_local)
t_new
is a table handle that references the new table on the server and can be used in queries.
t_query_2 = t_new$update("S = I1 + I2")$group_by("S")
df2 = t_query_2$as_data_frame()
print(df2)
Output:
S T B I1 I2
1 14 1695427209 FALSE 7 7
2 12 1695427209, 1695427209, 1695427209 FALSE, TRUE, FALSE 8, 10, 2 4, 2, 10
3 7 1695427209 TRUE 6 1
4 15 1695427209 FALSE 9 6
5 13 1695427209, 1695427209 TRUE, TRUE 5, 4 8, 9
6 4 1695427209 FALSE 1 3
7 8 1695427209 TRUE 3 5
Connect to an existing PQ
Instead of creating a new temporary worker, you can also connect to an existing PQ.
In the Create a new worker section, you created a new worker. Let's create a new client session to connect to that worker
and get a table created by the script executed on the worker:
pq_session <- sm$connect_to_pq_by_name(new_pq_name)
my_ticking_table <- pq_session$open_table("ticking_table")
print(paste("Ticking table has", my_ticking_table$size, "rows."))
Alternatively, you can connect to an existing PQ via its serial number:
pq_serial <- "9876543210"
client2 <- sm$connect_to_pq_by_serial(pq_serial)
Query the Deephaven Controller
The Deephaven controller manages all PQs. Using the R client, you can query the controller to learn about all PQs on the system.
Let's get a list of all PQs running on the system. To do this, you must first subscribe to controller updates and then poll the subscription for the controller's current state.
# Create a new subscription to the controller and store it in the `controller_subscription` variable
controller_subscription <- sm$subscribe()
# Poll the controllers current state; the output is TRUE for a successful poll;
# The controller state is stored in the `controller_subscription` object.
controller_subscription$poll()
Output:
[1] TRUE
Once the controller state has been updated, it can be retrieved as a map:
controller_state = controller_subscription$snapshot_map()
print(controller_state)
Output:
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
$`1725469718910000000`
C++ object <0x55efce6b1ee0> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>
$`1725469718910000001`
C++ object <0x55efce6f1020> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>
$`1725469718910000002`
C++ object <0x55efce6f1040> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>
$`1725469718910000003`
C++ object <0x55efce6f1060> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>
$`1725469718910000004`
C++ object <0x55efce6f11a0> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>
$`1725469718910000029`
C++ object <0x55efce6f12e0> of class 'INTERNAL_PqInfoView' <0x55efcc144d00>
controller_state
maps the PQ serial number string to an S6 INTERNAL_PqInfoView
object containing the information about the PQ.
Run the following to get the current running status for a PQ by serial number:
pq_status = smap[['1725469718910000029']]$state()$status()
print(pq_status == PqStatusEnumValues$PQS_RUNNING)
Output:
[1] 6
[1] TRUE
Note
INTERNAL_PqInfoView
mirrors PersistentQueryInfoMessage in the Core+ gRPC API. See the Core+ gRPC PersistentQueryInfoMessage documentation for details on what data is available.
Built-in R documentation
You can get more information about the Deephaven R client by consulting the built-in R documentation.
In R, you can use the ?
operator to access documentation for any function or package. To do this, simply type ?
followed by the name of the function or package you want to learn about.
For example, to get documentation on the mean
function, type ?mean
in the R console. This command will open the help page for the mean function, providing detailed information on its usage, arguments, and examples.
Similarly, to access documentation for a specific package, type ?package_name
.
This method is a quick and efficient way to access R's extensive built-in documentation and learn more about the functions and packages you are using.
Try this:
library("rdnd")
library("rdeephaven")
?rdnd
?rdeephaven
?SessionManager
?PqConfigBuilder
?DndClient
?Client
?Table
Additionally, you can use the ls
method to list all of the methods and fields on an object:
ls(sm)
ls(session)
ls(t)
Cleanup
Once you are done with a client session or session manager, call the respective close
method.
This will ensure that all network connections are closed and all server-side resources associated with the object are released.
session$close()
pq_session$close()
sm$close()
Caution
Once the close
method is called on a client session or session manager, any subsequent operation results in an error.