R integration
R is an open-source programming language and software environment that is commonly used for statistical analysis, graphical representation and reporting.
Before you can integrate R with Deephaven , you should install and run the Deephaven Launcher, create the appropriate Deephaven instance and workspace as required by your enterprise, and then connect to the Instance. Connecting to the appropriate instance will result in the client downloading necessary resource files from the server, which will be required for the R integration.
Note
Expand the R setup notebook
Integrating R with Deephaven Example Notebook
This notebook demonstrates how to use Deephaven's R integration.
When you save the notebook, an HTML file containing the code and output will be saved alongside the notebook.
Click the Preview button. Preview does not run any R code chunks. It displays the output of the chunk when it was last run in the editor.
Configuration
Configure variables needed to connect to the system.
home
: Your home directorysystem
: Deephaven system to connect to (as configured in the launcher)keyfile
: Key file used to authenticate when connecting to the Deephaven systemworkerHeapGB
: Gigabytes of heap for the Deephaven query workerjvmHeapGB
: Gigabytes of heap for the local Java Virtual Machine (JVM)workerHost
: host to run the Deephaven query worker on
home <- "/Users/userName"
system <- "dh-demo"
keyfile <- sprintf("%s/.priv.dh-demo.base64.txt",home)
workerHeapGB <- 4
jvmHeapGB <- 2
workerHost <- "dh-demo-query4.int.illumon.com"
Connect
Connect to the Deephaven system. The connection process creates a query worker on the Deephaven system for this session. All future queries for the session are executed in this worker.
To determine the proper value for JAVA_HOME
, run R CMD javareconf
from the command line.
# Run the following to get java details: R CMD javareconf
Sys.setenv(JAVA_HOME = '/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/')
source(sprintf("%s/iris/.programfiles/%s/integrations/r/irisdb.R",home,system))
idb.init(devroot = sprintf("%s/iris/.programfiles/%s/",home,system),
workspace = sprintf("%s/iris/workspaces/%s/workspaces/r/",home,system),
propfile = "iris-common.prop",
userHome = home,
keyfile = keyfile,
librarypath = sprintf("%s/iris/.programfiles/%s/java_lib",home,system),
log4jconffile = NULL,
workerHeapGB = workerHeapGB,
jvmHeapGB = jvmHeapGB,
workerHost = workerHost,
verbose = FALSE,
jvmArgs = c("-Dservice.name=iris_console",sprintf("-Ddh.config.client.bootstrap=%s/iris/.programfiles/%s/dh-config/clients",home,system)),
classpathAdditions = c(sprintf("%s/iris/.programfiles/%s/resources",home,system),sprintf("%s/iris/.programfiles/%s/java_lib",home,system)),
jvmForceInit = FALSE)
Execute a Command on the Deephaven Query Worker
Compute x=1+2
on the Deephaven query worker, then pull the value back to the local R session.
idb.execute("x=1+2")
x <- idb.get("x")
print(x)
Pull a Deephaven Query Result to a Local R Dataframe
Compute the number of stock trades for each date from the LearnDeephaven/StockTrades table. The result is stored as t1 on the Deephaven query worker. t1 is copied from the server to the local R session.
idb.execute('t = db.t("LearnDeephaven","StockTrades"); t1=t.countBy("Count","Date")')
t1 <- idb.get.df("t1")
print(t1)
Push a Local R Dataframe to the Deephaven Query Worker
Create a new table t2 in the local R session and copy the table to the Deephaven query worker.
t2 <- t1[2:3,]
print(t2)
idb.push.df("t2",t2)
Use the Pushed R Dataframe in a Deephaven Query
Table t is the result of t = db.t("LearnDeephaven","StockTrades")
, computed earlier in the session. Table t2 is the R dataframe pushed to the Deephaven query worker in the previous step. These tables are used to compute the total dollars traded per underlying security for the dates specified in t2. This result is then pulled to the local R session.
idb.execute('t3 = t.whereIn(t2,"Date").view("USym","Dollars=Last*Size").sumBy("USym").sortDescending("Dollars")')
t3 <- idb.get.df("t3")
print(t3)
Close the connection
To terminate the remote worker, use the close()
method.
idb.close()
Setup: R, rJava, and Auth Key
Windows setup
The following steps are required to install R and integrate R with Deephaven from a Windows machine.
Note
The steps below show the default paths used on Windows-based PCs. Your paths may differ depending on your Deephaven version, and the paths where you chose to install Deephaven components.
-
If the machine does not already have Deephaven client components installed, configure the machine as a Deephaven client. This can be done by installing the Deephaven Launcher and running either the interactive Launcher, or, for systems without a GUI, or where scripted executions are needed, the
Deephaven Updater
. -
Add the JVM shared library path to the path.
- If you installed the version of the Deephaven launcher that included the JDK, the complete path should be similar to the following:
C:\Users\<userName>\AppData\Local\Illumon\jdk\jre\bin\server\
- If you installed the JDK separately, the complete path should be similar to the following:
C:\Program Files\Java\jdk_version\jre\bin\server
- If R is not installed, install R. See: https://cran.r-project.org/
- If RStudio is not installed, consider installing RStudio. RStudio is not required, but it does provide a nice IDE. See: https://www.rstudio.com/products/rstudio/download/
- Start an R console through the standard R installation or through RStudio.
- Install rJava by running one of the following in the R console:
-
If you installed the version of the Deephaven Launcher that included the JDK, run the following in the R console:
Sys.setenv(JAVA_HOME="C:\\Users\\<userName>\\AppData\\Local\\Illumon\\jdk\\") install.packages("rJava")
-
If you installed the JDK separately and your system does not have a default value for JAVA_HOME, run the following in the R console:
Sys.setenv(JAVA_HOME="C:\\Program Files\\Java\\jdk_version\\") install.packages("rJava")
-
(Optionally) Test the rJava installation by running the following in the R console:
library(rJava) .jinit() # this starts the JVM s <- .jnew("java/lang/String", "Hello World!") print(s)
Note
You must restart the R session before integrating with Deephaven if you explicitly call
.jinit()
. The integration library takes care of JVM initialization for you. -
Set up Deephaven Authorization keys on the Deephaven server by executing the following on the server from your home directory:
Important
Steps 8 and 9 must be run by someone with administrative access to the Deephaven server.
/usr/illumon/latest/bin/generate-iris-keys <irisUserName>
Back up the current dsakeys first; e.g.:
sudo cp /etc/sysconfig/illumon.d/resources/dsakeys.txt /etc/sysconfig/illumon.d/resources/dsakeys.txt.`date +"%Y-%m-%d"`
cat pub-<irisUserName>.base64.txt | sudo -u irisadmin tee -a /etc/sysconfig/deephaven/auth/dsakeys.txt
- Reload the auth server (to pick up the change to dsakeys):
sudo -u irisadmin /usr/illumon/latest/bin/iris auth_server_reload_tool
- Copy the user's private auth key (
priv-<irisUserName>.base64.txt
) to the user's home directory on their client machine.
Note
After it has been created, the priv-<irisUserName>.base64.txt
file should be accessible only to its user. The copy that was created on the server should be deleted from the server, or moved to a section of the filesystem where only the user has permissions.
A more complete topic covering key based authentication in Deephaven is available in Instructions to make a key.
Connecting to Deephaven from R on Windows
- For non-Envoy (default) installations, you can use the configuration bootstrap to allow the integration to load configuration directly from the configuration server:
jvmLocalArgs <- c(
'-Ddh.config.client.bootstrap=:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\dh-config\\clients',
'-Dservice.name=iris_console'
)
If you are using Envoy, you will need to use the PropertyInputStreamLoaderTraditional
to read configuration from local files. Set up the jvmArgs
as follows:
jvmLocalArgs <- c(
'-Dcom.fishlib.configuration.PropertyInputStreamLoader.override=com.fishlib.configuration.PropertyInputStreamLoaderTraditional',
'-Dservice.name=iris_console'
)
Setting service.name
to iris_console
allows the integration to use some properties which are set for the launcher console, rather than to having to set them explicitly.
-
Load the Deephaven (Iris) R library:
source("C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\integrations\\r\\irisdb.R")
-
Initialize the Deephaven integration:
The following options are available for init:
db.init(devroot, workspace, propfile, userHome, keyfile, librarypath, log4jconffile, workerHeapGB, jvmHeapGB, verbose, jvmArgs, remote)
Only the devroot
, workspace
, propfile
, and keyfile
arguments are required:
idb.init(devroot, workspace, propfile,NULL,keyfile, jvmArgs=jvmLocalArgs)
Probable values would look something like the following:
devroot <- 'C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\'
workspace <- 'C:\\Users\\<userName>\\Documents\\Iris\\<instance>\\workspaces\\<workspace_name>'
keyfile <- 'C:\\Users\\<userName>\\priv-<username>.base64.txt'
propfile <- 'iris-common.prop'
Setting remote=TRUE
causes idb.execute
to send Groovy commands to the server and execute them remotely. The default
of remote=FALSE
executes Groovy commands locally. Executing Groovy commands remotely provides an environment that is
more similar to the Web Code Studio or Swing Console, and eliminates a potential class of serialization errors present when executing local Groovy closures against server-side objects.
Note
Note the assignment of jvmLocalArgs
to jvmArgs
in the init call. At least the two JVM arguments are needed to indicate how to load configuration and where to get some other settings (from the iris_console service stanza). Other JVM arguments, for example to specifically set a query server, can be added to jvmLocalArgs
to be used when launching the Deephaven worker.
Warning
If you need to re-run idb.init()
to capture any changed inputs, you will need to restart the R session.
Linux or OSX setup
The following steps are required to install R and integrate R with Deephaven from a Linux or OSX machine.
Note
The steps below show the default paths used on Linux machines. Your paths may differ depending on your operating system, Deephaven version, and the paths where you chose to install Deephaven components.
-
Configure the machine as a Deephaven client. This can be done by installing the Deephaven Launcher tar file and running the interactive Launcher, or, for systems without a GUI, or where scripted executions are needed, the
Deephaven Updater
. -
If R is not installed, install R. See: https://cran.r-project.org/
-
If RStudio is not installed, consider installing RStudio. RStudio is not required, but it does provide a nice IDE. See: https://www.rstudio.com/products/rstudio/download/
-
Start an R console through the standard R installation or through RStudio.
-
Install rJava by running one of the following in the R console:
install.packages("rJava")
-
(Optionally) Test the rJava installation by running the following in the R console:
library(rJava) .jinit() # this starts the JVM s <- .jnew("java/lang/String", "Hello World!") print(s)
Note
You must restart the R session before integrating with Deephaven if you explicitly call
.jinit()
. The integration library takes care of JVM initialization for you. -
Set up Deephaven Authorization keys on the Deephaven server by executing the following on the server from your home directory:
Important
Steps 7 and 8 must be run by someone with administrative access to the Deephaven server.
/usr/illumon/latest/bin/generate-iris-keys <irisUserName>
Back up the current dsakeys first; e.g.:
sudo cp /etc/sysconfig/illumon.d/resources/dsakeys.txt /etc/sysconfig/illumon.d/resources/dsakeys.txt.`date +"%Y-%m-%d"`
cat pub-<irisUserName>.base64.txt | sudo -u irisadmin tee -a /etc/sysconfig/deephaven/auth/dsakeys.txt
- Reload the auth server (to pick up the change to dsakeys):
sudo -u irisadmin /usr/illumon/latest/bin/iris auth_server_reload_tool
- Copy the user's private auth key (
priv-<irisUserName>.base64.txt
) to the user's home directory on their client machine.
Note
After it has been created, the priv-<irisUserName>.base64.txt
file should be accessible only to its user. The copy that was created on the server should be deleted from the server, or moved to a section of the filesystem where only the user has permissions.
A more complete topic covering key based authentication in Deephaven is available in Instructions to make a key.
Connecting to Deephaven from R on Linux or OSX
- For non-Envoy (default) installations, you can use the configuration bootstrap to allow the integration to load configuration directly from the configuration server:
jvmLocalArgs <- c(
'-Ddh.config.client.bootstrap=/home/<username>/iris/.programfiles/<instance_name>/dh-config/clients',
'-Dservice.name=iris_console'
)
If you are using Envoy, you will need to use the PropertyInputStreamLoaderTraditional
to read configuration from local files. Set up the jvmArgs
as follows. :
jvmLocalArgs <- c(
'-Dcom.fishlib.configuration.PropertyInputStreamLoader.override=com.fishlib.configuration.PropertyInputStreamLoaderTraditional',
'-Dservice.name=iris_console'
)
Setting service.name
to iris_console
allows the integration to use some properties which are set for the launcher console, rather than to having to set them explicitly.
-
Load the Deephaven (Iris) R library:
source("/home/<username>/iris/.programfiles/<instance_name>/integrations/r/irisdb.R")
-
Initialize the Deephaven integration:
The following options are available for input:
idb.init(devroot, workspace, propfile, userHome, keyfile, librarypath, log4jconffile, workerHeapGB, jvmHeapGB, verbose, jvmArgs)
Only the devroot
, workspace
, propfile
, and keyfile
arguments are required:
idb.init(devroot, workspace, propfile,NULL,keyfile, jvmArgs=jvmLocalArgs)
Probable values would look something like the following (a base path of /Users/<username>
on OSX is typical, whereas this will normally be /home/<username>
on Linux):
devroot <- '/home/<username>/iris/.programfiles/<instance>/'
workspace <- '/home/<username>/iris/workspaces/<instance>/workspaces/<workspace_name>/'
keyfile <- '/home/<username>/priv-<username>.base64.txt'
propfile <- 'iris-common.prop'
Note
Note the assignment of jvmLocalArgs
to jvmArgs
in the init call. At least the two JVM arguments are needed to indicate how to load configuration and where to get some other settings (from the iris_console service stanza). Other JVM arguments, for example to specifically set a query server, can be added to jvmLocalArgs
to be used when launching the Deephaven worker.
Warning
If you need to re-run idb.init()
to capture any changed inputs, you will need to restart the R session.
Using the idb Deephaven R interface
-
Execute Groovy code:
idb.execute('groovy')
Tip
Because Deephaven queries heavily utilize double quotes, use single quotes to encapsulate string literals.
-
Execute Groovy code contained in a file:
idb.executeFile('filepath')
-
Get a variable from the Groovy shell:
idb.get('variable')
-
Get a variable from the Groovy shell as an R data frame or converts a Deephaven table to an R data frame.
idb.get.df('variable')
-
Get a Deephaven database object. (Because the syntax is more nasty, you should only use this if you have a good reason not to use the Groovy functionality.)
idb.db()
-
Create the table variable "name" in the Groovy shell with the data in the data frame df.
idb.push.df('name', df)
Examples
Initialize Deephaven
Set the environment variable for the JDK:
Sys.setenv(JAVA_HOME="C:\\Users\\<userName>\\AppData\\Local\\Illumon\\jdk\\")
If JDK installed separately from Iris Launcher: Sys.setenv(JAVA_HOME="C:\\Program Files\\Java\\jdk_version\\")
install.packages("rJava")
library(rJava)
devroot = "C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\" // devroot requires trailing slash
workspace = "C:\\Users\\<userName>\\Documents\\Iris\\<instance>\\workspaces\\<workspace>\\"
propfile = "iris-common.prop"
workerHeapGB <- 2
jvmHeapGB <- 2
keyfile <- "C:\\Users\\<userName>\\priv-<irisUserName>.base64.txt"
verbose <- TRUE
source("C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\integrations\\r\\irisdb.R")
idb.init(devroot, workspace, propfile, workerHeapGB=workerHeapGB, jvmHeapGB=jvmHeapGB, keyfile=keyfile, verbose=verbose)
Note
Appropriate file paths can also be found in the bottom right corner of the Launcher. Also, you can determine devroot
, workspace
, and propfile
properties for idb.init
through environment variables rather than string arguments.
For example:
Sys.setenv(ILLUMON_DEVROOT = "C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\")
Sys.setenv(ILLUMON_WORKSPACE = "C:\\Users\\<userName>\\Documents\\Iris\\<instance>\\workspaces\\<workspace>\\")
Sys.setenv(ILLUMON_PROPFILE = "iris-common.prop")
Execute Simple Groovy Commands
// Set and get a variable
idb.execute('b=4')
print(idb.get('b'))
Execute a Groovy File
idb.executeFile('test.groovy')
print(idb.get('c'))
print(idb.get('d'))
Execute a Groovy Query to get an R Data Frame
idb.execute('
t1 = emptyTable(100).update("Type= i%2==0 ? `A` : `B`","X=i","Y=X*X");
t2 = t1.sumBy("Type")
')
t1df <- idb.get.df('t1')
t2df <- idb.get.df('t2')
show(t2df)
Execute a Query Without Using Groovy
The Groovy interface is much nicer. You should have a good reason not to use it.
t <- tryCatch( wdb.db()$getTable("SystemEQ","Trades"), Exception = function(e){e$jobj$printStackTrace()} )
t2 <- t$where(.jarray('Date=`2014-04-11`'))
print(t2$size())
print(t2$getColumn("Price")$getDouble(2L))
tt <- t$selectDistinct(.jarray("Date"))$sortDescending(.jarray("Date"))
print(tt$getColumn('Date')$get(0L))
Print Information on Available Java Methods
Print class info:
.jmethods('com.illumon.iris.db.tables.Table')
.jmethods('com.illumon.integrations.common.IrisIntegrationGroovySession')
Print Java Classpath:
print(.jclassPath())
Export Data Frames from R and Import to Deephaven
To export the R data frame df as an in-memory Deephaven Groovy table named myRTable, use the following:
idb.push.df('myRTable', df)
The in-memory Deephaven Groovy table is not permanently stored and cannot be accessed outside of the process in which it was created. You can use myRTable in R as you would any other Deephaven Groovy variable in the R API. For example:
idb.execute('t = myRTable.updateView("n2 = n*n")')
Like any other Deephaven table, myRTable can be saved for later use by using the following:
db.addTable(namespace,tablename,table)
The example below saves myRTable as MyDeephavenTableFromR in the MyNamespace namespace.
idb.execute('db.addTable("MyNameSpace","MyDeephavenTableFromR",myRTable)')
To load the saved R table from the console, use:
table = db.t("MyNameSpace","MyDeephavenTableFromR")
Best Practices
- Do as much work as possible in Deephaven. Use R for the final analysis of a small, distilled data set.
- Be conscious of the size of tables converted to in-memory R tables. The R session must have enough RAM to store the table.