R integration

R is an open-source programming language and software environment that is commonly used for statistical analysis, graphical representation and reporting.

Before you can integrate R with Deephaven , you should install and run the Deephaven Launcher, create the appropriate Deephaven instance and workspace as required by your enterprise, and then connect to the Instance. Connecting to the appropriate instance will result in the client downloading necessary resource files from the server, which will be required for the R integration.

Expand the R setup notebook

Integrating R with Deephaven Example Notebook

This notebook demonstrates how to use Deephaven's R integration.

When you save the notebook, an HTML file containing the code and output will be saved alongside the notebook.

Click the Preview button. Preview does not run any R code chunks. It displays the output of the chunk when it was last run in the editor.

Configuration

Configure variables needed to connect to the system.

  • home: Your home directory
  • system: Deephaven system to connect to (as configured in the launcher)
  • keyfile: Key file used to authenticate when connecting to the Deephaven system
  • workerHeapGB: Gigabytes of heap for the Deephaven query worker
  • jvmHeapGB: Gigabytes of heap for the local Java Virtual Machine (JVM)
  • workerHost: host to run the Deephaven query worker on
home <- "/Users/userName"
system <- "dh-demo"
keyfile <- sprintf("%s/.priv.dh-demo.base64.txt",home)
workerHeapGB <- 4
jvmHeapGB <- 2
workerHost <- "dh-demo-query4.int.illumon.com"

Connect

Connect to the Deephaven system. The connection process creates a query worker on the Deephaven system for this session. All future queries for the session are executed in this worker.

To determine the proper value for JAVA_HOME, run R CMD javareconf from the command line.

# Run the following to get java details:  R CMD javareconf
Sys.setenv(JAVA_HOME = '/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/')
source(sprintf("%s/iris/.programfiles/%s/integrations/r/irisdb.R",home,system))

idb.init(devroot = sprintf("%s/iris/.programfiles/%s/",home,system),
         workspace = sprintf("%s/iris/workspaces/%s/workspaces/r/",home,system),
         propfile = "iris-common.prop",
         userHome = home,
         keyfile = keyfile,
         librarypath = sprintf("%s/iris/.programfiles/%s/java_lib",home,system),
         log4jconffile = NULL,
         workerHeapGB = workerHeapGB,
         jvmHeapGB = jvmHeapGB,
         workerHost = workerHost,
         verbose = FALSE,
         jvmArgs = c("-Dservice.name=iris_console",sprintf("-Ddh.config.client.bootstrap=%s/iris/.programfiles/%s/dh-config/clients",home,system)),
         classpathAdditions = c(sprintf("%s/iris/.programfiles/%s/resources",home,system),sprintf("%s/iris/.programfiles/%s/java_lib",home,system)),
         jvmForceInit = FALSE)

Execute a Command on the Deephaven Query Worker

Compute x=1+2 on the Deephaven query worker, then pull the value back to the local R session.

idb.execute("x=1+2")
x <- idb.get("x")
print(x)

Pull a Deephaven Query Result to a Local R Dataframe

Compute the number of stock trades for each date from the LearnDeephaven/StockTrades table. The result is stored as t1 on the Deephaven query worker. t1 is copied from the server to the local R session.

idb.execute('t = db.t("LearnDeephaven","StockTrades"); t1=t.countBy("Count","Date")')
t1 <- idb.get.df("t1")
print(t1)

Push a Local R Dataframe to the Deephaven Query Worker

Create a new table t2 in the local R session and copy the table to the Deephaven query worker.

t2 <- t1[2:3,]
print(t2)
idb.push.df("t2",t2)

Use the Pushed R Dataframe in a Deephaven Query

Table t is the result of t = db.t("LearnDeephaven","StockTrades"), computed earlier in the session. Table t2 is the R dataframe pushed to the Deephaven query worker in the previous step. These tables are used to compute the total dollars traded per underlying security for the dates specified in t2. This result is then pulled to the local R session.

idb.execute('t3 = t.whereIn(t2,"Date").view("USym","Dollars=Last*Size").sumBy("USym").sortDescending("Dollars")')
t3 <- idb.get.df("t3")
print(t3)

Close the connection

To terminate the remote worker, use the close() method.

idb.close()

Setup: R, rJava, and Auth Key

Windows setup

The following steps are required to install R and integrate R with Deephaven from a Windows machine.

Note

The steps below show the default paths used on Windows-based PCs. Your paths may differ depending on your Deephaven version, and the paths where you chose to install Deephaven components.

  1. If the machine does not already have Deephaven client components installed, configure the machine as a Deephaven client. This can be done by installing the Deephaven Launcher and running either the interactive Launcher, or, for systems without a GUI, or where scripted executions are needed, the Deephaven Updater.

  2. Add the JVM shared library path to the path.

  • If you installed the version of the Deephaven launcher that included the JDK, the complete path should be similar to the following: C:\Users\<userName>\AppData\Local\Illumon\jdk\jre\bin\server\
  • If you installed the JDK separately, the complete path should be similar to the following: C:\Program Files\Java\jdk_version\jre\bin\server
  1. If R is not installed, install R. See: https://cran.r-project.org/
  2. If RStudio is not installed, consider installing RStudio. RStudio is not required, but it does provide a nice IDE. See: https://www.rstudio.com/products/rstudio/download/
  3. Start an R console through the standard R installation or through RStudio.
  4. Install rJava by running one of the following in the R console:
  • If you installed the version of the Deephaven Launcher that included the JDK, run the following in the R console:

    Sys.setenv(JAVA_HOME="C:\\Users\\<userName>\\AppData\\Local\\Illumon\\jdk\\")
    
    install.packages("rJava")
    
  • If you installed the JDK separately and your system does not have a default value for JAVA_HOME, run the following in the R console:

    Sys.setenv(JAVA_HOME="C:\\Program Files\\Java\\jdk_version\\")
    
    install.packages("rJava")
    
  1. (Optionally) Test the rJava installation by running the following in the R console:

    library(rJava)
    .jinit() # this starts the JVM
     s <- .jnew("java/lang/String", "Hello World!")
    print(s)
    

    Note

    You must restart the R session before integrating with Deephaven if you explicitly call .jinit(). The integration library takes care of JVM initialization for you.

  2. Set up Deephaven Authorization keys on the Deephaven server by executing the following on the server from your home directory:

Important

Steps 8 and 9 must be run by someone with administrative access to the Deephaven server.

/usr/illumon/latest/bin/generate-iris-keys <irisUserName>

Back up the current dsakeys first; e.g.:

sudo cp /etc/sysconfig/illumon.d/resources/dsakeys.txt /etc/sysconfig/illumon.d/resources/dsakeys.txt.`date +"%Y-%m-%d"`

cat pub-<irisUserName>.base64.txt | sudo -u irisadmin tee -a /etc/sysconfig/deephaven/auth/dsakeys.txt
  1. Reload the auth server (to pick up the change to dsakeys):
sudo -u irisadmin /usr/illumon/latest/bin/iris auth_server_reload_tool
  1. Copy the user's private auth key (priv-<irisUserName>.base64.txt) to the user's home directory on their client machine.

Note

After it has been created, the priv-<irisUserName>.base64.txt file should be accessible only to its user. The copy that was created on the server should be deleted from the server, or moved to a section of the filesystem where only the user has permissions.

A more complete topic covering key based authentication in Deephaven is available in Instructions to make a key.

Connecting to Deephaven from R on Windows

  1. For non-Envoy (default) installations, you can use the configuration bootstrap to allow the integration to load configuration directly from the configuration server:
    jvmLocalArgs <- c(
	'-Ddh.config.client.bootstrap=:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\dh-config\\clients',
  '-Dservice.name=iris_console'
	)

If you are using Envoy, you will need to use the PropertyInputStreamLoaderTraditional to read configuration from local files. Set up the jvmArgs as follows:

    jvmLocalArgs <- c(
	'-Dcom.fishlib.configuration.PropertyInputStreamLoader.override=com.fishlib.configuration.PropertyInputStreamLoaderTraditional',
  '-Dservice.name=iris_console'
	)

Setting service.name to iris_console allows the integration to use some properties which are set for the launcher console, rather than to having to set them explicitly.

  1. Load the Deephaven (Iris) R library:

    source("C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\integrations\\r\\irisdb.R")
    
  2. Initialize the Deephaven integration:

The following options are available for init:

db.init(devroot, workspace, propfile, userHome, keyfile, librarypath, log4jconffile, workerHeapGB, jvmHeapGB, verbose, jvmArgs, remote)

Only the devroot, workspace, propfile, and keyfile arguments are required:

idb.init(devroot, workspace, propfile,NULL,keyfile, jvmArgs=jvmLocalArgs)

Probable values would look something like the following:

devroot <- 'C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\'
workspace <- 'C:\\Users\\<userName>\\Documents\\Iris\\<instance>\\workspaces\\<workspace_name>'
keyfile <- 'C:\\Users\\<userName>\\priv-<username>.base64.txt'
propfile <- 'iris-common.prop'

Setting remote=TRUE causes idb.execute to send Groovy commands to the server and execute them remotely. The default of remote=FALSE executes Groovy commands locally. Executing Groovy commands remotely provides an environment that is more similar to the Web Code Studio or Swing Console, and eliminates a potential class of serialization errors present when executing local Groovy closures against server-side objects.

Note

Note the assignment of jvmLocalArgs to jvmArgs in the init call. At least the two JVM arguments are needed to indicate how to load configuration and where to get some other settings (from the iris_console service stanza). Other JVM arguments, for example to specifically set a query server, can be added to jvmLocalArgs to be used when launching the Deephaven worker.

Warning

If you need to re-run idb.init() to capture any changed inputs, you will need to restart the R session.

Linux or OSX setup

The following steps are required to install R and integrate R with Deephaven from a Linux or OSX machine.

Note

The steps below show the default paths used on Linux machines. Your paths may differ depending on your operating system, Deephaven version, and the paths where you chose to install Deephaven components.

  1. Configure the machine as a Deephaven client. This can be done by installing the Deephaven Launcher tar file and running the interactive Launcher, or, for systems without a GUI, or where scripted executions are needed, the Deephaven Updater.

  2. If R is not installed, install R. See: https://cran.r-project.org/

  3. If RStudio is not installed, consider installing RStudio. RStudio is not required, but it does provide a nice IDE. See: https://www.rstudio.com/products/rstudio/download/

  4. Start an R console through the standard R installation or through RStudio.

  5. Install rJava by running one of the following in the R console: install.packages("rJava")

  6. (Optionally) Test the rJava installation by running the following in the R console:

    library(rJava)
    .jinit() # this starts the JVM
     s <- .jnew("java/lang/String", "Hello World!")
    print(s)
    

    Note

    You must restart the R session before integrating with Deephaven if you explicitly call .jinit(). The integration library takes care of JVM initialization for you.

  7. Set up Deephaven Authorization keys on the Deephaven server by executing the following on the server from your home directory:

Important

Steps 7 and 8 must be run by someone with administrative access to the Deephaven server.

/usr/illumon/latest/bin/generate-iris-keys <irisUserName>

Back up the current dsakeys first; e.g.:

sudo cp /etc/sysconfig/illumon.d/resources/dsakeys.txt /etc/sysconfig/illumon.d/resources/dsakeys.txt.`date +"%Y-%m-%d"`

cat pub-<irisUserName>.base64.txt | sudo -u irisadmin tee -a /etc/sysconfig/deephaven/auth/dsakeys.txt
  1. Reload the auth server (to pick up the change to dsakeys):
sudo -u irisadmin /usr/illumon/latest/bin/iris auth_server_reload_tool
  1. Copy the user's private auth key (priv-<irisUserName>.base64.txt) to the user's home directory on their client machine.

Note

After it has been created, the priv-<irisUserName>.base64.txt file should be accessible only to its user. The copy that was created on the server should be deleted from the server, or moved to a section of the filesystem where only the user has permissions.

A more complete topic covering key based authentication in Deephaven is available in Instructions to make a key.

Connecting to Deephaven from R on Linux or OSX

  1. For non-Envoy (default) installations, you can use the configuration bootstrap to allow the integration to load configuration directly from the configuration server:
    jvmLocalArgs <- c(
	'-Ddh.config.client.bootstrap=/home/<username>/iris/.programfiles/<instance_name>/dh-config/clients',
  '-Dservice.name=iris_console'
	)

If you are using Envoy, you will need to use the PropertyInputStreamLoaderTraditional to read configuration from local files. Set up the jvmArgs as follows. :

    jvmLocalArgs <- c(
	'-Dcom.fishlib.configuration.PropertyInputStreamLoader.override=com.fishlib.configuration.PropertyInputStreamLoaderTraditional',
  '-Dservice.name=iris_console'
	)

Setting service.name to iris_console allows the integration to use some properties which are set for the launcher console, rather than to having to set them explicitly.

  1. Load the Deephaven (Iris) R library:

    source("/home/<username>/iris/.programfiles/<instance_name>/integrations/r/irisdb.R")
    
  2. Initialize the Deephaven integration:

The following options are available for input:

idb.init(devroot, workspace, propfile, userHome, keyfile, librarypath, log4jconffile, workerHeapGB, jvmHeapGB, verbose, jvmArgs)

Only the devroot, workspace, propfile, and keyfile arguments are required:

idb.init(devroot, workspace, propfile,NULL,keyfile, jvmArgs=jvmLocalArgs)

Probable values would look something like the following (a base path of /Users/<username> on OSX is typical, whereas this will normally be /home/<username> on Linux):

devroot <- '/home/<username>/iris/.programfiles/<instance>/'
workspace <- '/home/<username>/iris/workspaces/<instance>/workspaces/<workspace_name>/'
keyfile <- '/home/<username>/priv-<username>.base64.txt'
propfile <- 'iris-common.prop'

Note

Note the assignment of jvmLocalArgs to jvmArgs in the init call. At least the two JVM arguments are needed to indicate how to load configuration and where to get some other settings (from the iris_console service stanza). Other JVM arguments, for example to specifically set a query server, can be added to jvmLocalArgs to be used when launching the Deephaven worker.

Warning

If you need to re-run idb.init() to capture any changed inputs, you will need to restart the R session.

Using the idb Deephaven R interface

  1. Execute Groovy code:

    idb.execute('groovy')
    

    Tip

    Because Deephaven queries heavily utilize double quotes, use single quotes to encapsulate string literals.

  2. Execute Groovy code contained in a file:

    idb.executeFile('filepath')
    
  3. Get a variable from the Groovy shell:

    idb.get('variable')
    
  4. Get a variable from the Groovy shell as an R data frame or converts a Deephaven table to an R data frame.

    idb.get.df('variable')
    
  5. Get a Deephaven database object. (Because the syntax is more nasty, you should only use this if you have a good reason not to use the Groovy functionality.)

    idb.db()
    
  6. Create the table variable "name" in the Groovy shell with the data in the data frame df.

    idb.push.df('name', df)
    

Examples

Initialize Deephaven

Set the environment variable for the JDK:

Sys.setenv(JAVA_HOME="C:\\Users\\<userName>\\AppData\\Local\\Illumon\\jdk\\")

If JDK installed separately from Iris Launcher: Sys.setenv(JAVA_HOME="C:\\Program Files\\Java\\jdk_version\\")

install.packages("rJava")
library(rJava)

devroot = "C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\" // devroot requires trailing slash
workspace = "C:\\Users\\<userName>\\Documents\\Iris\\<instance>\\workspaces\\<workspace>\\"
propfile = "iris-common.prop"
workerHeapGB <- 2
jvmHeapGB <- 2
keyfile <- "C:\\Users\\<userName>\\priv-<irisUserName>.base64.txt"
verbose <- TRUE
source("C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\integrations\\r\\irisdb.R")
idb.init(devroot, workspace, propfile, workerHeapGB=workerHeapGB, jvmHeapGB=jvmHeapGB, keyfile=keyfile, verbose=verbose)

Note

Appropriate file paths can also be found in the bottom right corner of the Launcher. Also, you can determine devroot, workspace, and propfile properties for idb.init through environment variables rather than string arguments.

For example:

Sys.setenv(ILLUMON_DEVROOT = "C:\\Users\\<userName>\\AppData\\Local\\Illumon\\<instance>\\")
Sys.setenv(ILLUMON_WORKSPACE = "C:\\Users\\<userName>\\Documents\\Iris\\<instance>\\workspaces\\<workspace>\\")
Sys.setenv(ILLUMON_PROPFILE = "iris-common.prop")

Execute Simple Groovy Commands

// Set and get a variable
idb.execute('b=4')
print(idb.get('b'))

Execute a Groovy File

idb.executeFile('test.groovy')
print(idb.get('c'))
print(idb.get('d'))

Execute a Groovy Query to get an R Data Frame

idb.execute('
   t1 = emptyTable(100).update("Type= i%2==0 ? `A` : `B`","X=i","Y=X*X");
   t2 = t1.sumBy("Type")
    ')
t1df <- idb.get.df('t1')
t2df <- idb.get.df('t2')
show(t2df)

Execute a Query Without Using Groovy

The Groovy interface is much nicer. You should have a good reason not to use it.

t <- tryCatch( wdb.db()$getTable("SystemEQ","Trades"), Exception = function(e){e$jobj$printStackTrace()} )
t2 <- t$where(.jarray('Date=`2014-04-11`'))

print(t2$size())
print(t2$getColumn("Price")$getDouble(2L))

tt <- t$selectDistinct(.jarray("Date"))$sortDescending(.jarray("Date"))
print(tt$getColumn('Date')$get(0L))

Print class info:

.jmethods('com.illumon.iris.db.tables.Table')
.jmethods('com.illumon.integrations.common.IrisIntegrationGroovySession')

Print Java Classpath:

print(.jclassPath())

Export Data Frames from R and Import to Deephaven

To export the R data frame df as an in-memory Deephaven Groovy table named myRTable, use the following:

idb.push.df('myRTable', df)

The in-memory Deephaven Groovy table is not permanently stored and cannot be accessed outside of the process in which it was created. You can use myRTable in R as you would any other Deephaven Groovy variable in the R API. For example:

idb.execute('t = myRTable.updateView("n2 = n*n")')

Like any other Deephaven table, myRTable can be saved for later use by using the following:

db.addTable(namespace,tablename,table)

The example below saves myRTable as MyDeephavenTableFromR in the MyNamespace namespace.

idb.execute('db.addTable("MyNameSpace","MyDeephavenTableFromR",myRTable)')

To load the saved R table from the console, use:

table = db.t("MyNameSpace","MyDeephavenTableFromR")

Best Practices

  • Do as much work as possible in Deephaven. Use R for the final analysis of a small, distilled data set.
  • Be conscious of the size of tables converted to in-memory R tables. The R session must have enough RAM to store the table.