Think like a Deephaven ninja

How to write clean, efficient, and concise queries which dynamically update

As Deephaven’s Chief Data Scientist and as a co-inventor of the Deephaven system, I have had the opportunity to bring many new users onto the Deephaven platform. These onboarding experiences have demonstrated that one of the largest hurdles for new users is learning to “think in Deephaven.” To achieve exceptional power and usability, Deephaven is different than other technologies. Just as Pandas, SQL, R, etc. encourage a certain approach, so too does Deephaven. Like other users, I hope you find that "thinking in Deephaven" is intuitive and natural, because the system's capabilities can uniquely empower your use cases - particularly as it relates to updating or large data. To realize Deephaven's potential, I encourage you to consider the tips below.

In this document, I present major concepts that will allow you to think in Deephaven. I make no effort to be comprehensive, and I assume that you have a basic understanding of how to write a Deephaven query. Such details are documented elsewhere. Here, I provide a very high-level view of how Deephaven works and how to use this knowledge to produce powerful, clean, efficient, and concise queries.

Let’s get started.

-- Chip Kent

Chief Data Scientist, Deephaven Data Labs

Query Language

To achieve maximum performance and real-time query results, you must use the Deephaven Query Language (DQL) to interact with Deephaven.

Let’s begin by dissecting a simple Python query.

from deephaven.TableTools import emptyTable
from deephaven import QueryScope

t1 = emptyTable(10).update("X = i", "Y = X*X")

First, emptyTable(10) creates a table with 10 rows and no columns. Next, update("X = i", "Y = X*X") adds two columns, X and Y, to the table. Here "X = i" and "Y = X*X" are known as query strings. Query strings are special. When the query is evaluated, the query strings are dynamically converted into code and compiled, yielding very concise syntax for users with high-performance execution.

Now, let’s make the query a little more complex by adding a table, t2, with a new column, Z.

from deephaven.TableTools import emptyTable
from deephaven import QueryScope

t1 = emptyTable(10).update("X = i", "Y = X*X")

def f(x):
	return x*x*x

QueryScope.addParam("a", 7)
t2 = t1.update("Z = X + 2*Y + 2 + a + (int)f.call(Y)")

Column Z is computed with the query string "Z = X + 2*Y + 2 + a + (int)f.call(Y)". When this query string is compiled, Deephaven must determine what each of the variables is. These variables can be:

columns in the table, such as X and Y,
constants, such as 2 or 3, or
values present in the query scope, such as the variable a.

For convenience, top-level Python variables and Groovy binding variables are automatically added to the query scope by Deephaven. This would allow our query to be shortened to:

from deephaven.TableTools import emptyTable
from deephaven import QueryScope

t1 = emptyTable(10).update("X = i", "Y = X*X")

def f(x):
	return x*x*x

a = 7

t2 = t1.update("Z = X + 2*Y + 2 + a + (int)f.call(Y)")

Because the Deephaven query engine is written in Java, it is also possible to call Java functions from query strings.

from deephaven.TableTools import emptyTable
from deephaven import QueryScope

t1 = emptyTable(10).update("X = i", "Y = X*X")
t3 = t1.update("A = 4 + java.lang.Math.sin(X)")

Here I have explicitly provided the fully qualified Java class location so that the function can be resolved when the query string is compiled. The syntax can be made more concise with db.importClass(...) or db.importStatic(...). These methods make either the class or the static methods from the class available to the query string. This allows us to make the query string short and readable.

from deephaven.TableTools import emptyTable
from deephaven import jpy

t1 = emptyTable(10).update("X = i", "Y = X*X")

Math = jpy.get_type("java.lang.Math")
db.importStatic(Math.jclass)

t3 = t1.update(“A = sin(X)”)

Because the query strings are just strings, you can do some very cool meta-programming.

from deephaven.TableTools import emptyTable

t4 = emptyTable(10)

for i in range(5):
	t4 = t4.update(f"C{i} = 2*{i}")

This simple loop and format string creates five new columns - C0, C1, C2, C3, C4 - with values equal to 0, 2, 4, 6, 8. Carefully applying this technique can make messy code clean and concise.

Query As A Graph

Deephaven’s query syntax is very natural and readable. Under the hood, the queries are converted into directed acyclic graphs (DAGs) for efficient real-time processing. Let’s look at an example to understand DAGs.

t1 = timeTable("00:00:01").update("Label=(i%2)")
t2 = t1.lastBy("Label")
t3 = t1.naturalJoin(t2, "Label", "T2=Timestamp")

Here, table t1 is a table that ticks every second and has a Timestamp column as well as a Label column, which alternates between zero and one. Table t2 contains the most recent row for each Label value, and t3 joins the most recent Timestamp for a Label, from t2, onto t1.

Represented as a DAG, this query looks like:

Here you can see each part of the query as connected components - aka a “graph”. The graph starts with the data sources, timeTable(...) in this case. In real time, data flows from the data sources through the graph, dynamically updating the tables. Because this data flow is in one direction, the graph has no loops. This is where the “directed acyclic” part of the directed acyclic graph (DAG) name comes from.

The variables t1, t2, and t3 are simply references to tables within the DAG. These variables allow the tables to be displayed as well as used in further query operations. If a table is not associated with a variable, it is still part of the DAG, but it is not accessible to users. There is one exception. Tables that are not used in the calculation of any variables are garbage collected and removed from the DAG. For example, if the variable t3 is set to None, there are now no references to the result of the naturalJoin(...), so that table is removed from the DAG.

The Deephaven query engine is smart. When data flows through the DAG, instead of recomputing entire tables, only necessary changes are recomputed. For example, if only one row changes, only one row is recomputed. If 11 rows change, only 11 rows are recomputed. If you use static data tables obtained from db.t(...), large sections of the DAG may never recompute. This is one reason Deephaven is so fast.

When processing real-time data changes, Deephaven batches the changes together every 1000 ms (this is configurable). All of the changes within a batch are processed together. These processing events are called Live Table Monitor (LTM) cycles, and the change messages which propagate through the DAG are called Add Modify Delete Reindex (AMDR) messages. Just as it sounds, AMDR messages indicate which rows have been added, modified, deleted, or reindexed (reordered).

Note

Most users never interact directly with AMDR messages, but it is possible to write custom listeners, which execute code when a table changes. Custom listeners are non-table components of the DAG, which listen to AMDR messages of table changes. For example, if you have a query that monitors how full disks are on a cluster, you can write a custom listener to send an email or Slack message every time a table in your monitor query gets a new row, indicating that disks are starting to fill up.

Note

Thinking in terms of DAGs, LTM cycles, and AMDR messages can be insightful when trying to understand the real-time performance of a Deephaven query. Each source table change creates a cascade of changes (AMDRs), which must be processed before the next LTM cycle can begin. If the source table changes trigger many AMDRs, AMDRs with many changed rows, or slow to compute AMDRs, it may be impossible to calculate all of the necessary updates before the next LTM cycle begins. Now your query is unresponsive. Yuck.

Deephaven’s performance analysis tools can help you dig into an unresponsive query to locate which operations are causing slow LTM cycles. Once you understand what operations are slow, you can rearrange your query so that fewer changes are being processed, add table snapshotting to reduce the frequency of changes, etc.

Finally, DAGs are not limited to one query. Preemptive tables allow tables (and AMDRs) to be shared between queries. You may have Query1 perform a difficult or secret calculation. Query2 can use the shared results of Query1 without having to recompute and without being able to see the secret sauce that went into Query1’s calculation.

Data Layout

To achieve maximum performance on massive datasets, Deephaven uses (1) column-oriented tables, and (2) a tree-like table representation. Understanding these two concepts will allow you to write the fastest possible queries.

First, let’s look at column-orientation. Most tabular databases are row-oriented. This means that all of the data for a row is stored together. Such storage can be convenient if rows are frequently changed, but it results in slow queries because the whole row must be processed by the database, even if only one element is needed.

Deephaven is column-oriented. This means that all of the data for a column is stored together. As a result, Deephaven needs to read less data to perform a query. Only the required data is read. Also, because data does not need to be moved when adding a new column, adding columns to a Deephaven table, for example by using update(...), is very fast.

This figure illustrates a table in both the row-oriented and column-oriented format. If we filter the table to the rows where column B is odd (t.where("B%2==1")), the column-oriented representation only reads the 3 values from the B column, while the row-oriented representation must read almost the entire table in order to access the 3 values it needs to perform the query.

Now that you understand how column-orientation works, let’s look at a simple performance tuning example.

from deephaven.TableTools import emptyTable

t1 = emptyTable(10).update("X = i", “Y = X*X)
t2 = t1.where("X > 2", "X < 6")
t3 = t1.where("X > 2 && X < 6")

Here, tables t2 and t3 contain the same values. Which is faster to compute? Because Deephaven is column-oriented, neither t2 nor t3 must read the values in column Y. Both t2 and t3 must read the values in column X, but because t2 breaks the X filter into two, it reads column X twice. t3 is faster!

Note

Simple X>2 type filters do not need to be compiled, but X>2&&X<6 type filters do. This could potentially add a few milliseconds to the run time. This will only matter for very small data sets where the compilation time is large relative to the run time. If the column being filtered on is a partition column or a grouping column, the more complex filter (e.g., X>2&&X<6) may throw out all of the partition/grouping information, resulting in a slower query.

Next, let’s look at Deephaven’s tree-like table representation. A single Deephaven table can contain billions of rows and petabytes of data. Loading all of this data, when your query only needs a tiny bit, would be inefficient. To minimize the data a query needs to access, Deephaven uses a tree-like representation.

This figure shows the tree. The top branches of the tree are known as partitioning columns, the lower branches of the tree are known as grouping columns, and finally, at the bottom, is the actual data. Partition and grouping columns may not be present in all tables. For example, real-time tables frequently have partition columns but no grouping columns.

To make this abstract concept concrete, let’s look at the "StockTrades" table in the "LearnDeephaven" namespace. Every table has metadata associated with it. This metadata describes all of a table’s columns, including whether a column is partitioning, grouping, or normal.

m = db.t("LearnDeephaven", "StockTrades").getMeta()

Here you can see that Date is a partitioning column, USym and Sym are grouping columns, and the other columns are normal columns. As a tree, this table looks like:

To produce efficient queries, you want to read as little data as possible as well as perform as few calculations as possible. The tree lets you do this.

Important

Just order your where clauses as follows:

partitioning
grouping
normal

Here are five queries that produce the same result. Which is the fastest? Which is the slowest?

t = db.t("LearnDeephaven", "StockTrades")
t1 = t.where("Date = `2017-08-25`", "Sym = `AAPL`", "Last > 159.0")
t2 = t.where("Sym = `AAPL`", "Date = `2017-08-25`", "Last > 159.0")
t3 = t.where("Sym = `AAPL`", "Last > 159.0", "Date = `2017-08-25`")
t4 = t.where("Last > 159.0", "Date = `2017-08-25`", "Sym = `AAPL`")
t5 = t.where("Last > 159.0", "Sym = `AAPL`", "Date = `2017-08-25`")

They are actually ordered from fastest to slowest. The filters in t1 are ordered with (1) partitioning, (2) grouping, and (3) normal. By ordering the clauses like this, massive sections are removed from the tree with each consecutive filter, and very little data needs to be read. On the other hand, the least efficient query, t5, orders the filters with normal first, grouping second, and partitioning third. This order causes the entire table to be read. Ouch.

Like filters, joins can be sensitive to ordering. By ordering your join columns in partitioning, grouping, normal order, your queries will be fastest.

By understanding Deephaven’s tree-like representation and using getMeta() to find column types, you can easily order your filters and joins for optimal performance. Also, by thinking in terms of trees, you can determine the most efficient way to represent your data as a tree.

Deephaven is constantly improving the system; column ordering may become less important in the future, but if you always order your columns in partitioning, grouping, normal order, you will always have the fastest performance.

Choose The Right Quotes

The Deephaven Query Language (DQL) uses three types of quotes: ‘ (single quote), “ (double quote), and ` (backtick). Choosing the right quote can be confusing. Let’s explore where each should be used.

DQL query strings are strings. In Python, strings can be defined with either a single quote or a double quote. Both yield valid query strings, as illustrated in this example.

from deephaven.TableTools import emptyTable

t1 = emptyTable(10).update("X = i", 'Y = X*X')

Backticks surround strings within a query string. This is easier to understand with an example.

from deephaven.TableTools import emptyTable

t1 = emptyTable(10).update("X = `A`",'Y = (i%2==0) ? `P` : `Q`')
t2 = db.t("LearnDeephaven","StockTrades").where("USym in `SPY`,`AAPL`,`AMZN`")

In table t1, column X is created and set equal to the string “A”, and column Y is set equal to either the string “P” or the string “Q”. In table t2, the StockTrades table is filtered to only include the rows where the USym column is equal to the strings “SPY”, “AAPL”, or “AMZN”.

Times, time periods, and characters are the most complex quoting case. Similar to strings, single quotes surround times, time periods, or characters within a query string.

from deephaven.TableTools import emptyTable

t1 = emptyTable(10).update("X = '2019-01-23T12:15 NY'", "Y = X + '00:10:00'", "Z=X + '1y'", "A='C'")

In this example, column X is created and is set equal to a date-time. Column Y is also a date-time and is set equal to 10 minutes after the date-time stored in X. Column Z is a date-time and is set equal to one year after the date-time stored in X. Column A is a character. A single quote is used both to surround the date-time, the time difference, the time period, and the character. (Note: Date columns are frequently stored as strings, which use backticks.)

There is one catch. You need to use double quotes for your query strings if you are going to use single quotes within the query strings.

For example:

s1 = "This is 'OK'"

s2 = 'This is 'NOT''

There is no downside to using double quotes for query strings. When in doubt, use double quotes around query strings. You can use single quotes with escapes, but they are more difficult to read.

Let’s wrap up with an advanced problem. Here all three types of ticks are used in one query string. What does the query do?

from deephaven.TableTools import emptyTable

t1 = emptyTable(10).update("X = '2019-01-23T12:15 NY' - convertDateTime(`2019-01-21T06:15 NY`)")

Looping: Don’t Do It!

Common programming languages, such as Python, Java, C, C++, and Go, have instructions that operate upon a single piece of data at a time. In order to operate upon multiple pieces of data, users must write loops. Deephaven is similar to NumPy, TensorFlow, or PyTorch, where a single command operates upon multiple pieces of data.

Because looping is so common in standard programming languages, new Deephaven users rely on loops to perform calculations. Their code often looks like:

from deephaven.TableTools import merge

t1 = db.t("LearnDeephaven", "StockTrades")
result = None
for date in t1.selectDistinct("Date").getColumn("Date").getDirect():
    for sym in t1.selectDistinct("Sym").getColumn("Sym").getDirect():
        t2 = t1.where(f"Date=`{date}`", f"Sym=`{sym}`")
        result = merge(result, t2.maxBy())

This code is fairly complex. It has multiple loops, multiple queries, and multiple table merges. It is slow. It also does not work correctly as a dynamic query. The loops iterate over the values in the Date and Sym columns. If a new value is added to the Sym column, the loop does not handle the dynamic data change, and the loops no longer iterate over the correct data. Oops.

A more experienced Deephaven user would write the same query as:

t = db.t("LearnDeephaven", "StockTrades").maxBy("Date","Sym")

This piece of code is:

much easier to read ✅
faster ✅
dynamically updates ✅

Win, Win, Win.

Every time you write a loop or call getColumn(...) in your Deephaven code, ask yourself how you can accomplish the same goal without the loop and without directly accessing the contents of the Deephaven table. Most changes will involve using key columns in aggregations and join columns in joins. At first, these changes will seem foreign, but soon it will become natural.

As with most rules, there are a few exceptions to the “don’t use loops” rule. Most commonly, the rule is broken to get around RAM limitations. If you are applying complex calculations to a huge dataset, your calculation may exhaust the RAM available to your job. To avoid RAM limitations, a loop can be used to break the massive dataset into smaller, more manageable datasets, which fit sequentially into RAM. Such exceptions to the rule are rare.

`Select`, `Update`, `View`, `UpdateView`, or `LazyUpdate`?

update(...), updateView(...), and lazyUpdate(...) add new columns to a table. select(...) and view(...) retain a subset of columns from a table, as well as adding new columns. While it may seem redundant to provide the same functionality multiple ways, each method uses RAM and CPU differently. This allows you to choose the method with the best performance characteristics for your use case.

Note

High Performance Filters

The secret to computational efficiency is being lazy - doing as little work as possible. For Deephaven, this means operating on the minimum possible amount of data.

Note

? Is Your Friend

The ternary operator ? allows an if-else block to be expressed in a single line of code.

from deephaven.TableTools import emptyTable

t = emptyTable(10).update("X=i", "Y=X*X")
t1 = t.update("Z = (X%2==0) ? Y : 42")

The query string "Z = (X%2==0) ? Y : 42" sets Z equal to Y, if X is even, or equal to 42 otherwise. Using the ternary operator results in concise queries and saves you from writing functions.

Note

Aggregations: Group, Compute, Ungroup

When analyzing data, it is common to group related data together and then perform calculations on the groups. For example, when analyzing the weather, you may want to group data by day and then compute the high and low temperatures for each day. In the Deephaven lingo, such operations are called “aggregations.”

Note

Choose The Right Join

Joins provide a way to combine results from multiple tables into a single resulting table. Many new users gravitate towards the join(...) method, because it sounds like what they want to do. However, Deephaven provides many types of joins. While join(...) will frequently accomplish the desired objective, there are other, similar joins which are more efficient.

Note

Handling NULL Values

Datasets frequently have missing values. In Deephaven, missing values are represented as null values. For performance reasons, each data type represents nulls differently. Complex data types, such as Objects are stored as standard “null” references - None in Python. For primitive types, such as doubles or integers, a single value from the type’s range is designated to be the null value.

A simple script can be used to see what the null values are:

from deephaven.conversion_utils import NULL_BYTE, NULL_SHORT, NULL_INT, NULL_LONG, NULL_FLOAT, NULL_DOUBLE;

nulls = {"byte": NULL_BYTE, "short": NULL_SHORT, "int": NULL_INT, "long": NULL_LONG, "float": NULL_FLOAT, "double": NULL_DOUBLE}
print(nulls)

Type	Value
Byte	-128
Short	-32768
int	-2147483648
Long	-9223372036854775808
Float	-3.4028234663852886e+38
Double	-1.7976931348623157e+308

from deephaven.TableTools import emptyTable
from deephaven.conversion_utils import NULL_INT

def f(i:int) -> int:
    if i%3 == 0:
        return NULL_INT

    return i

def g(i:int) -> str:
    if i%3 == 1:
        return None

    return "A"

t = emptyTable(10) \
    .update( \
        "X=(i%2==0) ? NULL_INT : i",
        "Y=(double)X",
        "Z=(i%2==1) ? null : `A`",
        "F = (int) f.call(i)",
        "G = (String) g.call(i)")

t1 = t.where("isNull(X)")
t2 = t.where("!isNull(Z)")
t3 = t.where("!isNull(F)")

Here, column X contains null values, which result from using NULL_INT in the Deephaven Query Language. Column Y shows that NULL_INT is automatically converted to NULL_DOUBLE by a DQL type cast. Column Z shows null being used as the null value for a string object. Finally, columns F and G show user-created functions returning null values. Note that Python’s None value is used for null objects in user created functions. All of these columns can be filtered using the built-in isNull(...) function.

When writing your own functions, think carefully about how missing data (nulls) should be treated. Will there be null inputs? Should null values be returned?

from deephaven.TableTools import emptyTable
from deephaven.conversion_utils import NULL_INT

def f(i:int) -> int:
    if i == NULL_INT:
        return 200

    return i

t = emptyTable(10) \
    .update( \
        "X=(i%2==0) ? NULL_INT : i",
        "Y = nullToValue(X,100)",
        "Z = (int) f.call(X)",
        )

This example shows a built-in function, nullToValue(...), that replaces null values with 100. It also shows a user defined function that replaces null values with 200.

Deephaven’s built-in functions are designed to work with null. Make sure that your functions also handle nulls, or you will get some funky results.

from deephaven.TableTools import emptyTable

def f(i:int) -> int:
    return i+1

t = emptyTable(10) \
    .update( \
        "X=(i%2==0) ? NULL_INT : i",
        "Y = (int) f.call(X)",
        )

Java Inside

Most user interactions with Deephaven happen via a programming language such as Python or Groovy, yet, under the hood, the Deephaven Query Engine is written in Java. Occasionally, it is beneficial to use the “Java inside.”

from deephaven.TableTools import newTable, intCol, stringCol

t1 = newTable(intCol("X",1,2,3), stringCol("Y","A/B","C/D","E/F"))
t2 = t1.update("A=java.lang.Math.sin(X)","B=Y.split(`/`)[1]")

In this query, Java is used twice in the query strings. First, the sin(...) method from the java.lang.Math class is used to compute column A. If you find a Java library that implements functionality you need, just place it on the classpath, and it is available in the query language. (See Query Language for more details.) Second, Deephaven Query Language strings are java.lang.String objects. Here the split(...) method plus an array access have been used to retrieve the second token in the string. This inline use of Java String methods eliminates the need to write a custom string processing function.

Because of the deep integration of Java within the Deephaven environment, it is possible to create Java objects within a Python query. This technique isn’t often used, but it is occasionally helpful when using a Java package.

from deephaven.TableTools import emptyTable
from deephaven import jpy

ArrayList = jpy.get_type("java.util.ArrayList")
al = ArrayList()
al.add(1)
al.add(3)
print(al)

t = emptyTable(10).update("X=i", "Y=X*X")
t1 = t.where("al.contains(X)")

In this Python example, a Java ArrayList is created and used in a query filter. You now have the power of Python, Java, and Deephaven available within a single query!

Arrays In The Query Language

When performing complex analyses, arrays are an invaluable tool. Arrays group related data together, and arrays provide an easy way to access offset data from a time series. Arrays are built into the Deephaven Query Language.

Every column in a table has an associated array variable. For example, the data in column X can also be accessed through the X_ array.

from deephaven.TableTools import emptyTable

t = emptyTable(10).update("X=i")
t1 = t.update("A = X_[i-1]", "B = X_[i+1]", "C = X_.size()")

Here, columns A and B are offset values of column X. Column C contains the size of column X. Column i is a special, built-in column which contains the row number and is useful for accessing arrays. Because column i is expensive to compute for ticking tables, it is only available for static tables. As a result, accessing a column’s values through the _-operator is used almost exclusively for analyzing static data.

Arrays can also be created when using by(...) to group data.

from deephaven.TableTools import emptyTable
from deephaven import jpy

def sum_square(x):
    sum = 0

    for i in range(x.size()):
        sum += x.get(i)*x.get(i)

    return sum

DbIntArray = jpy.get_type("com.illumon.iris.db.tables.dbarrays.DbIntArrayDirect")
db.importClass(DbIntArray.jclass)

def cumsum(x):
    vals = []
    sum = 0

    for i in range(x.size()):
        sum += x.get(i)*x.get(i)
        vals.append(sum)

    return DbIntArray(vals)


t1 = emptyTable(10).update("X=(i%2==0)?`A`:`B`", "Y=i")
t2 = t1.by("X")
t3 = t2.update( \
    "A=Y[3]",
    "B=Y[4]",
    "C=Y.subArray(2,4)",
    "D=C[1]",
    "E=sum(C)",
    "F=(int) sum_square.call((Object)Y)",
    "G=(DbIntArrayDirect) cumsum.call((Object)Y)")

In this example, table t2 is grouped by the values contained in column X. This results in column Y containing arrays. These arrays can be indexed (columns A and B), sliced (columns C and D), operated upon by built-in functions (column E) or operated upon by user functions (columns F and G). The user function cumsum(...) even returns an array.

Use the power of arrays in the Deephaven Query Language to make your queries more powerful and concise.

Use User Tables

As a Deephaven user, you will want to create your own tables. These tables may contain the results of long-running calculations or imported data. Deephaven User Tables are designed for these cases. User Tables are created by you, the user, and can have access controls to limit which other users can view them.

Note

Clean Chained Queries

Query operations can be chained together to express complex ideas.

from deephaven.TableTools import emptyTable

t1 = emptyTable(10).update("X = i", "Y = X*X").where("Y > 3", "X < 2")

While powerful, these chained queries can quickly become difficult to read. Apply some formatting to clean things up.

from deephaven.TableTools import emptyTable

t5 = emptyTable(10) \
    .update(
        "X = i",
        "Y = X*X",
        )\
    .where(
        "Y > 3",
        "X > 2",
        )

If you want to include comments between your chained query operations or avoid trailing slashes, surround the code block with parentheses.

from deephaven.TableTools import emptyTable

t5 = ( emptyTable(10)
	# Comment 1
    .update(
        "X = i",
        "Y = X*X",
        )
    # Comment 2
    .where(
        "Y > 3",
        "X > 2",
        )
    )

With some practice, you will get a feeling for how to format chained query operations for maximum readability.

Spreadsheet On Steroids

Spreadsheets are popular because a user can change a value in a cell, and the whole sheet recomputes to reflect the change. The same is possible with Deephaven. Deephaven Input Tables allow users to add, modify, and delete table values from within the GUI. The resulting table is a dynamic table, which updates with each change. Now you have the power of Deephaven combined with GUI-driven user input. A spreadsheet on steroids.

Note

At the time of this writing, GUI interactions with Input Tables are only possible in Deephaven Classic. The feature is coming soon to Deephaven Web.

Note

Self Help

To be productive, it is important to know where to find help. First, Deephaven has extensive documentation which can be found here. Python documentation can also be found using the built-in help(...) and dir(...) methods:

import deephaven

help(deephaven)

print(dir(deephaven))

help(deephaven.SmartKey)

Performance Tables

When a query is running, logging and performance information is written to tables in the DbInternal namespace. These dynamic tables are a godsend when trying to determine why a query failed or why a query is not performing as expected.

log = db.i("DbInternal", "ProcessEventLog").where("Date=currentDateNy()")
perf = db.i("DbInternal", "QueryPerformanceLog").where("Date=currentDateNy()")

The ProcessEventLog and the QueryPerformanceLog are examples of useful DbInternal tables.

Note

Browser Logs

Finally, errors occasionally occur in the browser when using the Deephaven Web user interface. Details on these errors can be obtained from the browser console logs. Browsers have built-in tools to aid web developers. These tools allow you to view the console logs. Google “developer tools” for your browser to find out more.

If you are still unable to solve your problem, reach out to Deephaven Support using web chat, Slack, email, or phone. The team is there to help.

Conclusion

Now that you have made it this far, you will be able to think In Deephaven and will be able to write clean and efficient queries. You are ready for more advanced topics, such as parallelization or data replay.

Think like a Deephaven ninja

How to write clean, efficient, and concise queries which dynamically update

Query Language

Query As A Graph

Data Layout

Choose The Right Quotes

Looping: Don’t Do It!

Select, Update, View, UpdateView, or LazyUpdate?

High Performance Filters

? Is Your Friend

Aggregations: Group, Compute, Ungroup

Choose The Right Join

Handling NULL Values

Java Inside

Arrays In The Query Language

Use User Tables

Clean Chained Queries

Spreadsheet On Steroids

Self Help

Performance Tables

Browser Logs

Conclusion

`Select`, `Update`, `View`, `UpdateView`, or `LazyUpdate`?