Find and interpret errors in queries
As with any programming language, when writing queries in Deephaven Query Language you will certainly make mistakes. Mistakes will often result in queries failing to start up properly or crashing at some point during their runtime. This guide aims to provide you with a set of best practices to use when interpreting and fixing these issues.
My (Persistent) Query won’t Start!
At this point, you have gone through the process of creating a query through one of the Deephaven frontends (Web and Swing). You have configured the query type and scheduling and written some DQL that should produce some interesting tables and visualizations, but when you attempt to start the query it fails.
At this point you will see something like this:
The first step is to find the error (referred to as an Exception) and isolate its root cause. In the Web UI, simply select the query and click “Show More” in the exception panel. (In the Swing UI, locate the “ExceptionDetails” column, right click and select “Show Exception”.)
This will display a large block of text that describes the exception and a chain of further exceptions that leads to the root of the problem.
For example
com.illumon.iris.db.tables.remotequery.QueryException: Failed to execute setup query for config Example Triage[1599758438077000002.4]
at com.illumon.iris.controller.PersistentQuery$InitializationJob.run(PersistentQuery.java:445)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.illumon.iris.db.tables.remotequery.RemoteQueryException: Unexpected error while processing query
at com.illumon.iris.db.tables.remotequery.QueryProcessorConnectionImpl.executeQuery(QueryProcessorConnectionImpl.java:715)
at com.illumon.iris.db.tables.remotequery.QueryProcessorConnectionImpl.executeQuery(QueryProcessorConnectionImpl.java:703)
at com.illumon.iris.db.tables.remotequery.QueryProcessorConnection.executeQuery(QueryProcessorConnection.java:116)
at com.illumon.iris.controller.PersistentQuery$InitializationJob.run(PersistentQuery.java:443)
... 3 more
Caused by: io.deephaven.DeephavenException: remote call or answer conversion had an exception: com.illumon.iris.db.tables.remotequery.RemoteQueryException: Error while executing query: com.illumon.iris.controller.exception.ScriptException: Failed to run script
at com.illumon.iris.db.tables.remotequery.QueryProcessorConnectionImpl$3.handleCommand(QueryProcessorConnectionImpl.java:769)
at com.fishlib.net.impl.nio.RoundTripTracker$AsyncRoundTrip.finish(RoundTripTracker.java:53)
at com.fishlib.net.impl.nio.RoundTripTracker.matchResponse(RoundTripTracker.java:98)
at com.fishlib.net.impl.nio.Client$RemoteConnection.handleIncoming(Client.java:289)
at com.fishlib.net.impl.nio.CommandConnection.handleIncoming(CommandConnection.java:178)
at com.fishlib.io.sched.IOJobImpl.notifyIncoming(IOJobImpl.java:1664)
at com.fishlib.io.sched.IOJobImpl.fillAndHandle(IOJobImpl.java:532)
at com.fishlib.io.sched.IOJobImpl.invoke(IOJobImpl.java:387)
at com.fishlib.io.sched.YASchedulerImpl.dispatch(YASchedulerImpl.java:703)
at com.fishlib.io.sched.YASchedulerImpl.work(YASchedulerImpl.java:789)
at com.fishlib.net.impl.nio.FastNIODriver.run(FastNIODriver.java:190)
... 1 more
Caused by: com.illumon.iris.db.tables.remotequery.RemoteQueryException: Error while executing query: com.illumon.iris.controller.exception.ScriptException: Failed to run script
at com.illumon.iris.db.tables.remotequery.RemoteQueryUtils.checkResponse(RemoteQueryUtils.java:159)
at com.illumon.iris.db.tables.remotequery.RemoteQueryUtils.checkResponse(RemoteQueryUtils.java:120)
at com.illumon.iris.db.tables.remotequery.QueryProcessorConnectionImpl$3.handleCommand(QueryProcessorConnectionImpl.java:761)
... 11 more
Caused by: com.illumon.iris.controller.exception.ScriptException: Failed to run script
at com.illumon.iris.controller.ScriptSetupQuery.execute(ScriptSetupQuery.java:138)
at com.illumon.iris.controller.ScriptSetupQuery.execute(ScriptSetupQuery.java:71)
at com.illumon.iris.controller.ScriptSetupQuery.execute(ScriptSetupQuery.java:37)
at com.illumon.iris.db.tables.remotequery.RemoteQueryProcessor$QueryAction.lambda$execute$0(RemoteQueryProcessor.java:1860)
at com.illumon.util.locks.FunctionalLock.computeLockedInterruptibly(FunctionalLock.java:80)
at com.illumon.iris.db.tables.remotequery.RemoteQueryProcessor$QueryAction.execute(RemoteQueryProcessor.java:1860)
at com.illumon.iris.db.tables.remotequery.RemoteQueryProcessor$ClientConnectionHandler.runSyncQueryAndSendResult(RemoteQueryProcessor.java:1625)
at com.illumon.iris.db.tables.remotequery.RemoteQueryProcessor$ClientConnectionHandler.handleCommandST(RemoteQueryProcessor.java:1525)
at com.illumon.iris.db.tables.remotequery.RemoteQueryProcessor$ClientConnectionHandler$HandleCommandRunnable.run(RemoteQueryProcessor.java:1133)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: java.lang.RuntimeException: Error encountered at line 15: evenOnly = annotated.where("even=yes")
at com.illumon.iris.db.util.IrisDbGroovySession.maybeRewriteStackTrace(IrisDbGroovySession.java:280)
at com.illumon.iris.db.util.IrisDbGroovySession.wrapAndRewriteStackTrace(IrisDbGroovySession.java:261)
at com.illumon.iris.db.util.IrisDbGroovySession.evaluate(IrisDbGroovySession.java:253)
at com.illumon.iris.controller.ScriptSetupQuery.execute(ScriptSetupQuery.java:124)
... 13 more
Caused by: com.illumon.iris.db.v2.select.FormulaCompilationException: Formula compilation error for: even=yes
at com.illumon.iris.db.v2.select.AbstractConditionFilter.init(AbstractConditionFilter.java:159)
at com.illumon.iris.db.v2.QueryTable.lambda$where$52(QueryTable.java:1501)
at com.illumon.iris.db.tables.utils.QueryPerformanceRecorder.withNugget(QueryPerformanceRecorder.java:507)
at com.illumon.iris.db.v2.QueryTable.where(QueryTable.java:1473)
at com.illumon.iris.db.tables.Table.where(Table.java:324)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSite.invoke(PojoMetaMethodSite.java:189)
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at com.illumon.iris.db.util.dynamic.qa_treasureplus_cluster_infra_1_c_illumon_eng_170715_internal_worker_364_2.run(qa_treasureplus_cluster_infra_1_c_illumon_eng_170715_internal_worker_364_2.groovy:61)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:570)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:608)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:579)
at com.illumon.iris.db.util.IrisDbGroovySession.evaluateCommand(IrisDbGroovySession.java:219)
at com.illumon.iris.db.util.IrisDbGroovySession.lambda$evaluate$2(IrisDbGroovySession.java:249)
at com.illumon.util.locks.FunctionalLock.doLockedInterruptibly(FunctionalLock.java:45)
at com.illumon.iris.db.util.IrisDbGroovySession.evaluate(IrisDbGroovySession.java:249)
... 14 more
Caused by: com.illumon.iris.db.tables.lang.DBLanguageParser$QueryLanguageParseException:
Having trouble with the following expression:
Full expression : even==yes
Expression having trouble :
Exception message : Cannot find variable or class even
at com.illumon.iris.db.tables.lang.DBLanguageParser.visit(DBLanguageParser.java:846)
at com.illumon.iris.db.tables.lang.DBLanguageParser.visit(DBLanguageParser.java:41)
at io.deephaven.shadow.javaparser.com.github.javaparser.ast.expr.NameExpr.accept(NameExpr.java:48)
at com.illumon.iris.db.tables.lang.DBLanguageParser.getTypeWithCaching(DBLanguageParser.java:653)
at com.illumon.iris.db.tables.lang.DBLanguageParser.visit(DBLanguageParser.java:926)
at com.illumon.iris.db.tables.lang.DBLanguageParser.visit(DBLanguageParser.java:41)
at io.deephaven.shadow.javaparser.com.github.javaparser.ast.expr.BinaryExpr.accept(BinaryExpr.java:78)
at com.illumon.iris.db.tables.lang.DBLanguageParser.<init>(DBLanguageParser.java:125)
at com.illumon.iris.db.v2.select.AbstractConditionFilter.init(AbstractConditionFilter.java:114)
... 35 more
These stack traces tend to be very large but can be broken down into smaller digestible chunks by looking for the “Caused by:” clauses. For example, the exception above can be reduced to this:
com.illumon.iris.db.tables.remotequery.QueryException:
Failed to execute setup query for config Example Triage[1599758438077000002.2]
Caused by: com.illumon.iris.db.tables.remotequery.RemoteQueryException:
Unexpected error while processing query
Caused by: io.deephaven.DeephavenException:
remote call or answer conversion had an exception: com.illumon.iris.db.tables.remotequery.RemoteQueryException:
Error while executing query: com.illumon.iris.controller.exception.ScriptException:
Failed to run script
Caused by: com.illumon.iris.db.tables.remotequery.RemoteQueryException:
Error while executing query: com.illumon.iris.controller.exception.ScriptException:
Failed to run script
Caused by: com.illumon.iris.controller.exception.ScriptException:
Failed to run script
Caused by: java.lang.RuntimeException:
Error encountered at line 15: evenOnly = annotated.where("even=yes")
Caused by: com.illumon.iris.db.v2.select.FormulaCompilationException:
Formula compilation error for: even=yes
Caused by: com.illumon.iris.db.tables.lang.DBLanguageParser\$QueryLanguageParseException:
Having trouble with the following expression:
Full expression : even==yes
Expression having trouble :
Exception message : Cannot find variable or class even
When you interpret these exception chains it is best to start from the last exception in the chain and then work backwards for broader context. So let’s analyze this exception in that order.
The lowest exception is a QueryLanguageParseException
that claims it can’t find the variable even
within the expression evenOnly = annotated.where("even=yes")
. The error occurs on a where()
clause trying to filter a column within a table. The most likely cause for this is that the column does not exist in the table that you are trying to filter, so let’s look at the query:
annotated = tt.update("Even=i%2==0?true:false",
"Diff=(Timestamp-Timestamp*[0])/1000000000 *(Even ? 1 : -1)",
"Dummy=1-i",
"Parity=Diff%2==0?`Even`:`Odd`",
"USym=syms[i % syms.length]",
"Sym=syms[i % syms.length]+`*`+Diff",
"Expiry=Timestamp+HOUR")
evenOnly = annotated.where("even=yes")
Note that the actual column name was “Even” with an uppercase E
. Once that is corrected another exception occurs. Following the same procedure as the first time around, we see this:
Having trouble with the following expression:
Full expression : Even==yes
Expression having trouble :
Exception message : Cannot find variable or class yes
Now it does not understand what yes
is. Looking closer, Even
is a boolean column, which can only be true
or false
. This could be corrected by changing the formula to .where("Even=true")
or simply .where("Even")
.
The example above is a very simple example of triaging an equally simple query. In practice queries will be much more sophisticated and contain many interdependencies between tables. The recipe for triage, however, is the same. See the sections below for examples of some common mistakes.
My Query Crashed!
Once you have fixed the static problems in the query it will transition to the “Running” state and start serving up tables and plots for your workspace. In many cases this is the point where you can declare success, but sometimes the query may crash unexpectedly at some point during its normal runtime. These problems can be trickier to triage because they will often be related to unexpected streaming data patterns.
Your first steps should be the same as for a startup error:
- Find the exception,
- reduce it to it’s set of “Caused By” expressions,
- then start at the last one and work backwards.
The Code Studio in Web UI and Query Console in Swing UI are both critical tools that you can use during this process. Once you have isolated which operation is the one failing you can execute parts of the query bit by bit in a Code Studio to analyze the inputs to the failing method and experiment with changes to solve the problem.
Below we will discuss some of the most common categories of failure. Note that the provided examples will contain the following clause:
.where(new IncrementalReleaseFilter(10,1))
This is simply to make the static (non-ticking) historical example data conform to the ticking patterns required to demonstrate each failure.
Unexpected Null Values
It is easy to forget that columns can contain null values. If your queries do not account for this they will fail with an NullPointerException. The query below exhibits this problem:
import com.illumon.iris.db.v2.select.IncrementalReleaseFilter
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
quotes = db.t("LearnDeephaven", "StockQuotes").where("Date=`2017-08-25`")
quotesBinned = quotes.updateView("Bin=upperBin(Timestamp, MINUTE)")
tradesBinned = trades.updateView("Bin=upperBin(Timestamp, MINUTE)")
// Make this a live ticking table from historical.
bankOfAmericaQuotes = quotesBinned.where("USym=`BAC`")
.where(new IncrementalReleaseFilter(10,1))
bankOfAmericaTrades = tradesBinned.where("USym=`BAC`")
.where(new IncrementalReleaseFilter(10,1))
joined = bankOfAmericaQuotes.raj(bankOfAmericaTrades, "USym,Bin", "TradeExchange=Exchange, SaleCondition,Last,Size")
FTOnly = joined.where("SaleCondition.indexOf(`FT`) >= 0")
This query bins quotes and trades in 1 minute intervals, then attempts to locate the nearest trade that occurred after each binned quote for the BAC
symbol, and narrow the results to only trades that have the “FT” SaleCondition. It eventually fails with the following error:
com.illumon.iris.db.v2.select.FormulaEvaluationException: java.lang.NullPointerException encountered in filter={ SaleCondition.indexOf(`FT`) >= 0 }
Caused by: java.lang.NullPointerException
Why did this occur?
- We are binning quotes and trades by 1 minute intervals.
- We are searching for matches later than the current bin.
- It is entirely possible that there are no trades following a specific quote bin. In this case raj() will produce a null value in the column.
As discussed above, using a Code Studio is a good way we can inspect the tables for this condition. If you open a Code Studio and execute the same query, up to the failing .where()
, you will see this:
You can see the blank values in SaleCondition. Let’s filter this table down to only rows that contain null values.
In this case, we must simply check that the value we are trying to inspect is not null before we inspect it.
joined = bankOfAmericaQuotes.raj(bankOfAmericaTrades, "USym,Bin", "TradeExchange=Exchange, SaleCondition,Last,Size")
.where(“!isNull(SaleCondition)”)
Let's validate that this is works the way we expect in the Code Studio:
Looks good! Note that in this case we put the filter after the reverse as-of-join to ensure that the table ‘joined’ does not contain any null SaleConditions. If you want to exclude nulls for only the FTOnly table, you could have also done:
FTOnly = joined.where("!isNull(SaleCondition) && SaleCondition.indexOf(`FT`) >= 0")
String and Array Access
Another common set of problems relate to accessing strings or arrays. You may need to classify rows based upon the value of a character within a string of expected size; operations such as by()
and leftJoin()
will produce cells that contain arrays of values. In these cases there is a fair chance of trying to access a value in one of these types that does not exist. Below is an example of this:
import com.illumon.iris.db.v2.select.IncrementalReleaseFilter
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
.sort("Timestamp")
.where(new IncrementalReleaseFilter(0,1))
.where("!isNull(SaleCondition) && SaleCondition.charAt(2) == 'I'")
This query is attempting to filter the set of trades down to ones that only have the “I” condition. It will eventually fail with this exception:
com.illumon.iris.db.v2.select.FormulaEvaluationException: java.lang.StringIndexOutOfBoundsException encountered in filter={ !isNull(SaleCondition) && SaleCondition.charAt(2) == 'I' }
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 2
This is saying that you are trying to get the third character in a string that does not have three characters. Why could this have happened?
- Looking closely, the query is trying to grab the third character from SaleCondition.
- The assumption was clearly that the SaleCondition string was of a specific size (at least three characters). Either that assumption was wrong or the source of that data did not honor a guarantee.
- In the case of a data source providing incorrect data, the solution is to fix the source.
- If your assumptions were wrong, you must change the expression to be tolerant of what the real values can be.
- In both of these cases, it is good to exercise “defensive programming”. Try and detect bad inputs and handle them gracefully, even if they are not supposed to ever occur.
Use the Code Studio to inspect the table by adding a new column "SCLength = SaleCondition.length()"
:
Filter that table to only rows where SCLength < 3
using the right-click menu results:
Following the advice above, a good solution to this problem would be to change the query to:
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
.sort("Timestamp")
.where(new IncrementalReleaseFilter(0,1))
.where("!isNull(SaleCondition) && SaleCondition.length >= 3 && SaleCondition.charAt(2) == 'I'")
An even better solution, if you really only care about the presence of the I character in the string, is:
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
.sort("Timestamp")
.where(new IncrementalReleaseFilter(0,1))
.where("!isNull(SaleCondition) && SaleCondition.contains(`I`)")
As always, use the Code Studio to validate that this change works and produces the result you expect!
Join Key Problems
Some of the most powerful features of DQL are the join
operations. They merge tables together based upon matching parameters to produce new tables. Some flavors of join are tolerant to the cardinality of key instances on the left and right hand sides, but some are not. naturalJoin()
, for example, requires that there is no more than one right hand side key mapped to every left hand side key. This can cause issues when the input data to a joined table does not adhere to these requirements. Below is a query that exhibits this behavior.
import com.illumon.iris.db.v2.select.IncrementalReleaseFilter
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
quotes = db.t("LearnDeephaven", "StockQuotes").where("Date=`2017-08-25`")
quotesBinned = quotes.updateView("Bin=upperBin(Timestamp, MINUTE)")
tradesBinned = trades.updateView("Bin=upperBin(Timestamp, MINUTE)")
// Make this a live ticking table from historical.
bankOfAmericaQuotes = quotesBinned.where("USym=`BAC`")
.where(new IncrementalReleaseFilter(0,1))
bankOfAmericaTrades = tradesBinned.where("USym=`BAC`")
joined = bankOfAmericaTrades.naturalJoin(bankOfAmericaQuotes, "USym,Bin", "BidExchange=Exchange,Bid,BidSize")
This query joins the binned trade prices to the trades table to augment it with bid exchange, size, and price by USym and time bin. It eventually fails with the following error:
java.lang.IllegalStateException: Duplicate right key for [BAC, 1503660240000000000]
Why did this happen?
- We are binning quotes and trades in one minute intervals.
- This will produce potentially many matching rows for trades of a single USym.
- We are using a
naturalJoin
to join in quotes, binned in the same manner as trades. - We know that
naturalJoin()
requires there to be no more than one right hand side key mapped to a particular left hand side key. - This means that bankOfAmericaQuotes contains multiple
BAC
mappings for the same time bin.
We could change our binning interval, but that would likely eventually result in the same issue. What we really need to do is guarantee that every USym, Bin pair in quotesBinnedTable is unique. Let's use lastBy()
.
// Make this a live ticking table from historical.
bankOfAmericaQuotes = quotesBinned.where("USym=`BAC`")
.where(new IncrementalReleaseFilter(0,1))
.lastBy("Bin")
Now the input to naturalJoin()
will have at most one entry on the right hand side. The other flavors of join()
may have their own specific input requirements so if you encounter similar errors that reference join keys, search through your queries and validate that you are meeting these requirements.
Unstable Formulas
When you are writing a query there are many cases where you want to derive a new column from data in several other columns. Following Best Practices may have led you to choose .updateView()
for this. updateView()
creates columns that compute their values lazily, and do not store the result of the computation - that is, it is recomputed every time a cell is accessed. Under most circumstances this is exactly what you want; computations are cheap and you can save on memory use. Problems can occur, however, when the result of this derived computation is used in downstream operations and the computation itself is not stable.
We declare that a formula is “stable” if it is guaranteed to compute the same value every time it is executed. If a formula is written that does not provide this guarantee it can create issues in your query that are extremely difficult to detect and even harder to diagnose. In some cases an unstable formula may not even crash the query, but produce incorrect results!
Take the query below, for example:
import com.illumon.iris.db.v2.select.IncrementalReleaseFilter
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
.sort("Timestamp")
.where(new IncrementalReleaseFilter(0,1))
.sort("Size")
.updateView("PreviousExchange=Exchange\_[i-1]")
.tail(10)
unstable = trades.maxBy("USym", "PreviousExchange")
This query tries to find maximums for each “USym, PreviousExchange” pair, where “PreviousExchange” is the value of the exchange column from the row immediately before the current one.
This query eventually produces an error like this:
java.lang.IllegalStateException: Failed to find main aggregation slot for key [INTC, Arca]
This means that a table operation that groups values by keys (a flavor of join()
, or by()
) tried to update the value for a mapping that should have existed, but unexpectedly did not.
- Inspecting the query, there is a
maxBy()
operation that depends on “USym” and “PreviousExchange”. - If you search for where those columns are declared in the “trades” table, you find that “USym” came from the original data source, and “PreviousExchange” came from an
updateView()
. - Also, “PreviousExchange” depends on the value of the Exchange column from the previous row.
- We also know that
updateView()
computes values lazily, and does not store results. - This means that when the maxBy() operation receives an update, “PreviousExchange” evaluates to whatever the value of Exchange was in the previous row at this moment in time.
- Therefore the next time the formula is evaluated, the result is very likely to be different than it was now, especially if the table is sorted.
This can be easily fixed by changing the updateView()
to an update()
. This will compute the value at that instant in time and store it so it will remain consistent.
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
.sort("Timestamp")
.where(new IncrementalReleaseFilter(0,1))
.sort("Size")
.update("PreviousExchange=Exchange\_[i-1]")
.tail(10)
Note that this does not mean you should never use updateView()
. You just need to be careful when using updateView()
to compute something that might change like current time, a hashmap lookup, or a reference to a previous or following row.
The takeaway from this should be:
- Be very suspicious of using "i" in a ticking table
- Be suspicious of calling current time functions in a ticking table or in any way depending on the outside world.
- You may not get the right answer with an update either, it may simply be less likely to crash.
Out of heap
One of the most common problems with a query can be lack of memory, referred to as heap, to handle a query as the data volume grows throughout the day. This occurs if enough heap is not allocated to the query, but can be exacerbated by unexpected data patterns like unusually busy market days. Most often this can be fixed by simply increasing the heap allocated to the query, but an analysis of the query itself may yield improvements to the script that reduces overall heap usage. Below is are two common examples of Out Of Heap failures:
Caused by: java.lang.OutOfMemoryError: Java heap space
In the second example, “Disconnected” can be caused by several other issues as well. You should inspect the Process Event Log for this failed worker to find the root cause.
There are many ways to approach this problem and, in practice, resolution will require some combination of all of the techniques mentioned below.
The first, and simplest, action to take is to increase the heap size of your query (or Code Studio) in the configuration.
For many workloads this will be enough. However, users must be aware that you may not get the results you expect when you increase memory across the 32GB boundary. Java uses address compression techniques to optimize memory usage when maximum heap size is smaller than 32GB. Once you request 32GB or more this feature is disabled so that the program can access its entire memory space. This means that if you start with 30GB of heap and then increase the size to 32GB you will have less heap available to your query. If you must cross the 32GB boundary, it is best to jump directly to 42GB to account for this.
The next step is to review your query code.
- Look for places where you use
.update()
and carefully consider if.updateView()
might be a better choice. Keep in mind the earlier discussion about formula stability while doing this. Also, remember that the choice betweenupdate()
andupdateView()
is a memory vs. CPU tradeoff. You will be paying more in CPU time with updateView() in order to save on memory. For simple computations and transformations this is often a very good tradeoff. - Search for duplicate tables, or tables that only differ in column count. You should try to re-use tables in derivative computations as much as possible to make use of work that the query has already done. In most cases the Deephaven system will automatically recognize computations you have duplicated and use the original tables, however it is best to not rely on this behavior and instead be explicit about what tables you derive from.
- When using DQL expressions that group rows based on keys (
byExternal()
,by()
, andjoin()
operations), pay close attention to the number of unique keys in the tables you apply these operations to. If keys are generally unique (Trade ID, for example) within a table, then an operation likebyExternal()
will end up producing potentially millions of single row tables which all consume some heap space in overhead. In these cases consider if you can use different keys, or even a different set of DQL operations to get to the end goal. - Carefully consider the order you execute DQL expressions. A very simple example is applying a
join()
and awhere()
condition. If you were to executederived = myTable.join(myOtherTable, “JoinColumn”).where(“FilterColumn=
FilterValue)
, thejoin()
operation will end up consuming much more heap than it should be, given that you are going to filter it down later on in the.where()
expression. This query would be better expressed as:derived = myTable.where(“FilterColumn=
FilterValue).join(myOtherTable, “JoinColumn”)
. This will save in heap usage, but also has the benefit of reducing how many rows must be processed in downstream tables (called ticks) when the left hand or right hand tables tick.
There are many other reasons that a query may be using more heap than you expect. This will require a deep analysis of how the query is performing at runtime. Take a look at the monitoring query performance article for an in-depth discussion of how to take this deep dive.
Garbage Collection (GC) timeouts
GC timeouts are another type of heap related problem. These occur when you have allocated enough memory to your query to produce the output tables you want, but your query operations are ordered, or your data ticks in such a way that downstream operations must process a large quantity of rows. This results in lots of temporary heap allocation to process upstream ticks, which is then collected by the JVM in a phase called ‘Garbage Collection’ which consumes CPU time that can halt your normal query processing and cause computations to back up even further.
This behavior is characterized by the following in the Performance Overview tables:
- Very long GC times.
- Large numbers of ParNew GC points.
- A high frequency sawtooth memory usage pattern.
- Periodic, extremely long LTM cycles.
Data Source Death
This problem occurs, as you might deduce, when the system, or query, that is providing data to your query fails, or otherwise stops providing data. This can be caused by:
- Loss of the Data Import Server (DIS).
- Loss of the Table Data Cache Proxy (TDCP).
- Death of a Persistent Query that is providing a Preemptive Updates Table (PUT).
- Administrative deletion of an in-use intraday partition.
In most of these cases it is enough to recognize the error and classify it as one of these and report it to an administrator who can locate and solve the root cause.
If your PQ depends on a Preemptive Updates Table served by another query that you control, then you may be able to solve the problem by triaging that query. Another option may be to take advantage of the ConnectionAwareRemoteTable to gracefully handle this.
None of this helped - I think I found a bug!
If you have gone through all of the steps to identify the cause of a problem, but can’t find a cause in your query, it is always possible that you have uncovered a bug. In this case you should use the “Send to Deephaven Support” feature to request help from the Deephaven Team. Be sure to include the following with your bug report:
- The version of the Deephaven system.
- The complete stack trace, not just the list of “caused by” expressions.
- A description of the query, if you cannot send a code snippet.
- Logs from your user console or your Browser logs if Web IDE.
- Screenshots of any Performance Overview analysis you have done (if possible).
Appendix: Common Errors in Queries
- Is your DQL expression properly quoted?
- Expressions within DQL statements such as
where()
orupdate()
must be surrounded by double quotes (") for example:myTable.where("StringColumn=`TheValue`")
- If you copy-pasted an expression from somewhere (Slack for example), some systems will copy a unicode double quotation (U+201C and U+201D) instead of an ASCII quotation (U+0022).
- Expressions within DQL statements such as
- Do you have matching parentheses?
- Make sure that any open parenthesis that you are using are matched by a close parenthesis.
- If you are referring to a column in a DQL expression:
- Check the spelling and capitalization.
- Is the type of the column what you expect it to be?
- Did you account for null values?
- If you are using strings in your expression, did you quote them properly?
- Strings are quoted with the backtick character not single quotes (‘) or double quotes (")
- Do you have matching close quotes for your open quotes?
- If you are using DBDateTimes in your expressions:
- Did you use single quotes (‘) not double quotes (") or backticks?
- Did you use the proper format? DateTimes are expected as
<yyyyMMDD>T<HH:mm:ss.nnnnnnnnn> <TZ>
- Are all classes that you are trying to use properly imported?
- Groovy will not search for classes that you use; if they are not part of the standard set of DQL imports you must import them yourself.