Skip to main content
Version: Java (Groovy)

Deephaven's table types

Deephaven tables are the core data structures supporting Deephaven's static and streaming capabilities. Deephaven implements several specialized table types that differ in how streaming data is stored and processed in the engine, how the UI displays data to the user, and how some downstream operations behave. This document covers static tables, standard streaming tables, and the four specialized streaming table types: append-only, add-only, blink, and ring.

Table type summary​

This guide discusses the unique properties of each of Deephaven's table types. The main points are summarized in the following table.

Table typeIndex-based ops, special variablesConsistent inter-cycle row orderingBounded memory usageInter-cycle data persistence
Staticβœ…πŸš«βŒπŸš«
StandardβŒβŒβŒβœ…
Append-onlyβœ…βœ…βŒβœ…
Add-onlyβŒβŒβŒβœ…
Blinkβœ…πŸš«βœ…βŒ
RingβŒβŒβœ…βœ…

The rest of this guide explores each of these table types, their properties, and the consequences of those properties.

Static tables​

Static tables are the simplest types of Deephaven tables and are analogous to Pandas Dataframes, PyArrow Tables, and many other tabular representations of data. They represent data sources that do not update, and therefore do not support any of Deephaven's streaming capabilities.

Because static tables do not update, they have the following characteristics:

  1. Index-based operations are fully supported. The row indices of a static table always range from 0 to N-1, so operations can depend on these values to be stable.
  2. Operations that depend on or modify external state can be used with static tables. Stateful operations can present problems for some types of streaming tables.
  3. The use of special variables is fully supported. Deephaven's special variables i and ii represent row indices of a table as int or long types, respectively. These variables are guaranteed to have consistent values in static tables.

Static tables can be created by reading from a static data source, such as CSV, Iceberg, Parquet, SQL. Or, they can be created with Deephaven's table creation functions, like newTable or emptyTable. This example uses emptyTable to construct a static table:

// create a static table with 10 rows and 2 columns
t = emptyTable(10).update("IntIdx = i", "LongIdx = ii")

Check whether a table is a static table with the isRefreshing property. This property will be False for static tables:

println t.isRefreshing()

Any streaming table can be converted to a static table by taking a snapshot. This will produce a static copy of the streaming table at the moment in time the snapshot is taken:

// create a streaming table with timeTable
t = timeTable("PT1s")

// at some point in the future, take a snapshot of the streaming table
tSnapshot = t.snapshot()

img

Verify that the snapshot is static with isRefreshing:

println t.isRefreshing()
println tSnapshot.isRefreshing()

Standard streaming tables​

Most streaming Deephaven tables are "standard" tables. These are the most flexible and least constrained types of tables, with the following key properties:

  • Rows can be added, modified, deleted, or reindexed at any time, at any position in the table.
  • The table's size can grow without bound.

These properties have some important consequences:

  1. Index-based operations, stateful operations, or operations using special variables may yield results that change unexpectedly between update cycles. By default, Deephaven throws an error in these cases.
  2. The rows in standard tables are not guaranteed to maintain their original order of arrival. Operations should not assume anything about the order of data in a standard table.
  3. Standard tables may eventually result in out-of-memory errors in data-intensive applications.

These properties are not ideal for every use case. Deephaven's specialized table types provide alternatives.

Specialization 1: Append-only​

Append-only tables are highly-constrained types of tables. They have the following key properties:

  • Rows can only be added to the end of the table.
  • Once a row is in an append-only table, it cannot be modified, deleted, or reindexed.
  • The table's size can grow without bound.

These properties yield the following consequences:

  1. Append-only tables guarantee that old rows will not change, move, or disappear, so index-based operations, stateful operations, or operations using special variables are guaranteed to yield results that do not change unexpectedly between update cycles.
  2. The rows in append-only tables are guaranteed to maintain their original order of arrival.
  3. Append-only tables may eventually result in out-of-memory errors in data-intensive applications.

Append-only tables are useful when the use case needs a complete and ordered history of every record ingested from a stream. They are safe and predictable under any Deephaven query and are guaranteed to retain all the data they've seen.

Specialization 2: Add-only​

Add-only tables are relaxed versions of append-only tables. They have the following key properties:

  • Rows can only be added to the table, but they may be added at any position in the table.
  • Existing rows cannot be deleted or modified, but may be reindexed.
  • The table's size can grow without bound.

These properties yield the following consequences:

  1. Index-based operations, stateful operations, or operations using special variables may yield results that change unexpectedly between update cycles. By default, Deephaven throws an error in these cases.
  2. The rows in add-only tables are not guaranteed to maintain their original order of arrival. Operations should not assume anything about the order of data in an add-only table.
  3. Add-only tables may eventually result in out-of-memory errors in data-intensive applications.

Blink tables keep only the set of rows received during the current update cycle. Users can create blink tables when ingesting Kafka streams, creating time tables, or using Table Publishers. They have the following key properties:

  • The table only consists of rows added in the previous update cycle.
  • No rows persist for more than one update cycle.
  • The table's size is bounded by the size of the largest update it receives.

These properties have the following consequences:

  1. Since blink tables see a brand new world at every update cycle, index-based operations, stateful operations, or operations using special variables are guaranteed to yield results that do not change unexpectedly between update cycles.
  2. The entire table changes every update cycle, so preserving row order from cycle to cycle is irrelevant.
  3. Blink tables can only cause memory problems if a single update receives more data than fits in available RAM. This is unusual, but not impossible.

Blink tables are the default table type for Kafka ingestion within Deephaven because they use little memory. They are most useful for low-memory aggregations, deriving downstream tables, or using programmatic listeners to react to data.

Check whether a table is a blink table with the isBlink method:

import io.deephaven.engine.table.impl.BlinkTableTools
import io.deephaven.engine.table.impl.TimeTable.Builder

// TimeTable.Builder can be used to create a blink table with blinkTable=true
builder = new Builder().period("PT0.2s").blinkTable(true)

t = builder.build().update("X = ii")

img

println BlinkTableTools.isBlink(t)

Aggregation operations such as aggBy and countBy operate with special semantics on blink tables, allowing the result to aggregate over the entire observed stream of rows from the time the operation is initiated. That means, for example, that a sumBy on a blink table will contain the resulting sums for each aggregation group over all observed rows since the sumBy was applied, rather than just the sums for the current update cycle. This allows for aggregations over the full history of a stream to be performed with greatly reduced memory costs when compared to the alternative strategy of holding the entirety of the stream as an in-memory table.

Here is an example that demonstrates a blink table's specialized aggregation semantics:

// create blink table with two groups of data to sum
builder = new Builder().period("PT0.1s").blinkTable(true)

t = builder.build().update("X = ii", "Group = ii % 2 == 0 ? `A` : `B`")

// note that the sums continue to grow by including all previous data
tSum = t.view("X", "Group").sumBy("Group")

img

These special aggregation semantics may not always be desirable. Disable them by calling removeBlink on the blink table:

tNoBlink = t.removeBlink()

// sum is no longer over all data, but only over data in this cycle
tSumNoBlink = tNoBlink.view("X", "Group").sumBy("Group")

img

Most operations on blink tables behave exactly as they do on other tables (see the exclusions below); that is, added rows are processed as usual. For example, select on a blink table will contain only the newly added rows from the current update cycle.

Because Deephaven does not need to keep all the history of rows read from the input stream in memory, table operations on blink tables may require less memory.

Unsupported operations​

Attempting to use the following operations on a blink table will raise an error:

It is common to create an append-only table from a blink table to preserve the entire data history. Use blinkToAppendOnly to do this:

import io.deephaven.engine.table.impl.BlinkTableTools
import io.deephaven.engine.table.impl.TimeTable.Builder

builder = new Builder().period("PT1s").blinkTable(true)

t = builder.build().update("X = ii")

// get an append-only table from t
tAppendOnly = BlinkTableTools.blinkToAppendOnly(t)

img

tip

To disable blink table semantics, use [removeBlink], which returns a child table that is identical to the parent blink table in every way, but is no longer marked for special blink table semantics. The resulting table will still exhibit the β€œblink” table update pattern, removing all previous rows on each cycle, and thus only containing β€œnew” rows.

Specialization 4: Ring​

Ring tables are like standard tables, but are limited in how large they can grow. They have the following key properties:

  • Rows can be added, modified, deleted, or reindexed at any time, at any position in the table.
  • The table's size is strictly limited to the latest N rows, set by the user. As new rows are added, old rows are discarded so as to not exceed the maximum limit.

These properties have the following consequences:

  1. Index-based operations, stateful operations, or operations using special variables may yield results that change unexpectedly between update cycles. By default, Deephaven throws an error in these cases.
  2. The rows in ring tables are not guaranteed to maintain their original order of arrival. Operations should not assume anything about the order of data in a ring table.
  3. Ring tables will not grow without bound and are strictly limited to a maximum number of rows. Once that limit is reached, the oldest rows are discarded and deleted from memory.

Ring tables are semantically the same as standard tables, and they do not get specialized aggregation semantics like blink tables do. However, operations use less memory because ring tables dispose of old data.

It is common to create a ring table from a blink table to preserve some data history, but not all. Use RingTableTools.of to do this.

The following example creates a ring table that holds the five most recent observations from a blink table:

import io.deephaven.engine.table.impl.sources.ring.RingTableTools
import io.deephaven.engine.table.impl.TimeTable.Builder

builder = new Builder().period("PT0.5s").blinkTable(true)

t = builder.build()

// get ring table from t that holds last five rows
tRing = RingTableTools.of(t, 5)

img

Create a ring table from an append-only table​

caution

Creating a ring table from an append-only table does not give the memory savings that ring tables are useful for.

Ring tables can also be created from append-only tables using RingTableTools.of. This is a less common use case because the typical memory savings that ring tables afford is lost. If there is an append-only table anywhere in the equation, it can grow until it eats up all available memory. A downstream ring table will only appear to save on memory, and is effectively equivalent to applying a tail operation to an append-only table.

This example creates a ring table with a 5-row capacity from a simple append-only time table:

import io.deephaven.engine.table.impl.sources.ring.RingTableTools

// t is an append-only table
t = timeTable("PT0.5s")

// get ring table from t that holds last three rows
tRing = RingTableTools.of(t, 3)

img

If the source append-only table already has rows in it when RingTableTools.of is called, the resulting ring table will include those rows by default:

import io.deephaven.engine.table.impl.sources.ring.RingTableTools

// create append-only table that starts with five rows
tStatic = emptyTable(5).update("X = ii")
tDynamic = timeTable("PT1s").update("X = ii + 5").dropColumns("Timestamp")
t = merge(tStatic, tDynamic)

// get ring table from t that holds last ten rows
tRingWithInitial = RingTableTools.of(t, 10)

img

To disable this behavior, set initialize = false:

import io.deephaven.engine.table.impl.sources.ring.RingTableTools

tRingWithoutInitial = RingTableTools.of(t, 10, false)

img