Skip to main content
Version: Python

Create Your First Tables

Static tables

The simplest way to create static tables from scratch is with the new_table and empty_table methods.

from deephaven import new_table, empty_table
from deephaven.column import int_col, long_col, double_col

static_table1 = new_table(
[
int_col("IntColumn", [0, 1, 2, 3, 4]),
long_col("LongColumn", [0, 1, 2, 3, 4]),
double_col("DoubleColumn", [0.12345, 2.12345, 4.12345, 6.12345, 8.12345]),
]
)

static_table2 = empty_table(5).update_view(
[
"IntColumn = i",
"LongColumn = ii",
"DoubleColumn = IntColumn + LongColumn + 0.12345",
]
)

NOTE: The variables i and ii correspond to int and long row indices, respectively. They are only supported in append-only tables.

The two tables look identical but are created differently.

  • new_table builds a table directly from column type specifications and raw data.
  • empty_table builds an empty table with no columns and a specified number of rows. The Deephaven Query Language (DQL) can add new columns programmatically.

Ticking tables

You can create ticking tables to get a feel for live data in Deephaven. The time_table method works similarly to empty_table in that DQL is used to populate the table with more data. It creates a table with just a Timestamp column. The resultant table ticks at a regular interval specified by the input argument.

from deephaven import time_table

ticking_table = time_table("PT1s")

img

The PT1s argument is an ISO-8641 formatted duration string that indicates the table will tick once every second.

New ticking tables can be derived from existing ones using DQL, just as in the static case.

new_ticking_table = ticking_table.update_view(
"TimestampPlusOneSecond = Timestamp + 'PT1s'"
)

img

This exemplifies Deephaven's use of the Directed Acyclic Graph (DAG). The table ticking_table is a root node in the DAG, and new_ticking_table is a downstream node. Because the source table is ticking, the new table is also ticking. Every update to ticking_table propagates down the DAG to new_ticking_table, and the compute-on-deltas model ensures that only the updated rows are re-evaluated with each update cycle.

Ingesting static data

Deephaven supports reading from various common file formats like CSV, Parquet, and Arrow. The following code block reads CSV data from a URL directly into a table.

from deephaven import read_csv

crypto = read_csv(
"https://media.githubusercontent.com/media/deephaven/examples/main/CryptoCurrencyHistory/CSV/FakeCryptoTrades_20230209.csv"
)

If you're running Deephaven in a Docker container, reading your own files requires that you mount the local directory containing the files to a volume in the Docker container, which you can learn more about in the guide on Docker data volumes.

Real-world ticking data

Real-time data is Deephaven's mission statement. One of the easiest ways to work with realistic ticking data is by using the TableReplayer to replay the static data. Use it to replay the data ingested above:

from deephaven.replay import TableReplayer

replayer = TableReplayer("2023-02-09T12:09:18 ET", "2023-02-09T12:58:09 ET")
replayed_crypto = replayer.add_table(
crypto.sort("Timestamp"), "Timestamp"
).sort_descending("Timestamp")
replayer.start()

Most real-world use cases for ticking data involve connecting to data streams that are constantly being updated. For this, Deephaven's Apache Kafka integration is first-in-class, and almost any imaginable real-time streaming source can be wrangled with the TablePublisher or DynamicTableWriter. That said, setting up the pipelines for Kafka streams or other real-time data sources can be very complex, and is outside the scope of this guide.