Input tables

Input tables are customizable database tables designed to store data provided by users. Unlike other tables, users can add, remove, and modify data in input tables via the UI. This makes them similar to spreadsheets. Input table data can also be added, removed, and modified programmatically.

Input tables must contain at least one key column and can include any number of value columns. The key column(s) in input tables are used to update rows. If a new row of data is added and the key value does not already exist in the table, a new row will be created. However, if the key value is already present, the row with that key value will be modified.

Create an input table

The following code creates an Input Table with the following columns:

  • Symbol: The sole key column of type String.
  • Exchange: A value column of type String.
  • MyDoubleCol: A value column of type float64 (double).
  • MyBooleanCol: A value column of type bool.
import io.deephaven.engine.util.TableTools
import io.deephaven.enterprise.inputtables.ColumnSpec
import io.deephaven.enterprise.inputtables.InputTableSpec

// Create schema
inputTableSpec = InputTableSpec.builder()
        .addColumnSpec(ColumnSpec.builder().name("Symbol").setDataType(String.class).isKey(true).build())
        .addColumnSpec(ColumnSpec.builder().name("Exchange").setDataType(String.class).isKey(false).build())
        .addColumnSpec(ColumnSpec.builder().name("MyDoubleCol").setDataType(Double.class).isKey(false).build())
        .addColumnSpec(ColumnSpec.builder().name("MyBooleanCol").setDataType(boolean.class).isKey(false).build())
        .build()

// Create an empty input table
db.addInputTableSchema("InputTableNamespace", "InputTableName", inputTableSpec)
it = db.inputTable("InputTableNamespace", "InputTableName")

// Add data
dataTable = TableTools.newTable(
        TableTools.stringCol("Symbol", "GOOG", "APPL"),
        TableTools.stringCol("Exchange", "ARCA", "ARCA"),
        TableTools.doubleCol("MyDoubleCol", 12.5, 12.75),
        TableTools.booleanCol("MyBooleanCol", true, false)
)

inputTableUpdater = db.inputTableUpdater("InputTableNamespace", "InputTableName")
inputTableUpdater.add(dataTable)
from deephaven import dtypes, new_table
from deephaven.column import bool_col, double_col, int_col, string_col
from deephaven_enterprise.input_tables import ColumnSpec, InputTableSpec

# Create schema
input_table_spec = InputTableSpec(
    [
        ColumnSpec(name="Symbol", data_type=dtypes.string, is_key=True),
        ColumnSpec(name="Exchange", data_type=dtypes.string, is_key=False),
        ColumnSpec(name="MyDoubleCol", data_type=dtypes.float64, is_key=False),
        ColumnSpec(name="MyBooleanCol", data_type=dtypes.bool_, is_key=False),
    ]
)

# Create an empty input table
db.add_input_table_schema("InputTableNamespace", "InputTableName", input_table_spec)
it = db.input_table("InputTableNamespace", "InputTableName")

# Add data
data_table = new_table(
    [
        string_col("Symbol", ["GOOG", "APPL"]),
        string_col("Exchange", ["ARCA", "ARCA"]),
        double_col("MyDoubleCol", [12.5, 12.75]),
        bool_col("MyBooleanCol", [True, False]),
    ]
)
it.add(data_table)

You can also manually input the data by clicking on a cell and typing in the values directly, like a spreadsheet.

Create input tables programmatically

This section has some examples of most Input Table methods. For detailed API documentation, please refer to the Javadoc for the WritableDatabase interface or the Pydoc for the Database interface.

Note

The system disallows a mixture of input table and non-input table methods for a given table.

Add input table schema

The above example demonstrated how to add an input table schema by explicitly specifying columns. Alternatively, you can add an input table schema based on another table:

import io.deephaven.engine.util.TableTools

dataTable = TableTools.newTable(
        TableTools.stringCol("Symbol", "GOOG", "APPL"),
        TableTools.stringCol("Exchange", "ARCA", "ARCA"),
        TableTools.doubleCol("MyDoubleCol", 12.5, 12.75),
        TableTools.booleanCol("MyBooleanCol", true, false)
)
// The last param identifies the key columns
db.addInputTableSchema("InputTableNamespace", "InputTableName", dataTable.getDefinition(), "Symbol")
from deephaven import new_table
from deephaven.column import bool_col, double_col, string_col

data_table = new_table(
    [
        string_col("Symbol", ["GOOG", "APPL"]),
        string_col("Exchange", ["ARCA", "ARCA"]),
        double_col("MyDoubleCol", [12.5, 12.75]),
        bool_col("MyBooleanCol", [True, False]),
    ]
)

# The last param identifies the key columns
db.add_input_table_schema_from_table(
    "InputTableNamespace", "InputTableName", data_table, ["Symbol"]
)

You must manually specify the key columns when adding an input table schema based on another table. You can specify more than one key column by passing in more than one column name:

// The last params identify the key columns
db.addInputTableSchema("InputTableNamespace", "InputTableName", dataTable.getDefinition(), "Symbol", "Exchange")
# The last param identifies the key columns
db.add_input_table_schema_from_table(
    "InputTableNamespace", "InputTableName", data_table, ["Symbol", "Exchange"]
)

Input table schemas support the following data types:

PythonGroovy
dtypes.stringString.class
dtypes.InstantInstant.class
dtypes.bool_Boolean.class
dtypes.charchar.class
dtypes.float32float.class
dtypes.float64double.class
dtypes.bytebyte.class
dtypes.shortshort.class
dtypes.int32int.class
dtypes.longlong.class

Get an input table from the Deephaven database

The following example gets an input table called InputTableName from the InputTableNamespace namespace:

it = db.inputTable("InputTableNamespace", "InputTableName")
it = db.input_table("InputTableNamespace", "InputTableName")

Update input table schema

Updating an input table schema follows the same pattern as adding and differs only by the method call. However, it's useful to first retrieve the current input table specification before updating.

Add a column

import io.deephaven.enterprise.inputtables.ColumnSpec

inputTableSpec = db.inputTableSpecFor("InputTableNamespace", "InputTableName")
        .asBuilder()
        .addColumnSpec(ColumnSpec.builder().name("MyIntCol").setDataType(Integer.class).isKey(false).build())
        .build()
hasUpdated = db.updateInputTableSchema("InputTableNamespace", "InputTableName", inputTableSpec)
from deephaven import dtypes
from deephaven_enterprise.input_tables import ColumnSpec

input_table_spec = db.input_table_spec_for(
    "InputTableNamespace", "InputTableName"
).add_column_specs([ColumnSpec(name="MyIntCol", data_type=dtypes.int32, is_key=False)])
has_updated = db.update_input_table_schema(
    "InputTableNamespace", "InputTableName", input_table_spec
)

Remove a column

inputTableSpec = db.inputTableSpecFor("InputTableNamespace", "InputTableName").removeColumnSpecs("MyIntCol").build()
hasUpdated = db.updateInputTableSchema("InputTableNamespace", "InputTableName", inputTableSpec)
input_table_spec = db.input_table_spec_for(
    "InputTableNamespace", "InputTableName"
).remove_column_specs(["MyIntCol"])
has_updated = db.update_input_table_schema(
    "InputTableNamespace", "InputTableName", input_table_spec
)

Caution

After adding or removing columns, ensure that any related input tables currently active in your session are reloaded. Otherwise, you won't be able to add, update, or delete rows from the input table view.

Rules for adds and removes

You can add or remove columns in combination, but some updates are prohibited. You cannot:

  • Add or remove key columns.
  • Change the data type of columns.
  • Re-add columns that were previously removed unless the data type is the same as before.
  • Change from value column to key column or vice versa.

The add and update methods return a boolean indicating whether the operation succeeded.

Delete data from an input table

The first example adds data programmatically as well as from the UI using a table with the same definition as the input table. The following example programmatically deletes data from an input table by specifying a table containing the key values you want to delete:

import io.deephaven.engine.util.TableTools
table = TableTools.newTable(
        TableTools.stringCol("Symbol", "GOOG")
)
inputTableUpdater = db.inputTableUpdater("InputTableNamespace", "InputTableName")
inputTableUpdater.delete(table)
from deephaven import new_table
from deephaven.column import string_col

table = new_table([string_col("Symbol", ["GOOG"])])

it = db.input_table("InputTableNamespace", "InputTableName")

it.delete(table)

You can also do the same using the secondary click in the UI directly.

img

Delete an input table

The following example deletes an input table:

isDeleted = db.deleteInputTable("InputTableNamespace", "InputTableName")
is_deleted = db.delete_input_table("InputTableNamespace", "InputTableName")

The delete method returns a boolean indicating whether the input table was deleted.

Interact with input tables using the GUI

You can interact with input tables on the GUI the same way as Legacy input tables.

Implementation details

This section contains lower-level details about how input tables are implemented and some additional APIs for interacting with them.

Input table schema

Input table specifications are retained within the table's schema in the InputTableSpec element. Following is an example schema for "InputTableNamespace.InputTableName", which was updated to first add the column "MyIntCol" and then updated to remove that same column.

<Table namespace="InputTableNamespace" name="InputTableName" storageType="NestedPartitionedOnDisk" namespaceSet="User">
<InputTableSpec type="Keyed" version="3">
  <KeyColumn name="Symbol" dataType="java.lang.String" />

  <ValueColumn name="Exchange" dataType="java.lang.String" />
  <ValueColumn name="MyDoubleCol" dataType="double" />
  <ValueColumn name="MyBooleanCol" dataType="java.lang.Boolean" />
  <ValueColumn name="MyIntCol" dataType="int" removed="true" />
</InputTableSpec>

<Column name="Date" dataType="String" columnType="Partitioning" />
<Column name="_AuthenticatedUser" dataType="String" columnType="Normal" symbolTable="ColumnLocation" />
<Column name="_EffectiveUser" dataType="String" columnType="Normal" symbolTable="ColumnLocation" />
<Column name="_OperationAuthenticatedUser" dataType="String" columnType="Normal" symbolTable="ColumnLocation" />
<Column name="_OperationEffectiveUser" dataType="String" columnType="Normal" symbolTable="ColumnLocation" />
<Column name="_ProcessInfoId" dataType="String" columnType="Normal" symbolTable="ColumnLocation" />
<Column name="_EngineVersion" dataType="String" columnType="Normal" symbolTable="ColumnLocation" />
<Column name="_BatchId" dataType="int" columnType="Normal" />
<Column name="_ChangeTime" dataType="DateTime" columnType="Normal" />
<Column name="_Deleted" dataType="Boolean" columnType="Normal" />
<Column name="Symbol" dataType="String" columnType="Normal" symbolTable="ColumnLocation" />
<Column name="Exchange" dataType="String" columnType="Normal" symbolTable="ColumnLocation" />
<Column name="MyDoubleCol" dataType="double" columnType="Normal" />
<Column name="MyBooleanCol" dataType="Boolean" columnType="Normal" />
</Table>

Currently, only Keyed input table types are supported for persisted Core+ input tables. You can still create in-memory input tables of keyed or append-only type using the Core API.

version is incremented every time the input table schema is updated.

Note that because "MyIntCol" was removed, it is not listed as a normal Column element, but is retained as a ValueColumn element in the InputTableSpec with the attribute removed="true". This is to prevent users from trying to re-add the column in the future with a different data type, which would cause errors in data interpretation.

You may notice Column elements that are prefixed with an underscore. Every input table has special metadata columns that are available for auditing. The user may not use these special column names:

ColumnTypeDescription
_AuthenticatedUserStringThe authenticated user running the worker.
_EffectiveUserStringThe effective user running the worker.
_OperationAuthenticatedUserStringThe authenticated user running the operation.
_OperationEffectiveUserStringThe effective user running the operation.
_ProcessInfoIdStringThe unique key for the worker.
_EngineVersionStringThe running Core+ version, which contains the Core and Enterprise version.
_BatchIdintThe id associated with an input table operation.
_ChangeTimeInstantThe time associated with an input table operation.
_DeletedBooleanWhether the input table operation was a delete or not.

Underlying user table

Input tables are backed by live partitioned user tables, although its API differs from partitioned tables. Input table column partitions are the dates of add or delete events based on the system's default time zone. The schema shown in the above example is the schema of the underlying live partitioned user table. It shows the column Date as a partitioning column.

To access the underlying live table directly, run:

history = db.liveTable("InputTableNamespace", "InputTableName")
history = db.live_table("InputTableNamespace", "InputTableName")

This will show you all the columns from the above schema as well as the history of all the operations performed on the input table. Queries that view the input table only use the last row for each key (using lastby() on key columns). This filters out the rows marked as deleted (using where()) and filters out the deleted as well as additional metadata columns.

Input table snapshots

Over time, input table usage can result in a lot of partitions and, therefore, a large fetch-time. To reduce the fetch-time, you can snapshot input tables, which effectively takes the current input table view and writes it into a single partition. From then on, retrieving the input table will only fetch this snapped partition and new data since the snapshot.

The following Groovy example demonstrates how to snapshot input tables:

import io.deephaven.enterprise.database.inputtables.InputTableSnapshotter

// InputTable Snapshot by Namespace:
InputTableSnapshotter.create().runByNamespace(db, "InputTableNamespace")

// InputTable Snapshot by Namespace and TableName:
InputTableSnapshotter.create().runByTableName(db, "InputTableNamespace", "InputTableName")

// InputTable Snapshot all Input Tables:
InputTableSnapshotter.create().runAll(db)

The following properties can be modified in the Enterprise configuration to control the number of days or modified rows required for a new snapshot to be created. The default values are:

InputTableSnapshotter.daysSinceLastSnapshotThreshold=7
InputTableSnapshotter.changesToSnapshotSizeRatioThreshold=0.2

Convert input tables from Legacy to Core+

You can convert a Legacy input table into a Core+ input table from a Core+ worker. Out of caution, the process creates a Core+ input table that copies the latest state of the Legacy input table to a new table name and/or namespace, without affecting the Legacy input table itself. Only the current state of the Legacy input table is transferred to the new table; the history of the Legacy input table is not carried over.

The following query performs the conversion (in Groovy only):

import io.deephaven.enterprise.database.inputtables.InputTableConverter

InputTableConverter.create().run(db, "OriginalNamespace", "OriginalTableName", "TargetNamespace", "TargetTableName")

Replace the Legacy input table

After verifying a successful conversion, you may want to delete the original Legacy input table and rename the new Core+ input table to the original name.

To delete the original Legacy input table (removing the data, schema, InputTable configuration, and snapshots), run the following from a Legacy worker in a Groovy console:

InputTable.remove(db, "OriginalNamespace", "OriginalTableName")
if (db.hasTable("OriginalNamespace_Snapshots", "OriginalTableName")) {
  db.removeTable("OriginalNamespace_Snapshots", "OriginalTableName")
}

To rename the new Core+ input to the original namespace and table name of the Legacy input table, run the following from a Core+ worker:

// Copy schema
inputTableSpec = db.inputTableSpecFor("TargetNamespace", "TargetTableName")
db.addInputTableSchema("OriginalNamespace", "OriginalTableName", inputTableSpec)

// Copy data
inputTable = db.inputTable("TargetNamespace", "TargetTableName")
db.inputTableUpdater("OriginalNamespace", "OriginalTableName").add(inputTable)

// Verify copy
copiedInputTable = db.inputTable("OriginalNamespace", "OriginalTableName")
# Copy schema
input_table_spec = db.input_table_spec_for("TargetNamespace", "TargetTableName")
db.add_input_table_schema("OriginalNamespace", "OriginalTableName", input_table_spec)

# Copy data
input_table = db.input_table("TargetNamespace", "TargetTableName")
db.input_table("OriginalNamespace", "OriginalTableName").add(input_table)

# Verify copy
copied_input_table = db.input_table("OriginalNamespace", "OriginalTableName")

Renaming the Core+ input table is actually a copy and delete of said Core+ input table.

After verifying a successful copy, you can delete the first Core+ input table from a Core+ worker via:

db.deleteInputTable("TargetNamespace", "TargetTableName")
db.delete_input_table("TargetNamespace", "TargetTableName")