Manage

When using Deephaven, you may want to store the results of a query for later use. Deephaven supports saving data both within the database and in external formats. Deephaven allows users to store their own tables in the database. This section of the documentation discusses managing data in user tables.

Note

See Table Storage

User tables can be stored in any namespace that does not contain system tables. Namespaces do not need to be configured in advance; they are created automatically when a query creates a user table. Once data is stored in a user table, all users with the appropriate permissions can access it.

There are two types of user tables:

  • Directly manipulated user tables, stored on a locally accessible filesystem. These may be non-partitioned or partitioned (see below).
  • Centrally managed user tables, written by the Remote Table Appender, are always partitioned. User tables can be added, replaced, or deleted at any time.

Caution

User namespaces and tables definitions are stored without regard for how the data is managed. However, each section of this documentation (Non-Partitioned Direct User Tables, Partitioned Direct User Tables, and Centrally Managed User Tables represent distinct APIs and these methods should not be mixed. Note that when writing a ticking table to a Database, you must hold the LTM lock. See: Live Table Monitor Lock.

Partitions

Partitions are distinct segments of data that can be managed independently.

  • A non-partitioned, or splayed table, is not divided into individual sections: the table exists as a whole.
  • With a partitioned table, one column is designated as the "partitioning column." All rows with the same value in this column are placed in the same partition. Each partition constitutes an independent section of the table. Individual partitions can be added, replaced, or removed without altering data from other partitions. The most common partitioning column is Date. Partitioning tables by date simplifies the management of tables containing data over long time periods.
  • To add a partitioned user table to the database, you must first add the table definition. The table definition can be taken from another table, such as the table whose data will be added to the new user table.

Non-Partitioned Direct User Tables

Several methods are used to manage content within directly manipulated, non-partitioned user tables.

Adding a Table

The addTable method is used to create a new, non-partitioned direct user table in a namespace, and simultaneously writes the table and its definition. Arguments to the method include the namespace, the new table name and the variable from which the data is copied.

The following example copies data from queryResults and then uses that data to create a new non-partitioned user table called "NewTableName" in the "ExampleUserNamespace":

db.addTable("ExampleUserNamespace", "NewTableName", queryResults)

Caution

The operation will fail if the definition already exists in an existing table named "NewTableName".

You may also specify a storage format, e.g., DeephavenV1 or Parquet, as follows:

db.addTable("Namespace", "NewTableName", TableName, Database.StorageFormat.DeephavenV1)

or

db.addTable("Namespace", "NewTableName", TableName, Database.StorageFormat.Parquet)

Appending to a Table

The appendTable method is used to append data onto an existing non-partitioned direct user table. The following example copies data from "queryResults" and then appends the content to an existing user table called "ExampleTable" in the "ExampleUserNamespace":

db.appendTable("ExampleUserNamespace", "ExampleTable", queryResults)

Replacing a Table

The replaceTable method is used to replace (not append or modify) data in an existing non-partitioned direct user table. Arguments to the method include the namespace, the name of the table containing the data to be replaced, and the variable from which the new data is copied. The following example copies data from "queryResults" and then replaces the content in an existing user table called "ExampleTable" in the "ExampleUserNamespace":

db.replaceTable("ExampleUserNamespace", "ExampleTable", queryResults)

Note

If no table exists with the namespace and table name provided, replaceTable() behaves identically to addTable().

You may also specify a storage format, e.g., DeephavenV1 or Parquet, as follows:

db.replaceTable("Namespace", "NewTableName", TableName, Database.StorageFormat.DeephavenV1)

or

db.replaceTable("Namespace", "NewTableName", TableName, Database.StorageFormat.Parquet)

Removing a Table

The removeTable method removes an existing non-partitioned user table and its definition from the specified namespace. The following example removes an existing user table called "ExampleTable" from the "ExampleUserNamespace":

db.removeTable("ExampleUserNamespace", "ExampleTable")

Other Methods

  • deleteTables will find all table locations and delete them. This may be necessary if any data is corrupt.

  • deleteMatchingTables will delete all tables with names matching the regular expression. This example regex matches tables beginning with "Test": db.deleteMatchingTables("ExampleUserNamespace" , "Test.\*")

  • deleteNamespace will delete the entire namespace and all tables and data therein, and should be used with caution. Do not use if there are centrally managed user tables in the namespace.

Partitioned Direct User Tables

To add a directly manipulated, partitioned user table to the database, you must designate a partition and create the table definition. The table definition can be taken from another table, such as the table whose data will be added to the new user table.

The examples below demonstrate creating and modifying a partitioned table based on data stored as the "queryResult" variable. The table will be partitioned by date.

Adding a Table Definition for a Partitioned Direct User Table

The addPartitionedTableDefinition method is used create a partitioned table definition in the database. The arguments provided to the method are:

  • namespace: The namespace under which the table is stored.
  • tableName: The name used to save the table.
  • partitioningColumnName: The column by which the data will be partitioned.
  • tableDefinition: A table definition, which specifies the columns to be saved. This can be taken from an existing table with the getDefinition() method.

The code below demonstrates the creation of a table called "MyQueryResults" in a namespace called "ExampleUserNamespace". The table will be partitioned by the Date column, and will use the table definition (thus, the columnset) of the table stored in the "queryResults" variable:

db.addPartitionedTableDefinition("ExampleUserNamespace", "MyQueryResults", "Date", queryResults.getDefinition())

Using addTablePartitionAndDefinition() is identical to calling addPartitionedTableDefinitionIfAbsent() immediately before calling addTablePartition().

Storing Data in a Partitioned Direct User Table

To store data from "queryResults" to the May 1, 2017 partition of the "MyQueryResults" table, use the addTablePartition method:

db.addTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01", queryResults)

The operation will fail if data already exists in the 2017-05-01 partition of the "MyQueryResults" table. It will also fail if the table definition "MyQueryResults" table has not been added yet.

You can also specify a file format, e.g., DeephavenV1 or Parquet, as follows:

db.addTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.DeephavenV1)

or

db.addTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.Parquet)

It is possible to automatically add the table definition when storing the data. This can be accomplished with the addTablePartitionAndDefinition method:

db.addTablePartitionAndDefinition ("ExampleUserNamespace", "MyQueryResults", "2017-05-01", queryResults)

Replacing Data in a Partitioned Direct User Table

To replace the existing data stored under the May 1, 2017 partition of the "MyQueryResults", use the replaceTablePartition method:

db.replaceTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01" queryResults)

If no data exists at the partition name provided, replaceTablePartition() behaves identically to addTablePartition().

You can also specify a file format, e.g., DeephavenV1 or Parquet, as follows:

db.replaceTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.DeephavenV1)

or

db.replaceTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.Parquet)

The replaceTableDefinition method may also be used.

Removing Data from a Partitioned Direct User Table

To remove the May 5, 2017 partition of the "MyQueryResults" table, use the removeTablePartition method:

db.removeTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01")

The following can be used to remove the entire table:

db.removeTable("ExampleUserNamespace" , "MyQueryResults")

This removes both the table and its definition.

Centrally Managed User Tables

These tables are always partitioned and are written by the Remote Table Appender. Although the methods for specifying definitions are the same as with directly managed user tables, there is no other overlap.

Adding a Table Definition for a Centrally Managed User Table

To create a centrally managed user table, the definition must exist first using the addPartitionedTableDefinition method. The table data is appended to a system-defined internal partition, and the specified column partition value. The arguments provided to the method are:

  • namespace: The namespace under which the table is stored.
  • tableName: The name used to save the table.
  • partitioningColumnName: The column by which the data will be partitioned (e.g., "2019-01-11" for a Date partitioning column).
  • tableDefinition: A table definition, which specifies the columns to be saved. This can be taken from an existing table with the getDefinition() method.

The code below demonstrates the creation of a table called "ExampleQueryResults" in a namespace called "ExampleUserNamespace". The table will be partitioned by the Date column, and will use the table definition (thus, the columnset) of the table stored in the "queryResults2" variable:

db.addPartitionedTableDefinition("ExampleUserNamespace", "ExampleQueryResults", "Date", queryResults2.getDefinition())

Storing Data in a Centrally Managed User Table

To add rows to a centrally managed user table from a table object (i.e., a query table or a memory table), use the appendCentral method.

The method takes the following arguments:

  • namespace: The namespace under which the table is stored.
  • tableName: The name used to save the table.
  • columnPartitionValue: The column partition value (e.g., "2019-01-11" for a Date partitioning column).
  • newData: The table containing the new data to be appended.
  • flush (boolean): true to flush the table to disk; false to wait for flushing (if the implementation supports waiting).

The code below will append rows onto to the partitioned user table, "ExampleQueryResults", created in the previous example:

db.appendCentral("ExampleUserNamespace","ExampleQueryResults", "2018-08-27", queryResults3, true)
db.appendCentral("ExampleUserNamespace","ExampleQueryResults", "2018-08-27", queryResults3, True)

In this example, rows from a different table, "queryResults3", from the specified date will be added to "ExampleQueryResults". The final argument, true, flushes data to the disk immediately. The appendCentral method is useful for persisting data out to disk that is generated in a user script.

Note

The iris.defaultColumnPartitionNamingFunction property determines the default partitioning column for all user tables.

To use the default partitioning column, the appendCentral method takes the following arguments:

  • namespace: The namespace under which the table is stored.
  • tableName: The name used to save the table.
  • newData: The table containing the new data to be appended.
  • flush (boolean): true to flush the table to disk; false to wait for flushing.

Replacing Data in a Centrally Managed User Table

Unlike directly manipulated user tables, there are no distinct methods to replace data. You can delete the data (see below) and append new data to the table. There is no database call to delete a single centrally managed partition. You can create a Remote Table Appender (RTA) and use it directly.

Removing Data from a Centrally Managed User Table

To remove the all the data from the "ExampleQueryResults" table, use the closeAndDeleteCentral method:

db.closeAndDeleteCentral("ExampleUserNamespace", "ExampleQueryResults")

Note that this removes direct partitions in addition to centrally created partitions.

To remove all the data for a single column partition value (e.g., Date), add the column partition value to the closeAndDeleteCentral call:

db.closeAndDeleteCentral("ExampleUserNamespace", "ExampleQueryResults", "1776-07-04")

To delete the table definition itself, use the removeTableDefiniton method:

db.removeTableDefinition("ExampleUserNamespace", "ExampleQueryResults")

In the unlikely event you need to remove a single data location, you will also need to know the internal partition where the data is stored. You can then remove the single location with a script like this:

import com.illumon.iris.db.util.logging.RemoteTableAppender

namespace = "ExampleUserNamespace"
tableName = "ExampleQueryResults"
internalPartition = "RTA"
columnPartition = "2022-04-18"
definition = db.getTableDefinition(namespace, tableName).getWritable()

RemoteTableAppender rta =
        new RemoteTableAppender(log, definition, namespace, tableName, internalPartition, columnPartition)
rta.removePartition()
rta.flush()
rta.close()

Other Methods

The logPartitionTableSnapshot method snapshots a table at a regular interval and writes the snapshot to a partitioned table. The arguments to the method follow:

  • namespace: The namespace under which the table is stored.
  • tableName: The name used to save the table.
  • partitioningColumnName: The column by which the data will be partitioned.
  • columnPartitionValue: The column partition value (e.g., "2019-02-11" for a Date partitioning column).
  • table: The table to snapshot and log (does not contain the partition column).
  • logInterval: The interval, in milliseconds, to log snapshots at.
  • replayer: The data replayer.
  • append (boolean): true to append to an existing partition; false to replace the existing partition.

The logPartitionTableIncremental method writes all changes in an input table to a linked partitioned table. The arguments to the method follow:

  • namespace: The namespace under which the table is stored.
  • tableName: The name used to save the table.
  • partitioningColumnName: The column by which the data will be partitioned.
  • columnPartitionValue: The column partition value (e.g., "2019-02-11" for a Date partitioning column).
  • table: The table to log (does not contain the partition column).
  • append (boolean): true to append to an existing partition; false to replace the existing partition.

Naming Conventions

Please refer to the following rules for naming columns, variables, tables, namespaces and partitions.

Column Names

Column names cannot be reserved words. These include Java-reserved words, such as "double", "null", "int", and others (see full list), as well as the following Deephaven -reserved words: "in", "not", "i" and "k".

Column names must begin with a letter (a-z, A-Z), or the underscore character "_"; and subsequent characters may be letters, digits or underscore characters. Spaces, dollar signs and other special characters are not allowed.

Query Variable Names

Query variable names follow the same rules as column names. See above.

Table and Namespace Names

Table and namespace names must start with a letter (a-z, A-Z) or an underscore "_". Subsequent characters can be letters, numbers, underscores, hyphens "-", the plus sign "+" and the at sign "@".

Partition Names

Partition names follow the same rules as table and namespace names (see above). However, partition names may also start with a number.

Viewing and Loading Data

Deephaven TableTools contains many methods for printing, loading, saving and modifying tables. By default, all methods of Table Tools are statically imported into each Groovy session, making an import unnecessary. In Python, the TableTools methods are in the deephaven.TableTools package and must be imported before they are used.

See also the TableTools JavaDocs and the TableTools PyDocs.

Some of those functions follow.

show

Prints the first few rows of a table to a standard output:

show(Table variable, ...)

t = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`")
show(t)
from deephaven.TableTools import show

t = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`")
show(t)

showWithIndex

Prints the first few rows of a table to standard output, and also shows the details of the index used to retrieve the data:

showWithIndex(Table variable, ...)

showWithIndex(t)
from deephaven.TableTools import showWithIndex

showWithIndex(t)

html

Returns an entire table formatted as HTML:

html(Table variable)

t2 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`").head(10)
println(html(t2))
from deephaven.TableTools import html

t2 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`").head(10)
print(html(t2))

Note

This should only be used on small tables, as the whole string is built in memory.

string

Returns the first few rows of a table as a pipe-delimited string:

string(Table variable, ...)

t3 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`").head(20)
println(string(t3))
from deephaven.TableTools import string

t3 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`").head(20)
print(string(t3))

diff

Computes the difference of two tables for use in verification and provides a string report. maxDiffLines indicates the number of rows to compare before stopping the comparison.

diff(Table actual, Table expected, maxDiffLines)

stock21 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-21`")
stock22 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`")

println(diff(stock21, stock22, 100))
from deephaven.TableTools import diff

stock21 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-21`")
stock22 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`")

print(diff(stock21, stock22, 100))

readCsv

Loads a CSV file as an in-memory DB table. This method supports pipe, tab, or any single-character delimited file. Column types are inferred from the data:

readCsv(...)

Sample syntax:

myTable=readCsv("path_to_file")
from deephaven.TableTools import readCsv

myTable=readCsv("path_to_file")
path = "tmp.csv"
delim = "|"
readCsv(path, delim)
from deephaven.TableTools import readCsv

path = "tmp.csv"
delim = "|"
readCsv(path, delim)

Note

The newline character (\n) is not supported.

writeCsv

Writes a DB table out as a CSV file:

writeCsv(...)

Sample syntax:

writeCsv(tableToExport, "/mnt/exportPath/myResults.csv", true)
from deephaven.TableTools import writeCsv

writeCsv(tableToExport, "/mnt/exportPath/myResults.csv", True)

readBin

Loads a binary log file as an in-memory DB table:

readBin(...)

The argument must specify the correct classpath to ensure access to the listener class that matches the file to be loaded.

Managing Tables

TableManagementTools contains many methods for managing tables.

Note

These methods are primarily used by database administrators performing low level work on the system. The TableManagementTools class is not imported by default, and should be manually imported into either a Groovy or Python session.

Some of those functions follow.

readTable

Loads a DB binary table:

readTable(Table t, file location)

writeTable

Saves a table to disk in DB binary format:

writeTable(Table t, destination)

deleteTable

Deletes a DB binary format table from disk:

deleteTable(Table t, file location)

getAllDbDirs

Gets all of the directories for an on-disk database:

getAllDbDirs(...)

import class com.illumon.iris.db.tables.utils.TableManagementTools

dirs = getAllDbDirs(String tableName, File rootDir, int levelsDeep)
from deephaven.TableManagementTools import getAllDbDirs

dirs = getAllDbDirs(String tableName, File rootDir, int levelsDeep)