Manage
When using Deephaven, you may want to store the results of a query for later use. Deephaven supports saving data both within the database and in external formats. Deephaven allows users to store their own tables in the database. This section of the documentation discusses managing data in user tables.
Note
See Table Storage
User tables can be stored in any namespace that does not contain system tables. Namespaces do not need to be configured in advance; they are created automatically when a query creates a user table. Once data is stored in a user table, all users with the appropriate permissions can access it.
There are two types of user tables:
- Directly manipulated user tables, stored on a locally accessible filesystem. These may be non-partitioned or partitioned (see below).
- Centrally managed user tables, written by the Remote Table Appender, are always partitioned. User tables can be added, replaced, or deleted at any time.
Caution
User namespaces and tables definitions are stored without regard for how the data is managed. However, each section of this documentation (Non-Partitioned Direct User Tables, Partitioned Direct User Tables, and Centrally Managed User Tables represent distinct APIs and these methods should not be mixed. Note that when writing a ticking table to a Database, you must hold the LTM lock. See: Live Table Monitor Lock.
Partitions
Partitions are distinct segments of data that can be managed independently.
- A non-partitioned, or splayed table, is not divided into individual sections: the table exists as a whole.
- With a partitioned table, one column is designated as the "partitioning column." All rows with the same value in this column are placed in the same partition. Each partition constitutes an independent section of the table. Individual partitions can be added, replaced, or removed without altering data from other partitions. The most common partitioning column is Date. Partitioning tables by date simplifies the management of tables containing data over long time periods.
- To add a partitioned user table to the database, you must first add the table definition. The table definition can be taken from another table, such as the table whose data will be added to the new user table.
Non-Partitioned Direct User Tables
Several methods are used to manage content within directly manipulated, non-partitioned user tables.
Adding a Table
The addTable
method is used to create a new, non-partitioned direct user table in a namespace, and simultaneously writes the table and its definition. Arguments to the method include the namespace, the new table name and the variable from which the data is copied.
The following example copies data from queryResults and then uses that data to create a new non-partitioned user table called "NewTableName" in the "ExampleUserNamespace":
db.addTable("ExampleUserNamespace", "NewTableName", queryResults)
Caution
The operation will fail if the definition already exists in an existing table named "NewTableName".
You may also specify a storage format, e.g., DeephavenV1 or Parquet, as follows:
db.addTable("Namespace", "NewTableName", TableName, Database.StorageFormat.DeephavenV1)
or
db.addTable("Namespace", "NewTableName", TableName, Database.StorageFormat.Parquet)
Appending to a Table
The appendTable
method is used to append data onto an existing non-partitioned direct user table. The following example copies data from "queryResults" and then appends the content to an existing user table called "ExampleTable" in the "ExampleUserNamespace":
db.appendTable("ExampleUserNamespace", "ExampleTable", queryResults)
Replacing a Table
The replaceTable
method is used to replace (not append or modify) data in an existing non-partitioned direct user table. Arguments to the method include the namespace, the name of the table containing the data to be replaced, and the variable from which the new data is copied. The following example copies data from "queryResults" and then replaces the content in an existing user table called "ExampleTable" in the "ExampleUserNamespace":
db.replaceTable("ExampleUserNamespace", "ExampleTable", queryResults)
Note
If no table exists with the namespace and table name provided, replaceTable()
behaves identically to addTable()
.
You may also specify a storage format, e.g., DeephavenV1 or Parquet, as follows:
db.replaceTable("Namespace", "NewTableName", TableName, Database.StorageFormat.DeephavenV1)
or
db.replaceTable("Namespace", "NewTableName", TableName, Database.StorageFormat.Parquet)
Removing a Table
The removeTable
method removes an existing non-partitioned user table and its definition from the specified namespace. The following example removes an existing user table called "ExampleTable" from the "ExampleUserNamespace":
db.removeTable("ExampleUserNamespace", "ExampleTable")
Other Methods
-
deleteTables
will find all table locations and delete them. This may be necessary if any data is corrupt. -
deleteMatchingTables
will delete all tables with names matching the regular expression. This example regex matches tables beginning with "Test":db.deleteMatchingTables("ExampleUserNamespace" , "Test.\*")
-
deleteNamespace
will delete the entire namespace and all tables and data therein, and should be used with caution. Do not use if there are centrally managed user tables in the namespace.
Partitioned Direct User Tables
To add a directly manipulated, partitioned user table to the database, you must designate a partition and create the table definition. The table definition can be taken from another table, such as the table whose data will be added to the new user table.
The examples below demonstrate creating and modifying a partitioned table based on data stored as the "queryResult" variable. The table will be partitioned by date.
Adding a Table Definition for a Partitioned Direct User Table
The addPartitionedTableDefinition
method is used create a partitioned table definition in the database. The arguments provided to the method are:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.partitioningColumnName
: The column by which the data will be partitioned.tableDefinition
: A table definition, which specifies the columns to be saved. This can be taken from an existing table with thegetDefinition()
method.
The code below demonstrates the creation of a table called "MyQueryResults" in a namespace called "ExampleUserNamespace". The table will be partitioned by the Date column, and will use the table definition (thus, the columnset) of the table stored in the "queryResults" variable:
db.addPartitionedTableDefinition("ExampleUserNamespace", "MyQueryResults", "Date", queryResults.getDefinition())
Using addTablePartitionAndDefinition()
is identical to calling addPartitionedTableDefinitionIfAbsent()
immediately before calling addTablePartition()
.
Storing Data in a Partitioned Direct User Table
To store data from "queryResults" to the May 1, 2017 partition of the "MyQueryResults" table, use the addTablePartition
method:
db.addTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01", queryResults)
The operation will fail if data already exists in the 2017-05-01 partition of the "MyQueryResults" table. It will also fail if the table definition "MyQueryResults" table has not been added yet.
You can also specify a file format, e.g., DeephavenV1 or Parquet, as follows:
db.addTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.DeephavenV1)
or
db.addTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.Parquet)
It is possible to automatically add the table definition when storing the data. This can be accomplished with the addTablePartitionAndDefinition
method:
db.addTablePartitionAndDefinition ("ExampleUserNamespace", "MyQueryResults", "2017-05-01", queryResults)
Replacing Data in a Partitioned Direct User Table
To replace the existing data stored under the May 1, 2017 partition of the "MyQueryResults", use the replaceTablePartition
method:
db.replaceTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01" queryResults)
If no data exists at the partition name provided, replaceTablePartition()
behaves identically to addTablePartition()
.
You can also specify a file format, e.g., DeephavenV1 or Parquet, as follows:
db.replaceTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.DeephavenV1)
or
db.replaceTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.Parquet)
The replaceTableDefinition
method may also be used.
Removing Data from a Partitioned Direct User Table
To remove the May 5, 2017 partition of the "MyQueryResults" table, use the removeTablePartition
method:
db.removeTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01")
The following can be used to remove the entire table:
db.removeTable("ExampleUserNamespace" , "MyQueryResults")
This removes both the table and its definition.
Centrally Managed User Tables
These tables are always partitioned and are written by the Remote Table Appender. Although the methods for specifying definitions are the same as with directly managed user tables, there is no other overlap.
Adding a Table Definition for a Centrally Managed User Table
To create a centrally managed user table, the definition must exist first using the addPartitionedTableDefinition
method. The table data is appended to a system-defined internal partition, and the specified column partition value. The arguments provided to the method are:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.partitioningColumnName
: The column by which the data will be partitioned (e.g., "2019-01-11" for a Date partitioning column).tableDefinition
: A table definition, which specifies the columns to be saved. This can be taken from an existing table with thegetDefinition()
method.
The code below demonstrates the creation of a table called "ExampleQueryResults" in a namespace called "ExampleUserNamespace". The table will be partitioned by the Date column, and will use the table definition (thus, the columnset) of the table stored in the "queryResults2" variable:
db.addPartitionedTableDefinition("ExampleUserNamespace", "ExampleQueryResults", "Date", queryResults2.getDefinition())
Storing Data in a Centrally Managed User Table
To add rows to a centrally managed user table from a table object (i.e., a query table or a memory table), use the appendCentral
method.
The method takes the following arguments:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.columnPartitionValue
: The column partition value (e.g., "2019-01-11" for a Date partitioning column).newData
: The table containing the new data to be appended.flush
(boolean):true
to flush the table to disk;false
to wait for flushing (if the implementation supports waiting).
The code below will append rows onto to the partitioned user table, "ExampleQueryResults", created in the previous example:
db.appendCentral("ExampleUserNamespace","ExampleQueryResults", "2018-08-27", queryResults3, true)
db.appendCentral("ExampleUserNamespace","ExampleQueryResults", "2018-08-27", queryResults3, True)
In this example, rows from a different table, "queryResults3", from the specified date will be added to "ExampleQueryResults". The final argument, true
, flushes data to the disk immediately. The appendCentral
method is useful for persisting data out to disk that is generated in a user script.
Note
The iris.defaultColumnPartitionNamingFunction
property determines the default partitioning column for all user tables.
To use the default partitioning column, the appendCentral
method takes the following arguments:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.newData
: The table containing the new data to be appended.flush
(boolean):true
to flush the table to disk;false
to wait for flushing.
Replacing Data in a Centrally Managed User Table
Unlike directly manipulated user tables, there are no distinct methods to replace data. You can delete the data (see below) and append new data to the table. There is no database call to delete a single centrally managed partition. You can create a Remote Table Appender (RTA) and use it directly.
Removing Data from a Centrally Managed User Table
To remove the all the data from the "ExampleQueryResults" table, use the closeAndDeleteCentral
method:
db.closeAndDeleteCentral("ExampleUserNamespace", "ExampleQueryResults")
Note that this removes direct partitions in addition to centrally created partitions.
To remove all the data for a single column partition value (e.g., Date), add the column partition value to the closeAndDeleteCentral
call:
db.closeAndDeleteCentral("ExampleUserNamespace", "ExampleQueryResults", "1776-07-04")
To delete the table definition itself, use the removeTableDefiniton
method:
db.removeTableDefinition("ExampleUserNamespace", "ExampleQueryResults")
In the unlikely event you need to remove a single data location, you will also need to know the internal partition where the data is stored. You can then remove the single location with a script like this:
import com.illumon.iris.db.util.logging.RemoteTableAppender
namespace = "ExampleUserNamespace"
tableName = "ExampleQueryResults"
internalPartition = "RTA"
columnPartition = "2022-04-18"
definition = db.getTableDefinition(namespace, tableName).getWritable()
RemoteTableAppender rta =
new RemoteTableAppender(log, definition, namespace, tableName, internalPartition, columnPartition)
rta.removePartition()
rta.flush()
rta.close()
Other Methods
The logPartitionTableSnapshot
method snapshots a table at a regular interval and writes the snapshot to a partitioned table. The arguments to the method follow:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.partitioningColumnName
: The column by which the data will be partitioned.columnPartitionValue
: The column partition value (e.g., "2019-02-11" for a Date partitioning column).table
: The table to snapshot and log (does not contain the partition column).logInterval
: The interval, in milliseconds, to log snapshots at.replayer
: The data replayer.append
(boolean):true
to append to an existing partition;false
to replace the existing partition.
The logPartitionTableIncremental
method writes all changes in an input table to a linked partitioned table. The arguments to the method follow:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.partitioningColumnName
: The column by which the data will be partitioned.columnPartitionValue
: The column partition value (e.g., "2019-02-11" for a Date partitioning column).table
: The table to log (does not contain the partition column).append
(boolean):true
to append to an existing partition;false
to replace the existing partition.
Naming Conventions
Please refer to the following rules for naming columns, variables, tables, namespaces and partitions.
Column Names
Column names cannot be reserved words. These include Java-reserved words, such as "double", "null", "int", and others (see full list), as well as the following Deephaven -reserved words: "in", "not", "i" and "k".
Column names must begin with a letter (a-z, A-Z), or the underscore character "_
"; and subsequent characters may be letters, digits or underscore characters. Spaces, dollar signs and other special characters are not allowed.
Query Variable Names
Query variable names follow the same rules as column names. See above.
Table and Namespace Names
Table and namespace names must start with a letter (a-z, A-Z) or an underscore "_
". Subsequent characters can be letters, numbers, underscores, hyphens "-", the plus sign "+" and the at sign "@".
Partition Names
Partition names follow the same rules as table and namespace names (see above). However, partition names may also start with a number.
Viewing and Loading Data
Deephaven TableTools
contains many methods for printing, loading, saving and modifying tables. By default, all methods of Table Tools are statically imported into each Groovy session, making an import unnecessary. In Python, the TableTools
methods are in the deephaven.TableTools
package and must be imported before they are used.
See also the TableTools JavaDocs and the TableTools PyDocs.
Note
Some of those functions follow.
show
Prints the first few rows of a table to a standard output:
show(Table variable, ...)
t = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`")
show(t)
from deephaven.TableTools import show
t = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`")
show(t)
showWithIndex
Prints the first few rows of a table to standard output, and also shows the details of the index used to retrieve the data:
showWithIndex(Table variable, ...)
showWithIndex(t)
from deephaven.TableTools import showWithIndex
showWithIndex(t)
html
Returns an entire table formatted as HTML:
html(Table variable)
t2 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`").head(10)
println(html(t2))
from deephaven.TableTools import html
t2 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`").head(10)
print(html(t2))
Note
This should only be used on small tables, as the whole string is built in memory.
string
Returns the first few rows of a table as a pipe-delimited string:
string(Table variable, ...)
t3 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`").head(20)
println(string(t3))
from deephaven.TableTools import string
t3 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`").head(20)
print(string(t3))
diff
Computes the difference of two tables for use in verification and provides a string report. maxDiffLines
indicates the number of rows to compare before stopping the comparison.
diff(Table actual, Table expected, maxDiffLines)
stock21 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-21`")
stock22 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`")
println(diff(stock21, stock22, 100))
from deephaven.TableTools import diff
stock21 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-21`")
stock22 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-22`")
print(diff(stock21, stock22, 100))
readCsv
Loads a CSV file as an in-memory DB table. This method supports pipe, tab, or any single-character delimited file. Column types are inferred from the data:
readCsv(...)
Sample syntax:
myTable=readCsv("path_to_file")
from deephaven.TableTools import readCsv
myTable=readCsv("path_to_file")
path = "tmp.csv"
delim = "|"
readCsv(path, delim)
from deephaven.TableTools import readCsv
path = "tmp.csv"
delim = "|"
readCsv(path, delim)
Note
The newline character (\n
) is not supported.
writeCsv
Writes a DB table out as a CSV file:
writeCsv(...)
Sample syntax:
writeCsv(tableToExport, "/mnt/exportPath/myResults.csv", true)
from deephaven.TableTools import writeCsv
writeCsv(tableToExport, "/mnt/exportPath/myResults.csv", True)
readBin
Loads a binary log file as an in-memory DB table:
readBin(...)
The argument must specify the correct classpath to ensure access to the listener class that matches the file to be loaded.
Managing Tables
TableManagementTools
contains many methods for managing tables.
Note
These methods are primarily used by database administrators performing low level work on the system. The TableManagementTools class is not imported by default, and should be manually imported into either a Groovy or Python session.
Note
See TableManagementTools JavaDocs or the TableManagementTools PyDocs
Some of those functions follow.
readTable
Loads a DB binary table:
readTable(Table t, file location)
writeTable
Saves a table to disk in DB binary format:
writeTable(Table t, destination)
deleteTable
Deletes a DB binary format table from disk:
deleteTable(Table t, file location)
getAllDbDirs
Gets all of the directories for an on-disk database:
getAllDbDirs(...)
import class com.illumon.iris.db.tables.utils.TableManagementTools
dirs = getAllDbDirs(String tableName, File rootDir, int levelsDeep)
from deephaven.TableManagementTools import getAllDbDirs
dirs = getAllDbDirs(String tableName, File rootDir, int levelsDeep)