Analysis of data in queries often requires creating new tables from some or all of the columns in existing tables. The Deephaven API offers a variety of methods that can achieve this. In this guide, five are discussed:
The methods in this guide store results in different ways that can have a huge impact on query performance. The purpose of this guide is to show how to use these methods, not how to pick the best one for your needs. See our concept guide Choosing the right selection method for more insight.
The examples in this guide use a table called
students created using
students table contains data on four students in a class. If you are unfamiliar with the method, check out our guide Create a new table.
from deephaven import new_table
from deephaven.column import int_col, double_col, string_col
students = new_table([
string_col("Name", ["Andy", "Claire", "Jane", "Steven"]),
int_col("StudentID", [1, 2, 3, 4]),
int_col("TestGrade", [85, 95, 88, 72]),
int_col("HomeworkGrade", [85, 95, 90, 95]),
double_col("GPA", [3.0, 4.0, 3.7, 2.8])
Create a table from columns of a source table
view methods allow the user to create a new table containing columns derived from columns in a source table. The examples below show how to use these methods to select two columns from the source table and create a third to determine if a student passed the class.
Notice how the tables
studentsPassedView contain the same data. The
view methods return tables that appear identical, but they differ in how they store their results.
selectcomputes and stores the result in memory.
viewsaves formulas that are recomputed from data in the source table every time a cell is accessed.
students_passed_select = students.select(formulas=["Name", "GPA", "Passed = GPA >= 3.0"])
students_passed_view = students.view(formulas=["Name", "GPA", "Passed = GPA >= 3.0"])
Add columns to a table
lazy_update methods allow the user to add one or more columns to a source table. The examples below show how these three methods keep all of the data from the source table and add an additional column.
students_passed_lazy_update contain all columns from the
students table and one additional column. The
lazy_update methods return tables that appear identical, but they differ in how they store their results.
updatecomputes and stores the new columns in memory.
update_viewsaves the new columns as formulas that are recomputed from data in the source table every time a cell is accessed.
lazy_updatecaches new column formula evaluations so that each set of formula inputs is computed at most once.
students_passed_update = students.update(formulas=["Passed = GPA >= 3.0"])
students_passed_update_view = students.update_view(formulas=["Passed = GPA >= 3.0"])
students_passed_lazy_update = students.lazy_update(formulas=["Passed = GPA >= 3.0"])