Deephaven TableHandles
TableHandle.Rd
A TableHandle holds a reference to a Deephaven Table on the server, and provides methods for operating on that table.
Note that TableHandles should not be instantiated directly by user code, but rather by server calls accessible from
the Client
class. See ?Client
for more information.
Naming tables on the server
When a TableHandle is created, it is not automatically bound to a variable name on the server. This means that the TableHandle that gets created is the only reference to the table that's been created. Importantly, the variable name given to the TableHandle is purely a local variable, and has no relationship to that table's name on the server. For this reason, code like the following:
client <- Client$new(...)
df1 <- data.frame(x = 1:10, y = 11:20)
t1 <- client$import_table(df1)
client$run_script("t2 = t1.update('z = x + y')")
will not run, because the table referenced by the local variable t1
is not named on the server at all. To
make the table referenced by t1
accessible by name on the server (e.g., from within query strings), you must
bind it to a variable with the method bind_to_variable()
. We adopt the convention of calling local TableHandles
th1
, th2
, etc., and server-side tables t1
, t2
, etc., to help distinguish between the two. So, the above
code should be written as:
client <- Client$new(...)
df1 <- data.frame(x = 1:10, y = 11:20)
th1 <- client$import_table(df1)
th1$bind_to_variable("t1")
client$run_script("t2 = t1.update('z = x + y')")
You can then create a local TableHandle to reference t2
as follows:
th2 <- client$open_table("t2")
The above code is not best practice; calling update()
directly on t1
would be preferred to running a script.
It is, however, more illustrative of the relationship between local TableHandles and server-side tables. The best
way to accomplish the above would be the following:
client <- Client$new(...)
df1 <- data.frame(x = 1:10, y = 11:20)
th1 <- client$import_table(df1)
th2 <- th1$update("z = x + y")
# this is necessary to access the tables from within query strings
th1$bind_to_variable("t1")
th2$bind_to_variable("t2")
Methods
Method new()
Initializes a new TableHandle from an internal Deephaven TableHandle.
Usage
TableHandle$new(table_handle)
Method bind_to_variable()
Binds the table referenced by this TableHandle to a variable on the server, so that it can be referenced by that name.
Method head()
Creates a new table containing the first n
rows of this table.
Method tail()
Creates a new table containing the last n
rows of this table.
Method nrow()
Gets the number of rows in the table referenced by this TableHandle.
Method ncol()
Gets the number of columns in the table referenced by this TableHandle.
Method dim()
Gets the dimensions of the table referenced by this TableHandle. Equivalent to c(nrow, ncol)
.
Method merge()
Merges several tables into one table on the server. All tables must have the same schema as this table, and can be supplied as a list of TableHandles, any number of TableHandles, or a mix of both.
Method as_record_batch_reader()
Converts the table referenced by this TableHandle to an Arrow RecordBatchStreamReader.
Method select()
Creates a new in-memory table that includes one column for each formula. If no formula is specified, all columns will be included.
Usage
TableHandle$select(formulas = character())
Method view()
Creates a new formula table that includes one column for each formula.
Usage
TableHandle$view(formulas = character())
Method update()
Creates a new table containing a new, in-memory column for each formula.
Usage
TableHandle$update(formulas = character())
Method update_view()
Creates a new table containing a new formula column for each formula.
Usage
TableHandle$update_view(formulas = character())
Method drop_columns()
Creates a new table that has the same number of rows as this table,
but omits the columns specified in cols
.
Usage
TableHandle$drop_columns(cols = character())
Method group_by()
Creates a new table containing grouping columns and grouped data, with column content is grouped into arrays. If no group-by column is given, the content of each column is grouped into its own array.
Usage
TableHandle$group_by(by = character())
Method ungroup()
Creates a new table in which array columns from the source table are unwrapped into separate rows. The ungroup columns should be of array types.
Usage
TableHandle$ungroup(by = character())
Method update_by()
Creates a table with additional columns calculated from window-based aggregations of columns in this table.
The aggregations are defined by the provided operations, which support incremental aggregations over the
corresponding rows in the table. The aggregations will apply position or time-based windowing and compute the
results over the entire table or each row group as identified by the provided key columns.
See more detailed documentation here or run ?UpdateBy
.
Usage
TableHandle$update_by(ops, by = character())
Method agg_by()
Creates a new table containing grouping columns and grouped data. The resulting grouped data is defined by the
aggregation(s) specified. See more detailed documentation here or run ?AggBy
.
Usage
TableHandle$agg_by(aggs, by = character())
Method agg_all_by()
Creates a new table containing grouping columns and grouped data. The resulting grouped data is defined by the
aggregation(s) specified. See more detailed documentation here or run ?AggBy
.
This method applies the aggregation to all non-grouping columns of the table, so it can only
accept one aggregation at a time.
Usage
TableHandle$agg_all_by(agg, by = character())
Method first_by()
Creates a new table containing the first row of each group.
Usage
TableHandle$first_by(by = character())
Method last_by()
Creates a new table containing the last row of each group.
Usage
TableHandle$last_by(by = character())
Method head_by()
Creates a new table containing the first num_rows
rows of each group.
Usage
TableHandle$head_by(num_rows, by = character())
Method tail_by()
Creates a new table containing the last num_rows
rows of each group.
Usage
TableHandle$tail_by(num_rows, by = character())
Method min_by()
Creates a new table containing the column-wise minimum of each group.
Usage
TableHandle$min_by(by = character())
Method max_by()
Creates a new table containing the column-wise maximum of each group.
Usage
TableHandle$max_by(by = character())
Method sum_by()
Creates a new table containing the column-wise sum of each group.
Usage
TableHandle$sum_by(by = character())
Method abs_sum_by()
Creates a new table containing the column-wise absolute sum of each group.
Usage
TableHandle$abs_sum_by(by = character())
Method avg_by()
Creates a new table containing the column-wise average of each group.
Usage
TableHandle$avg_by(by = character())
Method w_avg_by()
Creates a new table containing the column-wise weighted average of each group.
Usage
TableHandle$w_avg_by(wcol, by = character())
Method median_by()
Creates a new table containing the column-wise median of each group.
Usage
TableHandle$median_by(by = character())
Method var_by()
Creates a new table containing the column-wise variance of each group.
Usage
TableHandle$var_by(by = character())
Method std_by()
Creates a new table containing the column-wise standard deviation of each group.
Usage
TableHandle$std_by(by = character())
Method percentile_by()
Creates a new table containing the column-wise percentile of each group.
Usage
TableHandle$percentile_by(percentile, by = character())
Method count_by()
Creates a new table containing the number of rows in each group.
Usage
TableHandle$count_by(col, by = character())
Method join()
Creates a new table containing rows that have matching values in both tables. Rows that do not have matching criteria will not be included in the result. If there are multiple matches between a row from the left table and rows from the right table, all matching combinations will be included. If no columns to match (on) are specified, every combination of left and right table rows is included.
Method natural_join()
Creates a new table containing all the rows and columns of this table, plus additional columns containing data from the right table. For columns appended to the left table (joins), row values equal the row values from the right table where the key values in the left and right tables are equal. If there is no matching key in the right table, appended row values are NULL.
Method exact_join()
Creates a new table containing all the rows and columns of this table, plus additional columns containing data from the right table. For columns appended to the left table (joins), row values equal the row values from the right table where the key values in the left and right tables are equal.
Method sort()
Creates a new table containing all the rows and columns of this table, sorted by the specified columns.
Arguments
order_by
String or list of strings denoting the names of the columns to sort by.
descending
Boolean or list of booleans denoting whether to sort in descending order. If a list is supplied, it must be the same length as
order_by
.abs_sort
Boolean or list of booleans denoting whether to sort by absolute value. If a list is supplied, it must be the same length as
order_by
.
Examples
if (FALSE) { # \dontrun{
library(rdeephaven)
# connecting to Deephaven server
client <- Client$new("localhost:10000", auth_type = "psk", auth_token = "my_secret_token")
# create a data frame, push it to the server, and retrieve a TableHandle referencing the new table
df <- data.frame(
timeCol = seq.POSIXt(as.POSIXct(Sys.Date()), as.POSIXct(Sys.Date() + 0.01), by = "1 sec")[1:50],
boolCol = sample(c(TRUE, FALSE), 50, TRUE),
col1 = sample(1000, size = 50, replace = TRUE),
col2 = sample(1000, size = 50, replace = TRUE),
col3 = 1:50
)
th <- client$import_table(df)
# get the dimension of the table
dim(th)
# get the last 10 rows of the table
th2 <- tail(th, 10)
as.data.frame(th2)
# create several new columns
th3 <- th$update(c("col4 = col1 + col2", "charCol = col3 % 2 == 0 ? `A` : `B`"))
as.data.frame(th3)
# filter based on parity of col3
th4 <- th3$where("charCol == `A`")
as.data.frame(th4)
# select a subset of columns
th5 <- th3$select(c("timeCol", "col1", "col4"))
as.data.frame(th5)
# drop timestamp column and get sum of remaining columns grouped by boolCol and charCol
th6 <- th3$
drop_columns("timeCol")$
sum_by(c("boolCol", "charCol"))
as.data.frame(th6)
client$close()
} # }