Skip to main content
Version: Python

Work with time

This guide discusses working with time in Deephaven. Deephaven is a real-time data platform that handles streaming data of various formats and sizes, offering a unified interface for the ingestion, manipulation, and analysis of this data. A significant part of the Deephaven experience involves working with data that signifies specific moments in time: calendar dates, time-periods or durations, time zones, and more. These can appear as data in Deephaven tables, literals in query strings, or values in Python scripts. So, it is important to learn about the best tools for these different jobs. Additionally, because the Deephaven query engine is implemented in Java and queries are often written in Python, there are some details to consider when crossing between the two languages. This guide will show you how to work with time efficiently in all cases.

note

This document provides references to Javadocs in numerous locations, as they are essential for understanding and working with time in Deephaven. For those new to Javadocs or in need of a refresher, refer to our guide on how to read Javadocs.

Because time-based data types are integral to working with Deephaven, understanding their representations and how to manipulate them is critical. This will be true for both server-side applications and client-side applications. As such, the date-time types natively supported in Deephaven are a good starting point for this discussion.

Natively supported date-time types

The Deephaven query engine is responsible for executing a query and updating the query results as data changes. The engine is implemented in Java. As a result, all of the date-time types that are natively supported by the Deephaven engine are Java types. Java's java.time package provides types for representing time-related data.

The java.time types natively supported by Deephaven are:

TypeDescription
java.time.ZoneIdA ZoneId represents a time zone such as "Europe/Paris" or "CST".
java.time.LocalDateA LocalDate is a date without a time zone in the ISO-8601 system, such as "2007-12-03" or "2057-01-28".
java.time.LocalTimeLocalTime is a timestamp without a date or time zone in the ISO-8601 system, such as "10:15:30", "13:45:30.123456789". LocalTime has nanosecond resolution.
java.time.InstantAn Instant represents an unambiguous specific point on the timeline, such as 2021-04-12T14:13:07 UTC. Instant has nanosecond resolution.
java.time.ZonedDateTimeA ZonedDateTime represents an unambiguous specific point on the timeline with an associated time zone, such as 2021-04-12T14:13:07 America/New_York. ZonedDateTime has nanosecond resolution. A ZonedDateTime is effectively an Instant with a ZoneId.
java.time.DurationA Duration represents a duration of time, such as "5.5 seconds". Duration has nanosecond resolution.
java.time.PeriodA Period is a date-based amount of time in the ISO-8601 system, such as "P2y3m4d" (2 years, 3 months and 4 days).

The Deephaven Query Language

Despite the fact that all of Deephaven's natively-supported types are Java types, the most common pattern for using Deephaven is through the Python API. So, how can these Java types be created or manipulated from Python? For this, Deephaven offers the Deephaven Query Language.

The Deephaven Query Language (DQL) is the primary way of expressing commands directly to the query engine. It is responsible for translating the user's intention into compiled code that the engine can execute. DQL is written with strings - called query strings - that can contain a mixture of Java and Python code. Thus, DQL query strings are the entry point to a universe of powerful built-in tools and Python-Java interoperability. These query strings are often used in the context of table operations, like creating new columns or applying filters. Here's a simple DQL example:

from deephaven import empty_table

t = (
empty_table(10)
.update(
[
"Col1 = ii",
"Col2 = Col1 + 3",
"Col3 = 2 * Math.sin(Col1 + Col2)",
]
)
.where(
[
"Col1 % 2 == 0",
"Col3 > 0",
]
)
)

In this example, the query engine converts these simple query strings to Java, compiles them, and executes them. When query strings include Python functions, the query engine uses both Java and Python to evaluate the expressions.

note

To learn more about the details of the DQL syntax and exactly what these commands do, refer to the DQL section in our User Guide, which includes guides on writing basic formulas, working with strings, using built-in functions, and more.

There are four important tools provided by DQL that are relevant to the discussion on date-time types.

1. Built-in Java functions

Deephaven has a collection of built-in functions that are useful for working with date-time types. For the sake of performance, these functions are implemented in Java. DQL supports calling these functions directly in query strings, opening up all of Deephaven's built-in Java functions to the Python interface. The following example uses the built-in Deephaven function now to get the current system time as a Java Instant:

from deephaven import empty_table

t = empty_table(5).update("CurrentTime = now()")
note

now uses the current clock of the Deephaven engine. This clock is typically the system clock, but it may be set to a simulated clock when replaying tables.

These functions can also be applied to columns, constants, and variables. This slightly more complex example uses the built-in Deephaven function epochDaysToLocalDate to create a LocalDate from a long that represents the number of days since the Unix epoch:

from deephaven import empty_table

t = empty_table(5).update(
["DaysSinceEpoch = ii", "LocalDateColumn = epochDaysToLocalDate(DaysSinceEpoch)"]
)

In addition to functions, Deephaven offers many built-in constants to simplify expressions involving time values. These constants, like SECOND, DAY, and YEAR_365, are equal to the number of nanoseconds in a given time period. Many of Deephaven's built-in functions operate at nanosecond resolution, so these constants provide a simple way to work with nanoseconds:

from deephaven import empty_table

t = empty_table(1).update(
[
"CurrentTime = now()",
"NanosSinceEpoch = epochNanos(CurrentTime)",
"SecondsSinceEpoch = NanosSinceEpoch / SECOND",
"MinutesSinceEpoch = NanosSinceEpoch / MINUTE",
"DaysSinceEpoch = NanosSinceEpoch / DAY",
"YearsSinceEpoch = NanosSinceEpoch / YEAR_AVG",
]
)

2. Java object methods

Java is an object-oriented programming language. As such, all Java objects have associated methods. These methods can be called in query strings. Here is an example that builds upon the previous example, and uses the getDayOfWeek method bound to each LocalDate object to extract the day of the week for each LocalDate:

from deephaven import empty_table

t = empty_table(5).update(
[
"DaysSinceEpoch = ii",
"LocalDateColumn = epochDaysToLocalDate(DaysSinceEpoch)",
"DayOfWeek = LocalDateColumn.getDayOfWeek()",
]
)

To be clear:

DQL enables them all to be used seamlessly from the Python API!

3. Arithmetic and inequality operators

Query strings support syntactic sugar for special operators such as +, -, >, <, >=, etc. for Java time types! For instance, it makes sense to add a Period to an Instant, or a Duration to an Instant, or to multiply a Duration by an integer. This example uses the built-in parsePeriod and parseDuration functions to create period and duration columns from strings. Then, the overloaded addition operator + is used to add them to the Timestamp column, and the overloaded multiplication operator * is used to create a column with timestamps that increment daily:

from deephaven import empty_table

t = empty_table(5).update(
[
"Timestamp = now()",
"PeriodColumn = parsePeriod(`P1D`)",
"DurationColumn = parseDuration(`PT24h`)",
"TimePlusPeriod = Timestamp + PeriodColumn",
"TimePlusDuration = Timestamp + DurationColumn",
"IncreasingTime = Timestamp + PeriodColumn * i",
]
)
note

This example uses backticks to represent strings. For more info, see the guide on working with strings.

4. Date-times using DQL

In Deephaven, date-time values can be expressed using very simple literal syntax. These literal values can be used directly in query strings or as string inputs to built-in functions.

tip

In query strings, time literals are denoted with single quotes.

This example creates Duration columns from a time literal as well as from a string parsed by parseDuration:

from deephaven import empty_table

t = empty_table(5).update(
["DurationFromLiteral = 'PT24h'", "DurationFromString = parseDuration(`PT24h`)"]
)
note

Note the difference in single quotes for the time literal and back quotes for the string.

Using query string time literals can yield more compact and more efficient code. In the prior example, parseDuration(`PT24h`) is evaluated for every single row in the table, but here 'PT24h' is only evaluated once for the entire table. This can lead to massive performance differences for large tables:

from deephaven import empty_table
import time

t1_start = time.time()
t1 = empty_table(100000000).update("DurationColumn = parseDuration(`PT24h`)")
t1_end = time.time()

t2_start = time.time()
t2 = empty_table(100000000).update("DurationColumn = 'PT24h'")
t2_end = time.time()

t1_time = t1_end - t1_start
t2_time = t2_end - t2_start
print("Using built-in parse function: ", t1_time)
print("Using literal: ", t2_time)

Most of the seven key Java types can be created using literals or functions like parseDuration:

from deephaven import empty_table

# ZoneId columns can be created with literals or the parseTimeZone built-in.
# The literal or string argument must be a valid Java time zone.
t1 = empty_table(1).update(
["TimeZone1 = 'GMT'", "TimeZone2 = parseTimeZone(`America/New_York`)"]
)

# LocalDate columns can be created with literals or the parseLocalDate built-in.
# The literal or string argument must use the ISO-8601 date format 'YYYY-MM-DD'.
t2 = empty_table(1).update(
["LocalDate1 = '2022-03-28'", "LocalDate2 = parseLocalDate(`2075-08-08`)"]
)

# LocalTime columns can be created with literals or the parseLocalTime built-in.
# The literal or string argument must use the ISO-8601 time format 'hh:mm:ss[.SSSSSSSSS]'.
t3 = empty_table(1).update(
["LocalTime1 = '10:15:30.46'", "LocalTime2 = parseLocalTime(`12:01:01.4567`)"]
)

# Instant columns can be created with literals or the parseInstant built-in.
# The literal or string arguments must use the ISO-8601 date-time format `yyyy-MM-ddThh:mm:ss[.SSSSSSSSS] TZ`.
t4 = empty_table(1).update(
[
"Instant1 = '2015-01-30T12:34:56Z'",
"Instant2 = parseInstant(`2023-11-21T21:30:45Z`)",
]
)

# ZonedDateTime columns cannot be created with literals, as they are indistinguishable from Instant literals.
# ZonedDateTime columns can be created with the parseZonedDateTime built-in,
# or by localizing the time zone of an Instant with the atZone() method.
# The string arguments must use the ISO-8601 date-time format `yyyy-MM-ddThh:mm:ss[.SSSSSSSSS] TZ`.
t5 = empty_table(1).update(
[
"ZonedDateTime1 = parseZonedDateTime(`2021-11-03T01:02:03 GMT`)",
"ZonedDateTime2 = '2023-11-21T21:30:45 GMT'.atZone('GMT')",
]
)

# Duration columns can be created with literals or the parseDuration built-in.
# The literal or string arguments must use the ISO-8601 duration format 'PnDTnHnMn.nS'.
# Negative durations are represented as 'P-nDT-nH-nM-n.nS' or '-PnDTnHnMn.nS'.
t6 = empty_table(1).update(
["Duration1 = 'PT6H30M30S'", "Duration2 = parseDuration(`PT10H`)"]
)

# Period columns can be created with literals or the parsePeriod built-in.
# The literal or string arguments must use the ISO-8601 period format 'PnYnMnD'.
# Negative periods are represented as 'P-nY-nM-nD' or '-PnYnMnD'.
t7 = empty_table(1).update(
["Period1 = 'P2Y3M10D'", "Period2 = parsePeriod(`P10Y0M3D`)"]
)

Put it all together

To illustrate the power and ease of working with natively supported date times, this example uses time literals, operator-overloaded arithmetic, and Java time methods together to create timestamps, compute time differences, and extract information about those timestamps in Tokyo and New York timezones:

from deephaven import empty_table

# Create reference time and timestamp column using operator overloading and Java method multipliedBy()
t = empty_table(100).update(
[
"Reference = '2025-10-15T13:23 ET'",
"Timestamp = Reference + 'PT1h'.multipliedBy(ii)",
]
)

# Use operator overloading and diffMinutes() built-in to get time since reference time
t = t.update(
[
"DifferenceNanos = Timestamp - Reference",
"DifferenceMin = diffMinutes(Reference, Timestamp)",
]
)

# Finally, use built-in functions and time zone literals to get date and
# day of week in two different time zones
t = t.update(
[
"DateTokyo = toLocalDate(Timestamp, 'Asia/Tokyo')",
"DayTokyo = dayOfWeek(Timestamp, 'Asia/Tokyo')",
"DateNY = toLocalDate(Timestamp, 'America/New_York')",
"DayNY = dayOfWeek(Timestamp, 'America/New_York')",
]
)

# Assess metadata to see that types are as expected
t_meta = t.meta_table

Hopefully, it's apparent that working with date-times in Deephaven queries is a breeze. With all of the built-in time functions, most date-time operations can be accomplished with native Deephaven operations.

Date-times in Python

Like Java, Python has its own ecosystem of types for representing dates, times, and durations. However, unlike Java, date-time types in Python are implemented and supported by Python packages, which can be thought of as extensions to the language that make it easy for users to share complex code at scale. Each package that implements its own date-time types also provides functions and methods for working with those types. This means that in Python, there are often many ways to solve a date-time problem.

Three primary Python packages support date-time types and operations: datetime, numpy, and pandas.

The datetime package

The datetime package is a part of the Python Standard Library, so it comes with every Python installation. The package provides several types to represent various date-time quantities:

TypeRepresents
dateDates with no time or time zone information
timeTimes with no date information, optionally includes time zone information
datetimeDate-times, optionally includes time zone information
timedeltaDurations or distances between specific points in time
tzinfoTime zones, generally used in tandem with another date-time type

Many types in the datetime package can be equipped with a time-zone. An instance of a type with a time zone is called aware, while instances of types with no time zone information are called naive.

The API is simple:

import datetime

some_date = datetime.datetime(2021, 11, 18, 12, 21, 36)
print(some_date)

It's easy to convert to different time zones:

cst_tz = datetime.timezone(offset=-datetime.timedelta(hours=5))
some_date_cst = some_date.astimezone(cst_tz)
print(some_date_cst)

Durations result from date-time arithmetic:

date1 = datetime.datetime.now().date()
date2 = datetime.date(2011, 3, 9)
duration = date2 - date1
print(duration)

And they can be constructed directly:

another_duration = datetime.timedelta(days=13, hours=14, minutes=56)
print(another_duration)

The simplicity of the datetime API makes it very attractive and easy to use. However, array-like operations are not natively supported. This means that loops or list comprehensions are often required to perform time-based operations on arrays of date-time objects, which can hamper performance considerably.

The numpy package

numpy is the leading implementation of arrays and array operations in Python. It's not a part of the Standard Library. numpy supports many different array types, including date-time types. Only the essential date-time types are supported:

TypeRepresents
datetime64Dates, times, and date-times with no time zone information
timedelta64Durations or distances between specific points in time

Unlike the datetime package, numpy does not support any kind of time zone awareness. In the language of the datetime package, all of the numpy types are naive. Time zone offsets can be manually added to any numpy date-time value, but that can get cumbersome.

import numpy as np

datetime1 = np.datetime64("now")
datetime2 = np.datetime64("2019-07-11T21:04:02")
print(datetime1, datetime2)
print(datetime2 - datetime1)

Since numpy is built around arrays, many of its methods for creating and manipulating arrays will work on date-time types:

datetime_array = np.arange(
np.datetime64("2005-01-01"), np.datetime64("2006-01-01"), np.timedelta64(1, "h")
)
print(datetime_array[0:10])

The array-first model means that operations can be performed on all array elements at once:

print(datetime1 - datetime_array)

numpy's performance is hard to argue with. However, the API's support for various common time-based operations is sparse, and the lack of time zone support means that numpy cannot always be the best tool for the job.

The pandas package

pandas is a popular Python package among data scientists, machine learning experts, and more. It provides substantial date-time support, and offers the best of both datetime and numpy: time-zone awareness, a rich API, and array-like operations for maximum performance.

pandas uses different types to represent scalar and array versions of each type:

TypeRepresents
TimestampSingular dates, times, or date-times, optionally includes time zone information
DatetimeIndexSequences of dates, times, or date-times, optionally includes time zone information
TimedeltaSingular durations less than a day in length
TimedeltaIndexSequences of durations, each less than a day in length
PeriodSingular durations lasting a day or more
PeriodIndexSequences of durations, each lasting a day or more

The date-time types also support imposing a time-zone, but no explicit time zone type is provided, as in the datetime package.

Creating date-times (and arrays of date-times) in pandas is easy:

import pandas as pd

datetime_start = pd.Timestamp("2021-05-30T18:30:30")
datetime_array = pd.DatetimeIndex(
[datetime_start + pd.Timedelta(i, unit="day") for i in range(365)]
)
print(datetime_array[0:10])

The API supports methods - like tz_localize - to impose a time zone on entire arrays:

datetime_array_cst_tz = datetime_array.tz_localize("America/Chicago")
print(datetime_array_cst_tz[0:10])

And there are helper attributes and methods for many common date-time operations, all with array support:

print(datetime_array_cst_tz[0:10].day_of_week)
print(datetime_array_cst_tz[0:10].time)
print(datetime_array_cst_tz[0:10].month)

Overall, pandas offers the richest API with array support. It's more complex than datetime or numpy, but can solve a wider range of problems than both of the alternatives.

deephaven.time

The deephaven.time Python library is a small set of functions that enable conversions between Python's common date-time types and their equivalent Java types. This provides a bridge between Python and Java that can be used with DQL. With these functions, the possible workflows are endless.

Python to Java conversions

The to_j_* series of functions from deephaven.time enable converting any of the common Python date-time types from datetime, numpy, or pandas to one of the seven key Java types.

The deephaven.time library includes:

Each of these functions convert Python date-time types to the respective Java analogs.

Here's an example using to_j_instant:

from deephaven import empty_table
import deephaven.time as dhtime
import datetime
import numpy as np
import pandas as pd

instant1 = dhtime.to_j_instant(datetime.datetime(2022, 7, 17, 12, 34, 56))
instant2 = dhtime.to_j_instant(np.datetime64("2022-07-17T12:34:56"))
instant3 = dhtime.to_j_instant(pd.Timestamp("2022-07-17T12:34:56"))

t = empty_table(1).update(
["Instant1 = instant1", "Instant2 = instant2", "Instant3 = instant3"]
)
t_meta = t.meta_table

The metadata table reveals that the types are indeed the Instant types that are expected. In this way, anything that can be created in Python can be directly converted to Java values that the query engine can efficiently use.

This functionality may be used to extract date-time information from an existing dataset and use it in Deephaven queries programmatically:

from deephaven import read_csv
import deephaven.time as dhtime
import numpy as np
import pandas as pd

# Load crypto data into a pandas dataframe and a Deephaven table
crypto_df = pd.read_csv(
"https://media.githubusercontent.com/media/deephaven/examples/main/CryptoCurrencyHistory/CSV/crypto_sept7.csv"
)
crypto_dh = read_csv(
"https://media.githubusercontent.com/media/deephaven/examples/main/CryptoCurrencyHistory/CSV/crypto.csv"
)

# Raw data does not already have TZ information, so we include it manually
crypto_df["dateTime"] = pd.DatetimeIndex(
crypto_df["dateTime"].str.split(" ").str[0]
).tz_localize("UTC")

# Get Python timestamp values from the dataframe
min_timestamp = crypto_df["dateTime"].min()
max_timestamp = crypto_df["dateTime"].max()

# Convert the Python timestamps to Java
j_min = dhtime.to_j_instant(min_timestamp)
j_max = dhtime.to_j_instant(max_timestamp)

# Filter the Deephaven table using the min and max timestamps
trimmed_crypto_dh = crypto_dh.where("dateTime >= j_min && dateTime <= j_max").sort(
"dateTime"
)

Java to Python conversions

In addition to providing functions to convert Python types to their equivalent Java types, deephaven.time provides a suite of functions to convert Java types to Python types. These functions support converting any of the seven key Java types to their Python equivalents.

The Java-to-Python conversion methods included in deephaven.time are:

A common use case for this functionality is in writing Python functions that act on Deephaven columns from within query strings. Here's an example that trains a simple ARIMA model on a toy dataset and uses a Python function to make forecasts with that model in a Deephaven table:

import os

os.system("pip install statsmodels")
from statsmodels.tsa.arima.model import ARIMA

from deephaven import empty_table
import deephaven.pandas as dhpd
import deephaven.time as dhtime

t = empty_table(60 * 24).update(
[
"X = ii",
"Timestamp = '2021-01-01T00:00:00Z' + 'PT1m' * X",
"Y = 20 + X / 3 + Math.sin(X) + randomGaussian(0.0, 1.0)",
]
)

# train on first 18 hours, test on last 6
t_training = t.head(60 * 18)
t_testing = t.tail(60 * 6)

# ARIMA requires that the timestamp column be set as the index
training_df = dhpd.to_pandas(t_training).set_index("Timestamp")
model = ARIMA(
endog=training_df["Y"].astype(float), order=(5, 1, 1), dates=training_df.index
)
res = model.fit()
print(res.summary())


# prediction function to apply to each timestamp.
# type hints enable Deephaven to interpret return type correctly
def predict(timestamp) -> float:
# here, we use deephaven.time to convert the value from DH to something that the model can use
pd_timestamp = dhtime.to_pd_timestamp(timestamp).tz_localize("UTC")
prediction = res.predict(start=pd_timestamp)
return prediction


t_testing_eval = t_testing.update(["YPred = predict(Timestamp)"])

Evaluating the mean squared error of this model shows that it performs well on average:

import deephaven.agg as agg

mse = t_testing_eval.update("SqErr = Math.pow(Y - YPred, 2)").agg_by(
agg.avg("MeanSqErr = SqErr")
)

Current time and the Deephaven Clock

The deephaven.time module includes two functions, dh_now and dh_today that return current date-time information according to the Deephaven clock.

Typically, the Deephaven clock is set to the system clock of the computer the software is running on. However, this is not always the case. In particular, TableReplayer sets the Deephaven clock to a simulated clock that aligns with the timestamps being replayed. Thus, the Deephaven clock will be different from the system clock, and it is important to be aware of cases where this may arise.

Here is simple example of dh_now and dh_today:

import deephaven.time as dhtime
import datetime

print(datetime.datetime.now())
print(dhtime.dh_now())
print(dhtime.dh_today())

Time zones and aliases

Keeping track of time zones is an important task when working with date-time data. deephaven.time offers three functions to facilitate working with time zones in Deephaven:

To get the current time zone of the running Deephaven engine, use dh_time_zone:

import deephaven.time as dhtime

print(dhtime.dh_time_zone())

The Deephaven engine defaults to Coordinated Universal Time (UTC), but this can be changed if needed. Additionally, the time zone displayed by the UI can be changed without modifying the underlying engine time zone.

Deephaven can make use of any valid Java time zone within queries. However, full Java time zone names like America/Chicago can be cumbersome to type out repeatedly. To make query strings shorter, Deephaven supports the use of common time zone aliases that can be used in place of the full Java names:

from deephaven import empty_table

t = empty_table(1).update(
[
"ChicagoTime = '2017-04-29T10:30:00 America/Chicago'",
"ChicagoTimeAlias = '2017-04-29T10:30:00 CT'",
]
)

When Deephaven's default time zone aliases are not enough, the time_zone_alias_add function from deephaven.time can be used to create a custom alias for any time zone. This custom alias can then be removed with time_zone_alias_rm. Here's an example of creating an alias for Indian Standard Time, and then removing it:

from deephaven import empty_table
import deephaven.time as dhtime

dhtime.time_zone_alias_add("IST", "Asia/Calcutta")
t = empty_table(1).update(
[
"IndiaTime = '2017-04-29T10:30:00 IST'",
]
)

dhtime.time_zone_alias_rm("IST")
note

Deephaven only supports using one alias per time zone. To create a custom alias for a time zone that already has a default alias, the default must first be removed with time_zone_alias_rm before the new alias is added with time_zone_alias_add.

Parse and format date-time columns

Date-time values in real datasets often do not come formatted in a way that Deephaven can automatically convert to one of the key Java types. Take this example:

from deephaven import read_csv

gsod = read_csv("/data/examples/GSOD/csv/station_data.csv")

The BEGIN and END columns are dates, formatted as YYYYMMDD. However, Deephaven does not recognize that format as a date-time format, and interprets the columns as integers:

gsod_meta = gsod.meta_table

This means that none of the date-time functionality previously discussed will be available. To remedy this, deephaven.time provides the simple_date_format function, which makes parsing ingested date-time values easy.

To use simple_date_format, pass a string that describes the format of the input date-time values. Since the BEGIN and END columns are formatted as YYYYMMDD, the argument to simple_date_format should be "YYYYMMDD":

from deephaven.time import simple_date_format

# pass the format of the input date-time
input_format = simple_date_format("YYYYMMDD")

Now, input_format is an object that can be used in a query to format the BEGIN and END columns. This is done with the parse method, which accepts the input date-time columns as strings. So, the int values must be converted to strings before calling parse. Finally, call toInstant on the result to get the formatted date-time as a Java Instant:

# use String.valueOf() to get int as string, then pass to input_format.parse(), finally call toInstant()
format_gsod = gsod.update(
[
"BEGIN = input_format.parse(String.valueOf(BEGIN)).toInstant()",
"END = input_format.parse(String.valueOf(END)).toInstant()",
]
)

The resulting columns can then be converted to any of the desired Java date-time types using Deephaven's built-in functions. This example converts BEGIN and END to LocalDate columns using toLocalDate:

format_gsod_local_date = format_gsod.update(
["BEGIN = toLocalDate(BEGIN, 'UTC')", "END = toLocalDate(END, 'UTC')"]
)

To verify that these columns have the correct type, check the metadata table:

format_gsod_meta = format_gsod_local_date.meta_table

Once the data have been converted to a proper Java date-time format, they can be reformatted using format_columns. Note that this reformatting does not affect the internal representation of the data, but rather how it's presented in the UI. Only Instant columns can be formatted, so other types must be converted to an Instant before reformatting them. The desired format must be a valid Java date-time format.

Here's an example that converts the LocalDate columns from above to Instant columns, and reformats them using format_columns:

reformat_gsod_local_date = format_gsod_local_date.update(
[
"BEGIN = toInstant(BEGIN, '00:00:00', 'UTC')",
"END = toInstant(END, '00:00:00', 'UTC')",
]
).format_columns(["BEGIN = Date(`EEE MMM d, y`)", "END = Date(`EEE MMM d, y`)"])

Performance and the Python-Java Boundary

Because Deephaven queries can contain Python and the Deephaven query engine is implemented in Java, performance-conscious users should be aware of Python-Java boundary crossings. Every time a query's execution passes between Python and Java, there is a performance penalty. Fast queries must be written to minimize such boundary crossings.

danger

Avoid deephaven.time functions in query strings. Because deephaven.time provides functions that convert between Python and Java types, every call crosses the Python-Java boundary. Thus, each call incurs a tiny overhead. This overhead can add up when applied millions or billions of times during a query. Instead, use the built-in functions that provide the same functionality and do not cross the Python-Java boundary.

Here is an example that demonstrates the effects that excess boundary crossings can have on performance:

from deephaven import empty_table
import deephaven.time as dhtime
import datetime
import time

# 30 days worth of data, one entry every second
t = empty_table(60 * 60 * 24 * 30).update(
"Timestamp = '2022-01-01T00:00:00 America/New_York' + SECOND * ii"
)


# many boundary crossings -- timestamp is passed in as a Java instant
def shift_five_seconds(timestamp):
# convert a Java Instant (timestamp) to a Python datetime
shifted_datetime = dhtime.to_datetime(timestamp) + datetime.timedelta(seconds=5)
# convert a Python datetime to a Java Instant
return dhtime.to_j_instant(shifted_datetime)


t1_start = time.time()
t1 = t.update("ShiftedTimestamp = (Instant) shift_five_seconds(Timestamp)")
t1_end = time.time()

# no boundary crossings -- pure Java query string
t2_start = time.time()
t2 = t.update("ShiftedTimestamp = Timestamp + 'PT5s'")
t2_end = time.time()

The resulting tables are identical, but their execution times differ enormously:

t1_time = t1_end - t1_start
t2_time = t2_end - t2_start
print("Many boundary crossings: ", t1_time)
print("No boundary crossings: ", t2_time)

Careful inspection of the slower code reveals many boundary crossings. If the table has n rows, there are:

  1. n boundary crossings to call the Python function
  2. n boundary crossings to convert the Java Instant to a Python datetime
  3. n boundary crossings to convert a Python datetime to a Java Instant

Thus, in total, there are 3n boundary crossings, which significantly slows the query.

Now let us consider a case where a sequence of time values is accessed by index in a query. In the following example, this computation is done three different ways. Each way has a different number of boundary crossings and therefore different performance. Comparing the execution times, there is a significant difference. The small overheads from the boundary crossings can add up.

from deephaven import empty_table
import deephaven.time as dhtime
import deephaven.dtypes as dht
import pandas as pd
import time

timestamps = pd.date_range(
start="2023-01-01T00:00:00", end="2023-12-31T23:59:59", freq="min"
)
t_empty = empty_table(60 * 24 * 365)


# Case 1: 2n boundary crossings -- 2 Python function calls per row
def get1(idx):
return timestamps[idx]


start = time.time()
t1 = t_empty.update("Timestamp = (Instant)dhtime.to_j_instant(get1(i))")
print(f"Case 1: {time.time()-start} sec; {(time.time()-start)/t_empty.size} sec/row")


# Case 2: n boundary crossings -- 1 Python function call per row
def get2(idx):
return dhtime.to_j_instant(timestamps[idx])


start = time.time()
t2 = t_empty.update("Timestamp = (Instant)get2(i)")
print(f"Case 2: {time.time()-start} sec; {(time.time()-start)/t_empty.size} sec/row")

# Case 3: 1 boundary crossing -- create a Java array of Java instants
start = time.time()
j_array = dht.array(dht.Instant, timestamps)
t3 = t_empty.update("Timestamp = j_array[i]")
print(f"Case 3: {time.time()-start} sec; {(time.time()-start)/t_empty.size} sec/row")

Finally, let us use Python to get the start of the current day and then use that datetime in a query. In this example, the first case does the time manipulation mostly in Python. The second case converts the Python datetime to a Java Instant one time for use in the query.

import time
import datetime
from deephaven import empty_table
import deephaven.time as dht

t_empty = empty_table(1_000_000)
start_of_day = datetime.datetime.combine(datetime.datetime.now(), datetime.time.min)


# Case 1: Work in terms of Python times
def plus_seconds(i):
return dht.to_j_instant(start_of_day + datetime.timedelta(seconds=i))


start = time.time()
t1 = t_empty.update("Timestamp = (Instant) plus_seconds(ii)")
print(f"Case 1: {time.time()-start} sec; {(time.time()-start)/t_empty.size} sec/row")

# Case 2: Make a single conversion from a Python time to a Java time and then work in terms of Java times
start_of_day_instant = dht.to_j_instant(start_of_day)
start = time.time()
t2 = t_empty.update("Timestamp = start_of_day_instant + ii*SECOND")
print(f"Case 2: {time.time()-start} sec; {(time.time()-start)/t_empty.size} sec/row")

Again, the resulting tables are identical, but the execution times are quite different, and boundary crossings still tell the story. Use deephaven.time methods outside of query strings for the fastest performance.

Appendix

Python TypeJava TypeJava to PythonPython to Java
datetime.dateLocalDateto_dateto_j_local_date
datetime.timeLocalTimeto_timeto_j_local_time
datetime.datetimeInstantto_datetimeto_j_instant
datetime.datetimeZonedDateTimeto_datetimeto_j_zdt
datetime.timedeltaDurationto_timedeltato_j_duration
datetime.timedeltaPeriodto_timedeltato_j_period
datetime.tzinfoZoneIdNAto_j_time_zone
numpy.datetime64LocalDateto_np_datetime64to_j_local_date
numpy.datetime64LocalTimeto_np_datetime64to_j_local_time
numpy.datetime64Instantto_np_datetime64to_j_instant
numpy.datetime64ZonedDateTimeto_np_datetime64to_j_zdt
numpy.timedelta64Durationto_np_timedelta64to_j_duration
numpy.timedelta64Periodto_np_timedelta64to_j_period
pandas.TimestampLocalDateto_pd_timestampto_j_local_date
pandas.TimestampLocalTimeto_pd_timestampto_j_local_time
pandas.TimestampInstantto_pd_timestampto_j_instant
pandas.TimestampZonedDateTimeto_pd_timestampto_j_zdt
pandas.DatetimeIndexNANANA
pandas.TimedeltaDurationto_pd_timedeltato_j_duration
pandas.TimedeltaPeriodto_pd_timedeltato_j_period
pandas.TimedeltaIndexNANANA
pandas.PeriodNANANA
pandas.PeriodIndexNANANA