Replay data from static tables
This guide will show you how to replay historical data as live data in Deephaven.
Deephaven excels at handling live data. Integrating historical data into real-time analysis is common in a multitude of fields including machine learning, validation, modeling, simulation, and forecasting.
For learning, testing, and other purposes, it can be useful to replay pre-recorded data as if it were live.
In this guide, we will take historical data and play it back as real-time data based on timestamps in a table. This example could be easily extended towards a variety of real-world applications.
Get a historical data table
To replay historical data, we need a table with timestamps in DateTime
format. Let's grab one from Deephaven's examples repository. We'll use data from a 100 km bike ride in a file called metriccentury.csv
.
import static io.deephaven.csv.CsvTools.readCsv
metricCentury = readCsv("https://media.githubusercontent.com/media/deephaven/examples/main/MetricCentury/csv/metriccentury.csv")
Replay the data
The data is in memory. We can replay it with the following steps:
Now we can replay our data. Follow these steps:
- Import the Deephaven
Replayer
object. - Set a start and end time for data replay.
- These correspond to those in the historical table.
- Create the replayer using the set start and end time.
- Call
replay
to prepare the replayed table This takes the table to replay and the timestamp column name as input. - Call
start
to start replaying data
import io.deephaven.engine.table.impl.replay.Replayer
startTime = parseInstant("2019-08-25T11:34:56.000 ET")
endTime = parseInstant("2019-08-25T17:10:21.000 ET")
resultReplayer = new Replayer(startTime, endTime)
replayedResult = resultReplayer.replay(metricCentury, "Time")
resultReplayer.start()
- replayedResult
Replay a table with no date-time column
Some historical data tables don't have a date-time column.
import static io.deephaven.csv.CsvTools.readCsv
iris = readCsv("https://media.githubusercontent.com/media/deephaven/examples/main/Iris/csv/iris.csv")
In such a case, a date-time column can be added.
startTime = parseInstant("2022-01-01T00:00:00 ET")
irisWithDatetimes = iris.update("Timestamp = startTime + i * SECOND")
Then, the data can be replayed just as before.
import io.deephaven.engine.table.impl.replay.Replayer
startTime = parseInstant("2022-01-01T00:00:00 ET")
endTime = parseInstant("2022-01-01T00:02:30 ET")
replayer = new Replayer(startTime, endTime)
replayedIris = replayer.replay(irisWithDatetimes, "Timestamp")
replayer.start()
Replay multiple tables
Real-time applications in Deephaven commonly involve more than a single ticking table. These tables tick simultaneously. A table replayer can be used to replay multiple tables at the same time, provided that the timestamps overlap.
The following code creates two tables with timestamps that overlap.
source1 = emptyTable(20).update("Timestamp = '2024-01-01T08:00:00 ET' + i * SECOND")
source2 = emptyTable(25).update("Timestamp = '2024-01-01T08:00:00 ET' + i * (int)(0.8 * SECOND)")
- source1
- source2
To replay multiple tables with the same replayer, simply call replay
twice before start
.
import io.deephaven.engine.table.impl.replay.Replayer
startTime = parseInstant("2024-01-01T08:00:00 ET")
endTime = parseInstant("2024-01-01T08:00:20 ET")
replayer = new Replayer(startTime, endTime)
replayedSource1 = replayer.replay(source1, "Timestamp")
replayedSource2 = replayer.replay(source2, "Timestamp")
replayer.start()
- replayedSource1
- replayedSource2