Using Deephaven's Partitioned Tables

There are two things that I care about the most when it comes to creating programs: the runtime and the overall look of my code. This is even more important to me when it comes to large coding projects that involve massive amounts of data. Fortunately, Deephaven Data Lab’s high-performance database has many tools for speeding up runtime and cleaning up code. One such tool is the partitioned user table, which divides a table's data into multiple partitions that can be managed independently. For example, if a user sets the date column in a table as the partitioning column, then the data is saved in groups by each distinct day. This makes it extremely fast for the user to pull up data for a specific day because the table is already divided appropriately.

Example

I recently used a partitioned user table in a program that saved the end of day data for each day in the stock market. Having the end of day values saved in a table is very valuable and greatly reduces the run time of a program that needs these values. On my first attempt, my program worked properly, but the code was repetitive and the resulting table was not optimized. On my second attempt, utilizing a partitioned user table, I immediately noticed my sorts performed faster. This also cleaned up my code and enhanced my saved data.

Let's look at my code before I implemented the partitioned user table:

def endOfDayTableMaker(options, stocks, start, end):
    QueryScope.addParam("start", start)
    QueryScope.addParam("time", convertDateTime(start + "T16:00:00 NY"))
    stocksEOD = stocks.where("Date = start", "Timestamp < time", "AskSize != 0", "BidSize != 0").lastBy("LocalCodeStr").updateView("UnderlyingStockPrice = (AskSize*Bid+BidSize*Ask)/(AskSize+BidSize)")
optionsEOD = options.where("Date = start", "Timestamp < time", "AskSize != 0", "BidSize != 0").lastBy("LocalCodeStr").naturalJoin(stocksEOD, "USym = LocalCodeStr", "UnderlyingStockPrice").updateView("OptionsPrice = (AskSize*Bid+BidSize*Ask)/(AskSize+BidSize)").view("Date", "LocalCodeStr", "USym", "Expiry", "OptionsPrice", "PutCall", "UnderlyingStockPrice", "Strike")
    for day in calendar.businessDaysInRange(calendar.nextBusinessDay(start, 1), end):
        QueryScope.addParam("day", day)
        QueryScope.addParam("time", convertDateTime(day + "T16:00:00 NY"))
        holder = stocks.where("Date = day", "Timestamp < time", "AskSize != 0", "BidSize != 0").lastBy("LocalCodeStr").updateView("UnderlyingStockPrice = (AskSize*Bid+BidSize*Ask)/(AskSize+BidSize)")
        stocksEOD = ttools.merge(stocksEOD, holder)
        holder = options.where("Date = day", "Timestamp < time", "AskSize != 0", "BidSize != 0").lastBy("LocalCodeStr").naturalJoin(holder, "USym = LocalCodeStr", "UnderlyingStockPrice").updateView("OptionsPrice = (AskSize*Bid+BidSize*Ask)/(AskSize+BidSize)").view("Date", "LocalCodeStr", "USym", "Expiry", "OptionsPrice", "PutCall", "UnderlyingStockPrice", "Strike")
optionsEOD = ttools.merge(optionsEOD, holder)
    db.removeTable("Correlation", "EndOfDayV2")
    db.addTable("Correlation", "EndOfDayV2", optionsEOD)
    db.removeTable("Correlation", "EndOfDayV3")
    db.addTable("Correlation", "EndOfDayV3", stocksEOD)
    return optionsEOD, stocksEOD

There are two main issues with this code: first, there is avoidable repetition including the QueryScope functions, the stocksEOD table, and the optionsEOD table; second is the use of the merge function. The repetition can be removed by not initializing the tables outside of the loop... but in order for the merge function to work properly, the tables have to be initialized outside of the loop. I found a solution by This can be done by initializing the tables each time the loop iterates and then saving that data to a partitioned user table. Normally, a whole table would have to be uploaded at once, but with a partitioned user table, a user can upload a table in parts.

Before I go into detail about how I improved this code, I first want to better explain how to use partitioned user tables. In order to save data to a partitioned user table, a user has to set up the location for the partitioned user table, give an outline of the table that will be stored there, and state the column that will be used to partition the data. Deephaven's addPartitionedTableDefinition method accomplishes this:

db.addPartitionedTableDefinition("ExampleNamespace", "ExampleTableName", "ColumnToPartitionOn", table.getDefinition())

After the initial set-up, users can add data to the table. To do so, they must provide the partition that the table will be saved under, adding one partition at a time using the addTablePartition method:

db.addTablePartition("ExampleNamespace", "ExampleTableName", partition, table)

Now a user can add data to this table at any time and pull out data at any time.

Implementing the partitioned table shortened my code and eliminated the merge function, as shown below:

def endOfDayTableMaker(optons, stocks, start, end):
    for day in calendar.businessDaysInRange(start, end):
        QueryScope.addParam("day", day)
        QueryScope.addParam("time", convertDateTime(day + "T16:00:00 NY"))
        stocksEOD = stocks.where("Date = day", "Timestamp < time", "AskSize != 0", "BidSize != 0") .lastBy("LocalCodeStr")updateView("UnderlyingStockPrice = (AskSize*Bid+BidSize*Ask)/(AskSize+BidSize)")
        db.replaceTablePartition("Correlation", "StocksEndOfDay", day, stocksEOD)
        optionsEOD = options.where("Date = day", "Timestamp < time", "AskSize != 0", "BidSize != 0").lastBy("LocalCodeStr").naturalJoin(stocksEOD, "USym = LocalCodeStr", "UnderlyingStockPrice").updateView("OptionsPrice = (AskSize*Bid+BidSize*Ask)/(AskSize+BidSize)").view("Date", "LocalCodeStr", "USym", "Expiry", "OptionsPrice", "PutCall", "UnderlyingStockPrice", "Strike")
        db.replaceTablePartition("Correlation", "EndOfDay", day, optionsEOD)
    return optionsEOD

This code works by creating the end of day table for the stocks and options and then saving the data for that day in a partitioned user table under the partition of that day. This removes the repeated code before the loop and increases the sorting speed of my user table by basing sorts on the partitioning column.