Use PyTorch in Deephaven

This guide will show you how to use PyTorch in Deephaven queries.

PyTorch is an open source machine learning library for use in Python. It is one of the most well known and widely used artificial intelligence libraries currently available. It boasts an easy-to-learn Pythonic API that supports a wide variety of applications and features. Given Deephaven's strength in real time and big data processing, it is natural to pair the two.

PyTorch does not come stock with Deephaven's base Docker image. To use it within Deephaven, you can install it yourself or choose one of a Deephaven Docker deployments with support built-in. The following options will work:

The two examples below both solve the same problem of classifying the Iris dataset, which can be found in Deephaven's examples repository. The first example uses PyTorch alone, whereas the second integrates Deephaven tables to perform predictions on live data.

Note

In this guide, we read data from Deephaven's examples repository. You can also load files that are in a mounted directory at the base of the Docker container. See Docker data volumes to learn more about the relation between locations in the container and the local file system.

The Iris flower dataset is a popular dataset commonly used in introductory machine learning applications. Created by R.A. Fisher in his 1936 paper, The use of multiple measurements in taxonomic problems, it contains 150 measurements of three types of Iris flower subspecies. The following values are measured in centimeters for Iris-setosa, Iris-virginica, and Iris-versicolor flowers:

  • Petal length
  • Petal width
  • Sepal length
  • Sepal width

This is a classification problem suitable for a supervised machine learning algorithm. We'll define and create a feed-forward neural network, train it, and test it. We'll determine its accuracy as a percentage based on the number of correct predictions compared to the known values.

Classify the Iris dataset

This first example shows how to use PyTorch to classify Iris flowers from measurements.

Let's first import all the packages we'll need.

Next, we'll import our data, quantize the targets we want to predict, and split the data into training and testing sets.

With that done, we can define and create our neural network. We'll employ a simple feed-forward neural network with two hidden layers. Both hidden layers will use the ReLU activation function.

Now comes the fun bit: training our network. We'll calculate loss using cross entropy, and we'll optimize the network with the Adam algorithm.

Lastly, we'll check how well our model worked.

This performance is pretty good for such a simple model.

This example follows a basic formula for using a neural network to solve a supervised learning problem:

  • Import, quantize, update, and store data from an external source.
  • Define the machine learning model.
  • Set the loss calculation and optimization routines.
  • Train the model over a set number of training epochs.
  • Calculate the model's accuracy.

In this case, a simple neural network is suitable for a simple dataset.

Classify the Iris dataset with Deephaven tables

So we just classified the Iris dataset using a simple feed-forward neural network. That's kind of cool, but it would be cooler to classify a live feed of incoming data. Let's do that with Deephaven!

First, we extend our previous example to train our model on data in a Deephaven table. This requires some additional code.

First, we import everything we need.

Just as we did before, we next import our data. This time, we'll import it into a Deephaven table.

Next we define and create the neural network.

This time, we'll create a function to train the neural network.

Now we need some extra functions. The first two gather data from a Deephaven table into torch tensors of doubles and integers, respectively, while the third extracts a value from the predictions. The extracted values will be used to create a new column in the output table.

Now, we can predict the Iris subspecies from these measurements. This time, we'll do it in a Deephaven table using the learn function. The first time we call learn, we train the model. Then, we use the trained model to predict the values of the Iris subspecies in the Class column.

We've done the same thing as before: we created static predictions on static data. Only this time, we used a Deephaven table. That's not that exciting. We really want to perform the prediction stage on a live data feed. To demonstrate this, we'll create some fake Iris measurements.

We can create an in-memory real-time table using DynamicTableWriter. To create semi-realistic measurements, we'll use some known quantities from the Iris dataset:

ColumnMinimum (CM)Maximum (CM)
PetalLengthCM1.06.9
PetalWidthCM0.12.5
SepalLengthCM4.37.9
SepalWidthCM2.04.4

These quantities will be fed to our table writer and faux measurements will be written to the table once per second for 30 seconds. We'll apply our model on those measurements as they arrive and predict which Iris subspecies they belong to.

First, we create a live table and write data to it.

Now we use learn on the ticking table. All it takes to change from static to live data is to change the table!

All of the code to create and use our trained model on the live, ticking table is below.