Too often I've been told, "Python is not an option for processing real-time data feeds." Too often I've Googled "Python for real-time data" and found answers limited to a single application far too specific for my needs. Too often the question "what programming language should I use for this real-time data problem?" yields suggestions that exclude Python.
If you're a data scientist that works in Python, then Deephaven's Learn library should be your new favorite tool. It has been engineered specifically to deliver Deephaven's "streaming table" capabilities - i.e., tables that magically update, append, or are otherwise dynamic - to Python data science, seamlessly, intuitively, and with good performance.
The Learn library facilitates interacting with data in Deephaven tables using a gather-compute-scatter paradigm. Gather data from a Deephaven table into a NumPy array, compute anything you want using the gathered data, and then scatter results back into a table. These steps can be used and will work identically whether the Deephaven table is static or dynamic. Neither your mental model nor your code needs to change as you move between static and real-time use cases.
Python and real-time data
Python isn't widely considered the strongest choice of language for real-time data processing applications. Despite this, TIOBE ranks Python as its #1 programming language, saying, "There are no signs that Python's triumphal march will stop soon." If you've read this far, it means you probably have some good reasons for using Python. Python is known for its amazing open-source community. PyPi
has nearly 350,000 packages freely available and easy to download and use.
Some Python modules can pull real-time data from various sources. Notable examples include urllib
for pulling data from website URLs, yfinance
for pulling stock data, and pycoingecko
for pulling live cryptocurrency values. Most code examples you'll find online that use these APIs aren't specific to real-time data, despite the fact that doing so is simple.
AI/ML packages like PyTorch
, SciKit-Learn
, and TensorFlow
are well-known in the Python community. These modules make creating, testing, and deploying AI/ML models significantly easier. However, guides and examples seldom use a real-time data feed. Frustratingly, most guides train a model on some historical data and call it a day. "This would work on real-time data!" isn't enough. I want to see it work, and with Deephaven's platform, that's a real possibility. Making models work on real-time data in Deephaven requires minimal effort outside of setting up the real-time data feed.
There are many Python modules currently available. Deephaven's Learn library is the only one that:
- Works seamlessly with both static and real-time Deephaven tables.
- Works with whatever module you want to use for AI/ML/data processing.
The Deephaven Learn library
deephaven.learn
is a new submodule within the deephaven
Python module that enables AI/ML models in Python to seamlessly interact with data stored in Deephaven tables. When using deephaven.learn
, you don't need to worry if the table is static or real-time; the model utilizes Deephaven's table update model to operate on only the data you want it to operate on.
If you'd like to see deephaven.learn
in action, I've already published code in a previous blog post titled Detecting Credit Card Fraud with Deephaven. The post details how a trained ML model can work on a real-time data feed with minimal work.
Check out our Examples repository, which contains numerous data sets that can be used with deephaven.learn
, as well as some code examples.
Python isn't widely considered to be a strong solution for real-time data processing despite the wealth of modules that make data processing easier than ever. We at Deephaven hope to change this belief. By using deephaven.learn
, you can train and use models on data in either historical or real-time formats while requiring minimal extra work to go from one to the other.
Got a cool idea for an AI/ML project and think it could work in the real-time domain? Try Deephaven and make it happen faster.
Learn more
Our docs include guides on using our module as well as integrating other modules into Deephaven:
- How to use deephaven.learn - Predict insurance charges and credit card fraud
- How to use PyTorch in Deephaven - Classify Iris flowers in real time
- How to use SciKit-Learn in Deephaven - Classify Iris flowers in real time
- How to use SciPy in Deephaven - Filter a noisy signal, compute nearest neighbors, and compute distances using various metrics
- How to use TensorFlow in Deephaven - Classify Iris flowers in real time