In any data collection or analysis, accuracy is paramount. But what about when you needed the metrics yesterday?!
The intersection of AI and real-time data is important. PyTorch, TensorFlow, Sci-Kit-Learn, and NLTK are ubiquitous libraries for data science workflows. To service real-time use cases, they pair well with Deephaven, a query engine that brings Python to "streaming dataframes" -- essentially "tables that update".
Uniting these two powerhouses should be easy. The deephaven.learn library was released in December, providing a gather-compute-scatter paradigm to Deephaven's streaming tables, in support of AI integrations.
Today, the team released an upgrade to the deployment options available to AI-motivated users. With a simple curl script, you can now download a Docker image that combines Python, Deephaven, and the AI library of your choice.
Copy, paste... done.
Reddit RSS feeds and Python AI are a powerful combination. With them, you can track the sentiment trends of the topics you care about.
Deephaven bridges the gap. You can explore not only the history of subreddits, but track the streaming conversation in real-time. In this article, we provide a program that ingests RSS feeds into Deephaven tables and performs simple sentiment analysis with easy to customize Python queries. We intend the code to be DIY - since RSS feeds are standardized, these methods can be applied to any RSS feed, such as those that source Wikipedia, Hackernews, CNN, and podcasts.
Read on for a look at WallStreetBets and what people are saying about meme stocks.
New and dynamic data drives business value. Modern real-time systems are increasingly engineered as a combination of a pub-sub system for streams and a query engine that can deliver AI, table operations, and application logic. Dave Menninger, an analyst at Ventana Group, predicts, "In the next 3-5 years, streaming will become the default way in which we deal with data." This way forward must recognize that historical data will also continue to provide value and context to analytics at the leading edge. Any approach must bring together batch and stream.
The combination of Redpanda and Deephaven offers an exciting and empowering solution. Redpanda is a leading streaming platform, and Deephaven is a query engine and interoperable framework built from the ground up to work with real-time data.
Stock analysis is an obvious example. Historical data provides rich training sets for the predictions that the real-time data drive. Below, we access live stock market data from dxFeed through their Python API and publish it to Kafka Redpanda topics. Those are then streamed into Deephaven, where they are rendered as "tables that update", enabling users to use Python and do table operations in familiar ways, magically inheriting the real-time changes.
If haven't heard of WORDLE, then you probably haven't been on the internet lately. Chances are your friends are routinely posting their outcomes. Each day, a new WORDLE goes live, challenging players to beat their prior scores and of course, show their awesomeness on social media. Every person plays the same word each day, so no spoilers, please!
Twitter is a treasure chest of data for social sentiment. The money made in meme stock trading is a stunning example of the power of community sentiment. Nevertheless, you could spend hours manually scrolling social media without gleaning meaningful insight. Or, you could equip yourself with automation and ML to measure sentiment - in real-time - and produce useful results.
Python and TensorFlow's natural language processing libraries and Deephaven's stream-and-batch table engine together meet this challenge. Below, I share the details of an app I made to do natural language processing of the Twitter feed and marry it to stock price information - all in real-time.
This starter program looks at tweets about cryptocurrency; however, the possibilities are endless and we encourage you to tailor these queries to suit your interests.
Deephaven remains laser focused on making real-time data easy for everyone – on its own and coupled with static data.
To support Community, we have been working on documentation coverage and established a new deephaven-examples GitHub organization to serve as a central warehouse of illustrative use cases. We encourage the community to contribute examples there as well.
Further, the Deephaven YouTube channel continues to grow. Subscribe to view the new content we plan to drop each week.
The description below follows the organization of development themes presented in our 2022 Roadmap.
Combining your real-time data into a single source of truth - in this case, one table that you can manipulate - makes for easier, efficient analysis.
In the previous two parts of our Prometheus series, we discussed how to ingest data from both the Prometheus REST API and from Prometheus alert webhooks. Now we have two steady streams of data: one that tracks our metrics, and one that tells us when alerts have been fired and resolved.
In this post, we'll combine these streams of data into a single table, allowing us to track our metrics with the alerts that are fired and resolved.
As we close out 2021, we completed a major update of the Deephaven Community Core documentation to reflect code changes introduced in the latest release. Current users will notice streamlined aggregation queries, new CSV import / export methods, changes to our date-time methods, and several Python-specific improvements. This paves the way for more code enhancements in the new year.
- We updated the Deephaven Community Core documentation with our new plotting tool that automatically renders charts in the docs as it tests the code.
- We added video how-to guides on working with Parquet and Kafka, as well as a walkthrough of launching Deephaven with Docker.
- We published part 2 of our Promethesus blog series, "Collecting one-time event data with Deephaven and Prometheus".
- We expanded our crytocurrency content with a new article spotlighting Doge coin, "Crypto made easy".