This month brought a sense of "Community" to the forefront of all our projects. We wrapped up our first Python Madness Tournament, polling the community about their most valuable Python package. Congratulations to pandas! Inspired by user questions, our dev-rel team was particularly prolific, writing several general interest blogs on topics ranging from Google Cloud as an alternative to replacing your laptop, to how to choose the right file format for your data. We also hosted our first AMA on reddit, answering questions about data science, working with large datasets, Python programming, and even the caffeine source of choice of our team.
Highlights from our blog, YouTube channel, and user guide are discussed below.
Blog
Spotlight on Parquet
One of Deephaven's reasons for being is to make working with massive datasets easy. There are a lot of tutorials in the wild about working with large CSV files and we felt we could enter the conversation to suggest Parquet for certain use cases.
- r/place is a social experiment where millions of users cooperate (or compete!) to carve out pixels on a shared artistic canvas. Devin Smith blogs about translating the data from CSV to Parquet. Reducing the 22 GB CSV dataset to a 1.5 GB Parquet file provides a significant advantage to your data analysis.
- Do you want to make pandas 60x faster? Parquet can help with that, too.
Plotting
Ok, you've got data, but you want to create slick visualizations.
- We introduce our Matplotlib and Seaborn plugins. Use these familiar and powerful Python tools in the Deephaven IDE.
- Taking those instructions a step farther, we show how to customize three lines of code to view YFinance data using Matplotlib.
General tutorials
- The Kaggle community houses nearly any data set you can imagine. We show you how to automate the process of getting that data with Python code. Get inspired and be productive quicker.
- Do big science without big costs: we show you how to get a $3800 laptop for $20/month by using Google Cloud.
- JSON is the go-to for working with semi-structured data. But JSON can give you a headache. Learn how to create a real-time JSON database that immediately support complex queries in just a few lines of code.
Guest post
Finally, check out a guest post by Dylan Carter, a high school user who successfully completed his AP Research capstone project using Deephaven. Inspired by some of our example projects, Dylan wrote an AI model for analyzing Solana and Twitter sentiment.
YouTube
If you missed our live stream of the reddit AMA, you can catch up on our YouTube channel.
While you're there, check out our newest Learning Sessions:
- Corey Kosak describes his design for our high-performance CSV reader. Did we mentino it's 325% faster than Apache Commons?
- Jianfeng Mao details our Python API redesign, debuted in March.
- Cristian Ferretti talks about our performance tables, which you can access and work with in the IDE to troubleshoot or finetune your queries.
- Devin Smith and Don McKenzie team up to discuss how we actually designed the front and back-ends of the Python tournament bracket.
- JJ Brosnan talks about AI / ML workflows and the exciting possibilies of real-time machine learning with Python.
- Isn't Deephaven like X, Y, or Z? Yes, but... If you're curious about how we compare to Materialize in particular, our Debezium performance testing series gives you all the details. We suggest starting with Part 3: after you're blown away by Deephaven's speed and nice IDE, go back and watch the context and nitty gritty code details Amanda Martin presents in the earlier installments.
We also continue to post our developer demos weekly. See how Deephaven Community Core continually evolves.
User guide
The Deephaven user guide is constantly updated. Of note:
- How to use Matplotlib and Seaborn
- An updated Application Mode how-to, with supporting articles on how to use Application Mode scripts and libraries.