Skip to main content

The MVP trophy goes to...pandas!

· 3 min read
DALL·E prompt: A trophy with gold pandas with colorful confetti in a blurred background
JJ Brosnan

The voters have spoken and revealed their favorite Python module:

img

pandas has emerged as the victor of the Python Package Tournament, with a narrow victory in the championship round over NumPy. Data storage and manipulation are the community's favorite Python tools. Numerical data storage and analysis fought valiantly, with convincing victories all the way until the finals.

This was a battle of two titans in the Python development community. Let's take a quick look at numbers from PyPi to see if the bracket echoes the true download statistics.

According to NumPy stats, NumPy has been downloaded nearly 104 million times in the past month (as of 04/11/2022). pandas, on the other hand, has been downloaded just over 81 million times.

Strangely, the community voted more in favor of the less downloaded package. Shown below are images taken directly from the statistics page:

img

img

pandas averages about 3,000,000 downloads per weekday, and NumPy averages about 4,000,000. Both of these packages are immensely popular among the Python development community.

So, for this year, Wes McKinney and all of the other pandas contributors can revel in their victory. They must be ready to come back strong for another tournament, though, as Travis Oliphant and the NumPy team will use this defeat as motivation to return stronger than ever! For now, pandas will enjoy a champagne shower of data from its virtual championship MVP trophy.

Nevertheless, be sure to use all of your favorite Python packages in celebration of the Python bracket tournament. Use either of the finalists, others in the bracket, or those that didn't make the bracket at all. Regardless of its status in the bracket showdown, every Python package has a place in the development community. The open source Python community continually delivers worthwhile, interesting, and fun packages to incorporate into the development life cycle. It's the community itself that deserves a round of applause for being a part of an ever-growing cornerstone of software as a whole.

Deephaven and pandas

We at Deephaven are big fans of pandas. DataFrames make data storage and manipulation simple and accessible. They are backed by NumPy arrays, so you can apply powerful numerical processing by default to data stored within them. DataFrames are particularly effective for data science, AI, and machine learning applications.

pandas DataFrames, Deephaven Tables and AI/ML

The pandas Python library supports operations on data structures that map directly to Deephaven data tables. pandas can be used in conjunction with Deephaven tables for queries in artificial intelligence and machine learning (AI/ML). Models can be trained using pandas DataFrames then these models can be leveraged in real time on Deephaven tables.

Real-time calculations in Python

pandas allows the Deephaven Core query engine to train AI/ML models, among many other practical data applications. Particularly, pandas and Deephaven play well together along:

  • Training an AI/ML model on a DataFrame, which the user can then test on a real-time Deephaven table
  • Splitting table data with a pandas DataFrame
  • Moving window operations in pandas and Deephaven

Starter projects with pandas

Cool things you can do with pandas + Deephaven in real time:

Talk to us

Which module did you vote for? Let us know how you feel about these results on Twitter, LinkedIn, or Slack!