Writing Python, doing data science, working with real-time data, and thinking about baseball.
That will be my summer. (Well, that and getting absolutely shredded in the weight room.)
Here are the Cliff Notes: My buddy, Joshua, and I landed internships at a data software company, Deephaven Data Labs. We’re both in college STEM programs and were told the roles would involve programming and modeling in a big way and that we could work together. Nice.
The first day on the job we were told to design a data-driven project for the summer.
This I didn’t expect: the first day on the job we were told to design a data-driven project for the summer. The parameters took a bit of time to digest.
- The project needed to involve data and modeling.
- The data needed to be publicly available so we could tell stories about it.
- We needed to believe that whatever use case we pursued would be better with real-time data.
Joshua's answer: "Baseball." He plays college ball and is pretty into it, so I dismissed his mumblings at first, but he pushed on. He suggested we make a summer project of the MLB Beat-the-streak fantasy game. Here’s what he had in mind:
- Try to become bona fide modelers of the game – essentially turn ourselves into Sabermetrics people focused on predicting hits in Major League Baseball.
- Use Deephaven to create analytics and experiences that update in real time – pitch-by-pitch. (Such a thing doesn’t exist for the Beat-the-Streak game today.)
- Add fun experiences around the game and convince MLB to democratize access to real-time data.
- Be a part of the movement to help someone win the darn game!
$5.6 million and MLB's Beat-the-Streak
The premise of the fantasy game BTS is compelling. In 1941, Joe DiMaggio got a hit in 56 games in a row. It is universally accepted that this was a freakish feat. As you can see below, no one else has come close.
Maybe pitchers, stadiums, or managers have hurt hitters, because streaks in the last two decades are nowhere close to DiMaggio.
The object of the BTS game is to simply beat DiMaggio's streak. Here is the twist, though, and it should be a big advantage: Joe's streak was just one player -- Joe! When playing BTS, you can pick a different player day by day. Change players every day if it suits you. That seems like a good deal.
String one-hit-per-day together for 57 days in a row, and Major League Baseball will give you $5.6 million. That’s a nice bag.
The simple math
When I told my dad about the summer project and the prize money, he gave me a “Cool your jets, Rockefeller. Do the math.”
OK, it’s a long shot. From an expected value point of view. Here is a quick analysis:
- The average batting average last year was 0.244.
- However, BA is calculated by dividing the number of hits by the number of at-bats.
- An at-bat ignores a lot of outcomes at the plate: Walk, hit-by-pitch, sacrifice.
- It turns out that 21.8% of appearances at the plate resulted in a hit in 2021.
- And the average number of plate appearances per starting player per game was 3.8.
However, the probability of doing that 57 times in a row is small.
As you can see from the table below, the number of plate appearances in a game matters a lot. The compounding effect of small differences is incredible to have laid out in front of you.
Looking at the "Odds" column at the right in the table above, the pink cells highlight the win probabilities one can expect if players' hit-per-game probability is in the 70-74% range.
It sure would be nice to move toward the white cells, where reasonably we could expect someone to win the game each MLB season. If you look at the screenshot below, MLB suggests "best-candidate" players in the 70-74% range. It remains open whether we, as a community, can discover a model that improves on MLB's.
Is MLB more generous than Warren Buffett?
Here is a screenshot from today's Beat-the-Streak app.
Relying on the 74% chance of getting a hit in a game that appears in the image, then the odds of streaks look like this:
As of the New York Times article last year, there have been more than 100 million attempts at the game. There are no winners yet, so 1-in-28 million is likely best case scenario right now.
Warren Buffett has famously claimed he'll pay $1 billion to anyone that can complete a perfect March Madness bracket. But the odds of doing so are 1 in 9.2 quintillion!
Unequivocally, the MLB is being more generous with BTS than Buffett's bracket offer. Whether the odds are 1-in-28 million or 1-in-280 million that would be true. (You can see who is in the sports business and who runs an insurance company.)
Let's Go
This summer, Joshua and I are going to see if we can convincingly increase the forecasted probability for a selected player to move above 74%. We're hoping some accomplished Sabermetricians weigh in and volunteer guidance. We intend to seek them out.
If we can increase the projected probabilities, there may be opportunity to crowd-source the game and find a way to get someone over the Joe DiMaggio finish line. In a dream scenario maybe a bunch of influencers could amass 10 million Streakers and the game (and Joe) could be beat.
We'll use Deephaven to simulate streaks and create user experiences that emulate a league.
While doing data science research, we'll use Deephaven to simulate BTS streaks and create user experiences that emulate a league. Its easy-to-configure replay technology will give us a taste of what real-time data will be like.
It might be fun to watch what other Streakers are doing. (Pardon the phrase.)
What streaks are alive? Who is everyone picking?
We'd like to be able to click on an MLB player and register for notifications before their at-bat.
Losing $5.6 million might be good for Major League Baseball
The average age of an MLB fan is 57 years old. (Ironic number, eh?)
With the right social experiences, BTS might drive renewed interest in baseball. After all, fantasy football seemingly generates more passion for the NFL than traditional fantasy baseball does today. (I understand there is a lot packed in that sentence.) Beat the Streak is the right fantasy game to spin baseball for a broader audience, particularly a younger one.
Shouldn't the MLB be doing whatever it can to get someone to win BTS? More offense, more interest in the game, younger audience skew. I'm no marketer, but isn't that a trifecta for (a mere) $5.6 million?
Streaking alone is a lot less fun
If you'd like to swap baseball math thoughts, lend a hand on some models, or learn about Deephaven (this supposedly powerful, real-time analytics and modeling system that we're learning to use), please reach out on Slack, or follow Joshua and me on Twitter.
Beat the Streak has tremendous potential. It'll be a fun summer using Python tools, moving towards real-time data, and nurturing enthusiasm for baseball.
Go Twins!