Data-driven bracket picks: March Madness analytics before tip-off

Use historical tournament data to find value picks and optimize your bracket strategy

March 3 2026

Margaret KennedyMargaret KennedyCommunications Director @Deephaven
ClaudeClaudeAI Assistant @Anthropic
AI Prompt: A golden basketball surrounded by floating holographic statistics breaking through a tournament bracket grid, other basketballs faded in background, dramatic lighting with orange and blue tones

Your bracket pool has 50 entries. Everyone picks the 1-seeds to the Final Four. The chalk brackets pile up, and when Duke wins it all, you split the pot 20 ways.

The smart money isn't on picking winners — it's on picking different winners. Upsets that others miss. Value picks where the data says the seed is wrong.

Let's use Deephaven to analyze historical tournament data, calculate upset probabilities, and build a bracket strategy that maximizes your edge.

The best bracket isn't the most accurate. It's the most accurate where others are wrong.

Historical upset rates

First question: how often do upsets actually happen? Let's load historical tournament results and find out.

This example uses data from the deephaven/examples repo, sourced from Kaggle: March Madness Data by nishaanamin. Deephaven can read CSVs directly from URLs:

This dataset shows how each seed has performed historically — wins, losses, and advancement rates by round. The original CSV has a WIN% column, but Deephaven sanitizes it to WIN (special characters are removed):

The view operation selects specific columns from a table, and sort orders the rows. Look at the results: 5-seeds win only about 53% of their games, while 12-seeds win 35%. That's a coin flip dressed up as a mismatch.

The 12-vs-5 matchup is notorious for a reason — historically, 12-seeds win about 35% of the time. That's not a fluke; it's a pattern. The 5-seed is often an overrated power conference team, while the 12-seed is a hot mid-major.

Finding value picks

Upset rates tell you what happens on average. But this year's bracket has specific teams. We need to identify where the seed doesn't match the talent — teams that are better than their seeding suggests.

The where operation filters rows — here we keep only 2025 tournament teams (ROUND < 68 means they made the 64-team field, excluding play-in losers). We then view just the columns we need. Notice how KADJ EM RANK from the CSV is referenced as KADJ_EM_RANK — Deephaven automatically sanitizes column names.

Tip

CSV column name sanitization: When Deephaven reads CSVs, column names are automatically sanitized to be valid identifiers — spaces become underscores, and special characters are removed. So KADJ EM RANK becomes KADJ_EM_RANK, and WIN% becomes WIN. For other CSV options like custom delimiters or encodings, see CsvSpecs.

Now let's find teams where the power ratings suggest they're better than their seed:

The update operation adds new calculated columns to each row. Here we're computing how much better each team's KenPom ranking is than their seed suggests — a 12-seed ranked like a typical 6-seed has serious upset potential. Teams at the top of this list are your value picks.

Look for games where:

  • Historical upset rate is high (12-vs-5, 11-vs-6)
  • Efficiency gap is small — the lower seed is better than their seeding suggests
  • Schedule strength differs — a mid-major with a weak schedule but high efficiency often gets underseeded

Pool strategy: contrarian picks

Here's where game theory enters. If everyone in your pool picks Duke, you don't gain anything when Duke wins — you just keep pace. You gain when you're right where others are wrong.

Now we combine our value picks with public sentiment to find contrarian opportunities. The join operation merges two tables based on a matching column — in this case, team name:

The 12-seed with 35% win probability that only 28% of brackets pick? That's your edge. You're not just betting on an upset. You're betting on an upset that separates you from the field.

Building your bracket

Putting it together:

Your data-driven bracket strategy:

  1. Lock in the chalk where efficiency gaps are large (1-vs-16, 2-vs-15).
  2. Take calculated upsets in 12-vs-5, 11-vs-6 where efficiency says the game is closer than the seed.
  3. Maximize contrarian value — pick upsets where the public is underweighting the lower seed.

What data can't tell you

Deephaven gives you tools to analyze data — not a crystal ball. We're not promising you'll win your bracket pool. March Madness is called "madness" for a reason.

A few caveats:

  • Injuries matter — a last-minute injury to a star player changes everything.
  • Hot streaks are real — a team peaking at the right time can outperform their season metrics.
  • Experience counts — tournament experience (coaches and players) doesn't show up in efficiency ratings.
  • Luck is real — a bouncing ball, a bad call, a cold shooting night. The best team doesn't always win.

Use the data as a starting point, then layer in your basketball knowledge. Watch the games. Trust your gut on the close calls. And remember: even the best models get it wrong.

Data gets you to the final three picks. Intuition picks the winner.

Next steps


Questions about analytics? Find us on Slack.