MLB 2026 · ABS Challenge System

When shoulda team burna challenge?

MLB rolled out the Automated Ball-Strike challenge system for the 2026 regular season. Each team gets two challenges per game, kept on success. The hard part is not asking is this pitch a ball? It is asking is this pitch worth burning a challenge on, given how many more I'll see tonight?

I build a pitch-by-pitch prediction from scratch, layer by layer. Then I turn that prediction into a simple challenge-or-save decision and grade the league's actual teams against it.

Leaderboard ↓Pitch Lab ↓Strategy ↓Team Audit ↓

The data

Every ABS challenge in MLB since Opening Day.

I pulled every public 2026 game log and found each ABS challenge: who challenged, what the umpire called, where the pitch crossed the plate, the count, the game situation, and whether the call flipped. Every row is one challenged pitch, not a player-season summary.

The first lesson is simple: not every challenger is picking the same spots. Catchers win at 71%, well above the 47% league average. Batters lose at 23%. Pitchers sit in the middle at 39%, still below average. That tells me the system should care a lot about who asked for the review.

Loading challenge data...

What I'm going to build

Seven layers. Each one has to explain itself.

I build the prediction one defensible idea at a time. Each level adds exactly one concept, and each level is tested on games it has not seen before. The chart uses average prediction miss: lower is better. The Level 0 to Level 5 climb is the honest prediction story. Level 6 is marked separately because exact ABS pitch location makes many calls almost automatic.

Loading challenge data...

Level 0 · the floor

Start with the dumbest useful forecast.

Level 0 is not strategy. It is the measuring stick. Before I use pitch location, count, challenger, or game state, I ask: what if every May challenge got the exact same chance, learned only from earlier games?

That one-number forecast is intentionally boring. Its job is to make every later improvement visible.

Loading challenge data...

Level 1 · the obvious signal

How far is the pitch from the zone edge?

The first thing a catcher checks before signaling a challenge: was the call close? I draw the pitch, find the closest edge of the strike zone, and measure how far the call appears to be from correct.

For a batter who was rung up, “evidence for a flip” means the pitch was outside the zone. For a catcher protesting a ball, it means the pitch was inside. Same picture, opposite complaint.

What Level 1 sees

Reduce the pitch to one geometric question.

Level 1 ignores names, teams, pitch type, and inning. It only asks how far the ball landed from the nearest strike-zone edge, then flips the sign based on the original call.

Called strike outside

More distance outside means more evidence the call will flip.

Called ball inside

More distance inside means more evidence the call will flip.

That one edge-distance measurement is why Level 1 makes the first big jump. It turns “close pitch” into something the model can compare.

Loading challenge data...

Level 2 · run value

How expensive is the missed call?

MLB's public ABS notes call some pitches more “reasonable” to challenge than others. A borderline miss in a huge count matters more than a borderline miss in a harmless count. The prediction now sees not just is the call close, but is it worth fixing.

What Level 2 adds

Same missed call. Different damage.

Level 1 only knows the pitch was close to the edge. Level 2 asks what the count turns into if the call is fixed. That is why the same miss can be a shrug in one count and a must-challenge in another.

Quiet count

0-0

Annoying, but the at-bat is still early.

bad call: 0-1

fix it: 1-0

Loud count

3-2

One review can flip the whole plate appearance.

bad call: strikeout

fix it: walk

The picture stays simple: close pitch first, game damage second.

Loading challenge data...

Level 3 · the central idea

Some challengers are just better at this.

This is the most important concept once people enter the story. Different catchers, batters, and pitchers have meaningfully different win rates on their challenges. But raw rates are noisy: a catcher with 6 challenges and 5 wins doesn't reliably have an 83% talent rate.

So the system avoids overreacting. Each challenger's record gets pulled toward the normal rate for their job: batter, catcher, or pitcher. Established high-skill challengers still move up, but tiny samples do not trick us.

Why job type matters
Catchers see hundreds of pitches a game and challenge most often, so real skill can show through faster. Batters challenge much less often, so the system stays more cautious before calling one of them elite.

What Level 3 adds

Same win rate on paper. Different trust.

A catcher with 5 wins out of 6 calls is not the same story as one with 53 out of 70. The first might be lucky. The second has shown it for months. Level 3 pulls every challenger's record toward the normal rate for their job, harder when the sample is tiny.

The dashed line is what an average catcher does. The small gray dot is the raw rate. The bright dot is what the system actually believes.

Rookie catcher

5 of 6 calls won

Adjusted

73%

Tiny sample. The big claim gets pulled almost all the way back to the catcher norm.

Veteran catcher

53 of 70 calls won

Adjusted

75%

Lots of evidence at the same rate. The system barely pulls it. Real skill earns it.

Slumping batter

1 of 4 calls won

Adjusted

40%

Tiny sample, low rate. Pulled up toward the batter norm. I do not call them bad on four tries.

Loading challenge data...

Level 4 · the learned zone

Umpires don't call the rulebook zone.

The rulebook zone is the plate plus the batter's personal top and bottom. But the called zone on the field does not always behave like a perfect rectangle. Some edges play bigger, others smaller, depending on the batter's stance, the pitch type, and the count.

I learn a smooth map of where umpires actually call strikes. When a real call disagrees with that learned zone, the chance of a successful challenge goes up.

Loading challenge data...

Level 5 · pitch deception

Some pitches just look like strikes.

A 78-mph curveball that breaks late looks like a strike longer than a 96-mph fastball that nips the corner. Catchers know this. Umpires get fooled by it. The prediction should know it too.

I bring in pitch family, release speed, and how much the pitch moves. The lift is real but smaller than the geometry levels: the zone signal is enormous; the deception signal is genuinely incremental.

What Level 5 adds

Same final spot. Different illusion.

Two pitches can finish in the exact same low spot just below the zone. ABS only sees the coordinates, so it flips both calls the same way. Human eyes do not. A slow late-breaking curve looks like a strike longer than a 96-mph fastball that nips low.

Level 5 reads the shape of the pitch: family, speed, and how much it moves. That is how the model decides which challenges a catcher can actually trust, and which ones look right but feel wrong.

Curveball

78 mph, big late break

Umpire said

STRIKE

Challenge

harder read

Late break fools the eye. ABS still flips it on geometry alone, but the catcher has to trust the camera over what they just saw. Many do not.

Fastball

96 mph, straight

Umpire said

STRIKE

Challenge

obvious flip

Nothing to be fooled by. The catcher sees the miss the same way ABS does, signals the challenge, and the call gets overturned.

Loading challenge data...

Level 6 · ABS geometry

The final layer finds the rule.

The ABS rule is simple. If any part of the ball touches the zone box, it is a strike. The last level is a cautious pattern-finder anchored to Level 5. Its job is not to prove I invented magic. It shows that once exact pitch location and the original call are available, ABS outcomes are close to a geometry problem.

That makes the tiny average miss real on these May test games, but not a fair “I beat Statcast” claim. The fair story is Level 0 to Level 5: public data, defensible ideas, and a clear climb from league rate to a useful before-the-final-rule version.

What Level 6 adds

With the inches, the call follows.

The ABS rule is simple. If any part of the ball touches the zone box, it is a strike. Once I know exactly where the ball was and what the umpire said, the question stops being a guess.

You can read the rule off the picture. If the ball even nicks the yellow box, ABS calls a strike. Only when the ball is fully off the box does ABS call it a ball.

Pitch A

1.1" off the zone

Outcome

FLIP

Umpire

STRIKE

ABS

BALL

Pitch B

grazes by 0.4"

Outcome

FLIP

Umpire

BALL

ABS

STRIKE

Pitch C

ball 1.2" inside

Outcome

STANDS

Umpire

STRIKE

ABS

STRIKE

Loading challenge data...

The strategy layer

Knowing whether the call will flip is half the answer.
When should you spend the challenge?

Each team starts with 2 challenges. A successful challenge is retained. So the real resource is at most 2 failedchallenges per game. The decision rule isn't "is this more likely than not?". It is a resource question: is challenging now worth more than saving the challenge for a better pitch later?

Tom Tango published the single-pitch version of this idea: compare the value of fixing the call against the cost of losing a challenge. I go one step further and ask what happens when there are many future pitches left. At each game state, I find the minimum confidence needed before spending a challenge makes sense.

Loading challenge data...

Pitch Lab

Drag a pitch. Watch the strategy think.

The headline interactive: drop a pitch anywhere on the strike zone and see the ABS-geometry flip chance, the recommended strategy line for the configured game state, and the verdict. Try the corners, try late innings with one challenge left.

Loading challenge data...

Try it

Pick a challenge. Watch the levels disagree.

Search any 2026 challenge by player name. The bars show the predicted chance the call flips at each of the seven levels for that exact pitch. The verdict at the top is whether the umpire's call survived.

Loading challenge data...

The new framing

Catchers aren't framing anymore.

ABS killed traditional pitch framing as a difference-maker; if the umpire gets it wrong, the catcher just challenges. But a new catcher skill emerged on the same day: knowing when to challenge. The best at it add real runs every game. The worst burn challenges in the second inning and watch their pitcher walk a guy in the eighth.

Loading challenge data...

How the audit reads the game

The audit gets the game state right.

Before grading anyone, I make sure the audit is looking at the same situation the team was. Two refinements, each driven by real data from the season so far.

Loading challenge data...

Team audit · counterfactual

How much are MLB teams leaving on the table?

Loading challenge data...

In one breath

What I did, why it’s defensible

Data

Public MLB game logs for every Spring Training and regular-season 2026 game. Each challenge tells us who asked for it, which team asked, and whether the umpire's call survived. The challenged pitch is the most recent called ball or strike in that plate appearance.

Model

Seven layered levels. The biggest single-step lift comes from Level 1: distance from the strike-zone edge is enormous. Level 3 adds challenger judgment. Level 4 adds the zone umpires actually call. Level 6 adds a cautious tree pattern-finder for the edge cases.

Policy

The decision layer treats challenges like a scarce resource. With one challenge left in the second inning, you should be pickier than with one challenge left in the ninth. The system learns those minimum confidence lines from real called pitches.

Validation

I train on Spring Training plus April, then test on May games the system has not seen. The headline chart measures average prediction miss and sorting quality. Public MLB leaderboards are aggregate, not the same pitch-by-pitch benchmark, so I do not present the tiny Level 6 score as “beating Statcast.”

Limitations

Catcher attribution is by elimination. ABS only allows batter, catcher, or pitcher to challenge, so any challenge that is not the batter or pitcher I attribute to the catcher. Designated-hitter lineups occasionally produce edge cases.
The policy treats the future close-pitch pool as independent draws from the empirical per-inning distribution. Within-game serial correlation (consecutive close pitches are not strictly independent) is a v2 refinement.
Triple-A 2023-25 has roughly fifty thousand more ABS challenges in a different ump pool and pitch-quality distribution. Concatenating that data naively would inject bias, so it is deferred to v2 with a proper domain-adaptation step.
The model is fit to MLB's 2026 challenge rule. If the rule changes (allotted challenges per game, eligible challengers, touch-zone definition) the policy needs to be refit before trusting any number on this page.

Sources

Savant ABS metrics documentation
Savant ABS challenge leaderboard
Tom Tango, “The math behind a challenge”
Public MLB game logs challenge results at the play level

Full reproducible pipeline: data pull to feature building to modeling to strategy simulation to web export. The website is static, so it loads fast and needs no live server.

When shoulda team burna challenge?

Every ABS challenge in MLB since Opening Day.

Seven layers. Each one has to explain itself.

Start with the dumbest useful forecast.

How far is the pitch from the zone edge?

Reduce the pitch to one geometric question.

How expensive is the missed call?

Same missed call. Different damage.

Some challengers are just better at this.

Same win rate on paper. Different trust.

Umpires don't call the rulebook zone.

Some pitches just look like strikes.

Same final spot. Different illusion.

The final layer finds the rule.

With the inches, the call follows.

Knowing whether the call will flip is half the answer. When should you spend the challenge?

Drag a pitch. Watch the strategy think.

Pick a challenge. Watch the levels disagree.

Catchers aren't framing anymore.

The audit gets the game state right.

How much are MLB teams leaving on the table?

What I did, why it’s defensible

Knowing whether the call will flip is half the answer.
When should you spend the challenge?