## May the best team win ... at least some of the time

By Tom Tippett
October 1, 2002

Baseball cliché: "Anything can happen in a short series."

Observation #1: Since 1990, the team with the best overall regular-season record has won one World Series.

Observation #2: Since the extra round of playoffs was added in 1995, the team with the better regular season record has won 13 division series and lost 13 of them, with two contested by teams with identical records.

Observation #3: In league championship series play since 1990, the team with the better regular season record (among the two teams in the LCS) has won 12 times and lost 9 times, with one LCS involving teams with the same record.

In other words, it's not easy to go all the way. And this isn't a recent phonomenon. Since division play began in 1969, the team with the best regular-season record in baseball has won the World Series only 8 times in 32 tries. (Can you name these eight teams? Answer below.)

To put it another way, it's a fool's game to predict the winner of a postseason tournament, even when one team has dominated the regular season. But baseball is supposed to be fun, so we're going to have a little fun with the numbers to see what we can learn about the chances of each of this year's contenders.

To do that, we'll start by assessing each team's chances to win one game against a given opponent. We'll use that information to estimate each team's chances to win a series against that opponent. And we'll put those figures together to estimate each team's chances of winning three consecutive series.

### Estimating one-game winning percentages

In the 1981 Bill James Baseball Abstract, Bill introduced the log5 method to answer the question, "how often should team A be expected to beat team B?" It took him several pages to describe and justify the method, so we won't take the space to do all of that again here. Instead, we'll just give the formula:

```             A - A * B
WPct = -----------------
A + B - 2 * A * B```

where A is team A's winning percentage and B is team B's winning percentage.

In other words, if you have a .600 team playing a .400 team, this method shows that the better team can be expected to win 69.2% of the games between these two teams:

```             .600 - .600 * .400           .360
WPct = ----------------------------- = ------ = .692
.600 + .400 - 2 * .600 * .400    .520```

If you were to take A's winning percentage as a given (say .600) and solve this equation for all possible values of B, you could determine A's chances in games against any conceivable opponent. And if you graphed those values, you'd see a curve, not a straight line.

But it's a gentle curve and the middle portion of that curve is very close to a straight line. That makes it possible to substitute a simpler straight-line formula that gives very similar results in the range of .400 to .600:

`    WPct = .500 + A - B`

For example, if A is .550, the log5 and straight-line methods produce values that differ by no more than .001 whenever B is in the range of .400 to .600. The further A gets away from .500, the bigger the differences, but they are still manageable. If A is .600, the difference is as much as .005 when B is close to .400 but is still within .002 for all values of B from .440 to .630.

In other words, because almost all baseball teams fall into this range of .400 and .600, and because the differences are smallest when A and B are close to each other, the straight-line formula is a handy alternative that works for the vast majority of matchups.

A few years ago, Tom Ruane wrote a program that looked at the result of every AL and NL game from 1901 to 1997. The program placed each team into one of twenty groups based upon their winning percentage for that season. All teams with winning percentages less than .330 went into group A; those with winning percentage between .330 and .350 went into group B, and so on up to the top group, which had all teams with winning percentages greater than .690. For each game, the program figured out what type of matchup it was (e.g. group C vs group F) and then added the game result to the totals for that matchup.

That study showed that these formulas are very accurate predictors of the actual winning percentages in matchups involving these different groups. If you read that article, you'll see that we focused on the straight-line method, but it's not hard to see that the log5 method would have provided an even better fit for the 1901-1997 results that we compiled. We'll use the log5 method for the remainder of this article.

Historically, home teams have consistently won at a .540 pace (see The Hidden Game of Baseball by John Thorn and Pete Palmer), so we need to know who's at home if we're going to get the best read on the expected winning percentage for a single game. This year, the winning percentage for home teams was .542, so we'll give the home team a .042 boost in its projected winning percentage when we extend the log5 estimate to assess each team's chances in a playoff series.

### Estimating winning percentages for a 5-game or 7-game series

We can use these one-game winning percentages to assess the chances to win a series by (a) adding in the home-field adjustment for each game, (b) multiplying the single-game probabilities to get a probability that a team will complete the series with a certain pattern of wins and losses, (c) repeating this step for all possible patterns of wins and losses, and (d) adding up the probabilities for all patterns that produce a series win for the team.

For example, the probability of winning a five-game series is the sum of the chances of sweeping, winning 3-1, or winning 3-2. There are ten ways a team can be first to win three games:

```  Result  Patterns
3-0    WWW
3-1    LWWW, WLWW, WWLW
3-2    LLWWW, LWLWW, LWWLW, WLLWW, WLWLW, WWLLW```

For example, if a .600 team is playing a .400 team, we've already established that it has a .692 chance to win each game on a neutral field. If games one and two are at home, their chances to sweep a series in three games are:

`  (.692 + .042) * (.692 + .042) * (.692 - .042) = .350`

or 35%. We can use similar logic to compute the probabilities for the patterns that produce a 3-1 or 3-2 win, add them up, and presto, we have the probability that the favored team will win the series one way or another.

Using this method, here are the results for this year's division series matchups:

```  Matchup     Favorite
---------   ---------

ANA @ NY    NY   .574
MIN @ OAK   OAK  .617

SF  @ ATL   ATL  .596
SL  @ ARI   ARI  .528  ```

In other words, this model says that the Yankees have a 57.4% chance of beating the Angels in a five-game series when New York has the home-field advantage, with Oakland having a bigger edge over the Twins.

We can move on to project the league championship series results. Of course, we don't know yet who will win each of the first-round matchups, so we'll need to do this for all possible outcomes of the first round:

```  Matchup     Favorite
---------   ---------

ANA @ OAK   OAK  .571
ANA @ MIN   ANA  .548
MIN @ NY    NY   .640
OAK @ NY    NY   .523

SF  @ ARI   ARI  .547
SF  @ SL    SL   .534
SL  @ ATL   ATL  .587
ARI @ ATL   ATL  .573```

Finally, there are sixteen possible matchups for the World Series, with the AL champion having the home-field advantage no matter who makes it that far:

```          @ NY        @ OAK       @ MIN       @ ANA
---------   ---------   ---------   ---------

ATL   NY   .534   OAK  .525   ATL  .594   ATL  .537
ARI   NY   .594   OAK  .585   ARI  .535   ANA  .527
SL    NY   .607   OAK  .598   SL   .521   ANA  .541
SF    NY   .627   OAK  .618   tie  .500   ANA  .561```

### Going all the way

We now have the series-winning probabilities for every possible matchup, so we can put it all together and project the chances that each team will go all the way, given who it might have to face at each step.

The Yankees, for example, have a probability of .574 to beat Anaheim and advance to the ALCS. There's a .617 chance they'll face Oakland and a .523 chance they would beat the A's in that series, so their chances to go to the World Series through Oakland are .574 * .617 * .523 = .185.

But there's a .383 chance they'll face Minnesota and a .640 chance they'd beat the Twins, so their chances to go to the World Series through Minnesota are .574 * .383 * .640 = .141.

Add these two possibilities together and you get a probability of .326, or about one chance in three, that Yankee Stadium will host game one of the World Series.

We can repeat this process for the other seven teams and then extend it to include the probability of winning the World Series. And when we do that, we come up with the following (drumroll, please):

```  NY    19.0% chance to win World Series
OAK   18.3%
ATL   17.4%
ARI   11.1%
ANA   10.4%
SL     9.3%
SF     7.6%
MIN    6.9%```

### Aren't we missing something?

Actually, we're missing a lot of things.

This approach doesn't take into account the starting pitchers in each game. If Randy Johnson and Curt Schilling can replicate what they did last year, Arizona's chances increase. Schilling hasn't pitched well lately, but he might be able to turn it on again when things really matter.

We're assuming the home-field advantage is the same for everyone, and Minnesota fans can point to 1987 and 1991 as proof that their home field edge is bigger than most. Then again, all of this year's playoff teams won between 50 and 55 games at home during the regular season, so nobody stands out in this regard.

This method uses regular-season winning percentages as the basis for all matchups. You could argue that other indicators, such as runs scored minus runs allowed, might be a better gauge of team talent. Using run differentials, the chances for Anaheim and San Francisco increase, mostly at the expense of Minnesota. (Of course, if run differentials were paramount, the Red Sox would still be playing, the A's would be booking tee times, and the White Sox and Twins would be in a one-game playoff for the AL Central title.)

The use of regular-season winning percentages also assumes that what happened over a six-month period is indicative of how the team's stack up right now. The unbalanced schedule skews things, with the teams in the two West divisions having battled much harder to achieve their records. Nobody would argue that Arizona is at full strength with Luis Gonzalez on the sidelines for the duration. And anyone who has watched the Yankees dial it up about three notches in almost every October since 1996 has to consider the possibility that they could do that again.

One other thing. This method assumes that the probability of winning one game in a series is independent of anything that has already happened in previous series games. Any baseball fan knows this isn't true. One team may wear out its bullpen more than the other. "Destiny" or "momentum" may somehow favor one side or the other. Underdogs who lose a couple of close games may subconsiously realize they're not going to come back to win the series. Since 1989, there have been quite a few more series sweeps than this model would predict, suggesting that there are real effects that carry over from game to game.

### Let's cut these guys a little slack

Much has been made of the fact that the Atlanta Braves have only one World Series victory despite winning their division in every full season since 1991. They have indeed come up a little short, but their post-season record isn't as bad as you might think.

In seven cracks at the division series, Atlanta has won six times. That includes a 5-for-5 showing when they entered that series with a better record than their opponent, 1-for-1 as the underdog, and one loss (to St. Louis in 2000) when the records were the same.

Atlanta has been in nine of the last ten National League Championship Series. As the favorites, they won four times in seven tries, which is about par for the course. As underdogs, they have one win in two tries. That's not bad.

It's only in the World Series that the Braves have failed to achieve their full potential, going 1-for-5 since 1991. As underdogs, they've won one (over Cleveland in 1995) and lost one (to Minnesota in 1991). As favorites, they were upset by the Blue Jays in 1992, the Yankees in 1996, the Yankees again in 1999.

Overall, in 19 postseason series against very good teams, the Braves have 12 series wins and 7 losses to their credit. As favorites, they are 9-6. As underdogs, they've gone 3-2. That's not bad, not bad at all.

Of course, what stands out are those three World Series losses when they were favored. But all it would take is one more run to the title to erase a lot of those bad memories.

Consider this. If Atlanta does go all the way this year, that would give them a 15-7 record in postseason series since 1991 and two World Series wins in 11 trips to the postseason, with both wins coming since the third round of postseason play made this journey so much more difficult. If that happens, I hope they get the monkeys off their backs once and for all.

But that's a very big IF.

### The bottom line

Every one of the eight qualifiers for the 2002 postseason comes in with some question marks. Oakland and Minnesota rank only 8th and 9th in the AL in scoring, while Anaheim relied on timely hitting all year. All three teams may struggle to score enough runs against good pitching. The Yankees have the best offense by far, but their pitching hasn't been as good or as healthy as they would like.

Atlanta's no-name bullpen must keep doing what it's been doing, and their 10th-ranked offense mustn't break down. Arizona has to make up for the loss of Gonzalez and hope that Schilling gets going again. The Cardinals have had a remarkable season given everything they've had to deal with, and may be this year's team of destiny, but their starting rotation is a very big question mark right now. For San Francisco, Barry Bonds must come up big and he must get some help, while the Giants pitching staff needs to show that their #2 ranking in the NL isn't just an illusion created by their home park.

The bottom line is that anything can happen this year, especially on the NL side. That's not news, of course. The record over the past 32 years is proof enough of that.

So even though the model described in this article leaves some things out, it's still worth noting that it's much more likely that the top-seeded Yankees won't win the whole thing than that they will. George Steinbrenner may be able to buy enough talent to win the AL East title every year, but it's not nearly as easy to buy three straight series wins against good teams.

Trivia answer: Since divisional play began in 1969, the eight teams that have won the World Series after posting the best regular-season record are the 1970 Orioles, 1975-76 Reds, 1978 Yankees, 1984 Tigers, 1986 Mets, 1989 Athletics, and 1998 Yankees.