Showing posts with label fallacies. Show all posts
Showing posts with label fallacies. Show all posts

End of Half Clock Management

I'm watching a game right now where there's over a minute left in the 2nd quarter. The ball is at midfield and it's 4th and long. Both teams have all three timeouts, but neither team used any. The punting team is standing around letting the seconds tick away, while the receiving team is patiently waiting for the snap.

I think this is irrational. Football is a zero-sum game. Whatever is good for me is equally bad for you, and vice versa. So if stopping the clock right now is not what you want, then it must be what I want. It can't be possible for both teams to benefit from allowing the clock to run down. One or the other team derives an advantage, however small, from stopping the clock.

The only plausible exception I can think of is when the possibility of either team scoring is so remote that the cost of potential for injury on the remaining plays exceeds the value of whatever advantage could be squeezed from trying to pursue a score. In this sense, the game becomes non-zero-sum.

But I think it's more likely that one or both of the teams are excessively pessimistic. The punting team is worried that the receiving team might have enough time to put together a scoring drive, and the receiving team is worried they might turn the ball over or be forced to punt again from deep in its own territory.

Should You Bench Your Fumbling Running Back?

Sam Waters is the Managing Editor of the Harvard Sports Analysis Collective. He is a senior economics major with a minor in psychology. Sam has spent the past eight months as an analytics intern for an NFL team. When he is not busy sounding cryptic, he is daydreaming about how awesome geospatial NFL data would be. He used to be a Jets fan, but everyone has their limits.

When the Pittsburgh Steelers traveled to Cleveland in week 12 of last season, Rashard Mendenhall was the Steelers’ starting running back. Well, he was at first. Mendenhall fumbled on his second carry of the game, and Head Coach Mike Tomlin benched him immediately. On came backups Isaac Redman, Jonathan Dwyer, and Chris Rainey, who all fumbled and joined Mendenhall on the sidelines in quick succession. Out of untainted running backs to sub in, Tomlin looped back around to Mendenhall, who put the ball on the ground again. Mendenhall, of course, went right back to the bench, ceding his snaps to Dwyer and Rainey for the rest of the game. This was one of the more prolific fumble-benching sprees in NFL history, but we see tamer versions of this scenario all the time. Just look back to last season. David Wilson fumbled and Tom Coughlin actually made him cry. Ryan Mathews fumbled away his job to Jackie Battle. Tears and Jackie Battle - does any mistake deserve these consequences?

Unicorns, The Tooth Fairy, the Cowboys, and Field Goal Range

If there is one example that demonstrates the fallacy of field goal range, Sunday's Cowboys-Cardinals game has to be it. With the score tied at 13 and under a minute to play, DAL converted a 3rd and 11 to move the ball to the ARI 31.

There it is. Game over. Field goal range, right? Who needs more time when you're inside the 35?

At the end of the play there were 26 seconds on the game clock, and DAL had 2 timeouts. DAL milked 18 seconds off the clock, then spiked the ball to stop the clock with 8 seconds to go. Then, head coach Jason Garrett called a timeout as if he was icing his own kicker. (The icing effect is greatly overstated, and in many cases, it simply gives the kicker a practice kick.) The kick was missed, and the game entered the dice roll of sudden death overtime.

With 26 seconds left, DAL could have used one of their 2 timeouts immediately. They could have run two plays, even including one run, and still saved a timeout for a final field goal. With the ball at the 31, a field goal attempt is a 49-yarder, which, on average, is a 65% proposition. Moving the ball just 6 yards increases the chances to 75%.

Bill Walsh on Randomizing

In his early Stanford days, Bill Walsh had already cracked the code on how un-random football coaches (and almost all people) are. From "Controlling the Ball with the Passing Game":

"We know that if they don't blitz one down, they're going to blitz the next down. Automatically. When you get down in there, every other play. They'll seldom blitz twice in a row, but they'll blitz every other down. If we go a series where there haven't been blitzes on the first two downs, here comes the safety blitz on third down."

Most NFL offenses tend to alternate rather than randomize. Walsh knew defenses were just as predictable decades ago.

Why Free Agent Signings Turn Out So Disappointing

Adam Archuleta became one of the most sought-after NFL free agents in 2006. Several teams were interested in the playmaking strong safety, but the Redskins won the bidding, making him the highest paid safety in history at the time. Owner Dan Snyder signed-off on giving Archuleta a 6-year $30 million contract, with $10 million guaranteed.

To call the Archuleta signing a bust would be an understatement. He started only 7 games the next season and was traded to Chicago for a 6th-round draft pick the following year. Archuleta never returned to his early-career form, and washed out of the league after the 2007 season.

Although Snyder has a well-known, and well-deserved, reputation for overpaying for disappointing free agents, he's not alone. There's a phenomenon of auctions that makes overpaying for top free agent players all too common.

Football Island

Back in 1997 I was spent Easter Sunday with a good friend. His brother, who lived in the La Jolla area of San Diego, hosted us for dinner. On the drive up the La Jolla ridge overlooking the Pacific, my friend pointed over to the right side of the road and said, "That's Junior Seau's house." I caught a glimpse of number 55, along with what looked like a dozen family members filing out of a couple vans in front of the house. They were the largest human beings I had ever seen. The women were large, the men were unimaginably large, and even the children seemed enormous. Junior somehow appeared to be the runt of the family. I got the impression Samoans were all giants.

Last season there were 30 NFL players of Somoan descent, and 200 more playing Division IA college football. (Here's a list from 2008.) That's a lot of players from a group of people whose entire population could easily fit into an NFL stadium. Earlier this year 60 Minutes aired a profile on Samoan football, and if you missed it it's a great story. (I've embedded the clip at the end of this article. Edit--CBS has killed the link.)

Undoubtedly, the culture and character of the Samoan people are factors in their disproportionate level of success in football. But, as my drive through La Jolla suggested, hereditary traits may also play a role. Still, how can a single small island produce so many top players?

JaMarcus Russell: Concorde of the NFL

With Jamarcus Russel’s recent benching, there’s been a lot of talk about when it’s time for a team to cut its losses on a failed quarterback. I don’t have hard numbers at my fingertips, but I’d be fairly certain that if a QB isn’t playing above average football or there hasn’t been steady improvement, by the end of his second year, it’s time to move on. [Edit: Here's a good look at that very question at PFR.] There’s no question teams tend to stick with struggling QBs well beyond their expiration date, even when better alternatives exist. The real question is, why?

Let’s say you’re an out-of-town Bills fan, and before the season began you were understandably optimistic about the team’s prospects. You bought prime tickets to the January 3rd game hosting the Colts, including parking and a hotel room. Altogether the bill comes to $400. In August, this feels like a great deal.

As the season wears on, it becomes clear the Bills aren’t contenders. The coach is fired, and the upcoming Colts game is not looking promising, as the Colts appear likely be playing for home field advantage in the playoffs. Everything points toward a humiliating blowout. What’s worse, as the game approaches the weather isn’t looking good. Bills fans are always the hardy type, but the foercast is beyond bad—snow, wind, freezing rain, and bitter cold. You’re not exactly excited about the prospect of going to the game.

Irrational Play Calling

As if you need any more evidence of how irrational many coaches can be when facing a 4th down, here’s some more.

In ‘no man’s land,’ the region of the field from the opponent’s 30-35 yd line, punts don’t buy you much and field goals are just above 50/50 propositions. Going for the conversion occurs fairly frequently, particularly on 4th and short. This is the confluence of 4th down decision making where all 3 options are reasonable choices.

But once a coach has decided that going for it is not worth the risk, he can then choose between attempting a FG and punting. Neither of these options is affected in any way by distance to go. Only field position matters. A field goal is just as rewarding and just as risky on 4th and 1 from the 30 as it would be on 4th and 15 from the 30. Same goes for punts. Distance to go should only affect conversion attempts. It would be irrational to base a decision between a FG and punt based on something that only matters when attempting a conversion. But that doesn’t stop NFL coaches from doing exactly that, a fact first noticed by commenter 'Jim A' last season.

Jim Zorn on 4th Down

Zorn is my hero today. On the Redskins’ final drive of their game against the Rams yesterday, head coach Jim Zorn went for it on 4th down not once, but twice. The network commentators were shocked, and the local media coverage has been decidedly critical. Were they good decisions?

The first decision “felt” right to me. Up by 2 points with 3:47 left in the 4th quarter, Washington faced a 4th and 1 at St. Louis’ 20 yd line. A FG attempt from there is an 84% proposition. A kickoff with a 5-point lead gives the Redskins a 0.76 Win Probability (WP). A missed FG attempt gives the ball to the Rams at the 27 and leaves the Redskins with a 0.56 WP. The net WP for the FG attempt is:

Entropy, Let's Make a Deal, and the Verducci Effect

Let’s Make a Deal was a 1970s game show, entropy is the second law of thermodynamics, and the Verducci Effect is an injury phenomenon named for a Sports Illustrated reporter. What do they have in common?

In Let's Make a Deal, host Monty Hall would walk the costumed audience, picking contestants on the spot to play various challenges for prizes. The central challenge was a simple game where the contestant had to choose one of three doors. Behind one of the doors was a big prize, such as a brand new Plymouth sedan. But behind the two other doors were gag prizes, such as a donkey.

Sounds simple, right? The contestant starts with 1 in 3 chance of picking the correct door. But then Monty would open one of the doors (but never the one with the real prize) and with two closed doors remaining, ask the contestant if she wanted to switch her choice. She would waffle as the audience screamed “switch!...stay!...switch.”

The answer is intuitively obvious. It doesn’t matter. She has a 1 in 3 chance when she first picked the door, and we already know one of the other two doors doesn’t have the real prize. So whether she switches or not is irrelevant. It’s still 1 in 3.

...And that would be completely wrong.

The real answer is she should always switch. If she stays, she has a 1 in 3 chance of winning, but if she switches she has a 2 in 3 chance of winning. I know, I know. This doesn’t make any sense.

Don’t fight it. It’s true. If the contestant originally picks a gag door, which will happen 2 out of 3 times, Monty has to open the only remaining gag door. In this case, switching always wins. And because this is the case 2/3 of the time, always switching wins 2/3 of the time.

(If you don’t believe me, visit this site featuring a simulation of the game. It will tally how many times you win by switching and staying. It’s the only thing that ultimately convinced me. But don’t forget to come back and find out what this has to do with the Verducci Effect.)

Baseball Prospectus defines the Verducci Effect as the phenomenon where young pitchers who have a large increase in workload compared to a previous year tend to get injured or have a decline in subsequent year performance. The concept was first noted by reporter Tom Verducci and further developed by injury guru Will Carroll.

But I'm not sure there really is an effect. First, consider why a young pitcher would have a large increase in workload. He’s probably pitching very well, and by definition he’s certainly healthy all year. Bad or injured pitchers don’t often pitch large numbers of innings.

Now, consider a 3-year span of any pitcher’s career. He’s going to have an up year, a down year, and a year in between. Pitchers also get injured fairly often. There’s a good chance he’ll suffer an injury at some point in that span.

Injuries in sports are like entropy, the inevitable reality that all matter and energy in the Universe are trending toward deterioration. Players always start out healthy and then are progressively more likely get injured. Pitchers don’t enter the Major Leagues hurt and gradually get healthier throughout their career. It just doesn’t work that way. Injuries always tend to be more probable in a subsequent year than any prior year. The second year in a 3-year span will have a greater chance of injury than the first, and the third would have a greater chance than the second.

Back to Let’s Make a Deal. Think of that three year span as the three doors. Without a Verducci Effect, the years would each have an equal chance at being an injury year. For the sake of analogy, say it’s a 1 in 3 chance. Now Monty opens one of the doors and shows you a non-injury year. The remaining doors have a significantly increased chance of being identified as an injury year. In this case, it’s a 1 in 2 chance.

I think that’s essentially what Verducci and Carroll did in their analysis. We already know a high workload season can’t be an injury season, therefore subsequent years will retrospectively appear to have higher injury rates. We would normally expect to see injuries in 1 out of 3 years, but we would actually see them 1 out of 2. It’s an illusion.

The analogy isn't perfect. Door one is always the open door without the prize, and there's no switching. Also, unlike a single prize behind one door, injuries can be somewhat independent (or more properly described in probability theory as "with replacement"). That is, a pitcher could be injured in more than just one year. But the Verducci Effect only considers two-year spans, and since one year is always a non-injury year, the analogy holds in this respect.

Ultimately, just like in Monty Hall’s game, the underlying probabilities don’t change at all. Only the chance of finding what we're seeking changes. There was always a 1 in 3 chance that one particular door would contain the prize. That never changes throughout the course of the game. But after identifying a non-prize door, we’ve increased our chances of finding the injury…err…I mean Plymouth.

I hereby name this phenomenon the Monty Hall Effect.

(PS Quite frankly, I’m not entirely confident in this. It’s hard to wrap my head around, and I keep second-guessing my logic. If someone out there, like a quantum physicist maybe, understands this stuff well, please add your two cents.)

Edit: See my comment for an alternate explanation of how the Verducci Effect may be an illusion.

Predictability on 2nd and 10

For any down and distance situation, a defensive coordinator wants to know how likely a run or pass play will be. He needs to select the right personnel and the right defensive scheme for the situation. Take the 2nd and 10 situation, the second most common down and distance combo in the NFL (1 in 5 of all 1st downs results in a 2nd and 10). In general, offenses tend to pass 55% of the time and run 45%. So defensive coordinators need to be equally prepared for any kind of play type. Or do they?

For offenses to be most effective, they need to be unpredictable. In the 2nd and 10 situation, this means defenses would have to prepare for the nearly equal chance of a run or pass. Many analysts refer to 'balance' as the key to unpredictability. But balance itself doesn’t matter if the offense is predictable in achieving its balance. Running and passing on every other down would provide perfect balance but would be completely predictable. That’s why randomness is at least as important as balance to keeping the defense on its heels. Anything other than random play selection provides a pattern, however subtle, that an opponent can detect and exploit.

In a recent article, I discussed an interesting pattern in NFL 2nd down plays illustrated below. Note how runs are far more common on 2nd and 10 than either 9 or 11 yards to go. The graph is basically continuous and smooth except for a notable spike in run plays on 2nd and 10.


This struck me as odd because 2nd and 10 is not tactically different than 2nd and 9 or 11. The situations are basically the same. My theory was that offenses were running more frequently on 2nd and 10 because that situation arises most often due to an incomplete pass, and offenses tend to predictably alternate between passes and runs. This would result in the unexpectedly high percentage of run plays on 2nd and 10.

I’ve dug into the data now, and I’ve confirmed my suspicion. The graph below illustrates the relative share of pass and run plays based on what kind of play the preceding 1st down was. On 2nd and 10, teams indeed run more often after a pass than after a run, and vice versa.

(The data consist of 14,384 2nd down plays in the 1st through 3rd quarters of all regular season games from 2000-2007. Fourth quarter plays were excluded to remove the possible biases from 'trash time' and running out the clock.) The graphs should be read as follows: The left column are 2nd down plays following a pass, and the right column are plays following a run. The blue portion of each column is the % of pass plays, and the yellow portion is the % of run plays.


This is significant because armed with this information, defensive coordinators can select personnel and plays tilted toward the anticipated play type. They no longer have to be on their heels without an idea of what to expect on 2nd and 10. If the previous play was a run, a coordinator can now be 72% confident the next play will be a pass.

But not all teams have the same tendencies. Compare Brian Billick’s Ravens with Bill Cowher’s Steelers over the 2000-2006 period. The Ravens were far more predictable compared to the Steelers. Cowher’s teams selected 2nd down plays without regard to what kind of play was called on 1st down, but Billick’s teams tended to follow a run with a pass.




(Statistically, the Ravens’ difference in proportions is significant at p<0.001. Typically, a single team’s proportions would need to be within about 8% to be considered non-significant. But that still would not indicate good non-predictable play calling independent of the previous play. The proportions would typically need to be within 3% to be more likely due to randomness than not.)

One counter-argument to this analysis is that teams are wisely choosing to run after an incomplete pass. If a pass falls incomplete, that would be fresh evidence about an offense’s ability to complete passes against this particular opponent. A run stuffed at the line is similarly an indication of each team’s relative strength. Shouldn’t offenses shy away from unsuccessful tactics? Doesn’t it make sense to try the alternate strategy next?

I would say no for two reasons. First, this would be a classic example of the small-sample fallacy, otherwise known as the hasty generalization. Over 40% of all passes are incomplete in the NFL. The outcome of a single pass should not be the basis of a change in strategy, however slight. The sample size of an entire game of passes would still not be enough to make conclusions about its relative merits as a strategy against a particular opponent. Second, even if the evidence of the single trial were so overwhelming, tending heavily toward the alternate play type on successive plays makes the offense too predictable, as we’ve seen here.

Coaches and coordinators are apparently not immune to the small sample fallacy. In addition to the inability to simulate true randomness, I think this helps explain the tendency to alternate. I also think this why the tendency is so easy to spot on the 2nd and 10 situation. It’s the situation that nearly always follows a failure. The impulse to try the alternative, even knowing that a single recent bad outcome is not necessarily representative of overall performance, is very strong.

So recency bias may be playing a role. More recent outcomes loom disproportionately large in our minds than past outcomes. When coaches are weighing how successful various play types have been, they might be subconsciously over-weighting the most recent information—the last play. But regardless of the reasons, coaches are predictable, at least to some degree. Fortunately for offensive coordinators, it seems that most defensive coordinators are not aware of this tendency. If they were, you’d think they would tip off their own offensive counterparts, and we’d see this effect disappear.

In case anyone's interested, here are some other team’s tendencies on 2nd and 10. I picked these teams because they’ve had the same head coaches over the entire period of the data set.





Blindsided?

Michael Lewis, author of the best-selling baseball book Moneyball, recently followed up with a book on innovation in football. The Blind Side follows the story of the left tackle, the player whose job of protecting the more vulnerable side of right-handed quarterbacks has become increasing important in the NFL ‘arms race’ of the pass rush vs. passing offense.

The entirety of Lewis’ premise is based on the relative pay of LTs compared to other positions. Lewis cites the fact that LT has become the second highest paid position, behind only the all-important QB. Unfortunately, the comparison of LT salaries with those of other positions is a false comparison, and a fairer comparison reveals a different story.

I was intrigued by Phil Birnbaum’s response to a write-up of Blind Side at the Freakonomics blog. Phil questioned the justifications for the extremely high salaries for LTs. And although I believe there are sound economic reasons based on the scarcity of qualified players and the contribution of the position, my main concern questioned the premise that LT salaries are truly any higher than other positions.

Like many other positions, offensive tackles are largely ’swappable’ in that they can go from left to right pretty easily. Most backups don’t even have a defined side and are available to fill in on either side to spell a starter or replace him in case of injury.

Due to the 'blind side' consideration, the LT is almost always the better of the two starting tackles on each NFL team. And he’s very likely to make a lot more money than the lesser player who is assigned RT.
Starting LTs are basically a group of the #1 offensive tackles from each of the 32 teams.

So when we compare average salaries of LTs to those of say, left corner back or all starting wide receivers, the comparison is not fair. Those positions do not place the better player on a certain side, or they are not defined as left/right positions to begin with. And if a player does always line up on one side, it’s not always the same side for every team.

If we compared the average salaries of LTs to the average salaries of all the best WRs from each team, we might expect to see drastically different results.

A much fairer comparison of position salaries is to compare the average salary of the 32 top paid offensive tackles, whether left or right, with the top 32 salaries of players at WR, CB, or various other positions. So that’s what I did.

I looked at the average of the top 32 salaries of 2007 at OT, QB, WR, CB, and RB. Because a player’s salary is a convoluted mix of regular salary, signing bonuses and other bonuses, I favor salary cap charges as the best measure of salary. A cap charge is basically a player’s base salary plus an amortized amount of bonus salary. I think it’s the best measure because it most realistically reflects the value of the player to the team. Total salary and base salary, the only other plausible measures, can be highly irregular based on the particular timing of bonuses. However, I’ll include all three types of salary below the graph, and you can judge for yourself.

The graph and table below list the salaries in $millions for the 32 highest paid players at various positions.



Average Salary of Top 32 Players by Position ($ million)













QB OT WR CB RB
Base Salary 2.9 2.3 3.5 3.3 1.8
Total Salary 5.8 4.9 5.7 5.7 4.9
Cap Charge 5.6 4.5 5.2 5.4 3.8


The 32 highest paid offensive tackles, whether left or right, rank only 4th out of 5 in all three measures of salary. I haven’t looked at other positions yet, so there may be others that are higher paid than OT. Further, only 33 of the 100 top paid offensive linemen were tackles, left or right.

While I agree LT is a critically important position and should be highly paid, the comparison of salaries against other left/right positions, or non-“sided” positions is severely biased. A fairer comparison reveals that the top players at other positions are paid even higher salaries.

Drunkards, Light Posts, and the Myth of 370

Running back overuse has been a hot topic in the NFL lately, partly because of Football Outsiders' promotion of their "Curse of 370" theory. In several articles in several outlets, including their annual Prospectus tome, they make the case that there is statistical proof that running backs suffer significant setbacks in the year following a season of very high carries. But a close examination reveals a different story. Is there really a curse of 370? Do running backs really suffer from overuse?


Football Outsiders says:

"A running back with 370 or more carries during the regular season will usually suffer either a major injury or a loss of effectiveness the following year, unless he is named Eric Dickerson.

Terrell Davis, Jamal Anderson, and Edgerrin James all blew out their knees. Earl Campbell, Jamal Lewis, and Eddie George went from legendary powerhouses to plodding, replacement-level players. Shaun Alexander struggled with foot injuries, and Curtis Martin had to retire. This is what happens when a running back is overworked to the point of having at least 370 carries during the regular season."

While it's true that RBs with over 370 carries will probably suffer either an injury or a significant decline in performance the following year, the reason is not connected to overuse. What Football Outsiders calls the 'Curse of 370' is really due to:
  1. Normal RB injury rates
  2. Natural regression to the mean
  3. A statistical trick known as multiple endpoints
  4. (And this should go without saying, but the "unless he is named Eric Dickerson" constraint is silliness.)
Injury Rate Comparison

In the 25 RB seasons consisting of 370 or more carries between the years of 1980 and 2005, several of the RBs suffered injuries the following year. Only 14 of the 25 returned to start 14 or more games the following season. In their high carry year (which I'll call "year Y") the RBs averaged 15.8 game appearances, and 15.8 games started. But in the following year ("year Y+1"), they averaged only 13.0 appearances and 12.2 starts. That must be significant, right?

The question is, significant compared to what? What if that's the normal expected injury rate for all starting RBs? If you think about it, to reach 370+ carries, a RB must be healthy all season. Even without any overuse effect, we would naturally expect to see an increase in injury rates in the following year.

In retrospect, comparing starts or appearances in such a year to any other would distort any evaluation. This is what's known in statistics as a selection bias, and in this case it could be very significant.

We can still perform a valid statistical analysis however. We just need to compare the 370+ carry RBs with a control group. The comparison group was all 31 RBs who had a season of 344-369 carries between 1980 and 2005. (The lower limit of 344 carries was chosen because it produced the same number of cases as the 370+ group as of 2004. Since then there have been several more which were included in this analysis.)

Fortunately there is a statistical test perfectly suited to comparing the observed differences between the two groups of RBs. Based on sample sizes, differences between means, and standard deviation within each sample, the t-test calculates the probability that any apparent differences between two samples are due to random chance. (A t-test results in a p-value which is the probability that the observed difference is just due to chance. A p-value below 0.05 is considered statistically significant while a high p-value indicates the difference is not meaningful.) The table below lists each group's average games, games started, and the resulting p-values in their high-carry year and subsequent year.

Comparison of Games Played and Started for High-Carry RBs










G Year YG Year Y+1GS Year YGS Year Y+1
370+ Group15.813.015.812.2
344-369 Group15.814.015.412.6
P-Value0.620.68


The differences are neither statistically significant nor practically significant. In other words, even if the sample sizes were enlarged and the differences became significant, the difference in games started between the two groups of RBs is only 0.4 starts and 1.0 appearances. RBs with 370 or more carries do not suffer any significant increase in injuries in the following year when compared to other starting RBs.

Regression to the Mean

The 370+ carry group of RBs declined in yards per carry (YPC) by an average of 0.5 YPC compared to a decline of 0.2 YPC by the 344-369 group. This is an apparently statistically significant difference, but is it due to overuse?

Consider why a RB is asked to carry the ball over 370 times. It's fairly uncommon, so several factors are probably contributing simultaneously. First, the RB himself was having a career year. He was probably performing at his athletic peak, and coaches were wisely calling his number often. His offensive line was very healthy and stacked with top blockers. Next, his team as a whole, including the defense, was likely having a very good year. Being ahead at the end of games means that running is a very attractive option because there is no risk of interception and it burns time off the clock. Additionally, his team's passing game might not have been one of the best, making running that much more appealing. And lastly, opposing run defenses were likely weaker than average. Many, if not all of these factors may contribute to peak carries and peak yardage by a RB.

What are the chances that those factors would all conspire in consecutive years? Linemen come and go, or get injured. Opponents change. Defenses change. Circumstances change. Why would we expect a RB to sustain two consecutive years of outlier performance? The answer is we shouldn't. Running backs with very high YPC will get lots of carries, but the factors that helped produce his high YPC stats are not permanent, and are far more likely to decline than improve.

If I'm right, we should see a regression to the mean in YPC for all RBs with peak seasons, not just very-high-carry RBs. The higher the peak, the larger the decline the following year. And that's exactly what we see in the data.


The graph above plots RB YPC in the high-carry year against the subsequent change in YPC. The blue points are the high-carry group, and the yellow points are the very-high-carry group. Note that there is in fact a very strong tendency for high YPC RBs to decline the following year, regardless of whether a RB exceeded 370 carries.

Very-high-carry RBs tend to have very high YPC stats, and they naturally suffer bigger declines the following season. 370+ carry RBs decline so much the following year simply because they peaked so high. This phenomenon is purely expected and not caused by overuse.

Statistical Trickery

Why did Football Outsiders pick 370 as the cutoff? I'll show you why in a moment, but for now I'm going to illustrate a common statistical trick sometimes known as multiple endpoints by proving a statistically significant relationship between two completely unrelated things. I picked an NFL stat as obscure and random as I could think of--% of punts out of bounds (%OOB).

Let's say I want to show how alphabetical order is directly related to this stat. I'll call my theory the "Curse of A through C" because punters whose first names start with an A, B, or C tend to kick the ball out of bounds far more often than other punters. In 2007 the A - C punters averaged 15% of their kicks out of bounds compared to only 10% for D - Z punters. In fact, the relationship is statistically significant (at p=0.02) despite the small sample size. So alphabetical order is clearly related to punting out of bounds!

Actually, what I did was sort the list of punters in alphabetical order, and then scanned down the column of %OOB. I picked the spot on the list that was most favorable to my argument, then divided the sample there. This trick is called multiple endpoints because there are any number of places where I could draw the dividing line (endpoints), but chose the most favorable one after looking at the data. Football Outsiders used this very same trick, and I'll show exactly how and why.

The graph below plots the change in yards per carry (YPC) against the number of carries in each RB's high-carry year. You can read it to say, a RB who had X carries improved or declined by Y yards per carry the following year. The vertical line is at the 370 carry mark.


Note the cluster of RBs highlighted in the top ellipse with 368 or 369 carries. They improved the following year. Now note the cluster of RBs highlighted in the bottom ellipse. They had 370-373 carries and declined the next year.

If we moved the dividing line leftward to 368 then the very-high-carry group would improve significantly. And if we moved line rightward to 373, then the non-high carry group would decline. Either way, the relationship between high carries and decline in YPC disappears. There is one and only place to draw the dividing line and have the "Curse" appear to hold water.

To be fair to Football Outsiders, they have recently admitted there is nothing magical about 370. A RB isn't just fine at 369 carries, and then on his 370th his legs will fall off. But unfortunately, that's the only interpretation of the data that supports the overuse hypothesis. If you make it 371 or 369, the relationship between carries and decline crumbles. It's circular to say that 370 proves overuse is real, then claim that 370 is only shorthand for the proven effect of overuse.

As Mark Twain (reportedly) once said, "Beware of those who use statistics like a drunkard uses a light post, for support rather than illumination."


Ideas, data, quotes, and definitions from Doug Drinen, PFR, Maurlie Tremblay, and Brian Jaura.

Game Theory Intro and 4th Down Decisions

For a while I've been a proponent of more aggressive decision making on 4th down in the NFL. On this site and others, and in academic research such as the Romer paper, it's been shown that a team is usually better off going for it on 4th down, as long as it's not buried deep within its own territory or facing a very long 'to go.'

There are two possibilities here. Either most research on the subject is wrong, or NFL coaches are timid and more concerned with short term job security than winning. I've previously suggested the real answer may lie in decision theory, namely that the uncertainty surrounding such tactical decisions lead coaches to choose the option that promises the best worst case scenario rather than the option that provides the best chance of winning in the long run.

Although I still stand by that analysis, it may be only part of the story. Any tactical decision in football has to take into account the opponent. And this is exactly what game theory can do.

There are two kinds of games in game theory: zero-sum and non-zero-sum games. Zero-sum games are typically those in which the winner takes all. Whatever is good for me is equally bad for my opponent. In contrast, non-zero-sum games allow for "win-win" or "lose-lose" scenarios.

Football defines the zero-sum game at every level. At the end of 60 minutes, one team earns 100% of the W and the other eats the whole L. Even ties are zero-sum. Only in the most extremely bizarre scenario would a football game be non-zero sum, such as when a tie would qualify both opponents for the playoffs. And within each game, every single play represents a zero-sum "sub-game." Every yard gained by the offense is a yard given up by the defense. Football screams zero-sum.

But just as winning is a zero-sum proposition, scoring is not. A team could adopt a strategy that allows itself to score more often, but also allows its opponents to score slightly more too, hoping to gain the greater advantage. Mike Martz's Rams may have been an example of such a team. Their relative disregard for turnovers led to aggressive high-scoring outcomes, but it also allowed opponents to score more frequently themselves. The question is, on balance, does the strategy favor the more aggressive team itself or its opponents?

Consider a football game between two equally matched teams, Team A and Team B. Each team has the choice of two strategies: either the conventional punt strategy or an aggressive go-for-it on 4th down strategy. Assuming the research on 4th down strategies is correct, the possible outcomes of the game are listed in the table below. In terms of the probability of winning, when both teams employ the conventional punt strategy they each have a .5 probability of winning. Similarly, if both teams employ the go-for-it strategy they would also have equal chances to win. But if one were to adopt the go-for-it strategy and the other did not, the aggressive team would enjoy a .6 to .4 advantage in its probability of winning. (The table can be read [outcome for Team A, outcome for Team B].









Team B
PuntGo For It
Team APunt.5, .5.4, .6
Go For It.6, .4.5, .5


If the research is correct, and coaches were purely rational, they would adopt the more aggressive strategy on 4th down. But they don't, and part of the reason my be related to how coaches think of outcomes.

Coaches might sometimes confuse points for the ultimate outcome of interest--winning. There is a paradox at work. A strategy that ultimately allows an opponent to score more points may actually be superior in terms of winning. But because in every other respect football is zero-sum, a coach would naturally be averse to any strategy (more aggressive than the conventional status quo) that allows his opponent to score more, even if it theoretically improved his team's chances to win. After all, if scoring is good for my opponent, how could it be good for me?

Here is the same football game from the table above, but this time the outcome is described in terms of points scored instead of win probability. As you can see, the game now appears to be non-zero-sum.










Team B
PuntGo For It
Team APunt20, 20
24, 27
Go For It27, 2427, 27


The average score in the NFL is about 20 points, so equally matched teams playing the conventional punting strategy could expect to average 20 points per game. The go-for-it strategy, according to the research, would allow the aggressive team to score more often--say 27 points per game. But it would also allow an opponent to score slightly more often--say 24 points per game.

If a coach is thinking in purely zero-sum terms, he would not be able to reconcile the non-zero-sum [27, 24] advantage with everything else he knows about the football.

Basketball has accepted this concept for years. NBA teams like the Phoenix Suns or Denver Nuggets employ fast-paced tempos that allow for more total possessions by both themselves and opponents. Consequently, they both score and allow more points per game. The idea is that their team strengths give them an advantage when playing at a quick pace. I think it might be particularly difficult to accept in football because of the zero-sum nature of yards gained and lost on every play.

It's all in how you define your utility. Outcomes in game theory must adhere to strict rules such as linearity and transitivity to be valid. This is why previous research on run/pass balance has missed the mark. I'll illustrate exactly how and why in a forthcoming article. Plus, I'll propose a valid measure of utility in football.

The Ellsberg Paradox and 4th Down

The Romer paper and other research provide fairly conclusive evidence that NFL coaches should go for it on 4th down more often than they currently do. The Ellsberg Paradox might help explain why.

Say there are two jars of 100 balls of which some are red and some are blue. Jar A has 50 red balls and 50 blue balls. Jar B has a random unknown mix of red and blue balls. You'll be given $100 if you pick a red ball from a jar. Which jar would you choose to pick from?

In clinical experiments, people almost universally choose jar A. This is the Ellsberg Paradox, a violation of the utility theory in economics. The expected value of each choice is equal. There is a 50/50 chance of winning $100 from either jar, so we wouldn't expect one option to be significantly preferable to the other.

The Ellsberg Paradox demonstrates the difference between risk and uncertainty. Risk is measurable but uncertainty is not. People almost always prefer a known risk to an unknown uncertainty, even if the expected results are equal.

People prefer Jar A according to the equation above. U() is the utility function.

Punting seems a lot like Jar A, for which the risks and potential outcomes are known. Going for the first down seems more like Jar B, for which the potential outcomes are vague and hard to measure. So at the equilibrium point between going for it and punting, where each decision provides equal chances of ultimately winning, coaches would be heavily biased toward punting. Even beyond the equilibrium point, where going for it would be favorable, coaches would still be biased toward the relatively certain (but less favorable) outcome of the standard 40-net-yard punt.

In a strict analogy, the $100 would be a win, and the red balls would represent the probability of winning the game. There would actually be some uncertainty in each strategy, but far more uncertainty in the go-for-it strategy--perhaps something like 40 to 60 red balls in the punt jar and 20 to 80 balls in the go-for-it jar. The Ellsberg Paradox suggests coaches would naturally prefer punting, the less uncertain option. Only when the advantage of going for it is beyond obvious would a coach choose to go for the 1st down--say 10 to 20 red balls for punting and 15 to 60 red balls for going for it.

I think NFL coaches typically employ the maximin strategy. In game theory the maximin strategy is one that selects the alternative with the best worst-case-scenario. It maximizes the minimum possible payoff. This is a conservative strategy in comparison to the maximax strategy, which selects the alternative with the greatest maximum payoff.

Continuing the jar and red ball analogy, compare jar X with 10 to 90 red balls and jar Y with 30-40 red balls. Utility theory would suggest the rational option is jar X with a higher overall chance of success. The maximin choice however, would be jar Y because it has a higher minimum chance of success.

Calculating the probability distributions of a football game's outcome given the combinations of score, time remaining, field position, etc. is far more complex than being told how many red balls are in a jar. It would be overwhelming for a human brain even to attempt it. In such a situation, coaches, like everyone else, use heuristic shortcuts such as the maximin strategy. Punting on every 4th down is a known risk, especially because coaches can count on opposing coaches to follow the same strategy (which suggests that always punting is a Nash Equilibrium). Punting usually presents the best worst-case-scenario despite being a sub-optimum decision.

Safe Leads in NCAA Basketball


Bill James takes a look at when leads become insurmountable in college basketball. In other words, when should CBS cut away from the UNC-Mt. Saint Mary's game to show us the barn-burner between Vanderbilt and Siena?

James' formula uses the lead in points, who has the ball, and seconds remaining to tell us if the lead is completely insurmountable. Here it is in a nutshell:

  • x= (Lead - 3 +/- .5) 2 -- [+.5 if winning team has possession, -.5 if not]
  • If x > time remaining in sec, the lead is insurmountable
Pretty cool. This is the kind of thing James is really good at. Unfortunately, I think he buys into a logical fallacy later in his article. He says that if a team is deemed to be "dead," that is to say too far behind, but it is able to climb back inside the limits of "insurmountability," it doesn't matter. The losing team is still dead.

I'd agree that it is highly unlikely that such a team would win, but I think James has been taken in by the gambler's fallacy. He writes "The theory of a safe lead is that to overcome it requires a series of events so improbable as to be essentially impossible. If the "dead" team pulls back over the safety line, that just means that they got some part of the impossible sequence—not that they have a meaningful chance to run the whole thing."

It seems to me that if a team climbs back into contention, it's in contention. If a sequence of events are independent, it doesn't matter how lucky or how impossible previous events were. They're water under the bridge. For example, (from Wikipedia) the probability of flipping 21 heads in a row, with a fair coin is 1 in 2,097,152, but the probability of flipping a head after having already flipped 20 heads in a row is simply 0.5

The only thing that matters is the current situation. It's like saying, "There's no way they'll hit another 3-pointer. They just hit five in a row. They're due to miss."

What does this have to do with football? It would be interesting to look at something similar in the NFL. When is a lead so safe that a team should stop throwing? Or when is it so safe a team should only throw on 3rd down? And so on. Basically, when should a winning team stop trying to gain a bigger lead and start trying to simply prevent big mistakes?

The Patriots and the Conjunction Fallacy

The conjunction fallacy is when people judge the probability of a series of events to be larger than one probability of its component events. In simple terms, many people mistakenly assign a higher probability to a specific outcome than a more general one. In football terms, this means that many fans underestimate how difficult it is for a team, even an extremely good one, to win the Super Bowl.

In my very simple poll asking if the Patriots had a better than 50/50 chance to win the Super Bowl, 64% of the (few) respondents said yes. I'd vote no, but let's look at what it would take for NE to win the championship.

The Patriots need to win three consecutive games, two of which are at home and one at a neutral site, against the NFL's top teams. What kind of win probability would they need for each game to arrive at a 50/50 chance to win it all? x * y * z = 0.50. For a rough estimate, let's assume their chance to win each game is roughly equal. Their probability would need to be 0.79 in each game. (0.79^3 = 0.50.)

This seems reasonable, but since they wouldn't have home field advantage in the Super Bowl, they would need slightly higher probabilities for the division and conference rounds of the playoffs.

Let's look at what Vegas thinks. According to a major online gambling site, NE is given 9 to 4 odds (0.69 probability) of winning the AFC championship, and 3 to 2 odds (0.60 probability) of winning the Super Bowl. They are also 13 point favorites to beat JAX this Saturday. With a 49 point over/under, 13 points roughly equates to a 0.78 probability (using this method).

There is something out of whack. A 0.60 probability of winning the Super Bowl and a 0.69 probability of winning the AFC, means the individual Super Bowl game probability must be 0.60/0.69 = 0.87. That's amazingly high. And I suspect that's where the conjunction fallacy may be having an effect. The individual game odds are incongruent with the conjunctive odds of NE winning all three games.

I'd guess there is some kind of arbitrage opportunity there for gamblers. Personally, I just think it's interesting how some people intuitively estimate the odds of future events.

My own model estimates NE has a 0.74 probability of winning this weekend. Against IND they get a 0.65 probability. But against SD they would have a 0.84 probability of winning. In total, that gives NE a 0.68 chance at appearing in the Super Bowl. To have a 50/50 chance of winning the Super Bowl, NE would need a 0.74 chance of beating the NFC representative. That would be the same chance they have against the AFC's #5 seed, a (pretty good) Florida team playing in Foxboro in January.

My own sense is that the Patriots have about a 40-45% chance of winning the Super Bowl. I would say 40, but as a dome team, the Colts would have a tougher time in Foxboro due to the January weather.

Beating the Season Over-Under Follow-Up

Before the 2007 season began, I proposed a system that appeared to be able to systematically beat Las Vegas over-under lines on team wins. For the 2007 season, the results are in. The system's record would have been 8 correct, 3 incorrect, with 1 push (73% correct).

The system itself consists of two rules:

1. For teams predicted to win 9.5 games or more, bet on the under.
2. For teams predicted to win 6.5 games or less, bet on the over.

In other words, bet on mediocrity.

Historically, the system would have been 70% correct over the previous two years, and slightly over 58% correct over a 10 year period between 1996 and 2005. By betting 'over' on teams predicted to win 6 or less, instead of 6.5 or less, the overall rate would have improved to 61%.

Here is how the system fared this year. Over-under lines were taken from bodog.com on 6/30/07.




















UndersLineActualResult
SD1111-
BAL9.55

W
CHI10.57

W
IND1112L
NE11.516L
DEN9.57

W
PHI9.58

W
CIN9.57

W
OversLineActualResult
DET67W
CLE610

W
HOU6.58

W
BUF 6.57W
OAK54L


One thing I learned watching the results develop over the season is that the system is more successful the earlier in the year that the over-under lines are taken. Early in the year, long before training camps open, there is the least amount of information and the more likely it is that over-under lines are set based on the previous season's results. This is when uncertainty would be greatest, and perhaps when overconfidence is accordingly great. When looking for records of the over-under lines, I noticed that the earlier the line, the more confident it was--i.e. the further from 8 wins teams were predicted to win.

As player movements, retirements, injuries, team schedules, and pre-season games come into focus, over-under lines move to reflect the new information. Uncertainty is reduced, and the overconfidence would be mitigated. It might therefore better to place bets earlier rather than later in the pre-season, capitalizing on maximum uncertainty.

Why the System Works

The system is based on three principles:

1. The NFL season is extremely difficult to predict.
2. Regression to the mean is very strong in the NFL.
3. People are overconfident in their ability to predict team wins.

Humans, including NFL fans and gamblers, are susceptible to cognitive biases, and I believe these biases affect prediction markets. Cognitive (or heuristic) biases are systematic errors in judgment made in certain situations. These biases exist probably because they benefited early humans in their efforts to adapt and survive. For example, the "overconfidence effect" may bias people toward action rather than passivity in the face of a challenge.

But in a prediction situation, overconfidence is counterproductive. People who believe their prediction abilities are better than they truly are would take excessive risks. In one survey 80% of respondents claimed they were in the top 30% in driving skills. Similarly, it's likely that most gamblers also believe that they have some special ability or intuition to predict outcomes--otherwise they wouldn't be gambling.

Hindsight bias, also known as the "I knew it all along" effect, may also be a factor in fooling people into forgetting how poor their predictions really are. Ask yourself how well you thought New England would do this year. Be honest. According to the betting lines, half of all people thought they'd win 11 games or fewer. Remember how Randy Moss was supposed to be a "cancer?" How about the Bears? Half of bettors believed Chicago would win at least 11 games this year. But as the Bears floundered this year, many people forgot just how good they were expected to be--just on their defense alone. Most fans, including tv commentators, forget how uncertain things were before the first snap of the season. They say "it was obvious the Patriots would be unbeatable. They have Randy Moss!" or "obviously the Bears fell flat, Rex Grossman is terrible." I know that's my first instinct.

People also tend to remember correct predictions and forget incorrect ones--both by themselves and others. Today I noticed the cover page of an obscure football "prospectus" book I bought in August. It trumpeted, "Last year, we correctly predicted player X would have a breakout year! We said team Y would return to the playoffs!" That a book of over 300 pages of NFL predictions made four or five correct guesses is not an accomplishment. But it's the correct ones that are remembered.

Many people are also unable to grasp the concepts of randomness or luck. They discount regression to the mean because they expect extreme performances to continue. In the NFL, a team that finishes with a 14-2 record was probably a "fundamentally" 12-4 team that got lucky in a couple games. Fans and bettors may expect much of that 14-2 performance to carry over into the next season, thinking that 11 or 12 wins is a safe bet. But in reality, the team was a "12-win" team the previous season, so the chance of repeating such a successful year would be lower than expected. This effect is compounded when the same phenomenon affects a team's opponents. For example, part of the reason the Ravens and Bengals did not reach their over-under expectations is because division rival Cleveland improved from extremely poor performance.

This field of heuristics and decision-making fascinates me. I'm sure there is a lot of money to be made in not just betting markets, but equity markets. The one thing about sports though, is that it's relatively easy to analyze statistically. And no. I'm sorry to report I did not put my money where my mouth is.