Showing posts with label draft. Show all posts
Showing posts with label draft. Show all posts

Using Probabilistic Distributions to Quantify NFL Combine Performance

Casan Scott continues his guest series on evaluating NFL prospects through Principal Component Analysis. By day, Casan is a PhD candidate researching aquatic eco-toxicology at Baylor University.

Jadeveon Clowney is thought of as a “once-in-a-decade” or even “once-in-a-generation” pass rushing talent by many. Once the top rated high school talent in the country, Clowney has retained that distinction through 3 years in college football’s most dominant conference. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. For all of the concerns over his work ethic, dedication, and professionalism, Clowney’s athleticism and potential have never been called into question. But is his athleticism actually that rare? And is his talent worth gambling millions of dollars and the 1st overall pick on? This article aims to objectify exactly how rare Jadeveon Clowney’s athleticism is in a historical sense.

Jadeveon Clowney set the NFL draft world on fire at this year’s combine when he delivered one of the most talked-about combine performances of recent memory, primarily driven by his blistering 40 yard dash time of 4.53. Over the years, however, I recall players like Vernon Gholston, Mario Williams, and even Ziggy Ansah displaying mind-boggling athleticism in drills. But if each year a player displays unseen athleticism at the combine, who is really impressive enough that we deem them “Once-in-a-decade?”

Probability Ranking allows me to identify the probability of encountering an athlete’s measurable. For instance, I probability ranked NFL combine 40 yard dash times for 341 defensive ends from 1999-2014 (Table 1 shows the top 50). In this case, Jadeveon Clowney’s 40 time of 4.53 had a probability rank of 99.12, meaning his speed is in the 99th percentile of all DEs over this time span.

NFL Prospect Evaluation using Quantile Regression

Casan Scott continues his guest series on evaluating NFL prospects through Principal Component Analysis. By day, Casan is a PhD candidate researching aquatic eco-toxicology at Baylor University.

Extraordinary amounts of data go into evaluation an NFL prospect. The NFL combine, pro days, college statistics, game tape breakdown, and even personality tests can all play a role in predicting a player’s future in the NFL. Jadeveon Clowney is arguably the most discussed prospect in the 2014 NFL draft, not named Johnny Manziel. He is certainly an elite prospect and potentially the best in this year’s draft, but he doesn’t appear to be a “once-in-a-decade” type of physical specimen based exclusively on historical combine performances. From the research I’ve done, only Mario Williams and JJ Watt can make such a claim. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. All prospects have a “ceiling” and a “floor” which represent the maximum and lowest potential that a prospect could realize respectively. But what does this “potential” mean and does it hold any importance for actually predicting a prospect’s success in the NFL? In this article I will show how Quantile Regression, a technique used by quantitative ecologists, can clarify what Clowney’s proverbial “ceiling” and “floor” may be in the NFL.

Athletes are a collection of numerous measured and unmeasured descriptor variables. Figure 1 shows a single predictor (40 yard dash time) vs a prospects’ Career NFL sacks + tackles for loss (TFL) per game.

New Feature on the Draft Model

In my last job I worked with a team of software developers. The interfaces they designed didn't make much sense to me. The interfaces were always, at heart, a giant expanding tree of classes, objects, and properties. Huh? Lots of tiny plus and minus marks everywhere to expand and contract the accordion. Left click to view something. Right click to modify it. If you ever had to deal with the Windows registry, it was like that. Steve Jobs would not have been thrilled.

When I learned a little about object oriented programming, it all made sense. The software engineers were designing the interface for their own convenience, not for ease of use. It made sense from an efficiency standpoint...a programming efficiency standpoint. But from the perspective of the user, it wasn't so efficient. The least used feature was just as accessible as the most common feature, and all of them were hidden until you expanded the right portion of the tree.

Yesterday I realized I was doing the same thing with the draft model. From my point of view, it's easiest to think in terms of players and their probability to be selected at each pick number, because that's how the software that runs the model works. It goes down the list of prospects, player-by-player, looking at the probability he'll be selected pick#-by-pick#.

For the players and their agents, and for fans of particular players, this is ideal. They want to know where and when they'll go. But the user is probably thinking of things from a team's perspective. Whether the user is a team personnel guy or a fan of a team, he'd rather see things from the perspective of a pick #. Right now, a Vikings fan (or exec) would have to click through over a dozen or so of the top players to see who's likely to be available to them at pick #8. And if they were wondering about who'd be available if they trade up or down, that's another few dozen clicks. Scroll, click. Scroll, click...

Bayesian Draft Analysis Tool

This tool is intended to help decision-makers better assess the NFL draft market. Specifically, it estimates the probability each prospect will be available at each pick number.  The estimates are based on a Bayesian inference model based on consensus player rankings and projections from individual experts with a history of accuracy.

For details on how the model works, please refer to these write-ups:

 - A full description of the purpose and capabilities of the model
 - A discussion of the theoretical basis of Bayesian inference as applied to draft modeling
 - More details on the specific methodology

If you want to jump straight to the results, here they are. But I recommend reading a little further for a brief description of what you'll find.


The interface consists of a list of prospects and two primary charts. Selecting a prospect displays the probabilities of when he'll likely be taken. You can filter the selection list by overall ranking or position.

The top chart plots the probabilities the selected prospect will be taken at each pick #. I think this chart is pretty cool because it illustrates the Bayesian inference process. You can actually see the model 'learn' as it refines its estimates with the addition of each new projection. Where there is a firm consensus among experts, the probability distribution is tall and narrow, indicating high confidence. When there is disagreement, the distribution is low and wide, indicating low confidence.

The lower chart is the bottom line. It's the take-away. It depicts the cumulative probability that the selected prospect will remain available at each pick #. For example, currently there's an 82% chance safety HaHa Clinton-Nix is available at the #8 pick but only a 26% chance he's available at #14. A team with an eye on a specific player could use this information in deciding whether to trade up or down, and in understanding how far they'd need to trade.



Hovering your cursor over one of the bars on the chart provides some additional context, including which team has that pick and that team's primary needs (according to nfl.com).

The box in the upper right gives you the player's vitals - school, position, height, weight. The expert projections used as inputs to the model are also listed. Currently those include Kiper (ESPN), McShay (Scouts, Inc.), Pat Kirwan(CBS Sports), Daniel Jeremiah (former team scout, NFL Network), and Bucky Brooks (NFL Network). Experts were selected for their  reputation, historical accuracy, and independence--that is, they don't all parrot the same projections. Not every prospect has a projection from each expert.

Link to the tool.

Bayesian Draft Model: More Methodology

Boomer, when you think about a guy like Thomas Bayes you think high motor, long arms, quick off the snap. Huge upside in any 3-4 scheme. Gets leverage on those tricky probability theorems right off the block. Game 1 starter for 90% of the teams out there. Writes proofs all the way through the end of the whistle. Definitely like him in the late first, early second round...

The new Bayesian draft model is nearly ready for prime time. Before I launch the full tool publicly, I need to finish describing how it works. Previously, I described its purpose and general approach. And my most recent post described the theoretical underpinnings of Bayesian inference as applied to draft projections. This post will provide more detail on the model's empirical basis.

To review, the purpose of the model is to provide support for decisions. Teams considering trades need the best estimates possible about the likelihood of specific player availability at each pick number. Knowing player availability also plays an important role in deciding which positions to focus on in each round. Plus, it's fun for fans who follow the draft to see which prospects will likely be available to their teams. Hopefully, this tool sits at the intersection of Things helpful to teams and Things interesting to fans.

Since I went over the math in the previous post, I'll dig right into how the probability distributions that comprise the 'priors' and 'likelihoods' were derived.

I collected three sets of data from the last four drafts--best player rankings, expert draft projections (mock drafts), and actual draft selections. In a nutshell, to produce the prior distribution, I compared how close each player's  consensus 'best-player' ranking was to his actual selection. And to produce the likelihood distributions I compared how close each player's actual selection was to the experts' mock projections.

Theoretical Explanation of the Bayesian Draft Model

I recently introduced a model for estimating the probabilities of when prospects will be taken in the draft. This post will provide an overview of the principles that underpin it. A future post will go over some of the deeper details of how the inputs for the model were derived.

First, some terminology. P(A) means the "probability of event A," as in the probability it rains in Seattle tomorrow. Event A is 'it rains in Seattle tomorrow'. Likewise, we can define P(B) as the probability that it rains in Seattle today.

P(A|B) means "the probability of event A given event B occurs," as in the probability that it rains in Seattle tomorrow given that it rained there today. This is known as a conditional probability.

The probability it rains in Seattle today and tomorrow can be calculated by P(A|B) * P(B), which should be fairly intuitive. I hope I haven't lost anyone.

It's also intuitive that "raining in Seattle today and tomorrow" is equivalent to "raining in Seattle tomorrow and today." There's no difference at all between those two things, and so there's no difference in their probabilities.

We can write out that equivalence, like this:

Bayesian Draft Prediction Model

Let's say you're a GM in need of a safety. You really like Ha Ha Clinton-Dix (FS Ala.) but are unsure if he'll still be on the board when you're on the clock. Do you need to trade up? How far? What if you're a GM with a high pick and would be willing to trade down if you're still assured of getting Clinton-Dix? How far down could you trade and still get your guy?

I've created a tool for predicting when players will come off the board. This isn't a simple average of projections. Instead, it's a complete model based on the concept of Bayesian inference. Bayesian models have an uncanny knack for accurate projections if done properly. I won't go into the details of how Bayesian inference works in this post and save that for another article. This post is intended to illustrate the potential of this decision support tool.

Bayesian models begin with a 'prior' probability distribution, used as a reasonable first guess. Then that guess is refined as we add new information. It works the same way your brain does (hopefully). As more information is added, your prior belief is either confirmed or revised to some degree. The degree to which it is refined is a function of how reliable the new information is. This draft projection model works the same way.

Draft Prospect Evaluation Using Principal Component Analysis

A guest post by W. Casan Scott, Baylor University.

As different as ecology and the NFL sound, they share quite similar problems. The environment is an infinitely complex system with many known and unknown variables. The NFL is a perpetually changing landscape with a revolving door of players and schemes. Predicting an athlete’s performance pre-draft is complicated through a number of contributing variables including combine results, college production, intangibles, or how well that player fits a certain NFL scheme. Perhaps techniques that ecologists use to discern confounding trends in nature may be suitable for such challenges as the NFL draft. This article aims to introduce an eco-statistical tool, Principal Component Analysis (PCA), and its potential utility to advanced NFL analytics.

My Ph.D. research area is aquatic eco-toxicology, where I primarily model chemical exposure hazards to fish. So essentially, I use the best available data and methods to quantify how much danger a fish may be in, in a given habitat. Chemical exposures occur in infinitely complex mixtures across many different environments, and distinguishing trends from such dynamic situations is difficult.

Prospective draftees are actually similar (in theory) in that they are always a unique combination of their college team, inherent athleticism, history, intangibles, and even the current landscape in the NFL. The myriad of variables present in the environment and the NFL, both static and changing, make it difficult to separate the noise from actual, observable trends.

In environmental science, we sometimes use non-traditional methods to help us visualize what previously could not be observed. Likewise, Advanced NFL Analytics tries to answer questions that traditional methods cannot. The goal of this article is to educate others of the utility of eco-statistical tools, namely Principal Component Analysis (PCA), in assessing NFL draft prospects.

Wondering About the Wonderlic: Does It Predict Quarterback Performance?

By: Austin Tymins and Andrew Fraga
Published originally at Harvard Sports Analysis

During the 2014 NFL Draft, all 32 NFL teams will be on the clock to invest in the future of their franchises. Decision makers will feel immense pressure to secure a top-notch first round pick, find the next Tom Brady in the sixth round, and, most importantly, avoid selecting a bust. College stats, highlight reels, and NFL Combine results will all be evaluated. The draft, however, isn’t just about physical prowess; in addition to the 6 workouts at the NFL Combine, such as the 40-yard dash and bench press, draft prospects must also complete the Wonderlic Test, an examination designed to gauge mental aptitude.

Cade Massey on Flipping Coins and the NFL Draft

Readers of this site will recall the name Case Massey. Along with fellow noted economist Richard Thaler, he co-authored the Massey-Thaler draft study titled The Loser's Curse. The paper found that, under the previous CBA, "surplus" draft value peaked with picks in the late first round and early second round. Surplus value was defined as the expected performance value above which a team could expect by spending an equivalent amount on a veteran free agent.

Massey has continued research into the draft. His presentation at the 2012 MIT Sloan Sports Analytics Conference outlines his recent findings. (I recommend using IE to view the presentation. Chrome didn't play nice with the video.) The slides from the brief can be viewed here.

If I understand things correctly, Massey has found that:

WP: Moving beyond Grossman

This week's post at the Washington Post's Redskins Insider site takes a look at the Redskins quarterback situation going into the 2012 off-season.


...with so many needs heading into the off-season, it might be tempting for the Redskins to think that Grossman is a medium-term solution at quarterback. But this is a trap...

“Best season ever”, “31st in the league”, and “leads the league in interceptions” is a combination that does not bode well for the future of a franchise that would consider standing pat at quarterback. 


WP: Rebuilding with Moneyball for Football

The Redskins need to restock the cupboard with talent. Here's how a team can build and sustain a winning roster.

Playing Moneyball in the NFL is about jettisoning expensive and under-producing veterans, rejecting the big-splash free agent, and stockpiling draft picks. There are two ways of generating those picks. First, you can trade away soon-to-be free agents to other teams in return for picks or allow restricted free agents to sign elsewhere in return for compensatory picks. For too long, the Redskins have been on the wrong end of those transactions.

The second way is to trade picks for more picks. Overconfidence and urgency run rife in personnel departments around the league, and smart teams can take advantage of this. There are always teams willing to overpay for a pick that they are so certain will immediately turn their team into a Super Bowl winner.  A team can sell its first-round pick for a second-round pick this year, plus a first-round pick next year.  In the next draft, that team will have an additional first-round pick that could be sold for another second-rounder, plus another future first rounder. Presuming there are enough buyers, a team could generate an additional second-round pick in perpetuity by foregoing its first-round pick in only one year.

There's one team in the league that understands this, and they've been phenomenally successful doing it:

What Happened to the First Round RB?

In the five-year period between 1970 through 1974, running backs made up 20% of all first round NFL draft picks. That's one out of every five. As recently as the 1985-1989 period, RBs made up 19% of first rounders. But by the most recent decade, from 2000 through 2010, RB selection was cut in half--down to about 10%. Last night, only 1 of the 32 players chosen (about 3%) was a RB, and he was chosen 28th, near the bottom of the round.

The graph below illustrates the trends in how teams favor each position over the past 41 years. Most positions are fairly stable. Click to expand.

Draft Needs According to 2010 EPA - Defense

Measuring defensive players is trickier than offensive players for a couple reasons. Most notably, +EPA (Positive Expected Points Added) captures half the story at best. But the idea is that the part of the story we see will correlate well with the part we don't see. In other words, "playmaking" defenders, in most cases at least, are often making unseen impacts as often as they make prominently visible plays.

The table below lists the 2010 regular season +EPA totals for each team by position. Although it doesn't consider free agent losses or injuries, this might be considered a good starting point for determining defensive draft needs.

Unfortunately, defensive positions are not so cut-and-dry. Depending on the base scheme, there are varying numbers of DEs, DTs, and LBs. (The 49ers don't even have a DE position.) Plus, players can sometimes be designated different positions from week to week. Advanced NFL Stats ultimately classifies players according to their most frequent designation in each game's official playbook.

Draft Needs According to 2010 EPA - Offense

Looking at each team's 2010 regular season EPA broken out by position might provide a decent starting point for identifying team needs. EPA (Expected Points Added) is a statistic that measures each play's change in team net scoring potential. Aggregating the EPA production for each team by position can suggest where teams need to be looking to upgrade or more depth.

For example, consider a team whose production from the offensive line, quarterback, running back, and tight end positions all rank somewhere in the top third of the league. But its wide receiver production ranks in the bottom third. It should be no secret where the team should look to improve.

Although it's doubtful the stats will tell us much we don't already know about team needs, they can confirm, underscore, or possible refute the common perceptions.

The table below lists each team's EPA rankings by position. Within each position, the top third of the league is shaded in green, and the bottom third is shaded in red. EPA stats for the QB, RB, TE, and WR positions are straightforward aggregations of each player's EPA by team. But offensive line EPA is measured indirectly, using the concept of -EPA. Each column is sortable.

Steven Pinker vs. Malcolm Gladwell and Drafting QBs

Last season you might recall a dust-up between Harvard evolutionary psychologist Steven Pinker and popular science author Malcolm Gladwell over whether teams really have any ability to predict which college QBs will pan out into good pros. You might be wondering what the heck a psychologist and a pop-science author have to do with NFL football.


In his book What the Dog Saw, Gladwell wrote about how hard it is for school administrators to discriminate the better teacher candidates from the lesser candidates. Gladwell used the NFL draft to illustrate how difficult it is for anyone to predict human performance, even in a sport where there is ample performance metrics and every step, throw, and catch is videotaped from 12 different angles. Gladwell was referring to what was reported by economists Dave Berri and Rob Simmons as a "very weak" correlation between draft order and per-play performance by QBs.

In an exchange of letters following Pinker's critical review of What the Dog Saw, Pinker took issue with Gladwell's claim that there was "no connection" between when a QB is taken in the draft and his per-play performance. Pinker wrote that this is "simply not the case."

As has been pointed previously, the problem with the weak correlation cited by Gladwell is that it excludes players who are not judged good enough by coaches during their development to warrant much if any playing time. At its core, the NFL draft is a process of selection, and we should expect  selection bias will taint most attempts at analysis. Gladwell looked at the draft process and (correctly) said:

"Coaches and GMs turn out to be good decision-makers when it comes to drafting quarterbacks when you consider the fact that the quarterbacks who never played aren’t any good. And how do we know that the quarterbacks who never play aren’t any good? Because coaches and GMs are good decision-makers!”

But Gladwell's argument cuts both ways. The only way to see that coaches and GMs aren't any good at drafting QBs is to assume they're no good at choosing which QB on their roster to play in games!

In this post I'll attempt to settle the question of whether NFL scouts really have any ability to identify the better QBs. Do the QBs picked higher in the draft turn out to be better performers on a per-play basis? Is Pinker correct that they do, or is Gladwell correct that they do not?

Are Top Draft Pick QBs Any Better Than Late Round Picks?

Is quarterback performance related to where a passer is taken in the draft? It may seem like a silly question, but the answer is more complicated (and controversial) than it first seems. What if I said that the correlation between a QB’s draft rank and his career adjusted Yards Per Attempt (AdjYPA) is only -0.07? You’d think that’s amazingly small. (The correlation coefficient would be -1 if the relationship between draft order and performance is perfectly proportional, and it would be 0 if there is no relationship at all.)

What if I told you the correlation coefficient is -0.72? That’s more like it, you’d think. But which correlation coefficient is correct? They both are.

Rethinking the Massey-Thaler Draft Study

Economists Richard Thaler and Cade Massey authored a widely-read research paper analyzing the value of NFL draft picks, and they've recently published an updated version. The paper's primary finding was that teams are overconfident in their ability to choose the best players. In essence, the very top picks are overvalued relative to later picks, both in terms of what teams are willing to trade to move up in the draft and in terms of salary.

Recently Richard Thaler penned an article for the NYT discussing his paper. But puzzlingly, he goes on to make a claim that clearly contradicts his own research. He writes, "it makes absolutely no sense to be giving so much money to unproven rookies, many of whom turn out to be busts." Further, he writes, "veteran players would probably agree with the principle that eight-figure salaries should be reserved for players who have already proved themselves on the field."

Are Safeties Risky Top Picks?

One of the Chiefs' most dire needs this off-season is a dynamic safety, but GM Scott Pioli is reluctant to take a safety with the 5th pick in the draft. Falcons GM Thomas Dimitroff is apparently on the same wavelength. There seems to be a growing conventional wisdom that safeties are high-risk picks at the top of the draft. As Peter King pointed out recently, the three best safeties of the decade--Ed Reed, Bob Sanders, and Troy Polamalu--have missed 78 games due to injuries in their combined 21 NFL seasons.

The thinking is that safety (ironically) is a fundamentally dangerous position. The nature of the position, launching head-first at high rates of closure toward oncoming ball carriers, may carry a systematically higher risk of injury than most other positions. Reed, Polamalu, and Sanders suggest this may be the case, but a sample size of three is small to say the least. Are Pioli and Dimitroff rightfully concerned?

Are Rookies Overpaid?

I recently looked at what might explain why the top draft picks are paid disproportionately to their expected performance compared to later picks. But that doesn't address the larger issue--are rookies overpaid compared to their veteran counterparts?

A 2005 research paper called The Loser's Curse by economists Cade Massey and Richard Thaler tackled that question. In a nutshell, the paper compares rookie pay to the pay of a 6th-year veteran who could be expected to deliver the same performance as a rookie from each slot in the draft. (Performance is defined by a mix of measures including: being on a team roster, starts, and Pro Bowls.)

The conclusion of the paper is that team executives and scouts overpay for the top picks in the draft relative to the later picks, likely due to overconfidence in their ability to identify the best players. But what might surprise some readers is that rookies at every level of the draft are bargains compared to equivalently performing veterans.

This graph from the paper is the study's bottom line. The red 'compensation' line is the average annual pay for each draft pick. The blue 'performance' line is the salary a team would have to pay a 6-year veteran free agent for the same expected performance. The green 'surplus' line is the difference between the two pay levels.


The surplus performance peaks shallowly at the bottom of the first round and through the second round. That's where teams get the biggest bang for the buck. But still, the surplus is strongly positive throughout the entire draft. According to Massey and Thaler, rookies are a bargain compared to veterans.

There's a good explanation why rookies would be underpaid. Veterans are known quantities while there is a tremendous amount of uncertainty with draft picks. Think of it this way--Peyton Manning has been to nine Pro Bowls and Ryan Leaf to zero, for an average of 4.5 between the two players. Four Pro Bowls--that's not bad. But would a GM pay more for a guaranteed 4.5 Pro-Bowl-type player or for a 50/50 shot between a total bust and Hall of Famer? Just about every modern economic and psychological theory tells us that people will pay a premium for the sure average.

Unfortunately, that's not an option in the draft. Peyton Leaf just doesn't exist. But 6-year veterans do, and GMs will be willing to pay a premium for the reduced uncertainty in performance.

One note of caution on the paper. The draft years studied were 2000-2002, and rookie salaries have increased substantially since then. But veteran salaries have too. The question is whether rookie pay increases have outpaced veteran pay increases since then. However, rookie pay would needed to have increased over 15-20% faster than veteran pay to change the conclusions of the paper.