Adam Arians
Regression Candidate
Session: Fall 2006
12/11/10
Introduction
The QB rating is heavily used to value quarterbacks at every level. Often times, millions of dollars are tied directly to these ratings. There are 3 different variations of the current formula depending on what league you’re playing in (NCAA, high school, NFL, AFL). However, all 3 formulas use a linear multivariable formula with the same 4 variables: Completions per attempt, touchdowns per attempt, interceptions per attempt and yards per attempt.
The purpose of this project is to develop a new NFL QB rating called “QB win rating” using a linear regression. The purpose will be to better associate winning with a QB’s performance. Individual QB statistics for NFL teams will be regressed against the winning percentage of their teams.
A strong predictor of wins is not expected since the QB’s performance is only one contributing factor in the game. Defense, special teams and rushing offense are also very important.
Data
NFL QB team data was taken from 2008 and 2009 for all 32 teams (source: http://espn.go.com/nfl/statistics/team/_/stat/passing/year), giving 64 win totals for a 16 game season. The win% will be initially regressed against these variables:
Response Variable: Win% = Wins in a Season/16
Additional notes:
Initial Model:
Win% = α + β1Comp + β2YSack + β3TDs + β4Yards + β5INTs + β6Sacks
Model #1
Considering a QB’s passing ability is only 1 critical factor in winning a football game, the initial model does a good job of explaining the variability of winning with R2 = 0.541.
Win% = 0.087 + (0.057)Comp + (0.151)YSack + (1.251)TDs + (0.095)Yards + (-5.689)INTs + (-2.461)Sacks
Regression Statistics |
|
|
|
|
|
|
|
|
α |
Comp |
YSack |
TDs |
Yards |
INTs |
Sacks |
Coefficient |
0.0868 |
0.0567 |
0.1507 |
1.2513 |
0.0948 |
-5.6891 |
-2.4613 |
Standard Error |
0.4114 |
0.7320 |
0.3694 |
2.8613 |
0.0446 |
2.5379 |
2.6833 |
t Stat |
0.2111 |
0.0775 |
0.4079 |
0.4373 |
2.1272 |
-2.2417 |
-0.9173 |
P-value |
0.8336 |
0.9385 |
0.6849 |
0.6635 |
0.0377 |
0.0289 |
0.3629 |
|
|
|
|
|
|
|
|
|
R2 |
SEy |
F stat |
df |
SSreg |
SSresid |
|
|
0.54075 |
0.144427 |
11.18591 |
57 |
1.399962 |
1.188964 |
|
Based on the P-values, the Completion % is the lowest predictor of Winning %. At first this may seem surprising because this is one of the variables in the existing QB ratings in football. However, this variable is likely to be highly correlated to the yardage variable (also in the existing QB ratings). Yards per attempt (plus sacks) and interceptions per attempt have the lowest p-values, thus the best predictors of Winning %.
Comp will be removed for the next regression.
Model #2
Reducing the model down to 5 factors predicts Winning % just as well as the previous model. R2 = 0.541.
Win% = 0.111 + (0.151)YSack + (1.282)TDs + (0.096)Yards + (-5.756)INTs + (-2.487)Sacks
Regression Statistics |
|
|
|
|
|
|
|
α |
YSack |
TDs |
Yards |
INTs |
Sacks |
Coefficient |
0.1114 |
0.1506 |
1.2823 |
0.0964 |
-5.7555 |
-2.4872 |
Standard Error |
0.2602 |
0.3663 |
2.8089 |
0.0389 |
2.3682 |
2.6395 |
t Stat |
0.4282 |
0.4113 |
0.4565 |
2.4814 |
-2.4303 |
-0.9423 |
P-value |
0.6701 |
0.6824 |
0.6497 |
0.0160 |
0.0182 |
0.3499 |
|
|
|
|
|
|
|
|
R2 |
SEy |
F stat |
df |
SSreg |
SSresid |
|
0.540702 |
0.143184 |
13.65592 |
58 |
1.399837 |
1.189089 |
Based on the p-values, Interceptions per attempt and Yards per attempt are still the strongest predictors. Yards lost from sacks will be removed next since it is the worst predictor based on the p-values. This is expected since Sacks per attempt will be highly correlated.
Model #3
Reducing the model to 4 factors predicts Winning % just as well as the previous model. R2 = 0.539.
Win% = 0.097 + (1.238)TDs + (0.098)Yards + (-5.616)INTs + (-1.503)Sacks
Regression Statistics |
|
|
|
|
|
|
|
|
α |
TDs |
Yards |
INTs |
Sacks |
|
|
Coefficient |
0.0965 |
1.2379 |
0.0983 |
-5.6160 |
-1.5029 |
|
|
Standard Error |
0.2558 |
2.7870 |
0.0383 |
2.3273 |
1.1056 |
|
|
t Stat |
0.3774 |
0.4442 |
2.5646 |
-2.4132 |
-1.3593 |
|
|
P-value |
0.7072 |
0.6585 |
0.0129 |
0.0189 |
0.1792 |
|
|
|
|
|
|
|
|
|
|
|
R2 |
SEy |
F stat |
df |
SSreg |
SSresid |
|
|
0.539362 |
0.142172 |
17.27081 |
59 |
1.396369 |
1.192557 |
|
Yards per attempt and interceptions per attempt remain as the strongest predictors of Winning %. Touchdowns per attempt will be removed next since it is the weakest predictor.
Model #4
Reducing the model to 3 factors predicts Winning % just as well as the previous model. R2 = 0.538.
Win% = 0.065 + (0.11)Yards + (-5.555)INTs + (-1.466)Sacks
Regression Statistics |
|
|
|
|
|
|
|
|
α |
Yards |
INTs |
Sacks |
|
|
|
Coefficient |
0.0650 |
0.1104 |
-5.5549 |
-1.4660 |
|
|
|
Standard Error |
0.2441 |
0.0268 |
2.3076 |
1.0951 |
|
|
|
t Stat |
0.2661 |
4.1222 |
-2.4072 |
-1.3387 |
|
|
|
P-value |
0.7911 |
0.0001 |
0.0192 |
0.1857 |
|
|
|
|
|
|
|
|
|
|
|
|
R2 |
SEy |
F stat |
df |
SSreg |
SSresid |
|
|
0.537822 |
0.141218 |
23.27336 |
60 |
1.392381 |
1.196545 |
|
All 3 of the factors have low p-values, suggesting that they are strong predictors of Winning %. I will remove Sacks per attempt for model 5.
Model #5
Reducing the model down to only 2 parameters reduces the R2 value more than any other previous model change. R2 = 0.524.
Win% = -0.162 + (0.131)Yards + (-5.07)INTs
Regression Statistics |
|
|
|
|
|
|
|
|
α |
Yards |
INTs |
|
|
|
|
Coefficient |
-0.1620 |
0.1308 |
-5.0703 |
|
|
|
|
Standard Error |
0.1767 |
0.0222 |
2.2938 |
|
|
|
|
t Stat |
-0.9166 |
5.8949 |
-2.2105 |
|
|
|
|
P-value |
0.3630 |
0.0000 |
0.0308 |
|
|
|
|
|
|
|
|
|
|
|
|
|
R2 |
SEy |
F stat |
df |
SSreg |
SSresid |
|
|
0.524018 |
0.142131 |
33.57806 |
61 |
1.356644 |
1.232282 |
|
The remaining 2 factors have very low p-values that are less than 0.05. No more factors will be removed.
Model #4 Predicted Win% vs Actual Win% Plot
Conclusion
If a “QB Win Rating” is to be used as a statistical metric for QBs, I recommend using model #4. This model has only 3 factors (Yards per attempt, Interceptions per attempt, and Sacks per attempt) and still explains 53.8% of the variability in team wins:
QB Win Rating* = [0.065 + (0.11)Yards + (-5.555)INTs + (-1.466)Sacks] x 100
*Winning % response variable is multiplied by 100 to get non-decimal “QB Win Rating”
This high of an R2 is remarkable given that a QB is involved in less than 50% of the plays during the course of a football game. This could explain a correlation between offensive and defensive performance or reflect the importance of the quarterback position.
This regression works well for NFL quarterbacks, but if expanded to high school and college I expect that the model will not be as good of a predictor. In the NFL, most quarterbacks will tend to pass much more than run. Therefore, passing factors will give a good predictor of winning. However, in high school and college, it is much more common for quarterbacks to be athletic and run for more yards. Running statistics such as “Rush Yards per attempt” are likely to be important factors in a regression model with winning as the response variable.