Adam Arians
Regression Candidate
Session: Fall 2006
12/11/10
Introduction
The QB rating is heavily used to value quarterbacks at every level. Often times, millions of dollars are tied directly to these ratings. There are 3 different variations of the current formula depending on what league you’re playing in (NCAA, high school, NFL, AFL). However, all 3 formulas use a linear multivariable formula with the same 4 variables: Completions per attempt, touchdowns per attempt, interceptions per attempt and yards per attempt.
The purpose of this project is to develop a new NFL QB rating called “QB win rating” using a linear regression. The purpose will be to better associate winning with a QB’s performance. Individual QB statistics for NFL teams will be regressed against the winning percentage of their teams.
A strong predictor of wins is not expected since the QB’s performance is only one contributing factor in the game. Defense, special teams and rushing offense are also very important.
Data
NFL QB team data was taken from 2008 and 2009 for all 32 teams (source: http://espn.go.com/nfl/statistics/team/_/stat/passing/year), giving 64 win totals for a 16 game season. The win% will be initially regressed against these variables:
Response Variable: Win% = Wins in a Season/16
Additional notes:
Initial Model:
Win% = α + β_{1}Comp + β_{2}YSack + β_{3}TDs + β_{4}Yards + β_{5}INTs + β_{6}Sacks
Model #1
Considering a QB’s passing ability is only 1 critical factor in winning a football game, the initial model does a good job of explaining the variability of winning with R^{2} = 0.541.
Win% = 0.087 + (0.057)Comp + (0.151)YSack + (1.251)TDs + (0.095)Yards + (5.689)INTs + (2.461)Sacks
Regression Statistics 








α 
Comp 
YSack 
TDs 
Yards 
INTs 
Sacks 
Coefficient 
0.0868 
0.0567 
0.1507 
1.2513 
0.0948 
5.6891 
2.4613 
Standard Error 
0.4114 
0.7320 
0.3694 
2.8613 
0.0446 
2.5379 
2.6833 
t Stat 
0.2111 
0.0775 
0.4079 
0.4373 
2.1272 
2.2417 
0.9173 
Pvalue 
0.8336 
0.9385 
0.6849 
0.6635 
0.0377 
0.0289 
0.3629 









R^{2} 
SE_{y} 
F stat 
df 
SS_{reg} 
SS_{resid} 


0.54075 
0.144427 
11.18591 
57 
1.399962 
1.188964 

Based on the Pvalues, the Completion % is the lowest predictor of Winning %. At first this may seem surprising because this is one of the variables in the existing QB ratings in football. However, this variable is likely to be highly correlated to the yardage variable (also in the existing QB ratings). Yards per attempt (plus sacks) and interceptions per attempt have the lowest pvalues, thus the best predictors of Winning %.
Comp will be removed for the next regression.
Model #2
Reducing the model down to 5 factors predicts Winning % just as well as the previous model. R^{2} = 0.541.
Win% = 0.111 + (0.151)YSack + (1.282)TDs + (0.096)Yards + (5.756)INTs + (2.487)Sacks
Regression Statistics 







α 
YSack 
TDs 
Yards 
INTs 
Sacks 
Coefficient 
0.1114 
0.1506 
1.2823 
0.0964 
5.7555 
2.4872 
Standard Error 
0.2602 
0.3663 
2.8089 
0.0389 
2.3682 
2.6395 
t Stat 
0.4282 
0.4113 
0.4565 
2.4814 
2.4303 
0.9423 
Pvalue 
0.6701 
0.6824 
0.6497 
0.0160 
0.0182 
0.3499 








R^{2} 
SE_{y} 
F stat 
df 
SS_{reg} 
SS_{resid} 

0.540702 
0.143184 
13.65592 
58 
1.399837 
1.189089 
Based on the pvalues, Interceptions per attempt and Yards per attempt are still the strongest predictors. Yards lost from sacks will be removed next since it is the worst predictor based on the pvalues. This is expected since Sacks per attempt will be highly correlated.
Model #3
Reducing the model to 4 factors predicts Winning % just as well as the previous model. R^{2} = 0.539.
Win% = 0.097 + (1.238)TDs + (0.098)Yards + (5.616)INTs + (1.503)Sacks
Regression Statistics 








α 
TDs 
Yards 
INTs 
Sacks 


Coefficient 
0.0965 
1.2379 
0.0983 
5.6160 
1.5029 


Standard Error 
0.2558 
2.7870 
0.0383 
2.3273 
1.1056 


t Stat 
0.3774 
0.4442 
2.5646 
2.4132 
1.3593 


Pvalue 
0.7072 
0.6585 
0.0129 
0.0189 
0.1792 











R^{2} 
SE_{y} 
F stat 
df 
SS_{reg} 
SS_{resid} 


0.539362 
0.142172 
17.27081 
59 
1.396369 
1.192557 

Yards per attempt and interceptions per attempt remain as the strongest predictors of Winning %. Touchdowns per attempt will be removed next since it is the weakest predictor.
Model #4
Reducing the model to 3 factors predicts Winning % just as well as the previous model. R^{2} = 0.538.
Win% = 0.065 + (0.11)Yards + (5.555)INTs + (1.466)Sacks
Regression Statistics 








α 
Yards 
INTs 
Sacks 



Coefficient 
0.0650 
0.1104 
5.5549 
1.4660 



Standard Error 
0.2441 
0.0268 
2.3076 
1.0951 



t Stat 
0.2661 
4.1222 
2.4072 
1.3387 



Pvalue 
0.7911 
0.0001 
0.0192 
0.1857 












R^{2} 
SE_{y} 
F stat 
df 
SS_{reg} 
SS_{resid} 


0.537822 
0.141218 
23.27336 
60 
1.392381 
1.196545 

All 3 of the factors have low pvalues, suggesting that they are strong predictors of Winning %. I will remove Sacks per attempt for model 5.
Model #5
Reducing the model down to only 2 parameters reduces the R^{2} value more than any other previous model change. R^{2} = 0.524.
Win% = 0.162 + (0.131)Yards + (5.07)INTs
Regression Statistics 








α 
Yards 
INTs 




Coefficient 
0.1620 
0.1308 
5.0703 




Standard Error 
0.1767 
0.0222 
2.2938 




t Stat 
0.9166 
5.8949 
2.2105 




Pvalue 
0.3630 
0.0000 
0.0308 













R^{2} 
SE_{y} 
F stat 
df 
SS_{reg} 
SS_{resid} 


0.524018 
0.142131 
33.57806 
61 
1.356644 
1.232282 

The remaining 2 factors have very low pvalues that are less than 0.05. No more factors will be removed.
Model #4 Predicted Win% vs Actual Win% Plot
Conclusion
If a “QB Win Rating” is to be used as a statistical metric for QBs, I recommend using model #4. This model has only 3 factors (Yards per attempt, Interceptions per attempt, and Sacks per attempt) and still explains 53.8% of the variability in team wins:
QB Win Rating* = [0.065 + (0.11)Yards + (5.555)INTs + (1.466)Sacks] x 100
*Winning % response variable is multiplied by 100 to get nondecimal “QB Win Rating”
This high of an R^{2} is remarkable given that a QB is involved in less than 50% of the plays during the course of a football game. This could explain a correlation between offensive and defensive performance or reflect the importance of the quarterback position.
This regression works well for NFL quarterbacks, but if expanded to high school and college I expect that the model will not be as good of a predictor. In the NFL, most quarterbacks will tend to pass much more than run. Therefore, passing factors will give a good predictor of winning. However, in high school and college, it is much more common for quarterbacks to be athletic and run for more yards. Running statistics such as “Rush Yards per attempt” are likely to be important factors in a regression model with winning as the response variable.