Adam Arians

Regression Candidate

Session: Fall 2006

12/11/10

Introduction

The QB rating is heavily used to value quarterbacks at every level. Often times, millions of dollars are tied directly to these ratings. There are 3 different variations of the current formula depending on what league you’re playing in (NCAA, high school, NFL, AFL). However, all 3 formulas use a linear multivariable formula with the same 4 variables: Completions per attempt, touchdowns per attempt, interceptions per attempt and yards per attempt.

The purpose of this project is to develop a new NFL QB rating called “QB win rating” using a linear regression. The purpose will be to better associate winning with a QB’s performance. Individual QB statistics for NFL teams will be regressed against the winning percentage of their teams.

A strong predictor of wins is not expected since the QB’s performance is only one contributing factor in the game. Defense, special teams and rushing offense are also very important.

Data

NFL QB team data was taken from 2008 and 2009 for all 32 teams (source: http://espn.go.com/nfl/statistics/team/_/stat/passing/year), giving 64 win totals for a 16 game season. The win% will be initially regressed against these variables:

Comp = Completions/(Pass Attempts + Sacks)
Yards = Yards/(Pass Attempts + Sacks)
TDs = Touchdowns/(Pass Attempts + Sacks)
INTs = Interceptions/(Pass Attempts + Sacks)
Sacks = Sacks/(Pass Attempts + Sacks)
YSack = Yards Lost from Sacks/(Pass Attempts + Sacks)

Response Variable: Win% = Wins in a Season/16

Additional notes:

QB rushing data was left out, but could be a significant variable to the QB win rating.
“Attempts + Sacks” was used as the denominator rather than just “Attempts” as other QB ratings do. The decision to do this was based on the fact that in the NFL game, not completing a pass is usually much better than a sack. The team loses no yardage on an incompletion, but this is not the case with a sack. Therefore, it is counterintuitive to have any QB rating decrease on an incompletion, but stay flat on a sack.

Initial Model:

Win% = α + β₁Comp + β₂YSack + β₃TDs + β₄Yards + β₅INTs + β₆Sacks

Model #1

Considering a QB’s passing ability is only 1 critical factor in winning a football game, the initial model does a good job of explaining the variability of winning with R² = 0.541.

Win% = 0.087 + (0.057)Comp + (0.151)YSack + (1.251)TDs + (0.095)Yards + (-5.689)INTs + (-2.461)Sacks

Regression Statistics
	α	Comp	YSack	TDs	Yards	INTs	Sacks
Coefficient	0.0868	0.0567	0.1507	1.2513	0.0948	-5.6891	-2.4613
Standard Error	0.4114	0.7320	0.3694	2.8613	0.0446	2.5379	2.6833
t Stat	0.2111	0.0775	0.4079	0.4373	2.1272	-2.2417	-0.9173
P-value	0.8336	0.9385	0.6849	0.6635	0.0377	0.0289	0.3629

	R²	SE_y	F stat	df	SS_reg	SS_resid
	0.54075	0.144427	11.18591	57	1.399962	1.188964

Based on the P-values, the Completion % is the lowest predictor of Winning %. At first this may seem surprising because this is one of the variables in the existing QB ratings in football. However, this variable is likely to be highly correlated to the yardage variable (also in the existing QB ratings). Yards per attempt (plus sacks) and interceptions per attempt have the lowest p-values, thus the best predictors of Winning %.

Comp will be removed for the next regression.

Model #2

Reducing the model down to 5 factors predicts Winning % just as well as the previous model. R² = 0.541.

Win% = 0.111 + (0.151)YSack + (1.282)TDs + (0.096)Yards + (-5.756)INTs + (-2.487)Sacks

Regression Statistics
	α	YSack	TDs	Yards	INTs	Sacks
Coefficient	0.1114	0.1506	1.2823	0.0964	-5.7555	-2.4872
Standard Error	0.2602	0.3663	2.8089	0.0389	2.3682	2.6395
t Stat	0.4282	0.4113	0.4565	2.4814	-2.4303	-0.9423
P-value	0.6701	0.6824	0.6497	0.0160	0.0182	0.3499

	R²	SE_y	F stat	df	SS_reg	SS_resid
	0.540702	0.143184	13.65592	58	1.399837	1.189089

Based on the p-values, Interceptions per attempt and Yards per attempt are still the strongest predictors. Yards lost from sacks will be removed next since it is the worst predictor based on the p-values. This is expected since Sacks per attempt will be highly correlated.

Model #3

Reducing the model to 4 factors predicts Winning % just as well as the previous model. R² = 0.539.

Win% = 0.097 + (1.238)TDs + (0.098)Yards + (-5.616)INTs + (-1.503)Sacks

Regression Statistics
	α	TDs	Yards	INTs	Sacks
Coefficient	0.0965	1.2379	0.0983	-5.6160	-1.5029
Standard Error	0.2558	2.7870	0.0383	2.3273	1.1056
t Stat	0.3774	0.4442	2.5646	-2.4132	-1.3593
P-value	0.7072	0.6585	0.0129	0.0189	0.1792

	R²	SE_y	F stat	df	SS_reg	SS_resid
	0.539362	0.142172	17.27081	59	1.396369	1.192557

Yards per attempt and interceptions per attempt remain as the strongest predictors of Winning %. Touchdowns per attempt will be removed next since it is the weakest predictor.

Model #4

Reducing the model to 3 factors predicts Winning % just as well as the previous model. R² = 0.538.

Win% = 0.065 + (0.11)Yards + (-5.555)INTs + (-1.466)Sacks

Regression Statistics
	α	Yards	INTs	Sacks
Coefficient	0.0650	0.1104	-5.5549	-1.4660
Standard Error	0.2441	0.0268	2.3076	1.0951
t Stat	0.2661	4.1222	-2.4072	-1.3387
P-value	0.7911	0.0001	0.0192	0.1857

	R²	SE_y	F stat	df	SS_reg	SS_resid
	0.537822	0.141218	23.27336	60	1.392381	1.196545

All 3 of the factors have low p-values, suggesting that they are strong predictors of Winning %. I will remove Sacks per attempt for model 5.

Model #5

Reducing the model down to only 2 parameters reduces the R² value more than any other previous model change. R² = 0.524.

Win% = -0.162 + (0.131)Yards + (-5.07)INTs

Regression Statistics
	α	Yards	INTs
Coefficient	-0.1620	0.1308	-5.0703
Standard Error	0.1767	0.0222	2.2938
t Stat	-0.9166	5.8949	-2.2105
P-value	0.3630	0.0000	0.0308

	R²	SE_y	F stat	df	SS_reg	SS_resid
	0.524018	0.142131	33.57806	61	1.356644	1.232282

The remaining 2 factors have very low p-values that are less than 0.05. No more factors will be removed.

Model #4 Predicted Win% vs Actual Win% Plot

Conclusion

If a “QB Win Rating” is to be used as a statistical metric for QBs, I recommend using model #4. This model has only 3 factors (Yards per attempt, Interceptions per attempt, and Sacks per attempt) and still explains 53.8% of the variability in team wins:

QB Win Rating* = [0.065 + (0.11)Yards + (-5.555)INTs + (-1.466)Sacks] x 100

*Winning % response variable is multiplied by 100 to get non-decimal “QB Win Rating”

This high of an R² is remarkable given that a QB is involved in less than 50% of the plays during the course of a football game. This could explain a correlation between offensive and defensive performance or reflect the importance of the quarterback position.

This regression works well for NFL quarterbacks, but if expanded to high school and college I expect that the model will not be as good of a predictor. In the NFL, most quarterbacks will tend to pass much more than run. Therefore, passing factors will give a good predictor of winning. However, in high school and college, it is much more common for quarterbacks to be athletic and run for more yards. Running statistics such as “Rush Yards per attempt” are likely to be important factors in a regression model with winning as the response variable.