This action might not be possible to undo. Are you sure you want to continue?
One thing that makes sports so much fun to follow is the plethora of statistics associated with every player, every game, every team, and every season. Other than government agencies, you won’t find better sources of data to practice on. It’s a simple matter to go to the website of a professional sport and find some raw data that needs analyzing. In football (the American kind) it is often said that good offense provides excitement but good defense wins games. Fans of the 2006 Indianapolis Colts probably wouldn’t agree. Ranked 3rd in offense but 21st of 32 teams in defense, the Colts had a regular season record of 12 wins and 4 losses and won the Super Bowl. Maybe they were an anomaly. So the question is: are teams that make the post-season playoffs better defensively than the rest of the league as the conventional wisdom claims?
Data for this analysis consisted of 26 variables (i.e., team performance statistics, such as number of plays, penalties, fumbles, 3rd and 4th down conversions, and time of possession) for the 32 NFL teams (thank you nfl.com). Having that many performance variables with comparably few teams is a flag that factor analysis might be a useful way to proceed (http://statswithcats.wordpress.com/2010/08/27/the-right-tool-for-the-job/). Factor analysis (FA) is based on the concept that the variation in a set of variables can be rearranged and attributed to new variables, called factors. The use of factors instead of raw variables is sometimes preferable because factors are more efficient (i.e., fewer factors are needed to evaluate almost the same proportion of variability as the original variables). FA requires some intuition to interpret. FA produces equations that define each factor in terms of the original variables: F1 = a11x1 + a12x2 + a13x3 … a1nxn F2 = a21x1 + a22x2 + a23x3 … a2nxn
Fm = am1x1 + am2x2 + am3x3 … amnxn where: F1 through Fm are the m factors that replace the original n variables x1 through xn are the original variables a1 through an are factor analysis weights. m is always less than or equal to n, but is a lot less if you’re lucky. What you have to do is look at the correlations between the original variables and the factors and guess what each factor might mean. It’s like being given a big box of parts—gears, transistors, tires, fabric, motors, pipes, wires, and lumber—and trying to figure out what they’re supposed to make. Some parts will be integral and others will be left over.
FA derived two factors from the 26 NFL statistics—an Offense Factor and a Defense Factor. No big surprise there, in fact, that’s what we were hoping for. Each factor accounts for about 20% of the total variation in the original variables. So, we’ve lost 60% of the information contained in the original 26 variables in exchange for the simplicity of having just two variables. That’s a good example of why FA is often referred to as a data reduction technique. Two Factors that Summarize 26 Team Performance Statistics. Offense Factor
Most Important Offensive Yards per Game Offensive Yards per Play Points Scored per Game Offensive First Downs per Game Offensive Third Down Conversions Offensive Fumbles per Game Offensive Plays per Game Offensive Time of Possession per Game Defensive Plays per Game Offensive Yards per Penalty Defensive Yards per Penalty Offensive Forth Down Attempts per Game Percent of Fumbles Recovered by Defense Offensive Third Downs per Game Least Important Offensive Percent of Fumbles Lost Defensive Penalties per game Offensive Penalties per game Offensive Forth Down Conversions Fumbles Caused by Defense per Game Forth Down Conversions Allowed Time of Possession Allowed per Game
Yards Allowed per Play Yards Allowed per Game First Downs Allowed per Game Points Allowed per Game Third Down Conversions Allowed
FA and the associated data reduction techniques of correspondence analysis and multidimensional scaling are like photographs. A photograph conveys only two of three spatial dimensions and usually includes no information about time, odors, sounds, temperature, or other circumstances, yet it still presents enough information so that observers can discern what is happening. So data reduction shouldn’t be taken as a pejorative descriptor. Sometimes simplifying a problem is the best way to solve it; at least that’s what William of Ockham thought. And after all, isn’t that what modeling is about? Once the number of variables has been reduced to a manageable few factors, you can analyze patterns of relationships much more efficiently. Consider the scatter plot of how the 32 teams scored on the two factors and how far they got in the postseason. The two gray lines represent the averages of the Offense and Defense Factors. The Seattle Seahawks could be considered the average team of the 2006 season because they are located closest to the intersection of these two lines. Draw an imaginary line through the plot origin and the intersection of the lines (i.e., a 45° angle), and you’ll identify the most balanced teams, the teams with about the same scores for their Offense and Defense Factors. The most balanced teams from best to worst would be the
Pittsburgh Steelers, the New York Giants, the Seattle Seahawks, the Tennessee Titans, the Cleveland Browns, and the Houston Texans. Of these, only the Giants and the Seahawks made the playoffs. So much for the importance of balance.
OFFENSE: Above Average DEFENSE: Below Average
OFFENSE: Above Average DEFENSE: Above Average
Indianapolis Colts San Diego Chargers Philadelphia Eagles
Played in Super Bowl Played in Conference Championship Game Played in Division Playoff Game Played in Wildcard Game Did not make playoffs
New Orleans Saints Cincinnati St. Louis Rams Washington Bengals Kansas City Redskins Chiefs Atlanta Falcons Seattle Seahawks Arizona Cardinals Tennessee Titans Detroit Green Bay Packers Lions Cleveland Browns Houston Texans
Dallas Cowboys Pittsburgh Steelers New England Patriots New York Giants New York Jets Miami Dolphins Carolina Panthers Jacksonville Jaguars Baltimore Ravens
Denver Broncos Minnesota San Francisco Vikings 49ers Buffalo Bills
Tampa Bay Buccaneers OFFENSE: Below Average DEFENSE: Below Average
Oakland Raiders OFFENSE: Below Average DEFENSE: Above Average
Factor Analysis of National Football League Teams. [Note: There’s a reason why there are no values on the axes. Some readers who saw this graph were totally baffled by the numbers, so I took them out. No need to compound the confusion (http://statswithcats.wordpress.com/2011/01/16/ockham%E2%80%99s-spatula/). The units of the analysis were normalized and are meaningful only in relative terms. Both axes do have the same scale increments, however. A difference of 1 on the offense scale is analogous to a difference of 1 on the defense scale.] The 2006 Super Bowl champion Colts had the highest score on the Offense Factor but the lowest score on the Defense Factor of any of the playoff teams. In fact, 63% of teams with an above
average Offense Factor score made the playoffs compared to 44% of teams with an above average Defense Factor score. So, is the notion that good defense beats good offense wrong? Not necessarily; but it sure didn’t apply in 2006. So remember, if there’s no NFL football in 2011 because of contractual problems, you can always fall back on statistics to fill the gap. Then again, there’s always sabermetrics …
Join the Stats with Cats group on Facebook at:
Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at:
Stats with Cats is also available at amazon.com, barnesandnoble.com, and other online booksellers.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.