Thursday, August 13, 2009

The Superlative Teams of the Decade

Part 1 - Scoring Offense

Having done all the work to see which teams had the top Margin of Victory for the decade (which, once again, is not quite finished) I decided to aggregate my data to see which teams truly stood out – for good or for bad.

The data set is a stat geeks dream with 1054 data points, or the Scoring Offense, the Scoring Defense, the Margin of Victory and the Win-Loss record of every FBS team since 2000. At over a thousand data points we can feel pretty confident the results will be reasonably solid.

First let’s look at the averages –

Average Scoring Offense (2000-2008) = 26.6

Average Scoring Defense (2000-2008) = 25.7

Average Margin of Victory (2000-2008) = 0.88

Average Win-Loss Record (2000-2008) = 6.3 - 5.8


You might ask why the average scoring offenses and defenses don’t equal out, or why the average MOV isn’t 0. The answer is lower division teams. This data includes scores and wins (or in the case of Michigan, a loss) against the former Division II teams rather than against the members of the data set. Hence the slight inequality.

Having the averages allows us to calculate standard deviations, which in turn helps show which teams were truly superlative in the relevant category. If you recall your normal distribution, or “bell curve” it looks like this –


















Now I don’t believe all the data here would fit a normal distribution, and some would be better than others. But by looking at those stats that fall beyond 2 or even 3 standard deviations we can see those teams that truly stood out.

Scoring Offense Standouts

3 Standard Deviations

3 standard deviations above mean would indicate an offense that scored more than 47.6 points per game which we would expect to happen less than 0.1% of the time on a normal distribution. In our sample it happened more often than that.

The average Scoring Offense for the top group was 49.8 ppg, and the average Win-Loss record 11.8 – 1. The teams were -

2008 Oklahoma 51.14
2005 Texas 50.15
2004 Louisville 49.75
2005 Southern California 49.08
2004 Boise St. 48.92

Only one of these claimed the BCS title (Texas), while 2 lost it in the BCS title game (Oklahoma and USC). Texas was the only undefeated team of the group, and Oklahoma the only 2 loss team (the others had 1 loss). Notably that same Texas team was the only one to have a Scoring Defense less than 17 points per game (16.4).

2 Standard Deviations

2 standard deviations above mean would indicate an offense that scored more than 40.6 points per game which we would expect to happen 2.1% of the time on a normal distribution. The average Scoring Offense for this group was 43.0 ppg, and the average Win-Loss record 10.5 – 2.3

Those teams were -

2008 Tulsa 47.21
2006 Hawaii 46.86
2001 Brigham Young 46.77
2002 Boise St. 45.62
2004 Utah 45.33
2000 Boise St. 44.91
2002 Kansas St. 44.77
2004 Bowling Green 44.33
2001 Florida 43.82
2008 Texas Tech 43.77
2008 Florida 43.64
2005 Louisville 43.42
2007 Hawaii 43.38
2001 Miami (Fla.) 43.18
2003 Boise St. 43
2003 Miami (Ohio) 43
2003 Oklahoma 42.93
2007 Kansas 42.77
2000 Miami (Fla.) 42.64
2007 Florida 42.46
2003 Texas Tech 42.46
2000 Florida St. 42.42
2008 Texas 42.38
2007 Boise St. 42.38
2007 Oklahoma 42.29
2008 Missouri 42.21
2008 Oregon 41.92
2000 Nebraska 41.45
2008 Rice 41.31
2007 Tulsa 41.14
2003 Southern California 41.08
2003 Texas 41
2007 Texas Tech 40.92
2002 Bowling Green 40.83
2008 Oklahoma St. 40.77
2008 Houston 40.62

In terms of national success Florida in ’08, Miami in ’01 and USC in ’03 claimed the top prize. (All 3 were among the top 10 defenses in this group). Utah in ’04 was undefeated.

Also noteworthy is the fact that Urban Meyer coached several of the teams on this list – Bowling Green in ’02, Utah in ’04 and Florida in ’07 and ’08. (But that Spread Offense is just a gimmick).

The Truly Miserable

Standard deviation of course goes both ways, and several teams in the decade were 3 standard deviations below the mean with Scoring Offenses less that 12.6 points per game. This group of shame is –

2000 Central Mich. 12.45
2000 Kent St. 11.64
2003 Southern Methodist 11.17
2006 Temple 10.92
2006 Utah St. 10.83
2001 Rutgers 10.82
2006 Stanford 10.58
2005 Buffalo 10
2005 Temple 9.73
2006 Florida Int'l 9.58
2000 La.-Monroe 8.73

For the group they averaged 10.6 points per game and a Win-Loss record of 1 – 10.5.

As I hinted above defense is pretty important to overall success, and that will be our next examination of this data.

UPDATE:

Commenter Clark makes some points that I should clear up. To his points (in his comment)

1. There are 1054 points for each statistic. This is the total number of teams for the 9 years, or about 117 teams a year.

2. I ran a quick and dirty histogram and got the following –


















Not a perfect "normal distribution" but close enough for our purposes.

3. I’m not sure how I would adjust for rule changes, though their existence is duly noted. Many of the highest offenses were in the past year.

4. Thanks, I think.

Finally there is this – where is the offense SEC? The only team to make this list in the SEC for the decade is Florida. Hmmmmm.

6 comments:

Clark said...

1: I am assuming that the 1054 data points is for each statistical category (scoring offense, scoring defense, MOV and W/L record). Thus, the data set is 1054 points for this discussion, with another 3100 or so left over for the other discussions.

2: How about a graph of the data so we can see if it really is normal? Standard deviation calculations really only make sense for normal distributions, and the 68-95-99.7 rule certainly requires normality. The data suggests that indeed the distribution isn't normal.

If the distribution were normal we should find 0.135% of the teams to be 3 standard deviations high. In fact, we find 5 out of 1054. (1054 is pretty close to 119x9=1071.) 5/1054 = 0.474%. We see that there are 3.51 times as many teams 3 standard deviations above the mean than we would expect.

The results for the group 3 standard deviations below are more striking, with 11 teams in that group. 11/1054= 1.04% which is an occurrence rate that is 7.73 times too high. A -3-sigma team should be a 1 in a 1000 event, but we're seeing it happen every year (on average).

The +2-sigma group ought to have 2.14% of the teams in it (that is, 2.275% of the the teams ought to be +2-sigma, but then I've removed the 0.135% that are in the +3-sigma group.) We find 36/1054 = 3.42% which is 1.6 times too large of a group. The trend seems to be that the extremes of the data set are exaggerated when compared to a true normal distribution.

Although the data is more polarized than a normal set ought to be, the standard deviation is still a meaning with value. Maybe we can find a real statistician to weigh in on how to measure how normal the data is, and how meaningful the standard deviation is.

3. It looks like teams are being compared across years without any adjustment for rule changes. I'm specifically thinking of the clock changes made in the last few years which have shortened games. I'm assuming that this also decreased scoring. Certainly you have the numbers to check that out for sure. Thus, a team scoring 50 points per game this year would be more exceptional than a team scoring 50 points in 2000. It may be enlightening to see how many standard deviations from the mean for that year a team is, which would account for this.

4. Congratulations if you read this whole comment. I didn't mean for it to be so long. (Also don't mean to be obnoxious, though I fear that I cam across that way.)

Jams said...

oh man, loved this post. very interesting to see everything measured relative to each other, thanks for putting in the effort to calculate it all. can't wait to see the defensive numbers, and especially the ones for margin of victory. should be interesting to see how the Gators bear out in all this.

and, while I see the validity of Clark's suggestions and comments, the raw numbers still bear out some valid and at least interesting points. bravo!

Scotty #13 said...

Any EASY way to take that data and determine the "programs of the decade" by averaging their stats from all 8 years played so far?

I'd be interested to see who was more of a flash in the pan and who was a consistent powerhouse...though I think we probably know the names of the teams that would come out near the top...

JasonC said...

RE: Rule change
I was curious about UGA, so I charted where they fell in the '05 season to see if it really was a possible contender. Then, I looked at '08 to see how off the mark we were. One of the interesting things I found was the Total Defense numbers (yards, not rank) were almost the exact same which I found a bit puzzling. But I guess the clock changes might have affected that.

Amos said...

So, a little math lesson for the curious, standard deviation isn't exclusive to normal distributions. The only thing exclusive to normal distributions is the whole 68-95-99.7 rule, since with normal distributions the standard deviation has already been calculated for you. Standard deviation can be calculated similarly to an average (though not quite as simply), and will continue to measure how similar the data is to each other. I suggest reading up on:
http://en.wikipedia.org/wiki/Standard_deviation#Definition

The second square root formula is probably the easiest to understand, remember that mu is the average of the whole set of data.

Mergz said...

Amos is exactly right and I guess I should have pointed that out. Standard Deviation can be calculated using any set of numbers.

The point here is that anything 2, or especially 3, standard deviations from the mean are rare by definition. The actually percentage of rareness will vary depending on the distribution.