Thursday, January 11, 2007

Responding to the SMQ

The Sunday Morning Quarterback (SMQ), on his excellent site, posted a question about what many regard as the surprising outcome of the BCS Title game. Actually, a series of questions, but the essence seems to be - can we ever really know the “best” team in college football?

Lacking the good sense not to attempt such an undertaking, here is my contribution to the discussion.

As an investment professional, and all around statistics geek, I am been interested in probabilities for decades. My interest ranges from the merely curious (What is the chance my daughter will have my blue eyes?), to the pecuniary (What is the chance I will roll a “7” before I make my mark at craps?).

As an avid college football fan, and occasional college football “investor”, my interests in statistical probabilities and the college game have crossed more than once. Also, at the onset, let me say this - I agree with the SMQ that many times the outcome of any one college game simply defies explanation.

Unlike activities with a known number of outcomes, like the casino game of craps, a college football game has theoretically infinite outcomes. In craps, for instance, there are only 36 ways the dice can fall. Of those 36, 6 are the number “7”. Thus we know each roll has a 1 in 6 chance of being a “7”.

In college football, you cannot assign a simple number of outcomes to any given game. Using Florida’s most overmatched opponent as an example (and an example they were), Western Carolina has some chance in theory of beating Florida. Now it may be a chance so remote as to be almost impossible, but the chance exists. And, like the monkey randomly typing for eternity will eventually produce the works of Shakespeare, if Florida and Western Carolina played enough times, Western Carolina would win.

The first question in this case then becomes - how many times would Western Carolina need to play Florida on average to get a single win?

I think the number would be rather large, but for our purposes, lets say that WCU would beat Florida 1 in every hundred times. But this only answers part of our overall question. When WCU got that one-in-a hundred-win, what would be the score?

Well, I submit, that if we ran the game say, 10,000 times, each of WCU’s 100 wins could be a different score. So, what we have here is a series of probabilities within probabilities. For the point spread of each of the wins WCU got 1 in 100 times, I believe a bell curve analysis would be appropriate. Using the “normal distribution” of the bell curve, I would estimate that WCU would beat Florida by 3 points or less 68.3% of the time, or one standard deviation. However, if you took the analysis out to, say, 7 standard deviations, or a probability of 0.00000000026%, that would be the time that WCU blows Florida out by some extraordinary score (once again, the probabilities here are mere estimates, used for illustrative purposes).

In Western Carolina’s visit to Florida on November 18th, they lost 62-0. Flipping the example above around, Florida had a 99% chance of winning that game. However, Florida’s point spread using a normal distribution on a bell curve would have been much higher than WCU’s would be for there win. I would hazard a guess that the same 1 standard deviation range for a Florida win would be about 40 points. In a very small percentage of times, Florida would win by 1 point.

Now, I am well aware that the entire range of possibilities (combining wins with score, or just merely coming up with a score, which would tell you who wins) can be placed on a single bell curve. In our example above, the chance of WCU winning would be placed to one side of the curve, with the greater win being pushed further out. However, I wanted to separate the win from the score for a reason.

And that reason is that what we are trying to do in making analyzing a single game - such as in the instance of Florida versus Ohio State - is to come up with a probability as to what a series of outcomes would be. In other words, to see who is, statistically, the “better team”.

In looking at the Florida - Western Carolina score, there is much evidence to be gathered that Florida would beat WCU the vast majority of times they play. Is there the same evidence from the likewise lopsided margin by which Florida beat Ohio State?

I think our fallacy here - the way we make inferences of overall worth based on a game like the BCS Title game - is that basing a conclusion on a single outcome is wrong. We know from a simple game like craps that there are 36 possible outcomes. If I roll a single time, and don’t roll a “7”, the most probable outcome, does that mean I will never roll a seven? Of course not.

Likewise, in college football, where the possible outcomes are likely infinite, does a single game mean all games would be so similar. No, it cannot.

However, we can most certainly gather inferences from a single game as to what other games might look like. In the case of Florida-Ohio State, conventional wisdom clearly underestimated the strength and speed of Florida’s defense (as did Ohio State). Prior to the game, based on the team’s records and statistics, I estimated that Ohio State might win this game “7 in 10 times” Afterward, I think almost the opposite.

However, the problem will always be that our sample sizes are far too small. Without talking about R-Squareds, just know that even a 14 game season is far too small a sample to really gather anything approximating real knowledge about the “goodness” of teams.

Take this example - there are about 120 NCAA Division I-A teams. If we assume they each have a 50% chance of winning any game (assume it, we know they do not), after one week 60 teams will be undefeated. After two weeks it should be 30, 3 weeks 15, 4 weeks 8 (round up), 5 weeks 4, 6 weeks 2, and by the 7th week 1 team. Now, can we say that the one undefeated team after 7 weeks is “the best”. Of course not. With the chance of winning being equal, they were the luckiest.

But the chance of winning is not equal. When you have 120 teams, and unequal chances of winning, the probability that after 12 games at least one or more teams are undefeated is very high. If that team had a fundamentally easy schedule (which I suggest Ohio State did), how much of that 12-0 record can be considered skill, and how much just the random probability of luck?

A problem I have had with all the post game analyses of Florida - Ohio State is that people seem perplexed Florida could do so well having played Vanderbilt and South Carolina so closely. I think this not only ignores my concerns about sample sizes and probabilities of outcome, it also ignores a very real part of the fundamental analysis people are trying to make in this type of comparison - in other words, what about Ohio State’s close games?

What about, in particular, Ohio State against Illinois. In that game, which Ohio State won by 7, Illinois out gained OSU 233 to 224 yards. In fact, there were many examples of games in which Ohio State gave up big yardage to poor Big Ten teams that went ignored (Mostly, I suppose, because the point totals were large - see this post).

My point is, if you were looking for the evidence that Ohio State could play badly, it was there. But, and this is getting away from this topic and towards another -“we”, collectively, did not want to see it. It did not play into the story of match-ups like “The Game of the Century” (OSU-Michigan).

And if we want to believe anything in college football, it is a good story.

Back to the analysis. Say the “true” probability of either team winning the BCS Title game was 50-50. When either team wins, then the probability of a point spread using the bell curve comes into play. Using this example, Ohio State could have gotten very, very unlucky. They lost the game, and the spread (27 points), fell somewhere outside a single, or likely even second, standard deviation. In other words, it was unlikely.

But, when you only get to play one game, the unlikely can happen.

However, know well this - if you are willing to believe that Ohio State was merely the victim of extraordinarily “bad luck” in this game, know that it also exists the possibility that their entire regular season and 12-0 record was the result of extraordinarily “good luck”. You can’t have it one way, without the other.

Finally, remember that college football is not a casino, where the house has a multitude of chances for the odds to come their way, no matter how long it takes. The sample size is very, very small, even minute, at 12 to 14 games. And determining who is really “best” in this small a sample size is unknown and unknowable.

In other words, no - we cannot ever know the “best” team.

Which, in turn, is what makes the game so much fun.


Henry Gomez said...

Great post. Of course the nature of the game dictates that the "best" team won't always win. In a physically brutal sport like football you can't play 3 or 4 times a week like you do in college baseball or 2-3 times a week like in college basketball. So seasons must be short with a lot of time (relatively speaking) between games.

Professional baseball is the sport with the largest sample sizes (probably why statistics geeks like us tend to gravitate towards it). In a 162-game season it usually becomes plainly obvious who the best team is. That doesn't mean the best team wins the world series. That's because the sample sizes in the playoffs are much much smaller than in the regular season. A team that finished 20 games ahead of its playoff opponent can see all that work go up in smoke in a first round, 5-game series.

That's because dynamics of game change when moving from baseball's regular season into the playoffs. You are willing to pitch pitchers on short rest because "there's no tomorrow" and do other things that you would never do in the regular season.

Case in point. The World Champion St. Louis Cardinals had the 13th best record in the major leagues. It's safe to say that they were not the best team in baseball in 2006. But as a sports culture we have accepted the idea that at some point it comes down to a single 7-game series to decide who is "best" among two teams that had played, to that point, at least 169 games.

So there's a bigger question here. Will a playoff in college football answer the question of who is "best"? The answer is obviously no.

But here's the thing it will create a legitimate champion that fans can accept.

In fact, I could argue that a playoff in college football will create a more legitimate champion than the World Series does.

The reason is because football is winner take all sport. You can't lose today and know you are going to line up against the same team tomorrow and the day after that with a chance to win 2 out of 3. A single loss for a college team is a set-back that can almost rule you out of the championship picture (even with a playoff). The idea of single elimination tournament in football dovetails nicely with the "one and done" culture of the sport. In other words it's a war of attrition rather than the war of accumulation that pro baseball is.

College football teams play already out their seasons on the razor's edge. Playing in a tournament environment is a naural extension of that where as a short playoff series is not a natural extension of the way pro baseball is played.

d.tensor said...

I guess that is what I like about the current system - I think most of the time the best team is selected, which to me is the goal of a playoff. If a playoff system is also subject to error, why do it?

But this may be in part an age thing - I suspect that fans over, say, 35 may prefer the old system and that younger fans may prefer a playoff. The legitimate champion is
whatever the majority of fans say it is.

On the plus side, a playoff would, I think, reduce the arguements about bias (is this a good thing?).

Regarding the bell curve, I think an important characteristic of college teams is the equivalent of "standard deviation" in defining a Gaussian*, or how consistent they are [perhaps should be called the Michigan State factor].

For example, USC**, beat some good teams but lost to some mediocre ones. Boise was fairly consistent, I thought. They soundly beat the teams they were supposed to beat and played even in closer games, mostly. If USC and Boise played each other and both brought their best game, I think USC would win. However, if Boise had played USC's schedule, I am not sure that they would have lost to UCLA. So who is the better team? A strong team that plays erratically or a weaker but more consistent team? (Hopefully, that makes some sense.)

*actually, it is not obvious to me that a Gaussian is a reasonable assumption, central limit theorem notwithstanding - although I have no idea what might be better.

** I'm not a USC fan (rather, a disappointed Buckeye fan).

Henry Gomez said...

"I think most of the time the best team is selected, which to me is the goal of a playoff. If a playoff system is also subject to error, why do it?"

You used the correct word: "selected" and that's the problem with bowls and polls. Teams are selected rather than allowed to duke it out against each other. Imagine boxing (if it were a clean sport) but the best boxers would not fight against each other, only against separate pools of fighters. The belt is given on the basis of a vote. We would argue that so and so looked good against so and so but he could never beat so and so heads up.

Right now the margin for error is greater. Deserving teams (or teams we think may be deserving) aren't even in the conversation. How do you really know that Boise couldn't beat Florida this Saturday unless they played? Or Louisville?

Like I said it wouldn't answer who was "best" but it would answer who was "champion" by a means that most of us think is more fair than so and so said so.

Mergz said...

I agree with Henry's post above.

I think college football is the only sport where we even worry about who the "best" team really is. In the other sports, you merely compete and win championships. There is a difference between being a "champion" and being the "best".

A playoff would end this whole who is "best" debate, or at least relegate it to an afterthought. Because we can talk about who is best forver, but we can never truly know.

jimcaserta said...

The same "who's the best" question can be asked after the basketball final 4, but could any team last year really say they were better than UF? Likewise, what football team is even close to claiming they're better than UF. Another factor is that teams are time-varying systems - UF at the end of the year was clicking a lot better than at the beginning.

One thing almost all fans are guilty of is overestimating the differences between "good" and "bad" teams. Wake Forest was a blocked FG away from losing to Duke their 2nd game, but ended up winning the ACC. There are events throughout a game that are pretty much toss-ups as to what way they go. Before the NC game most fans were guilty of overestimating OSU's strength and underestimating UF's. From the season, there should have been NO question UF was a top-notch team, and you put 2 top-notch teams against each other and the result is very hard to predict.

While Corso claimed 9/10 win ratio for UF, I'd stick to 7/10. OSU had no answer for Moss/Harvey and no amount of prep will make their tackles faster.

Enjoy the title guys!

Anonymous said...

Hey, that's what I said!

Really, it is, just with math and stuff. The fact the sample size is too small for us ever know the "best" team was my entire point. Pleased to see it backed up in technical terms I only vaguely understand (that's a knock on my understanding, by the way, not the technical terms).