March Madness: Using Game Theory to Win the Office Pool
Of the tens of millions of people who will exchange about $2.5 billion during March Madness, only a few will know any game theory. But those few will be rewarded with more than their share of the pool.
It's just two weeks until Selection Sunday, the official kickoff to March Madness. Throughout the United States, no office work is performed the following Monday and Tuesday, only fevered activity filling out bracket selections in office pools.
This activity is illegal everywhere except Montana and Vermont, and even in those states many versions violate the letter of state or federal law, not to mention the rules in most employee handbooks.
On the other hand, as far as I know, only Rick Neuheisel has suffered significant penalties for participating in a March Madness bracket pool, and only North Carolina seems seriously interested in stamping them out. The last seems odd since North Carolina schools have won two of the last three and nine of the last 30 tournaments, so you’d expect the state to be good-natured about things.
On Tuesday, March 13, all eyes turn to Blackburn Court in Dayton, Ohio, for the opening games of the NCAA Men's Basketball tournament. The University of Dayton is a friendly Catholic University, in Middle America in all of the good ways and none of the bad. The place is basketball-crazy, and the outsiders who show up are basketball-crazy-squared.
Everything is arranged for the comfort and enjoyment of basketball fans, rather than the separation of those mental deviants from their money and the worship of fake celebrity. The eight teams competing are roughly the 61st to 68th best men's college teams in the country, which means you get high-quality and higher-spirited basketball, far from the corruptions of big-time college and professional sports.
The games are competitive and unpredictable — sport, not the sportainment business. I can't deny the excitement and pageantry of a Final Four game in a big city with the president in a box and a rock star/supermodel couple in the next row having their sightlines blocked by NBA hall-of-famers, with future NBA stars wrestling on the floor under the direction of $10-million-a-year coaches, but over the years I've had more fun, and tons more fun per dollar, at the First Four in Dayton.
Over the 20 days from March 13 to April 2, culminating in the championship game in the New Orleans Superdome, lots of basketball will be played. More important, tens of millions of people will exchange about $2.5 billion, mostly among friends and co-workers. Only a few of these people will know any game theory, but those people as a group will be rewarded with more than their share of the pool.
If none of this makes sense to you, here is a quick primer that contains all you need to know for this article. The NCAA Men's Basketball tournament is a single-elimination affair (one loss and you're out) that selects a champion from the 68 invited teams.
Since each game eliminates one team, that means there are 67 games. The tournament begins with the “First Four,” four games among eight teams to reduce the number of teams to 64. Those 64 teams are divided into four “regionals” of 16 teams each, and the teams are “seeded” from 1 to 16 in each regional (the committee organizing all of this actually ranks the teams from 1 to 68, but only the seed within the regional is made public).
In the first round in each regional, the 1 seed plays 16, the 2 seed plays 15, and so on. In the second round, the winner of the 1/16 seed game plays the winner of the 8/9 seed game, the winner of the 2/15 seed game plays the winner of the 7/10 seed game, and so on until each regional has a single winner (four rounds). These four winners meet in the “Final Four” games, two rounds to determine a champion.
There are many ways to bet on these games, but by far the most popular is to fill out predictions for all 67 games before the first one is played. This is called a “bracket pool,” and there are many online sites to take care of the bookkeeping for you.
There are also many scoring methods. The most popular is to award one point for every correct first round or First Four pick, two points for every correct second round pick, and so on up to six points for correctly picking the overall tournament winner. Another system, that used to be the most popular, is just to count the number of correct picks. I'm going to use the latter for examples in this piece, for simplicity. The same game theory principles apply to all versions.
An important feature is that you must pick the winner for each slot before the tournament begins. For example, suppose Baylor as the No. 3 seed in the East regional plays Oral Roberts as the No. 14 seed in the first round. The winner of this game will play the winner of the game between the No. 6 and No. 11 seeds, perhaps Florida.
Baylor would be a solid favorite in both games, perhaps an 80% chance against Oral Roberts and 70% over Florida, and about a 75% favorite in the second round game if Florida lost to the No. 11 seed, say Long Beach State. Therefore, you might pick Baylor to win both games.
Suppose, however, that the Golden Eagles of Oral Roberts put together a furious offensive performance anchored by Dominique Morrison's 40 points to edge Baylor in the first round. Your second-round pick of Baylor is now worthless, as are picks of Baylor in any later round. By itself, this isn't a big problem because most people will have picked Baylor in these two games, so you may not lose much ground.
But if you had taken a flyer and picked Baylor to win the regional, this single first-round loss would probably eliminate you from contention in any reasonably large pool because you lost not just the one point for the first round, but nine additional points for the second, third, and fourth rounds. On the other hand, someone who correctly picked Oral Roberts in the first round makes only one extra point. Correct picks only help a little, but key incorrect picks can hurt a lot.
OK, time for game theory. As you may recall from my earlier articles, game theory is the modeling of uncertainty as choices by rational actors, rather than random events like coin flips, or anything else. Game theory is just a model, as is probability theory and other alternatives. It is not reality. We use it if it helps us win more bracket pools, and not if it doesn't. Whether its assumptions are realistic is irrelevant, only whether it is a useful tool.
The first place we might apply game theory is to predict the outcomes of individual basketball games. Generally, game theory is useful when outcomes are not zero sum. For the most part basketball games are zero sum, one team wins and one team loses. However, that's not entirely true.
Coaches and players make systematic deviations from the strategies they should pursue if all they cared about was maximizing the chance of winning each game. For examples, coaches underuse three-point shots, and top players shoot too much.
If a team wins with a lot of three-point shots and a low shooting percentage, people think the coach is lucky and the players are good; while if a team wins through methodical, elaborate plays leading to lay-ups, people think the coach is a genius who has coaxed victory out of a bunch of plodders. Therefore, the coach's career interests are served by insisting on a playing style that maximizes the credit he gets rather than the team's chance of success. Similarly, top players will do better in the professional draft with good total statistics, like total points, than with good efficiency statistics, like field goal percentage; and individual statistics count more than team success.
However, the size of these deviations from optimal play is limited by the zero-sum considerations. A coach has to win, not lose, and that's more important than how he wins. A player has to please his coach to get playing time and the professional scouts to get drafted; and both mean trying to win, not just pile up stats. Thus we see the bias against three-point shots fading every year, and also the emphasis on total production rather than efficiency.
These sorts of game theory considerations matter in microscopic analyses of individual games and especially for proposition bets like the total number of three-point shots or whether a certain player will get a triple double. However, they don't matter enough to affect NCAA bracket picks because, as we shall see, individual game predictions are not the most important aspects to filling out a good bracket sheet.
The next place you might think to use game theory is in figuring out which bracket selections are likely to produce the most points. There are many different scoring systems. Some of them award selection of upsets; for example, by adding points for the seed of the winning team. In other systems, you have to pay more to select highly seeded teams. Some pools weight picks differently or put constraints on picks. Many pools ignore the First Four games. Some pools allow you to change picks after a team loses, or allow your picks to “inherit” slots (for example, if Oral Roberts beats Baylor in the first round and you picked Baylor in the second round, that pick would switch to a pick of Oral Roberts in the second round).
If the point of the contest were to maximize total expected points, any rule variation would have a big impact on strategy. But in most pools the goal is to have the most points, or among the top points if there are prizes for runners-up. Both pretty good scores and terrible scores get zero payout. While it's important to understand the scoring rules of your pool, it will not figure deeply in your selection analysis.
This is a difficult point to explain without mathematics, so I am going to try with an analogy. It's a bit involved; you have to pay careful attention. But it's worth it because it is the secret to gaining advantage in March Madness pools.
Suppose we had a contest to guess the location of the first baby born in the continental United States after the ball drops in Times Square at midnight, January 1, 2013. Each entrant guesses a location (latitude and longitude), and the guess closest to the baby's birthplace wins. Level-one thinking ignores what everyone else will do and just picks based on likelihood. In this case, this means picking the location of a large US maternity hospital.
Level-two thinking means assuming everyone else will do level-one thinking. Get a map of the US and mark your guess on it. Now draw a line between your guess and the largest maternity hospital. Draw a second line at right angles to that line intersecting at the midpoint between your guess and the hospital. Paint everything black on the map to the hospital size of that second line. Now do the same thing with other large maternity hospitals you think other people will pick.
When you're finished, the location you picked will be surrounded by an irregular shape made up of line segments, with everything outside of it black. If the first baby is born inside the nonblack region, you win. The key point is the baby need not be born anywhere near the location you picked. You don't care how many babies are born within a short distance of your pick; you care how many babies are born within the large (hopefully large, anyway) area for which your location is closer than any location picked by anyone else. The best level-two choice might be some nearly uninhabited region far from any major metropolitan area. You can win if the baby is born in any rural area, town, or small city within a large geographic area. The level-one thinkers can only win if the baby is born in their hospital, and even then they may have to share the prize with others.
Game theory goes one step further. Instead of assuming everyone else is a level-one thinker, it assumes everyone else thinks through infinite numbers of levels. Thus everyone will be selecting locations based on the number of babies born in the region that location will command, and everyone will know everyone else is selecting the same way.
The analogy to NCAA bracket picks is level-one thinkers will make selections that are likely to happen. As a game theory thinker, you realize that the likelihood of your picks being 100% correct is irrelevant. What matters is the total likelihood of all possible outcomes for which your bracket selections are the closest of anyone's in the pool.
The crudest level-one thinker will pick the higher-ranked team in every slot. The higher-ranked team wins about two-thirds of the time in the March Madness tournament, so the probability of getting all 67 games exactly correct this way is about two-thirds to the 67th or 1 in 628 billion, about the same chance of winning the 59/35 Powerball one week from one ticket and then dealing yourself five cards from a well-shuffled deck and getting a full house.
Actually, the chance of a zero-upset NCAA tournament is about 1 in 20 trillion due to the variation of individual game odds around two-thirds. That is like winning the Powerball then dealing a straight flush in one attempt. And if you make these selections, and use up many lifetimes of luck getting every game to go your way, you may well share the prize with others who did the same thing.
But unlike the person who picks only favorites, you don't have to get all or almost all of your picks right to win. In fact if everyone else in the pool makes safe selections, you will often win as long as more than half the slots are upsets. The probability of that is about 1 in 100; 34 or more upsets is far, far more likely than zero upsets (remember, we expect about 1/3 of the 67 slots to be upsets, or about 22 upsets, and 34 is closer to 22 than zero is).
“Upset” refers to a slot, not a game. In the example above, Baylor would be the favorite to win the second-round game. But if Oral Roberts beat Baylor in the first round, but then lost to Florida in the second round, the second-round game would not be an upset; Florida would have been the favored team. However, Florida as a second-round winner would still be an upset from a bracket perspective because Baylor was the favorite for the second-round winner slot.
Unfortunately, in pools with more than 100 people in them, there are likely to be enough savvy pickers that selecting all upsets will not be a winning strategy. You're also vulnerable to level-zero thinkers, people who pick lots of upsets not because they're smarter than everyone else, but because they're dumber. Maybe they have no idea who the favorites are, or even what probability means, or don't care about winning, or pick based on loyalties or team names or dreams or anything else.
So far we have figured out that it's not smart to pick all favorites, and it's not smart to pick all upsets. You undoubtedly knew those things before you started reading, but perhaps you had not thought through precisely why they are true. Now that we've discussed reasons, we are in a position to answer the question of how many upsets to select. Clearly it's related to the number of people in the pool. With only one other person, all favorites is the best choice. With a very large number of people you can only win with a lot of luck, meaning an unusual outcome, meaning a lot of upset picks.
There is one other important consideration, however, one that is less susceptible to mathematical analysis. Game theory can derive an exact solution on the assumption that everyone else is a game-theory optimal player. But that will be very far from the truth in most pools. Instead we are better off guessing how most other people will pick, rather than solving for their optimal strategies. We will also assume there are a few other savvy pickers -- you know better than I how many there are likely to be in your pool.
But don't be fooled by the basketball experts or even the sports betting experts. They're likely to waste effort on guessing individual game results. Also don't worry about most quantitative experts; they're likely to be modeling the bracket-game rules. Other than the few game theory thinkers in the world, the dangerous players will be the shrewd people with a lot of experience in these pools. They may not know why their strategy works, but it does work.
You are not in competition with these smart players. You are all trying to win money from the rest of the pool. To do it, you need to stay out of each other's way. I'm going to assume there is no collaboration involved. This means you will have to randomize your strategy in a way I will describe later.
Rather than approaching this from the top down, let's do it from the bottom up. Suppose you have filled in your entire form except for one game. That last game will only make a difference if the best other entrant in the pool has either the same score as you, or is up or down by exactly one point. That, in turn, likely means that most of your other picks are correct. This is important.
Suppose, for example, that you had picked Oral Roberts to beat Baylor and Florida to beat Long Beach State in the first round. Now you are considering whether to pick Florida in the second round. Without knowing who won the first round, you might think Florida has about a 40% chance of being the second-round winner. But assuming your first-round picks were correct, Florida has about a 75% chance of beating Oral Roberts. It's the latter probability that's more relevant, since if your first-round picks are incorrect, you're less likely to be in contention anyway.
If you knew you were one game behind the other player, you would want to pick the opposite of her pick, regardless of the odds of the game, since that's the only way to tie, which is the best you can do. If you knew you were one game ahead of the other player, you would want to pick the same as her, since that guarantees victory. Again, the game odds don't matter. If you knew you were tied with the other player, you would want to pick whichever outcome has higher probability, regardless of what she does.
In any reasonably large pool, the chance of you being one game behind the best other player has to be greater than your chance of being tied, and your chance of being tied has to be greater than your chance of being ahead. This is because there are lots of other players, so the best of them is likely better than you. Let's assume the chance is 60% than you're one game behind, 30% that you're tied, and 10% that you're one game ahead. Further, let's assume that you know the other player will pick the favorite for this slot.
If you also pick the favorite, you have a 30% chance of tying and a 10% chance of winning, for a total value of 30%/2 + 10% = 25%. If you pick the underdog, and the probability that the underdog wins is p, you tie if you're behind and win or ahead and lose. That probability is 60%*p + 10%*(1-p) = 10% + 50%*p. You win the pool if you're tied before the last game and your underdog pick wins. That probability is 30%*p. So your overall value is (10% + 50%*p)/2 + 30%*p = 5% + 55%*p. This is larger than 25% if p > 20%/55% = 36%. So in this case, you pick the underdog if it has at least 36% chance of winning, and the favorite otherwise.
Of course, we don't know all the probabilities above to solve each game exactly, but we don't need to. All we need to know is that in a pool where people heavily pick the favorites, we should pick all the underdogs above some minimum probability of winning. That is, we want the least surprising upsets; we want the underdogs in the closest games. We can set the probability threshold by considering the number of people in the pool. I use a rule of thumb that I want to win one time in the square root of the number of entrants, so one time in 10 in a pool of 100.
I further assume I win if more than half of my upset picks win. That works out to something like picking 40 upset slots, with an average chance for the underdog of about 40%. The sheet might look something like assuming the No. 3 through No. 8 seeds in each regional all lose in the first round, all the No. 2 seeds lose in the second round, but that every other game goes to the higher ranked team among those that compete.
However, there is one key assumption above that we have to relax. We assumed that the other player always picked the favorite in each game. In fact, most players pick some upsets, and are much more likely to pick close-game upsets than huge upsets (for example, no No. 16 team has ever beaten a No. 1 seed in 108 chances, so almost no one picks a No. 16 seed). What we really care about is not p, the probability of the underdog winning, but f*p where f is the chance that the other player picks the favorite (we assumed above that f=1, so only p mattered).
While we don't know what f is, we can sort the games into broad categories. In about one-third of games, the underdog has a 40% or better chance of winning, in about a third it has between 30% and 40%, and in a third it has less than 30%. In about one-third of the games, the favorite will be picked over 85% of the time, about a third it will be picked between 70% and 85%, and about a third under 70%. We want to pick our upsets that are first in both categories, but these are rare. If a game is reasonably even, a lot of people will tend to pick the underdog. So most of our upset picks will be top in one category and not bottom in the other.
A good choice is a game in which the favorite is a big, well-known school that has often been successful in the past, and the underdog is obscure but very competitive this year. Also look for favorites that are local or have national followings. Stay away from underdogs that have been tipped as potential Cinderellas, or were Cinderellas in the past few years, or are local or popular, or have a star player. However, you're going to need a lot of these games so you can't be too choosy. As long as the underdog has a decent chance and there is no obvious reason for nongame-theory players to pick it, you should add it to your set of choices.
While we don't know the brackets yet, some good underdog choices in 2012 are likely to be Drexel, Akron and Belmont — assuming that your office is not near any of these schools and it is not populated with their alumni. Avoid schools like New Mexico, Mississippi State, Notre Dame, West Virginia, Florida, and Cincinnati (that is, don't pick them to upset anyone, and pick them to be upset). Some good schools to pick to exit early are Florida State, Indiana, Louisville, Creighton, UNLV, Georgetown, Vanderbilt, Connecticut, and Temple, especially if you work with their fans.
Some sleepers to pick for Sweet Sixteen or Elite Eight are Texas, Murray State, and Long Beach State. Kentucky is probably too good to bet against in any round unless your office is in Kentucky, in which case you might pick it to lose in round one and make every other game a favorite pick. But you need a large pool to make that profitable. Otherwise, there's not much difference between the likely No. 1 and 2 seeds, so picking the other three No. 2 seeds for the final four could be good strategy.
If there were no other sophisticated players in your pool, you would just select your upsets in order of attractiveness based on highest f (probability that others will select the favorite in this game) times p (probability that the underdog will win). However, you also want to put some distance between your picks and those of other smart people. Since there aren't many smart people (unless you work in an office full of them, in which case you're all going to get so rich that you don't need the money from winning the March Madness pool), don't work too hard on this. Also, you get some distance due to different estimates of the attractiveness of various games. Nevertheless, it makes sense to randomize a bit, put in a few marginally unattractive upsets and leave out a few marginally attractive ones.
I described a method above to set the total number of upsets. Pick a target probability of winning (I use one over the square root of the number of entrants) and pick enough upsets so that probability equals the chance of at least half of your upset picks winning. This will generally lead to far more upsets than most people pick, maybe 20 in a small pool (10 to 50 people), 40 in a medium-sized one (50 to 250), and 60 in a very large pool (more than 250). Remember that “upset” refers to slots, not games. So 60 upsets doesn't mean the underdog wins in 60 out of 67 games, but that the team occupying 60 out of the 67 slots is not the highest seed that can get to that slot.
For example, if you pick the No. 2 seed to win a regional, that is an upset pick, even though you may have the No. 2 seed playing the No. 4 seed in the regional final. Also, remember that all probabilities are estimated assuming that your other picks are correct, since unless most of them are correct, you're out of contention anyway.
There is one last secret to winning, the most important of all. Go to some games, if at all possible from where you live, and watch the rest with friends. Don't wait until the Final Four to get interested. Sure there's drama and great play at the end, but it's 10 times the pleasure if you've watched each team in four previous games and have come to know the players. The greatest thing about March Madness is someone who hasn't watched a minute of college basketball all year, or read a word about it, can experience the full measure of it over three weeks.
Oh, and win a little money from the people you work with, too.