# Statsketball 2019

Below is my submission methodology for the 2019 American Statistical Association Statsketball Draft Challenge. Having won the contest in 2018, I knew that I had to up my game this year to defend my crown.

# Contest Background

The Statsketball Draft Challenge asks participants the chance to select NCAA Tournament Teams using a budget of 224 draft points. The cost of each team, in draft points, is based on the team’s seed.

seedcost
175
240
325
420
517
615
712
810
99
108
117
126
135
144
153
161

The deeper each team goes in the tournanement, the more points they accrue, with each round more valuable than the last. Wins in each round are worth 1, 2, 3, 5, 8, and 13 points, respectively. The goal of the contest is to score the most points by selecting a subset a group of teams within the alloted budget.

# Methodology

### NCAA Hoops Model

A key ingredient to my submission is the NCAA men’s basketball model that I have developed over the last 3 years. In very simple terms, the model using weighted least squares regression to predict score differentials between any two teams, and then utilizes logistic regression to transform the obtained pointspread into a win probability for each team. The specifics of this model are not the primary focus of the Statsketball tournament submission, but for those curious, a detailed write-up of the model methodology can be found here. For the purposes of this contest, all that is important is knowing is that given teams $$A$$ and $$B$$, the model outputs, $$P_A$$ and $$P_B$$, the chances that team $$A$$ and team $$B$$ win the game, respectively.

### Simulating the NCAA Tournament

The second step in my submission invloved running Monte Carlo simulations of the NCAA tournament. A single simulation of the tournament used the following steps

1. Use NCAA Hoops Model to predict the win probabilities for all 32 games in first round.
2. Draw 32 random numbers from a uniform distribution, one for each game.
3. Let game $$i$$ be played between team $$A_i$$ and team $$B_i$$ If the $$i^{th}$$ random number is $$\leq$$ team $$A_i$$’s chances of winning, advance team $$A_i$$ to the next round. Otherwise, advance team $$B_i$$ to the next round.
4. Repeat step 3 for all games in given round.
5. Repeat setps 1-4 for subsequent rounds, until a champion is crowned.

I ran 10,000 simulations of the NCAA tournament to obtain estimates of the probabilities of each team reaching each round. R code for running these simulations can be found here. The results of my simulations are shown below.

teamseedregionr64r32s16e8f4ncgchamp
Gonzaga1West1.00000.99540.91330.79320.61290.41130.2653
Duke1East1.00000.99280.90360.73410.50390.33670.1928
Virginia1South1.00000.98660.88220.72610.52670.28900.1682
North Carolina1Midwest1.00000.98890.88770.66310.46170.23540.1081
Michigan St. 2East1.00000.98240.77740.62300.30820.18670.0870
Kentucky2Midwest1.00000.96950.74460.50190.23800.09540.0362
Michigan2West1.00000.95490.70010.42050.15180.06880.0276
Tennessee2South1.00000.95080.74110.44120.19140.07280.0275
Purdue3South1.00000.93980.69420.38010.14530.05170.0193
Texas Tech3West1.00000.91650.62420.33240.10600.04270.0160
Auburn5Midwest1.00000.79550.50870.18260.09350.03120.0093
Virginia Tech4East1.00000.90120.59510.16210.06780.02770.0080
Houston3Midwest1.00000.90200.51590.21420.07210.01930.0056
Wisconsin5South1.00000.74620.48150.13210.05730.01580.0042
Florida St. 4West1.00000.86030.56240.11900.04930.01440.0040
Buffalo6West1.00000.79710.32970.13660.03360.01100.0033
Louisville7East1.00000.72310.18900.10930.02880.01100.0033
LSU3East1.00000.82810.50240.14550.03780.01210.0030
Kansas4Midwest1.00000.85240.39880.11450.04760.01110.0020
Iowa St. 6Midwest1.00000.68970.36520.14940.04860.01230.0017
Kansas St. 4South1.00000.80500.37830.08520.02970.00610.0015
Marquette5West1.00000.69640.32160.04550.01610.00350.0013
Mississippi St. 5East1.00000.82900.36090.07470.02330.00630.0010
Wofford7Midwest1.00000.68540.20010.09090.02370.00560.0009
Villanova6South1.00000.61520.20160.07250.01790.00290.0006
Maryland6East1.00000.74210.37750.09560.02150.00570.0005
Utah St. 8Midwest1.00000.55340.06680.01840.00410.00090.0003
Cincinnati7South1.00000.53080.13920.04420.00780.00140.0002
Yale14East1.00000.17190.04570.00440.00050.00020.0002
Florida10West1.00000.44720.12230.03900.00640.00140.0001
Syracuse8West1.00000.54120.04900.01930.00630.00130.0001
Saint Mary’s (CA)11South1.00000.38480.09440.02590.00390.00070.0001
UCF9East1.00000.46080.04060.00990.00200.00050.0001
Baylor9West1.00000.45880.03770.01300.00310.00040.0001
Iowa10South1.00000.46920.11310.03520.00640.00070.0000
VCU8East1.00000.53920.05570.01650.00320.00070.0000
Oklahoma9South1.00000.52500.06090.02200.00550.00060.0000
Ole Miss8South1.00000.47500.05490.01940.00540.00060.0000
Washington9Midwest1.00000.44660.04450.01080.00190.00030.0000
Minnesota10East1.00000.27690.03240.01280.00160.00030.0000
Ohio St. 11Midwest1.00000.31030.10360.02750.00450.00020.0000
Seton Hall10Midwest1.00000.31460.05220.01490.00210.00020.0000
Murray St. 12West1.00000.30360.08360.00790.00110.00020.0000
Oregon12South1.00000.25380.10590.01240.00240.00010.0000
New Mexico St. 12Midwest1.00000.20450.07050.00900.00210.00010.0000
Temple11East0.40490.08830.01990.00180.00030.00010.0000
Belmont11East0.59510.16960.05450.00740.00090.00000.0000
Vermont13West1.00000.13970.03240.00210.00050.00000.0000
UC Irvine13South1.00000.19500.03430.00240.00030.00000.0000
Arizona St. 11West0.55990.12230.01920.00330.00010.00000.0000
Liberty12East1.00000.17100.02510.00180.00010.00000.0000
Northern Ky.14West1.00000.08350.01450.00120.00010.00000.0000
Georgia St. 14Midwest1.00000.09800.01530.00100.00010.00000.0000
Saint Louis13East1.00000.09880.01890.00090.00010.00000.0000
Northeastern13Midwest1.00000.14760.02200.00160.00000.00000.0000
St. John’s (NY)11West0.44010.08060.01240.00090.00000.00000.0000
Montana15West1.00000.04510.00590.00090.00000.00000.0000
Old Dominion14South1.00000.06020.00980.00060.00000.00000.0000
Gardner-Webb16South1.00000.01340.00200.00040.00000.00000.0000
Colgate15South1.00000.04920.00660.00030.00000.00000.0000
Abilene Christian15Midwest1.00000.03050.00310.00020.00000.00000.0000
Iona16Midwest1.00000.01110.00100.00000.00000.00000.0000
North Dakota St. 16East0.73440.00610.00010.00000.00000.00000.0000
N.C. Central16East0.26560.00110.00000.00000.00000.00000.0000
Fairleigh Dickinson16West0.53140.00250.00000.00000.00000.00000.0000
Prairie View16West0.46860.00210.00000.00000.00000.00000.0000

### Expected Points

Now that we have probabilities that each team reaches each round in the tournament, we can compute the expected number of points a given team will score. Let $$S_i$$ denote the number of points team $$i$$ will score in the Statsketball tournament. We see that the expected number of points team $$i$$ will score, $$\mathbb{E}(W_i)$$ is computed as follows: $\mathbb{E}(S_i) = \sum_{j = 1}^6 P(\text{Team } i \text{ wins in Round } j)\times(\text{Points for win in Round } j)$ It’s not suprising that teams with better seeds tend to have the best expected points. However, given the increased cost, is it necessarily worth it? Looking at a plot of expected points divided by team cost (to get an estimate of how many points we can expected per draft point we spend), it’s clear that many of the 1-seeds, with the expection of Gonzaga, are overvalued.

### Selecting Teams: The Knapsack Problem

I’d like to select the teams that maximize the number of expected points in my entry, subject to the budget restriction. This is a version of the classic, Knapsack Problem. In the Knapsack Problem, there are $$N$$ objects with values $$p_1, ..., p_n$$ and weights $$w_1, ..., w_n$$. The problem asks for the most valuable subset of objects that can fit in the backpack, which has weight capacity $$C$$. In the context of Statsketball, the objects are the teams, values $$p_i$$ are equal to the expected points scored $$\mathbb{E}(S_i)$$, weights $$w_i$$ are equal to team costs, and the capacity $$C$$ is our budget of 224 draft points. The Knapsack problem is solved using Dynamic Programming, and there is an implementation of a Knapsack solver in the adagio R package. Note, that we will have to combine the expected points of the First 4 teams because the contest rules only asks that we select the combination of teams in each play-in game slot.

library(adagio)
kn_solved <- knapsack(ncaa_sims$cost, ncaa_sims$exp_pts, 224)
kn_solved
## $capacity ## [1] 224 ## ##$profit
## [1] 42.297
##
## $indices ## [1] 2 18 21 32 35 48 49 51 ncaa_sims$team[kn_solved\$indices]
## [1] "Michigan St." "Kentucky"     "Auburn"       "Iona"         "Purdue"
## [6] "Gardner-Webb" "Gonzaga"      "Texas Tech"

We aren’t expecting much from either of the 16 seeds, Gardner-Webb and Iona, but those are simply chosen in order to not be wasteful. Not suprisingly, the teams chosen are among the best value picks in seeds 1-5 identified above.

### Limitations

There are two main limitations of maximizing expected points. The first, is that we don’t know what teams other entrants to the Statsketball tournament are selecting. Any team that is selected by several entrants becomes relatively less valuable. Another limitation is that maximizing expected points yields fewer teams chosen. Many times, we can obtain 90% of the expected points of 1 team for the same (or lower) cost using some combination of 2 or 3 teams. Having lots of teams sounds great in practice, as we keep more doors open in case of upsets, but at the same time, the lower seeded a team is, the less likely it is to exceed it’s expected value. Many distributions of team points in statsketball will be heavily skewed towards 0, as lower seeded teams are likely to be knocked out in round 1 of the tournament. Because we are comparing means and not distributions, there might be some other combination of teams with a lower expected value that is more likely to outscore the group of teams I’ve selected.

By no means is this method perfect, but I think it does a reasonably good job of identifying and selecting undervalued teams given the cost constraints of the tournament.