📊 Making NBA Picks using ML 🏀

Using a XGBoost model with ChatGPT to simplify picking winners

Oct 29, 2023

I’m into stats and basketball, and so naturally I have spent some time researching projects that use machine learning to pick winners. With the season starting up I’m excited to improve my process, specifically using ChatGPT for displaying results.

NBA-Machine-Learning-Sports-Betting by Kyle Skompinski

This project offers two models for the Money Line (which team wins) and the Over/Under (total points), XGBoost and a Tensorflow one. To simplify this I’m only focusing on MoneyLine (ML) and XGBoost.

It uses the odds API to create Expected Value, and more recently has included Kelly Criterion to create a suggested wager % based on the expected value.

You can run their Python script in a Colab notebook or by checking out the repo locally.

------------Expected Value & Kelly Criterion-----------
Oklahoma City Thunder EV: 60.4 Fraction of Bankroll: 41.94%
Denver Nuggets EV: -45.82 Fraction of Bankroll: 0%
Houston Rockets EV: 8.43 Fraction of Bankroll: 4.48%
Golden State Warriors EV: -9.94 Fraction of Bankroll: 0%
Milwaukee Bucks EV: -10.02 Fraction of Bankroll: 0%
Atlanta Hawks EV: 10.29 Fraction of Bankroll: 4.79%
Philadelphia 76ers EV: -11.69 Fraction of Bankroll: 0%
Portland Trail Blazers EV: 26.33 Fraction of Bankroll: 7.31%
LA Clippers EV: -35.37 Fraction of Bankroll: 0%
San Antonio Spurs EV: 100.75 Fraction of Bankroll: 31.48%
Sacramento Kings EV: 5.95 Fraction of Bankroll: 7.86%
Los Angeles Lakers EV: -15.92 Fraction of Bankroll: 0%
-------------------------------------------------------

Net EV

One thing I notice when monitoring Expected Value is that the numbers can be volatile because of data not included in the model, for things like last minute injuries.

What I’ve done as a workaround for this is calculate the difference in EV between the two teams. This helps normalize the EV, because with Kelly Criterion we wager according to the EV.

Simplifying

This was my hypothesis last season and I had some success with it, beating the spread above 60% in my picks. I had a complicated spreadsheet and scripts to handle all of this, which I’m not as interested in doing this year. My plan is to use ChatGPT to be the interface for displaying this data.

The system prompt is a basic way to achieve so much, the trick is having ChatGPT write the system prompt for you (they would know best right?). Here is my current System Prompt:

system_prompt = """
Given the odds data, XGBoost Model Predictions, and Expected Value & Kelly Criterion data for NBA games, generate a betting table considering:
1. For each matchup, select the pick based on the highest positive Expected Value (EV). If both sides have negative EV, no bet will be placed.
2. Calculate the 'Net EV' by taking the difference between the teams' EV for each matchup.
3. The total betting budget is $10 times the number of games.
4. Allocate this budget to each pick based on its Kelly Criterion fraction, ensuring the total does not exceed the maximum budget.
5. Calculate the potential payout for each wager based on the provided odds.
6. Ensure that the sum of the wager amounts does not exceed the total budget.
7. Return the results in a table format that includes the pick, odds, net EV, amount wagered based on the allocated fraction of the total budget, and expected payout.
"""

It’s funny, I didn’t even bother forking the existing project and adding the library, instead I’m just piping the subprocess. GPT-4 has been great at handling moderately structured data.

# Step 1: Run your subprocess and retrieve the data
command = ['python3', 'main.py', '-xgb', '-odds=fanduel', '-kc']
subprocess_output = subprocess.run(command, capture_output=True, text=True).stdout

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": subprocess_output}
]

response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=messages
)

The end result is the number I should wager based on my 10$ budget in a table.

Update: This has been incredibly inconsistent, so I will continue to tweak the prompt, but hopefully you get the idea.

Further automation

Kyle’s NBA project claims 68% on Money Line picks, and I’ve seen it be roughly that good from last year. The Kelly Criterion feature will help maximize value on each ML pick, and hopefully ChatGPT (or any LLM) can make the display of this data consistently easy to view without having to write any code.

Automating the actual wagering would be very difficult, and sports gambling is generally not recommended, but using statistics and outsourcing your decision making on sports picks can be incredibly valuable. Our own bias will always sneak in, and while these models aren’t perfect they are proven to be accurate.

Here are a few other resources if you are interested in NBA and statistics, good luck this season!

https://www.balldontlie.io/home.html#introduction (NBA Player API)

https://nbadata.cloud/ (Make your own models, very cool)

Matt’s Substack

Discussion about this post