DataBall

Thank you for visiting my website. It explores a project that combines my interest in data science with my love of sports. The discussion that follows details the process I used to predict NBA game winners against betting lines, from acquiring data to evaluating models. The project’s name was inspired by a Grantland article by Kirk Goldsberry. Several of the pages on this site are converted from Jupyter Notebooks, in which case I provide a link to the original notebook hosted on GitHub. This project is a continuation of a previous project in which I predicted NBA winners straight up using season-averaged stats. I was interested in predicting winners against the spread in a sequential manner to represent a real-life betting scenario, which is what sparked this project. Full disclosure, I do not recommend running off to Vegas next season and bet on games using the models presented here. Betting on the spread is a difficult problem to model.

My first foray into machine learning in sports came in the form of a Kaggle competition, where competitors were tasked with calculating the odds one team would beat another for each potential matchup of the NCAA men’s basketball tournament. Models were evaluated on the log loss of their predicted probabilities for the games that actually occurred. This causes models that are incorrectly confident to be heavily penalized. Predicting all possible matchups instead of filling out a traditional bracket also allowed submissions to be easily compared against one another. It would otherwise have been difficult to determine who had the best model since filling out a perfect bracket is near impossible. This project is a natural progression of that initial work.

As usual, there is a relevant xkcd comic. Let’s see what we can learn!