DataBall

Thank you for visiting my website. It explores a project that combines my interest in data science with my love of sports. The discussion that follows details the process I used to predict NBA game winners, from acquiring data to evaluating models. The project’s name was inspired by a Grantland article by Kirk Goldsberry. Several of the pages on this site are converted from Jupyter Notebooks, in which case I provide a link to the original notebook hosted on GitHub. My first foray into machine learning in sports came in the form of a Kaggle competition, where competitors were tasked with calculating the odds one team would beat another for each potential matchup of the NCAA men’s basketball tournament. Models were evaluated on the log loss of their predicted probabilities for the games that actually occurred. This causes models that are incorrectly confident to be heavily penalized. Predicting all possible matchups instead of filling out a traditional bracket also allowed submissions to be easily evaluated. It would otherwise have been difficult to determine who had the best model since filling out a perfect bracket is near impossible. This project is a natural progression of that initial work.

As usual, there is a relevant xkcd comic. Let’s see what we can learn!

png