Unlike my previous project, the predictions shown here represent a realistic betting scenario since only data from past games is used to make predictions. Prospective bettors would have made money during the 2016 season betting on every game according the methodology outlined here. Since the models were only tested against the 2016 season, additional testing must be done to see how well this approach generalizes by predicting other seasons. Additional hyperparameter tuning could be performed on the window size of the rolling average. Ten games seemed to perform the best after some initial testing, but I did not conduct a formal optimization on the window size. I could also explore reoptimizing hyperparameters during the season to capture in-season trends.
I am interested in investigating using stats calculated relative to league average to attempt to control for opponent strength and league-wide changes in game strategy. One variable that could potentially improve performance is a dummy variable that indicates if a team is playing the second game of a back-to-back. Back-to-back games can take a toll on players’ bodies causing fatigue and reduced performance. The NBA is working to reduce the number of back-to-backs through the new collective bargaining agreement (CBA), but some back-to-backs are inevitable. I also originally collected over/under lines, but have yet to experiment with predicting them. It is likely a similar challenge to predicting winners against the spread given the near 50/50 split of games falling above and below the over/under line, but it would be another interesting problem to tackle.
One major potential improvement to this analysis would be to use player stats instead of team stats to train the models. The absence of a team’s best player due to injury or rest generally affects an NBA team more than teams from other sports, which would be reflected in the betting lines and the team’s chance of winning. Models built using team stats would not pick up on this, but building a model using player stats from the players in each game’s lineup would better account for this. This presents the additional difficulty of estimating team strength given game rosters. When players sit, we are forced to estimate how their teams will play in their absence. This involves predicting how the minutes they typically play will be distributed amongst their teammates and how well bench players will perform with added playing time, potentially with little data to aid decisions. In the event of trades and free agency, we need to predict how well players will fit in a new system, which isn’t always well.