Covers

I combined the stats with point spreads and over/under lines obtained from covers.com, which provides historical betting data going back to the 1990-91 season. Each team page contains season schedules like this one for the 2016-17 season of my hometown Sacramento Kings. In addition to game results, the pages include the betting lines (point spreads), over/under lines, and the results of both types of bets. The betting line results are categorized as W/L/P (win, lose, or push against the spread) and the over/under results as O/U/P (over, under, or equal to the over/under line).

I utilized the Python web scraping framework Scrapy to collect all the betting data and store it to the same database the stats were written to. The heavy lifting of the Scrapy project was performed by what Scrapy designates spiders and pipelines. The job of a Scrapy spider is to crawl a web page and extract the desired data into an item or number of items and pass them to all registered pipelines. Pipelines can perform a number of tasks ranging from data cleansing and validation to data storage, which is how I wrote betting information to the database. I only wrote data for games in which the team I was parsing was the home team. This avoids duplicating data and makes it easier to setup a machine learning problem similar to my previous project where I am concerned with predicting if the home team wins against the spread.

Crawling the website provided a number of challenges including missing data and data entry errors. The site includes many games with missing betting data, such as two games for the 2000-01 Minnesota Timberwolves. Most of these instances occurred between 1995-1999, and none have happened since the 2000-01 season. These games get stored with null values for the missing data because they might have point spreads or over/under lines, just not both. Another edge case I had to account for is the rare “pick’em” game indicating the point spread is zero. However, the website displays the point spread as PK instead of 0, in which case I just replace it with a zero. A curious error in the website lists a game between the Houston Rockets and Sacramento Kings on April 4, 1995 as being played in Houston, when in fact it was played in Sacramento. The last thing I had to account for is the confusing history of the Charlotte Hornets. They moved to New Orleans in 2002 and the NBA established the Charlotte Bobcats shortly after in 2004. In 2013, the Hornets rebranded as the Pelicans, which freed up the Hornets name and allowed the Bobcats to change in 2014. The NBA stats database lists old Hornets games as Charlotte, which they technically are, but covers.com lists them as New Orleans. In order to assign game IDs to the betting data to later join with the game information in the database, I had to switch the team to Charlotte in the pipeline for “New Orleans” games prior to the 2002-03 season.

< Previous Page
Next Page >