NBA Analysis: Coming Soon!

I decided to spend a few hours this weekend writing the R code to scrape the individual statistics of NBA players (2010-11 only).  I originally planned to write up a few NBA-related analyses, but a friend was visiting from out of town and, of course, that means less time sitting in front of my computer…which is a good thing!  So in between an in-house concert at my place (video posted soon), the Rapids first game (a win, 3-1 over Portland), brunch, and trivia at Jonesy’s (3rd place), I did write some code.  The git repo can be found here on github.

Note that this code is having a little trouble at the moment.  I have no idea why, but it’s throwing an error when it tries to scrape the Bulls’ and the Raptors’ pages.  I’m pretty sure it’s NOT because the Bulls are awesome and the Raptors suck…though I haven’t confirmed that assertion.

In any case, let me know if you have any ideas about what I should do with this data.  Some of the concepts that I’m toying with at the moment include:

  • Comparing the before and after performances of players who were traded at or near the trading deadline, and/or
  • Examining some of the more holistic player-evaluation metrics w/r/t win-loss records for various teams.

Question:  Why didn’t you use BeautifulSoup for your scraping?  You seem to be a big proponent of python — what’s up?

Answer:  I wrote about scraping with R vs python in a previous post.  That little test was pretty conclusive in terms of speed and R won.  I am not totally convinced that I like the R syntax for xml/html parsing, but it is fast.  And me not liking the syntax is probably a result of me not being an XML expert rather a shortcoming of the XML package itself.



Filed under R, Scraping Data, Sports

4 responses to “NBA Analysis: Coming Soon!

  1. I’m glad to see you posting again! I tried to use your scraping code in R, but I keep running into problems. I’m not sure if it’s because you run on a mac and I don’t or what. I downloaded and installed the XML package and the RCurcl package. There wasn’t any code there to load those programs, I suppose because they are preloaded for you. My first error is coming in the second line of code: source(paste(wd.path, “”src/…

    I’ve never really scraped before, but I’m looking forward to learning more about this.

  2. BTW, I saw a blog about a month ago of some NBA shotting in specefic areas. I couldn’t tell if these charts on contained this information, but I think it would be really cool to see if there is any spatial dynamic to shooting percentages for specefic players. The url to that blog is The date of the entry was Feb. 25.

  3. Ryan

    Hey Basil! Yeah, the part about source the load.R file and just paste the contents of load (aside from setwd()) into the d.R file. The setwd, paste, etc. isn’t necessary; it’s just some stuff that I use to keep things straight on my mac. Make sense?

  4. Pingback: NBA, Logistic Regression, and Mean Substitution | The Log Cabin

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s