Category Archives: Data Science

Slides from Rocky Mtn SABR Meeting

Last Saturday I had the good fortune to present a talk on finding, gathering, and analyzing some sports-related data on the web at the local SABR group meeting.  In case you’re not familiar with the “SABR” acronym, it stands for “Society for American Baseball Research”; here’s a link to the national organization.  The talk was light on tech and heavy on graphs (predominately made in R and in particular ggplot2).  Good times were had by all.  The slides from the talk are given below.  Most of the slides and code are recycled from previous talks, so I apologize in advance if you’re already familiar with the content.  It was, however, new to the SABR people.

Advertisements

1 Comment

Filed under Data Mining, Data Science, ggplot2, R, Scraping Data, Sports

The RDSTK Presentation at Denver R Users Group

Last night I presented a talk at the DRUG introducing the R wrapper for the Data Science Toolkit.  Lots of good questions, good forking, and good beer afterwards at Freshcraft.  The slides are given below.

Leave a comment

Filed under Data Science, Data Viz, R, Web 2.0

R and the Data Science Toolkit

I recently decided to present a talk to the Denver R Users Group (DRUG) on how to make an R package (May 17). There were only two problems: (1) I’ve never made a package and (2) I had nothing in mind to package up.  At about this same time, Pete Warden and others were blogging about the iPhone tracking issue [1]. How are these two events related? Well, I remembered that a few of my favorite Twitter ‘friends’ posted some things related to Pete Warden’s “The Data Science Toolkit (DSTK)” [2] a while back. And? And at the time I thought that it would be cool to have an R package/wrapper for accessing the DSTK’s API, similar to Drew Conway’s  R wrapper  for the infochimps API.

So I’m happy to announce that after spending a little time on this project in the past week, Version 0.1 of the RDSTK package is available on github. I haven’t submitted this package to CRAN and, hence, you need to install it from source (RDSTK_0.1.tar.gz). In order to do this, use the install.packages() function within R or R CMD INSTALL from the shell prompt. Note that the package depends on the RCurl, plyr, and rjson packages.

The following functions are included in the package:

  • street2coordinates
  • ip2coordinates
  • coordinates2politics
  • text2sentences
  • text2people
  • html2text
  • text2times

They should be easy to use if you are familiar to the DSTK API. If not, RTFM! 🙂

Let me know if you have any comments and/or suggestions. Happy hacking.

Acknowledgements:

I wanted to mention that I received a bit of help with the RCurl package from “Noah” on stackoverflow, Andy Gayton on stackoverflow, and Duncan Temple Lang on the R-Help list.  Thanks!

Footnotes:

  1. To borrow a joke from Asi Behar, “Right after word leaks that the iPhone has been tracking your location at all times, we find Osama. Coincidence? Thanks Apple!”
  2. You may recall that a while back, I tweeted about disliking the phrase “data science”.  My feelings have not changed.

10 Comments

Filed under Data Science, R