R and the Data Science Toolkit

I recently decided to present a talk to the Denver R Users Group (DRUG) on how to make an R package (May 17). There were only two problems: (1) I’ve never made a package and (2) I had nothing in mind to package up.  At about this same time, Pete Warden and others were blogging about the iPhone tracking issue [1]. How are these two events related? Well, I remembered that a few of my favorite Twitter ‘friends’ posted some things related to Pete Warden’s “The Data Science Toolkit (DSTK)” [2] a while back. And? And at the time I thought that it would be cool to have an R package/wrapper for accessing the DSTK’s API, similar to Drew Conway’s  R wrapper  for the infochimps API.

So I’m happy to announce that after spending a little time on this project in the past week, Version 0.1 of the RDSTK package is available on github. I haven’t submitted this package to CRAN and, hence, you need to install it from source (RDSTK_0.1.tar.gz). In order to do this, use the install.packages() function within R or R CMD INSTALL from the shell prompt. Note that the package depends on the RCurl, plyr, and rjson packages.

The following functions are included in the package:

  • street2coordinates
  • ip2coordinates
  • coordinates2politics
  • text2sentences
  • text2people
  • html2text
  • text2times

They should be easy to use if you are familiar to the DSTK API. If not, RTFM! :)

Let me know if you have any comments and/or suggestions. Happy hacking.

Acknowledgements:

I wanted to mention that I received a bit of help with the RCurl package from “Noah” on stackoverflow, Andy Gayton on stackoverflow, and Duncan Temple Lang on the R-Help list.  Thanks!

Footnotes:

  1. To borrow a joke from Asi Behar, “Right after word leaks that the iPhone has been tracking your location at all times, we find Osama. Coincidence? Thanks Apple!”
  2. You may recall that a while back, I tweeted about disliking the phrase “data science”.  My feelings have not changed.
About these ads

10 Comments

Filed under Data Science, R

10 responses to “R and the Data Science Toolkit

  1. Great idea for a package; thanks for creating it.
    It would be great if you could update the package so that the functions take a … argument to pass to getURL. This means that if you need to use a proxy server (e.g., a firewall in a corporate environment), then the functions can still be used. Take a look at the twitteR package source if you need an example of how to do this.

    • Ryan

      For example, using something like curl=getCurlHandle() in the function args? If so, I can do that no problem. Thanks for the comment.

    • Ryan

      This should be working now. Let me know if you are thinking of something else. Cheers.

  2. Hey Relmore nice job. Maybe I should start using R.

    Want word do you prefer over “data science”?

    • Ryan

      Statistics! :) I remember a bit from Moore and McCabe’s book on introductory statistics and they define statistics as (paraphrasing) the science of collecting, organizing, and interpreting data. This, to me, is what Drew Conway’s (in)famous Venn Diagram is saying exactly…though he calls it ‘Data Science’ and everybody treats the term as if it’s a new science. I think a ‘modern’ statistician might need new tools, e.g. ‘hacking skills’, but the heart of data science is organizing and collecting data (hacking skills) and interpreting the results (need math and substantive expertise). In other words, ‘data science’ (IMO) is nothing more than a subset of statistics that we might call ‘modern’ statistics.

  3. Pingback: The RDSTK Presentation at Denver R Users Group | The Log Cabin

  4. Pingback: More free data repositories! | El racó d'en Mingot

  5. Pingback: Thought this was cool: R语言中获取公开数据的25种方法 « CWYAlpha

  6. Pingback: Data APIs/feeds available as packages in R | Q&A System

  7. Pingback: Carson Farmer » Blog Archive » Because its fun to map stuff…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s