A Whole New World of Data

The word twitter is about as ubiquitous as health-care reform in today’s media.  Not everybody knows what twitter is or what you can do with so-called “tweets”, but that does not stop them from posting their thoughts on Michael Jackson’s death in 140 characters or less.  What does this have to do with statistics you ask?  Well, there is a ton of information floating around in the twitter databases and we’re going to look at how you can access this information.

Think of a popular website and it’s almost a guarantee that your choice will have an Application Programming Interface (API).  The facebook, google maps, NY Times, and, of course, twitter APIs are some of the more popular ones that a data junkie might choose to use.  It is through the API that you have access (albeit limited in some cases) to the wealth of information that the particular website might choose to share.

As a simple example, let us look at my friend Michael Twardos‘ website that is a twitter-based surf report.  Michael uses the twitter search API to mine information about various surfing locations.  To see an API in action, consider the following sentence given on Twardos’ website: “Click here to view the latest surf reports”.   By clicking on that link, you are asking the computer to make a request to the twitter api, asking the api to send all of its information related to surf reports, summarizing the information, and presenting the information in an easily digestible way for the user.  In other words, there is a lot going on before that page is rendered to your web browser!

You might think that this sounds cool and all, but where is the statistical connection here?  The surf reports page is an example of a text-based data mining tool and is very much about summarizing qualitative information.  In my next blog post, I will consider and application of summarizing twitter data related to a thunderstorm and/or an application related to football games.  I will also include how you would go about getting the information given in these examples.

1 Comment

Filed under Web 2.0

One response to “A Whole New World of Data

  1. Data mining is all around us. My mom used to be a data mining fanatic. Many times she over analyzed someones responses, facial expressions, physical gestures, etc. I think websites may also have some over analyzation. I believe one of the major faults with the use of this data mining is the appearnce on sites and relative usefullness. My gosh, if I ever follow a link related to a site within a site, I’m in for a never ending loop of links to useless information. Many of these data mining techniques just produce text for links, losing many of the potential “clickers”. I’m curious to as if any data mining is used for cursor position on a webpage. Do any sites record cursor position even without a click? There must be some correlation between cursor placement and brain/eye attraction. I recently was paid to complete a grocery store survey that analyzed my eye placement in a virtual grocery store. Cursor placement could easily be used to attract potential buyers… I know it is possible since we already have roll over affects in plain html and flash…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s