The word twitter is about as ubiquitous as health-care reform in today’s media. Not everybody knows what twitter is or what you can do with so-called “tweets”, but that does not stop them from posting their thoughts on Michael Jackson’s death in 140 characters or less. What does this have to do with statistics you ask? Well, there is a ton of information floating around in the twitter databases and we’re going to look at how you can access this information.
Think of a popular website and it’s almost a guarantee that your choice will have an Application Programming Interface (API). The facebook, google maps, NY Times, and, of course, twitter APIs are some of the more popular ones that a data junkie might choose to use. It is through the API that you have access (albeit limited in some cases) to the wealth of information that the particular website might choose to share.
As a simple example, let us look at my friend Michael Twardos‘ website that is a twitter-based surf report. Michael uses the twitter search API to mine information about various surfing locations. To see an API in action, consider the following sentence given on Twardos’ website: “Click here to view the latest surf reports”. By clicking on that link, you are asking the computer to make a request to the twitter api, asking the api to send all of its information related to surf reports, summarizing the information, and presenting the information in an easily digestible way for the user. In other words, there is a lot going on before that page is rendered to your web browser!
You might think that this sounds cool and all, but where is the statistical connection here? The surf reports page is an example of a text-based data mining tool and is very much about summarizing qualitative information. In my next blog post, I will consider and application of summarizing twitter data related to a thunderstorm and/or an application related to football games. I will also include how you would go about getting the information given in these examples.