Goals per Game in MLS

I promised something related to Major League Soccer and here it is.  Caveat:  It’s not much.  Why so sparse?  (1) The data is a bit messy due to teams folding, expansion, name changes, etc.  (2)  I was backpacking all weekend and didn’t have time to work on this side project.  Yes, I have a real job and working during the work week is a bit difficult.

My first step was to scrape the “stats” section of the MLS site to get all of their public data.  Or at least all of the data that is relatively easy to find and easy to scrape.  I’ll post the code soon once I setup the master repository on github.  Needless to say, I think it looks a bit better than my initial foray into beautifulsoup as posted here.

I decided to look at goals per game by team and year.  Most people who like soccer like goals, so this seems like a good starting point.  Here is the initial figure.

As you can see, there are a lot of blank spaces.  The reason for this is because a lot of teams changed their name and/or relocated (e.g., San Jose), some teams folded (e.g., Tampa Bay), and MLS added teams over the years (e.g., Chivas USA).  The bottom line is that it makes for an ugly graph.  In an attempt to clean it up a bit, I tried to consolidate some of the names.  Here is the new figure.

It still doesn’t look great, but I do think that you can learn a bit from this figure.  Overall, I would say that the goals per game for each team is decreasing over time.  Is it a statistically significant decline?  I dunno.  I’m not writing a paper here — it’s a freaking blog, i.e., speculation reigns supreme!  In any case, this raises more questions.  For example,

  1. Does this apparent decline affect attendance numbers?
  2. What is the cause of this decline?  Better defenders coming into the league?  Um, I doubt it.  I would imagine that quality strikers are being added at about the same rate.

I would hypothesize that it’s just that the quality of the league has improved significantly over the years.  Hence, the teams are holding possession more and not just firing shots whenever they get a chance.  As a result, I will look into attendance numbers, shots, shots on goal, etc. in the upcoming days or possibly weeks.  I believe that some interesting questions can be answered with these data.  However, I am still trying to discover what these questions might be.  If you have any ideas for questions, let me hear about them in the comments section.

The R code for this project isn’t too interesting, so I won’t post it below — it will be on the github repository in time though.  One thing that I did learn about R is that reading in numeric data measured in the thousands (e.g., attendance figures) can be problematic if the numbers have commas.  It took me a while to find the workaround and it’s given below.

mls.reg.dat$h_tot <- as.numeric(gsub(",", "", mls.reg.dat$h_tot))
mls.reg.dat$h_avg <- as.numeric(gsub(",", "", mls.reg.dat$h_avg))
mls.reg.dat$a_tot <- as.numeric(gsub(",", "", mls.reg.dat$a_tot))
mls.reg.dat$a_avg <- as.numeric(gsub(",", "", mls.reg.dat$a_avg))


Filed under Data Viz, ggplot2, R

6 responses to “Goals per Game in MLS

  1. That’s a very interesting result. One could postulate that there is more parity in the league which could be a good thing. I’m looking forward to seeing the code. I posted an XML example on my blog recently if you want to compare and contrast methods.

  2. Excellent point Larry. I would be interested also in any correlation between points scored and points allowed. The Los Angeles Galaxy had a fairly good season recently. I wonder if this was related to any poor defense.

  3. Nice job, R! One of the recurring topics of conversation about soccer here in Europe is that the various leagues become less competitive as years go by. As a result, it’s always the same 2-3 teams compete for the title every year. I don’t follow MLS so I’m not sure if this is the case in the US as well (…is there still a salary cap enforced?). One idea is to look at how the goal difference (instead of the number of goals) in the games has changed over the years. This will allow you to see if individual matches become more/less competitive as years go by. You could also contrast this measure with the variance in the final standings (i.e. the points earned by teams throughout the league).

    • Ryan

      Thanks Marios. I’ll take a look at that stuff this weekend if I get a chance. I should probably do something a bit more interesting and start pulling data from England’s Premiership! Or maybe Spain’s top league.

  4. Pingback: A Rule Change in Major League Soccer? « The Log Cabin

  5. Interesting look at the MLS. We looked at this data a couple of years ago for World Cup Soccer with the goal of being able to predict game winners. We found shots on goal in previous matches as the strong predictor of future results.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s