A Rule Change in Major League Soccer?

I have to admit that working with my Major League Soccer data set has been slow going.  There are a few reasons:  (1) I have a full-time job at the National Renewable Energy Lab and (2) the data isn’t quite as “rich” as I initially thought.  As an example, the MLS site doesn’t list the wins and losses for each team by year.  That seems to be a fundamental piece of “sports”-type data, right?  In any case, I did come across something that I can’t seem to answer.  If you know somebody that works with MLS, send ’em my email address and tell them that I want answers, damnit!

So following up on my previous MLS-related post, I wanted to see if I could pinpoint why goals per game has been decreasing in recent years.  My first thought was that with MLS expanding, more US-based players transferring overseas, etc., that the overall talent level in MLS has suffered a bit in the more recent years.  One way that this might manifest itself in the data is by having less shots “on target” or “on goal”.  Therefore, I looked at the number of shots on goal vs the number of shots and also vs the number of goals over the years.  The two figures are given next.

Based on the first figure, one could argue that the shooters are becoming a little less accurate.  That is, the number of shots on target per shot has decreased by about 10% over the course of the league’s lifetime.  Shots on goal per goal seems relatively steady over this same time period.  This might suggest that the league’s strikers are getting slightly worse whereas the quality of the keepers is holding steady.  That, of course, could contribute to the decline of goals per game.

I also decided to look at the number of assists per goal.  Why?  Well, my logic is that if there are more assists per goal, then there might be better overall team play.  Conversely, a decrease in this number might be a result of teams having one or more stars (hence, more individual brilliance) and less of the quality, build-up-type goals.  Make sense?  C’mon, I’m trying here!  Anyway, here is the resulting graph.

Whoa, what in the hell happened there?  The data look a bit suspicious.  Specifically, there seems to be a serious change between the 2002 and 2003 seasons.  So I made a similar graph, but I separated by the different time periods.  Here ya go.

What does this mean?  My hypothesis is that there was a fundamental change to the rules in how assists were recorded between the 2002 and 2003 seasons.  Unfortunately, I can’t confirm this.  I’ve searched the web, read the MLS Wikipedia page, read a good amount of the MLS website, and can’t seem to find anything related to a rules change that might result in this sort of phenomenon.  Sooooo, if you have any ideas, send ’em my way!

This will likely be the last MLS-specific post for a while.  Unless I can find some more data, I’m giving up — their data is just not that interesting.  Notice that I didn’t say that this would be my last soccer post.  Hopefully I can scrape some EPL (England’s Premiership) data.  Given that their league has been around for more than 15 years, it should be a bit more interesting than mine.

If you’re interested in taking a look at the data and/or code yourself, I’ve created a github repository for your perusal.  Feel free to pass along your comments and/or questions regarding any code — I have thick skin.

So what’s next?  I am thinking about comparing my current workflow of (a) scrape with Python and (b) analyze with R to just doing everything in R (e.g., using the xml package).  Hopefully, I can post some time comparisons soon!

Addendum:  According to at least one blogger, the recording of “secondary assists” was changed after the 2002 season.  I’m not sure why they record secondary assists in the first place — I guess MLS wanted to appeal to the hockey people in the early years.  Here is the bloggers take on secondary assists:



Filed under Data Viz, ggplot2, R, Sports

12 responses to “A Rule Change in Major League Soccer?

  1. disgruntledphd

    dunno about the rule change, but the english premier league has been around for more than 15 years.

    The name was changed to premier from first division in the early 90’s, but its been around for at least a century.

    • Ryan

      Yep, that’s why I said it’s been around for more than 15 years and that it should be more interesting to look at that data.

  2. I believe a lot of MLS rules are taken from the FIFA sanctioning body. So if there is a rule change to “assists” it will probably be as a result of a FIFA change and not so much an MLS change. MLS does have some rules differences from FIFA’s overarching rules but I’m not sure any of them are scoring rules.

    • Ryan

      I know that the MLS experimented with some rules in the early years to make it like other sports in the US. For example, MLS used to have the clock countdown from 45 min and then the half was over at zero; whereas, FIFA has has the clock running up and the ref has ultimate control. This was fortunately changed to align with FIFA. I think that there was a “second” assist (similar to hockey) — Damian mentions this in his comment above — but I can’t confirm this.

      I’m going to take a look at the XML package vs scraping using beautifulsoup this weekend or so.

      • Ryan

        I just searched for “secondary assists” + MLS and found this youtube video. According to this guy, they changed how they are recorded after the 2002 season. I think that they should be eliminated altogether.

      • Damian

        Do you know if it is possible for someone to get an assist on their own goal? It would be ridiculous but possible.

      • Ryan

        I would doubt it, but you never know with the dipshits that run MLS as well as US Soccer.

  3. Damian

    I have to imagine that the rule change has to come from the “second” assist. I feel like I remember this being a topic of discussion a few years back. I thought the MLS had discontinued the “second” assist altogether but it appears from your data that they did not. Maybe they made the requirements more stringent after the 2002 season. Of course this is all speculation and I have no actual reference, just a hazy memory of what I may remember.

  4. If you are interested in comparing against the English premiership or various other European leagues then the http://www.football-data.co.uk/ website is a good free resource of results and for some leagues information about shots on target and so on.

  5. Very nice analysis! I just started my own blog modeling slightly after yours. I’m sure you’ll recognize the R output. :0)

  6. Pingback: Using XML package vs. BeautifulSoup « The Log Cabin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s