Give a soapbox (blog) to a nonparametric statistician and you’re going to get this post or something very similar. Specifically, the latest edition of The Log Cabin concerns two summary statistics: the mean (average) and the median. Anybody who has taken an introductory statistics course knows the difference between the two statistics, so why waste an entire post about this subject? Glad you asked!
I just started reading The Tipping Point by Malcolm Gladwell and I’ve noticed that he likes to throw around statistics like they’re frisbees. Yes, I know it was published in 2000, it was a NY Times best seller, etc. If you like the book, don’t tune me out because I’m not here to rip the book — I really like this book. It’s just that my blood begins to boil when I read something like “The average score in that class was 20.96, meaning that the average person in the class knew 21 people …” on page 40 of my copy. To me, this doesn’t tell me a particularly compelling story. Why not?
Here’s another example. In the e-commerce world, you will often hear people refer to the average revenue per user (ARPU) as a summary of the company’s paying customers. The logic being that knowing ARPU will be sufficient for describing how much money a typical user will spend on the website. Does this seem reasonable?
Let’s consider the preceding example in a bit more detail. Consider the following figure of revenue per user for a fictitious company. ARPU for this example is $13.50.
Simply reporting ARPU of $13.50 can be misinterpretted in a situation like this. I would argue that reporting the median RPU (= $7) is more meaningful because there is an explicit interpretation — that is, approximately 50% of the users will spend at least $7. We know immediately what half of the users are likely to do; conversely, we can’t make a similar statement when reporting ARPU.
How does this related to The Tipping Point? Without knowing what the shape of the distribution of scores, I am not sure what an average score of 21 people really means. It could be that the median score is 5 people and a few scores were near 100 and, thus, severely inflating the mean score. On the other hand, it’s certainly possibly that the median score is around 21 people as well. I just don’t know. In any case, I would have preferred to see the median, the quartiles (future post), or then entire distribution.
What’s the take away message here, folks? Don’t just give me an average anything and expect me to be impressed with your findings! If you choose to report a single measurement (should be avoided at all costs!), use the median. Better yet, show me the entire disribution of values.