Everyone can be above average

All it requires is some selection bias

There were quite a few teachers during my time at IIT Madras who were rumoured to have said the line “I want everyone in class to be above average”. Some people credit a professor of mathematics for saying this. At other times, the quote is ascribed to a lecturer of Engineering Drawing. In the last 20 years I’m sure even some statistics professors would have been credited with this line.

The absurdity in the line is clear. By definition, everyone cannot be above average. The average is a measure of central tendency. However you define it (arithmetic mean, geometric mean, harmonic mean, median, mode), the average is by definition a “central value”, meaning you will have numbers both above and below it. In the worst case (assuming you are using a mode or median for a highly skewed distribution), there will be a large number of data points EQUAL to the average. Everyone cannot be above (strictly greater than) average.

However, based on some recent incidents, I figured out a way in which everyone can actually be above average. All it takes is some kind of selection bias. Basically you need to be clever in terms of how you count – both when you calculate the average and when you define the “everyone”.

Take one example – you have an exam you need to pass to go from Grade 1 to Grade 2. Let’s say the class average (let’s use the simple mean here) is 41, and you need to have scored at least 40 to pass. Let’s also assume that nobody has scored exactly 40 or 41.

Now, if you come back next month and look at the exam scores of all the Grade 2 students, you will find that all of them would have scored strictly more than 41 – the old “average”. In other words, since the below average students are no longer part of the sample (since they have “not passed”), everyone left is above average! The below average set has simply been eliminated!

Another way is simple relative grading. Let’s say there are 3 sections in the class. Telling one section that “everyone should be above average” is fairly legit – all it says is that this particular section should outperform the others so significantly that everyone in this section will be above the average defined by all sections!

It is easier to do in code – using some statistical packages, as long as you slip in a few missing values into your dataset, you will find that the average is meaningless, and when you ask your software for how many are above average, the program defaults can mean that everyone can be classified as “above average” (even the ones with missing values).

I must have recommended this a few times already, but Darrell Huff’s 1954 book How to Lie With Statistics remains a masterpiece.


Average skill and peak skill

One way to describe how complex a job is is to measure the “average level of skill” and “peak level of skill” required to do the job. The more complex the job is, the larger this difference is. And sometimes, the frequency at which the peak level of skill is required can determine the quality of people you can expect to attract to the job.

Let us start with one extreme – the classic case of someone  turning screws in a Ford factory. The design has been done so perfectly and the assembly line so optimised that the level of skill required by this worker each day is identical. All he/she (much more likely a he) has to do is to show up at the job, stand in the assembly line, and turn the specific screw in every single car (or part thereof) that passes his way.

The delta between the complexity of the average day and the “toughest day” is likely to be very low in this kind of job, given the amount of optimisation already put in place by the engineers at the factory.

Consider a maintenance engineer (let’s say at an oil pipeline) on the other hand. On most days, the complexity required of the job is very close to zero, for there is nothing much to do. The engineer just needs to show up and potter around and make a usual round of checks and all izz well.

On a day when there is an issue however, things are completely different – the engineer now needs to identify the source of the issue, figure out how to fix it and then actually put in the fix. Each of this is an insanely complex process requiring insane skill. This maintenance engineer needs to be prepared for this kind of occasional complexity, and despite the banality of most of his days on the job, maintain the requisite skill to do the job on these peak days.

In fact, if you think of it, a lot of “knowledge” jobs, which are supposed to be quite complex, actually don’t require a very high level of skill on most days. Yet, most of these jobs tend to employ people at a far higher skill level than what is required on most days, and this is because of the level of skill required on “peak days” (however you define “peak”).

The challenge in these cases, though, is to keep these high skilled people excited and motivated enough when the job on most days requires pretty low skill. Some industries, such as oil and gas, resolve this issue by paying well and giving good “benefits” – so even an engineer who might get bored by the lack of work on most days stays on to be able to contribute in times when there is a problem.

The other way to do this is in terms of the frequency of high skill days – if you can somehow engineer your organisation such that the high skilled people have a reasonable frequency of days when high skills are required, then they might find more motivation. For example, you might create an “internal consulting” team of some kind – they are tasked with performing a high skill task across different teams in the org. Each time this particular high skill task is required, the internal consulting team is called for. This way, this team can be kept motivated and (more importantly, perhaps) other teams can be staffed at a lower average skill level (since they can get help on high peak days).

I’m reminded of my first ever real taste of professional life – an internship in an investment bank in London in 2005. That was the classic “high variance in skills” job. Having been tested on fairly extreme maths and logic before I got hired, I found that most of my days were spent just keying in numbers in to an Excel sheet to call a macro someone else had written to price swaps (interest rate derivatives).

And being fairly young and immature, I decided this job is not worth it for me, and did not take up the full time offer they made me. And off I went on a rather futile “tour” to figure out what kind of job has sufficient high skill work to keep me interested. And then left it all to start my own consultancy (where others would ONLY call me when there was work of my specialty; else I could chill).

With the benefit of hindsight (and having worked in a somewhat similar job later in life), though, I had completely missed the “skill gap” (delta between peak and average skill days) in my internship, and thus not appreciated why I had been hired for it. Also, that I spent barely two months in the internship meant I didn’t have sufficient data to know the frequency of “interesting days”.

And this is why – most of your time might be spent in writing some fairly ordinary code, but you will still be required to know how to reverse a red-black tree.

Most of your time might be spent in writing SQL queries or pulling some averages, but on the odd day you might need to know that a chi square test is the best way to test your current hypothesis.

Most of your time might be spent in managing people and making sure the metrics are alright, but on the odd day you might have to redesign the process at the facility that you are in charge of.

In most complex jobs, the average day is NOT similar to the most complex day by any means. And thus the average day is NOT representative of the job. The next time someone I’m interviewing asks me what my “average day looks like”, I’ll maybe point that person to this post!

Distribution of political values

Through Baal on Twitter I found this “Political Compass” survey. I took it, and it said this is my “political compass”.

Now, I’m not happy with the result. I mean, I’m okay with the average value where the red dot has been put for me, and I think that represents my political leanings rather well. However, what I’m unhappy about is that my political views have been all reduced to one single average point.

I’m pretty sure that based on all the answers I gave in the survey, my political leaning across both the two directions follows a distribution, and the red dot here is only the average (mean, I guess, but could also be median) value of that distribution.

However, there are many ways in which people can have a political view that lands right on my dot – some people might have a consistent but mild political view in favour of or against a particular position. Others might have pretty extreme views – for example, some of my answers might lead you to believe that I’m an extreme right winger, and others might make me look like a Marxist (I believe I have a pretty high variance on both axes around my average value).

So what I would have liked instead from the political compass was a sort of heat map, or at least two marginal distributions, showing how I’m distributed along the two axes, rather than all my views being reduced to one average value.

A version of this is the main argument of this book I read recently called “The End Of Average“. That when we design for “the average man” or “the average customer”, and do so across several dimensions,  we end up designing for nobody, since nobody is average when looked at on many dimensions.

Standard deviation is over

I first learnt about the concept of Standard Deviation sometime in 1999, when we were being taught introductory statistics in class 12. It was classified under the topic of “measures of dispersion”, and after having learnt the concepts of “mean deviation from median” (and learning that “mean deviation from mean” is identically zero) and “mean absolute deviation”, the teacher slipped in the concept of the standard deviation.

I remember being taught the mnemonic of “railway mail service” to remember that the standard deviation was “root mean square” (RMS! get it?). Calculating the standard deviation was simple. You took the difference between each data point and the average, and then it was “root mean square” – you squared the numbers, took the arithmetic mean and then square root.

Back then, nobody bothered to tell us why the standard deviation was significant. Later in engineering, someone (wrongly) told us that you square the deviations so that you can account for negative numbers (if that were true, the MAD would be equally serviceable). A few years later, learning statistics at business school, we were told (rightly this time) that the standard deviation was significant because it doubly penalized outliers. A few days later, we learnt hypothesis testing, which used the bell curve. “Two standard deviations includes 95% of the data”, we learnt, and blindly applied to all data sets – problems we encountered in examinations only dealt with data sets that were actually normally distributed. It was much later that we figured that the number six in “six sigma” was literally pulled out of thin air, as a dedication to Sigma Six, a precursor of Pink Floyd.

Somewhere along the way, we learnt that the specialty of the normal distribution is that it can be uniquely described by mean and standard deviation. One look at the formula for its PDF tells you why it is so:

Most introductory stats lessons are taught from the point of view of using stats to do science. In the natural world, and in science, a lot of things are normally distributed (hence it is the “normal” distribution). Thus, learning statistics using the normal distribution as a framework is helpful if you seek to use it to do science. The problem arises, however, if you assume that everything is normally distributed, as a lot of people do when they learn deep statistics using the normal distribution.

When you step outside the realms of natural science, however, you are in trouble if you were to blindly use the standard deviation, and consequently, the normal distribution. For in such realms, the normal distribution is seldom normal. Take, for example, stock markets. Most popular financial models assume that the movement of the stock price is either normal or log-normal (the famous Black-Scholes equation uses the latter assumption). In certain regimes, they might be reasonable assumptions, but pretty much anyone who has reasonably followed the markets knows that stock price movements have “fat tails”, and thus the lognormal assumption is not a great example.

At least the stock price movement looks somewhat normal (apart from the fat tails). What if you are doing some social science research and are looking at, for example, data on people’s incomes? Do you think it makes sense at all to define standard deviation for income of a sample of people? Going further, do you think it makes sense at all to compare the dispersion in incomes across two populations by measuring the standard deviations of incomes in each?

I was once talking to an organization which was trying to measure and influence salesperson efficiency. In order to do this, again, they were looking at mean and standard deviation. Given that the sales of one salesperson can be an order of magnitude greater than that of another (given the nature of their product), this made absolutely no sense!

The problem with the emphasis on standard deviation in our education means that most people know one way to measure dispersion. When you know one method to measure something, you are likely to apply it irrespective of whether it is the appropriate method to use given the circumstances. It leads to the proverbial hammer-nail problem.

What we need to understand is that the standard deviation makes sense only for some kinds of data. Yes, it is mathematically defined for any set of numbers, but it makes physical sense only when the data is approximately normally distributed. When data doesn’t fit such a distribution (and more often than not it doesn’t), the standard deviation makes little sense!

For those that noticed, the title of this post is a dedication to Tyler Cowen’s recent book.