On finding the right signal

It is not necessary that every problem yields a “signal”. It is well possible that sometimes you try to solve a problem using data and you are simply unable to find any signal. This does not mean that you have failed in your quest – the fact that you have found the absence of a signal means is valuable information and needs to be appreciated.

Sometimes, however, clients and consumers of analytics fail to appreciate this. In their opinion, if you fail to find an answer to a particular problem, you as an analyst have failed in your quest. They think that with a better analyst or better analysis it is possible to get a superior signal.

This failure by consumers of analytics to appreciate that sometimes there need not be a signal can sometimes lead to fudging. Let us say you have a data set where there is a very weak signal – let us say that all your explanatory variables explain about 1% of the variance in the dependent variable. In most cases (unless you are trading – in which case a 1% signal has some value), there is little value to be gleaned from this, and you are better off without applying a model. However, the fact that the client may not appreciate you if you give “no” as an answer can lead you to propose this 1% explanatory model as truth.

What one needs to recognize is that a bad model can sometimes subtract value. One of my clients once was using this model that had been put in place by an earlier consultant. This model had prescribed certain criteria they had to follow in recruitment, and I was asked to take a look at it. What I found was that the model showed absolutely no “signal” – based on my analysis, people with a high score as per that model were no more likely to do better than those that scored low based on that model!

You might ask what the problem with such a model is. The problem is that by recommending a certain set of scores on a certain set of parameters, the model was filtering out a large number of candidates, and without any basis. Thus, using a poor model, the company was trying to recruit out of a much smaller pool, which led to lesser choice for the hiring managers which led to suboptimal decisions. I remember closing that case with a recommendation to dismantle the model (since it wasn’t giving much of a signal anyway) and to instead simply empower the hiring manager!

Essentially companies need to recognize two things. Firstly, not having a model is better than having a poor model, for a poor model can subtract value and lead to suboptimal decision-making. Secondly, not every problem has a quantitative solution. It is very well possible that there is absolutely no signal in the data. So if no signal exists, the analyst is not at fault if she doesn’t find a signal! In fact, she would be dishonest if she were to report a signal when none existed!

It is important that companies keep these two things in mind while hiring a consultant to solve a problem using data.