forecasting – Pertinent Observations

Confusing with complications

I’m reading this awesome article by Srinivas Bhogle (with Rajeeva Karandikar) on election forecasting. To be fair, not much of the article is new to me – it’s just a far more readable version of Karandikar’s seminal presentation on the topic made at IIT Kanpur all those years back.

However, as with all good retellings, this story also has some nice tidbits. This one has to do with “index of opposition unity”. The voice here is Bhogle’s:

It is easy to understand why the IOU becomes so critical in such situations. But, and here’s the rub, the exact mathematical formula connecting IOU to the seat count prediction is not easy to find. I searched through the big and small print of The Verdict by Dorab Sopariwala and Prannoy Roy, but the formula remained elusive.

Rajeeva suggests that it was likely based on simple heuristics: something like ‘if the IOU is less than 25%, give the first-placed party 75% of the seats.’ It may also have involved intelligent tweaking based on current survey data, historical data, informal feedback, expert opinion, gut feeling, and so on.

I first came across the IOU in Prannoy Roy and Dorab Sopariwala’s book. The way they had presented in the book, it seemed like it is a “major concept”. It seems, like I did, Bhogle also looked through the book trying to find a precise formula, and failed to do so.

And then Karandikar’s insight above is crucial – that the IOU may not be a precise mathematical formula, but just an intelligent set of heuristics, involving intelligent tweaking.

Sometimes putting a fancy name (or, even better, an acronym) on something can help lend credibility to the concept. For example, IOU is something that has been championed by Roy and Sopariwala for years, and they have done so to a level where it has become a self-fulfilling prophecy, and a respected scientist for Bhogle has gone searching for its formula!

Also, sometimes, telling people that you “used an intelligent heuristic” to come up with a conclusion can lead you to be taken less seriously. Put on a fancy name (even if it is something that you have yourself come up with), and the game changes. You suddenly start to be taken more seriously, like Ganesha assumed when he started sending fan mail under the name “YG Rao”.

And like they say in The Usual Suspects, sometimes the greatest trick that the devil ever pulled was to convince you that he exists. It is the same with “concepts” such as IOU – you THINK they must be sound because they come with a fancy name, when all that they apeear to represent is a set of fancy heuristics.

I must say this is excellent marketing.

Communicating binary forecasts

One silver lining in the madness of the US Presidential election counting is that there are some interesting analyses floating around regarding polling and surveying and probabilities and visualisation. Take this post from Andrew Gelman’s blog, for example:

Suppose our forecast in a certain state is that candidate X will win 0.52 of the two-party vote, with a forecast standard deviation of 0.02. Suppose also that the forecast has a normal distribution.[…]

Then your 68% predictive interval for the candidate’s vote share is [0.50, 0.54], and your 95% interval is [0.48, 0.56].

Now suppose the candidate gets exactly half of the vote. Or you could say 0.499, the point being that he lost the election in that state.

This outcome falls on the boundary of the 68% interval, it’s one standard deviation away from the forecast. In no sense would this be called a prediction error or a forecast failure.

But now let’s say it another way. The forecast gave the candidate an 84% chance of winning! And then he lost. That’s pretty damn humiliating. The forecast failed.

It took me a while to appreciate this. In a binary outcome, if your model says predicts 52%, with a standard deviation of 2%, you are in effect predicting a “win” (50% or higher) with a probability of 84%! Somehow I had never thought about it that way.

In any case, this tells you how tricky forecasting a binary outcome is. You might think (based on your sample size) that a 2% standard deviation is reasonable. Except that when the mean of your forecast is close to the barrier (50% in this case), the “reasonable standard deviation” lends a much stronger meaning to your forecast.

Gelman goes on:

That’s right. A forecast of 0.52 +/- 0.02 gives you an 84% chance of winning.

We want to increase the sd in the above expression so as to send the win probability down to 60%. How much do we need to increase it? Maybe send it from 0.02 to 0.03?
> pnorm(0.52, 0.50, 0.03)
[1] 0.75
Uh, no, that wasn’t enough! 0.04?
> pnorm(0.52, 0.50, 0.04)
[1] 0.69
0.05 won’t do it either. We actually have to go all the way up to . . . 0.08:
> pnorm(0.52, 0.50, 0.08)
[1] 0.60
That’s right. If your best guess is that candidate X will receive 0.52 of the vote, and you want your forecast to give him a 60% chance of winning the election, you’ll have to ramp up the sd to 0.08, so that your 95% forecast interval is a ridiculously wide 0.52 +/- 2*0.08, or [0.36, 0.68].

Who said forecasting an election is easy?

Just Plot It

One of my favourite work stories is from this job I did a long time ago. The task given to me was demand forecasting, and the variable I needed to forecast was so “micro” (this intersection that intersection the other) that forecasting was an absolute nightmare.

A side effect of this has been that I find it impossible to believe that it’s possible to forecast anything at all. Several (reasonably successful) forecasting assignments later, I still dread it when the client tells me that the project in question involves forecasting.

Another side effect is that the utter failure of standard textbook methods in that monster forecasting exercise all those years ago means that I find it impossible to believe that textbook methods work with “real life data”. Textbooks and college assignments are filled with problems that when “twisted” in a particular way easily unravel, like a well-tied tie knot. Industry data and problems are never as clean, and elegance doesn’t always work.

Anyway, coming back to the problem at hand, I had struggled for several months with this monster forecasting problem. Most of this time, I had been using one programming language that everyone else in the company used. The code was simultaneously being applied to lots of different sub-problems, so through the months of struggle I had never bothered to really “look at” the data.

I must have told this story before, when I spoke about why “data scientists” should learn MS Excel. For what I did next was to load the data onto a spreadsheet and start looking at it. And “looking at it” involved graphing it. And the solution, or the lack of it, lay right before my eyes. The data was so damn random that it was a wonder that anything had been forecast at all.

It was also a wonder that the people who had built the larger model (into which my forecasting piece was to plug in) had assumed that this data would be forecast-able at all (I mentioned this to the people who had built the model, and we’ll leave that story for another occasion).

In any case, looking at the data, by putting it in a visualisation, completely changed my perspective on how the problem needed to be tackled. And this has been a learning I haven’t let go of since – the first thing I do when presented with data is to graph it out, and visually inspect it. Any statistics (and any forecasting for sure) comes after that.

Yet, I find that a lot of people simply fail to appreciate the benefits of graphing. That it is not intuitive to do with most programming languages doesn’t help. Incredibly, even Python, a favoured tool of a lot of “data scientists”, doesn’t make graphing easy. Last year when I was forced to use it, I found that it was virtually impossible to create a PDF with lots of graphs – something that I do as a matter of routine when working on R (I subsequently figured out a (rather inelegant) hack the next time I was forced to use Python).

Maybe when you work on data that doesn’t have meaningful variables – such as images, for example – graphing doesn’t help (since a variable on its own has little information). But when the data remotely has some meaning – sales or production or clicks or words, graphing can be of immense help, and can give you massive insight on how to develop your model!

So go ahead, and plot it. And I won’t mind if you fail to thank me later!

Statistics and machine learning

So a group of statisticians (from Cyprus and Greece) have written an easy-to-read paper comparing statistical and machine learning methods in time series forecasting, and found that statistical methods do better, both in terms of accuracy and computational complexity.

To me, there’s no surprise in the conclusion, since in the statistical methods, there is some human intelligence involved, in terms of removing seasonality, making the time series stationary and then using statistical methods that have been built specifically for time series forecasting (including some incredibly simple stuff like exponential smoothing).

Machine learning methods, on the other hand, are more general purpose – the same neural networks used for forecasting these time series, with changed parameters, can be used for predicting something else.

In a way, using machine learning for time series forecasting is like using that little screwdriver from a Swiss army knife, rather than a proper screwdriver. Yes, it might do the job, but it’s in general inefficient and not an effective use of resources.

Yet, it is important that this paper has been written since the trend in industry nowadays has been that given cheap computing power, machine learning be used for pretty much any problem, irrespective of whether it is the most appropriate method for doing so. You also see the rise of “machine learning purists” who insist that no human intelligence should “contaminate” these models, and machines should do everything.

By pointing out that statistical techniques are superior at time series forecasting compared to general machine learning techniques, the authors bring to attention that using purpose-built techniques can actually do much better, and that we can build better systems by using a combination of human and machine intelligence.

They also helpfully include this nice picture that summarises what machine learning is good for, and I wholeheartedly agree:

The paper also has some other gems. A few samples here:

Knowing that a certain sophisticated method is not as accurate as a much simpler one is upsetting from a scientific point of view as the former requires a great deal of academic expertise and ample computer time to be applied.

[…] the post-sample predictions of simple statistical methods were found to be at least as accurate as the sophisticated ones. This finding was furiously objected to by theoretical statisticians [76], who claimed that a simple method being a special case of e.g. ARIMA models, could not be more accurate than the ARIMA one, refusing to accept the empirical evidence proving the opposite.

A problem with the academic ML forecasting literature is that the majority of published studies provide forecasts and claim satisfactory accuracies without comparing them with simple statistical methods or even naive benchmarks. Doing so raises expectations that ML methods provide accurate predictions, but without any empirical proof that this is the case.

At present, the issue of uncertainty has not been included in the research agenda of the ML field, leaving a huge vacuum that must be filled as estimating the uncertainty in future predictions is as important as the forecasts themselves.

When I missed my moment in the sun

Going through an old piece I’d written for Mint, while conducting research for something I’m planning to write, I realise that I’d come rather close to staking claim as a great election forecaster. As it happened, I just didn’t have the balls to stick my neck out (yes, mixed metaphors and all that) and so I missed the chance to be a hero.

I was writing a piece on election forecasting, and the art of converting vote shares into seat shares, which is tricky business in a first past the post system such as India. I was trying to explain how the number of “corners of contests” can have an impact on what seat share a particular vote share can translate to, and I wrote about Uttar Pradesh.

Quoting from my article:

An opinion poll conducted by CNN-IBN and CSDS whose results were published last week predicted that in Uttar Pradesh, the Bharatiya Janata Party is likely to get 38% of the vote. The survey reported that this will translate to about 41-49 seats for the BJP. What does our model above say?

…

If you look at the graph for the four-cornered contest closely (figure 4), you will notice that 38% vote share literally falls off the chart. Only once before has a party secured over 30% of the vote in a four-cornered contest (Congress in relatively tiny Haryana in 2004, with 42%) and on that occasion went on to get 90% of the seats (nine out of 10).

Given that this number (38%) falls outside the range we have noticed historically for a four-cornered contest, it makes it unpredictable. What we can say, however, is that if a party can manage to get 38% of the votes in a four-cornered state such as Uttar Pradesh, it will go on to win a lot of seats.

As it turned out, the BJP did win nearly 90% of all seats in the state (71 out of 80 to be precise), stumping most election forecasters. As you can see, I had it all right there, except that I didn’t put it in that many words – I chickened out by saying “a lot of seats”. And so I’m still known as “the guy who writes on election data for Mint” rather than “that great election forecaster”.

Then again, you don’t want to be too visible with the predictions you make, and India’s second largest business newspaper is definitely not an “obscure place”. As I’d written a long time back regarding financial forecasts,

…take your outrageous prediction and outrageous reasons and publish a paper. It should ideally be in a mid-table journal – the top journals will never accept anything this outrageous, and you won’t want too much footage for it also.

…

In all probability your prediction won’t come true. Remember – it was outrageous. No harm with that. Just burn that journal in your safe (I mean take it out of the safe before you burn it). There is a small chance of your prediction coming true. In all likelihood it wont, but just in case it does, pull that journal out of that safe and call in your journalist friends. You will be the toast of the international press.

So maybe choosing to not take the risk with my forecast was a rational decision after all. Just that it doesn’t appear so in hindsight.

Airline pricing is strange

While planning our holiday to al-Andalus during my wife’s Easter break (starting later this week), we explored different options for flights from different destinations in al-Andalus to Barcelona before we confirmed our itinerary.

As it turned out, it was cheapest (by a long way) to take a flight back from Malaga to Barcelona on Good Friday (meaning we were “wasting” three days of Priyanka’s vacation – which we were okay with), and so we’ve booked that.

Now, Vueling (Iberia’s low cost version where we’ve booked our tickets) sends me an email offering credits of €40 per passenger if we could change our flight from Friday to Saturday (one day later). In other words, it turns out now that the demand for Friday flights is so much more than that for the Saturday flight that Vueling is willing to refund more than half the fare we’ve paid so that we can make the change!

I don’t know what kind of models Vueling uses to predict demand but it seems to me now that their forecasts at the time we made our booking (3 weeks back) were a long way off – that they significantly underestimated their demand for Friday and overestimated demand for Saturday! If this is due to an unexpected bulk booking I wouldn’t blame them, else they have some explaining to do!

And “special occasions” such as long weekends, and especially festivals such as Good Friday, are a bitch when it comes to modelling, since you might need to hard code some presets for this, since normal demand patterns will be upset for the entire period surrounding that.

PS: Super excited about the upcoming holiday. We’re starting off touristy, with a day each in Granada and Cordoba. Then some days in Sevilla and some in Malaga. If you have any recommendations of things to do/see/eat in these places, please let me know! Thanks in advance.