More on covid testing

There has been a massive jump in the number of covid-19 positive cases in Karnataka over the last couple of days. Today, there were 44 new cases discovered, and yesterday there were 36. This is a big jump from the average of about 15 cases per day in the preceding 4-5 days.

The good news is that not all of this is new infection. A lot of cases that have come out today are clusters of people who have collectively tested positive. However, there is one bit from yesterday’s cases (again a bunch of clusters) that stands out.

Source: covid19india.org

I guess by now everyone knows what “travelled from Delhi” is a euphemism for. The reason they are interesting to me is that they are based on a “repeat test”. In other words, all these people had tested negative the first time they were tested, and then they were tested again yesterday and found positive.

Why did they need a repeat test? That’s because the sensitivity of the Covid-19 test is rather low. Out of every 100 infected people who take the test, only about 70 are found positive (on average) by the test. That also depends upon when the sample is taken.  From the abstract of this paper:

Over the four days of infection prior to the typical time of symptom onset (day 5) the probability of a false negative test in an infected individual falls from 100% on day one (95% CI 69-100%) to 61% on day four (95% CI 18-98%), though there is considerable uncertainty in these numbers. On the day of symptom onset, the median false negative rate was 39% (95% CI 16-77%). This decreased to 26% (95% CI 18-34%) on day 8 (3 days after symptom onset), then began to rise again, from 27% (95% CI 20-34%) on day 9 to 61% (95% CI 54-67%) on day 21.

About one in three (depending upon when you draw the sample) infected people who have the disease are found by the test to be uninfected. Maybe I should state it again. If you test a covid-19 positive person for covid-19, there is almost a one-third chance that she will be found negative.

The good news (at the face of it) is that the test has “high specificity” of about 97-98% (this is from conversations I’ve had with people in the know. I’m unable to find links to corroborate this), or a false positive rate of 2-3%. That seems rather accurate, except that when the “prior probability” of having the disease is low, even this specificity is not good enough.

Let’s assume that a million Indians are covid-19 positive (the official numbers as of today are a little more than one-hundredth of that number). With one and a third billion people, that represents 0.075% of the population.

Let’s say we were to start “random testing” (as a number of commentators are advocating), and were to pull a random person off the street to test for Covid-19. The “prior” (before testing) likelihood she has Covid-19 is 0.075% (assume we don’t know anything more about her to change this assumption).

If we were to take 20000 such people, 15 of them will have the disease. The other 19985 don’t. Let’s test all 20000 of them.

Of the 15 who have the disease, the test returns “positive” for 10.5 (70% accuracy, round up to 11). Of the 19985 who don’t have the disease, the test returns “positive” for 400 of them (let’s assume a specificity of 98% (or a false positive rate of 2%), placing more faith in the test)! In other words, if there were a million Covid-19 positive people in India, and a random Indian were to take the test and test positive, the likelihood she actually has the disease is 11/411 = 2.6%.

If there were 10 million covid-19 positive people in India (no harm in supposing), then the “base rate” would be .75%. So out of our sample of 20000, 150 would have the disease. Again testing all 20000, 105 of the 150 who have the disease would test positive. 397 of the 19850 who don’t have the disease will test positive. In other words, if there were ten million Covid-19 positive people in India, and a random Indian were to take the test and test positive, the likelihood she actually has the disease is 105/(397+105) = 21%.

If there were ten million Covid-19 positive people in India, only one-fifth of the people who tested positive in a random test would actually have the disease.

Take a sip of water (ok I’m reading The Ken’s Beyond The First Order too much nowadays, it seems).

This is all standard maths stuff, and any self-respecting book or course on probability and Bayes’s Theorem will have at least a reference to AIDS or cancer testing. The story goes that this was a big deal in the 1990s when some people suggested that the AIDS test be used widely. Then, once this problem of false positives and posterior probabilities was pointed out, the strategy of only testing “high risk cases” got accepted.

And with a “low incidence” disease like covid-19, effective testing means you test people with a high prior probability. In India, that has meant testing people who travelled abroad, people who have come in contact with other known infected, healthcare workers, people who attended the Tablighi Jamaat conference in Delhi, and so on.

The advantage with testing people who already have a reasonable chance of having the disease is that once the test returns positive, you can be pretty sure they actually have the disease. It is more effective and efficient. Testing people with a “high prior probability of disease” is not discriminatory, or a “sampling bias” as some commentators alleged. It is prudent statistical practice.

Again, as I found to my own detriment with my tweetstorm on this topic the other day, people are bound to see politics and ascribe political motives to everything nowadays. In that sense, a lot of the commentary is not surprising. It’s also not surprising that when “one wing” heavily retweeted my article, “the other wing” made efforts to find holes in my argument (which, again, is textbook math).

One possibly apolitical criticism of my tweetstorm was that “the purpose of random testing is not to find out who is positive. It is to find out what proportion of the population has the disease”. The cost of this (apart from the monetary cost of actually testing) are threefold. Firstly, a large number of uninfected people will get hospitalised in covid-specific hospitals, clogging hospital capacity and increasing the chances that they get infected while in hospital.

Secondly, getting a truly random sample in this case is tricky, and possibly unethical. When you have limited testing capacity, you would be inclined (possibly morally, even) to use it on people who already have a high prior probability.

Finally, when the incidence is small, we need a really large sample to find out the true range.

Let’s say 1 in 1000 Indians have the disease (or about 1.35 million people). Using the Chi Square test of proportions, our estimate of the incidence of the disease varies significantly on how many people are tested.

If we test a 1000 people and find 1 positive, the true incidence of the disease (95% confidence interval) could be anywhere from 0.01% to 0.65%.

If we test 10000 people and find 10 positive, the true incidence of the disease could be anywhere between 0.05% and 0.2%.

Only if we test 100000 people (a truly massive random sample) and find 100 positive, then the true incidence lies between 0.08% and 0.12%, an acceptable range.

I admit that we may not be testing enough. A simple rule of thumb is that anyone with more than a 5% prior probability of having the disease needs to be tested. How we determine this prior probability is again dependent on some rules of thumb.

I’ll close by saying that we should NOT be doing random testing. That would be unethical on multiple counts.

Simulating Covid-19 Scenarios

I must warn that this is a super long post. Also I wonder if I should put this on medium in order to get more footage.

Most models of disease spread use what is known as a “SIR” framework. This Numberphile video gives a good primer into this framework.

The problem with the framework is that it’s too simplistic. It depends primarily on one parameter “R0”, which is the average number of people that each infected patient infects. When R0 is high, each patient infects a number of other people, and the disease spreads fast. With a low R0, the disease spreads slow. It was the SIR model that was used to produce all those “flatten the curve” pictures that we were bombarded with a week or two back.

There is a second parameter as well – the recovery or removal rate. Some diseases are so lethal that they have a high removal rate (eg. Ebola), and this puts a natural limit on how much the disease can spread, since infected people die before they can infect too many people.

In any case, such modelling is great for academic studies, and post-facto analyses where R0 can be estimated. As we are currently in the middle of an epidemic, this kind of simplistic modelling can’t take us far. Nobody has a clue yet on what the R0 for covid-19 is. Nobody knows what proportion of total cases are asymptomatic. Nobody knows the mortality rate.

And things are changing well-at-a-faster-rate. Governments are imposing distancing of various forms. First offices were shut down. Then shops were shut down. Now everything is shut down, and many of us have been asked to step out “only to get necessities”. And in such dynamic and fast-changing environments, a simplistic model such as the SIR can only take us so far, and uncertainty in estimating R0 means it can be pretty much useless as well.

In this context, I thought I’ll simulate a few real-life situations, and try to model the spread of the disease in these situations. This can give us an insight into what kind of services are more dangerous than others, and how we could potentially “get back to life” after going through an initial period of lockdown.

The basic assumption I’ve made is that the longer you spend with an infected person, the greater the chance of getting infected yourself. This is not an unreasonable assumption because the spread happens through activities such as sneezing, touching, inadvertently dropping droplets of your saliva on to the other person, and so on, each of which is more likely the longer the time you spend with someone.

Some basic modelling revealed that this can be modelled as a sort of negative exponential curve that looks like this.

p = 1 - e^{-\lambda T}

T is the number of hours you spend with the other person. \lambda is a parameter of transmission – the higher it is, the more likely the disease with transmit (holding the amount of time spent together constant).

The function looks like this: 

We have no clue what \lambda is, but I’ll make an educated guess based on some limited data I’ve seen. I’ll take a conservative estimate and say that if an uninfected person spends 24 hours with an infected person, the former has a 50% chance of getting the disease from the latter.

This gives the value of \lambda to be 0.02888 per hour. We will now use this to model various scenarios.

  1. Delivery

This is the simplest model I built. There is one shop, and N customers.  Customers come one at a time and spend a fixed amount of time (1 or 2 or 5 minutes) at the shop, which has one shopkeeper. Initially, a proportion p of the population is infected, and we assume that the shopkeeper is uninfected.

And then we model the transmission – based on our \lambda = 0.02888, for a two minute interaction, the probability of transmission is 1 - e^{-\lambda T} = 1 - e^{-\frac{0.02888 * 2}{60}} ~= 0.1%.

In hindsight, I realised that this kind of a set up better describes “delivery” than a shop. With a 0.1% probability the delivery person gets infected from an infected customer during a delivery. With the same probability an infected delivery person infects a customer. The only way the disease can spread through this “shop” is for the shopkeeper / delivery person to be uninfected.

How does it play out? I simulated 10000 paths where one guy delivers to 1000 homes (maybe over the course of a week? that doesn’t matter as long as the overall infected rate in the population otherwise is constant), and spends exactly two minutes at each delivery, which is made to a single person. Let’s take a few cases, with different base cases of incidence of the disease – 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 20% and 50%.

The number of NEW people infected in each case is graphed here (we don’t care how many got the disease otherwise. We’re modelling how many got it from our “shop”). The  right side graph excludes the case of zero new infections, just to show you the scale of the problem.

Notice this – even when 50% of the population is infected, as long as the shopkeeper or delivery person is not initially infected, the chances of additional infections through 2-minute delivery are MINUSCULE. A strong case for policy-makers to enable delivery of all kinds, essential or inessential.

2. SHOP

Now, let’s complicate matters a little bit. Instead of a delivery person going to each home, let’s assume a shop. Multiple people can be in the shop at the same time, and there can be more than one shopkeeper.

Let’s use the assumptions of standard queueing theory, and assume that the inter-arrival time for customers is guided by an Exponential distribution, and the time they spend in the shop is also guided by an Exponential distribution.

At the time when customers are in the shop, any infected customer (or shopkeeper) inside can infect any other customer or shopkeeper. So if you spend 2 minutes in a shop where there is 1 infected person, our calculation above tells us that you have a 0.1% chance of being infected yourself. If there are 10 infected people in the shop and you spend 2 minutes there, this is akin to spending 20 minutes with one infected person, and you have a 1% chance of getting infected.

Let’s consider two or three scenarios here. First is the “normal” case where one customer arrives every 5 minutes, and each customer spends 10 minutes in the shop (note that the shop can “serve” multiple customers simultaneously, so the queue doesn’t blow up here). Again let’s take a total of 1000 customers (assume a 24/7 open shop), and one shopkeeper.

 

Notice that there is significant transmission of infection here, even though we started with 5% of the population being infected. On average, another 3% of the population gets infected! Open supermarkets with usual crowd can result in significant transmission.

Does keeping the shop open with some sort of social distancing (let’s see only one-fourth as many people arrive) work? So people arrive with an average gap of 20 minutes, and still spend 10 minutes in the shop. There are still 10 shopkeepers. What does it look like when we start with 5% of the people being infected?

The graph is pretty much identical so I’m not bothering to put that here!

3. Office

This scenario simulates for N people who are working together for a certain number of hours. We assume that exactly one person is infected at the beginning of the meeting. We also assume that once a person is infected, she can start infecting others in the very next minute (with our transmission probability).

How does the infection grow in this case? This is an easier simulation than the earlier one so we can run 10000 Monte Carlo paths. Let’s say we have a “meeting” with 40 people (could just be 40 people working in a small room) which lasts 4 hours. If we start with one infected person, this is how the number of infected grows over the 4 hours.

 

 

 

The spread is massive! When you have a large bunch of people in a small closed space over a significant period of time, the infection spreads rapidly among them. Even if you take a 10 person meeting over an hour, one infected person at the start can result in an average of 0.3 other people being infected by the end of the meeting.

10 persons meeting over 8 hours (a small office) with one initially infected means 3.5 others (on average) being infected by the end of the day.

Offices are dangerous places for the infection to spread. Even after the lockdown is lifted, some sort of work from home regulations need to be in place until the infection has been fully brought under control.

4. Conferences

This is another form of “meeting”, except that at each point in time, people don’t engage with the whole room, but only a handful of others. These groups form at random, changing every minute, and infection can spread only within a particular group.

Let’s take a 100 person conference with 1 initially infected person. Let’s assume it lasts 8 hours. Depending upon how many people come together at a time, the spread of the infection rapidly changes, as can be seen in the graph below.

If people talk two at a time, there’s a 63% probability that the infection doesn’t spread at all. If they talk 5 at a time, this probability is cut by half. And if people congregate 10 at a time, there’s only a 11% chance that by the end of the day the infection HASN’T propagated!

One takeaway from this is that even once offices start functioning, they need to impose social distancing measures (until the virus has been completely wiped out). All large-ish meetings by video conference. A certain proportion of workers working from home by rotation.

And I wonder what will happen to the conferences.

I’ve put my (unedited) code here. Feel free to use and play around.

Finally, you might wonder why I’ve made so many Monte Carlo Simulations. Well, as the great Matt Levine had himself said, that’s my secret sauce!

 

Distribution of political values

Through Baal on Twitter I found this “Political Compass” survey. I took it, and it said this is my “political compass”.

Now, I’m not happy with the result. I mean, I’m okay with the average value where the red dot has been put for me, and I think that represents my political leanings rather well. However, what I’m unhappy about is that my political views have been all reduced to one single average point.

I’m pretty sure that based on all the answers I gave in the survey, my political leaning across both the two directions follows a distribution, and the red dot here is only the average (mean, I guess, but could also be median) value of that distribution.

However, there are many ways in which people can have a political view that lands right on my dot – some people might have a consistent but mild political view in favour of or against a particular position. Others might have pretty extreme views – for example, some of my answers might lead you to believe that I’m an extreme right winger, and others might make me look like a Marxist (I believe I have a pretty high variance on both axes around my average value).

So what I would have liked instead from the political compass was a sort of heat map, or at least two marginal distributions, showing how I’m distributed along the two axes, rather than all my views being reduced to one average value.

A version of this is the main argument of this book I read recently called “The End Of Average“. That when we design for “the average man” or “the average customer”, and do so across several dimensions,  we end up designing for nobody, since nobody is average when looked at on many dimensions.

Statistical analysis revisited – machine learning edition

Over ten years ago, I wrote this blog post that I had termed as a “lazy post” – it was an email that I’d written to a mailing list, which I’d then copied onto the blog. It was triggered by someone on the group making an off-hand comment of “doing regression analysis”, and I had set off on a rant about why the misuse of statistics was a massive problem.

Ten years on, I find the post to be quite relevant, except that instead of “statistics”, you just need to say “machine learning” or “data science”. So this is a truly lazy post, where I piggyback on my old post, to talk about the problems with indiscriminate use of data and models.

I had written:

there is this popular view that if there is data, then one ought to do statistical analysis, and draw conclusions from that, and make decisions based on these conclusions. unfortunately, in a large number of cases, the analysis ends up being done by someone who is not very proficient with statistics and who is basically applying formulae rather than using a concept. as long as you are using statistics as concepts, and not as formulae, I think you are fine. but you get into the “ok i see a time series here. let me put regression. never mind the significance levels or stationarity or any other such blah blah but i’ll take decisions based on my regression” then you are likely to get into trouble.

The modern version of this is – everybody wants to do “big data” and “data science”. So if there is some data out there, people will want to draw insights from it. And since it is easy to apply machine learning models (thanks to open source toolkits such as the scikit-learn package in Python), people who don’t understand the models indiscriminately apply it on the data that they have got. So you have people who don’t really understand data or machine learning working with those, and creating models that are dangerous.

As long as people have idea of the models they are using, and the assumptions behind them, and the quality of data that goes into the models, we are fine. However, we are increasingly seeing cases of people using improper or biased data and applying models they don’t understand on top of them, that will have impact that affect the wider world.

So the problem is not with “artificial intelligence” or “machine learning” or “big data” or “data science” or “statistics”. It is with the people who use them incorrectly.

 

Big Data and Fast Frugal Trees

In his excellent podcast episode with EconTalk’s Russ Roberts, psychologist Gerd Gigerenzer introduces the concept of “fast and frugal trees“. When someone needs to make decisions quickly, Gigerenzer says, they don’t take into account a large number of factors, but instead rely on a small set of thumb rules.

The podcast itself is based on Gigerenzer’s 2009 book Gut Feelings. Based on how awesome the podcast was, I read the book, but found that it didn’t offer too much more than what the podcast itself had to offer.

Coming back to fast and frugal trees..

In recent times, ever since “big data” became a “thing” in the early 2010s, it is popular for companies to tout the complexity of their decision algorithms, and machine learning systems. An easy way for companies to display this complexity is to talk about the number of variables they take into account while making a decision.

For example, you can have “fin-tech” lenders who claim to use “thousands of data points” on their prospective customers’ histories to determine whether to give out a loan. A similar number of data points is used to evaluate resumes and determine if a candidate should be called for an interview.

With cheap data storage and compute power, it has become rather fashionable to “use all the data available” and build complex machine learning models (which aren’t that complex to build) for decisions that were earlier made by humans. The problem with this is that this can sometimes result in over-fitting (system learning something that it shouldn’t be learning) which can lead to disastrous predictive power.

In his podcast, Gigerenzer talks about fast and frugal trees, and says that humans in general don’t use too many data points to make their decisions. Instead, for each decision, they build a quick “fast and frugal tree” and make their decision based on their gut feelings about a small number of data points. What data points to use is determined primarily based on their experience (not cow-like experience), and can vary by person and situation.

The advantage of fast and frugal trees is that the model is simple, and so has little scope for overfitting. Moreover, as the name describes, the decision process is rather “fast”, and you don’t have to collect all possible data points before you make a decision. The problem with productionising the fast and frugal tree, however, is that each user’s decision making process is different, and about how we can learn that decision making process to make the most optimal decisions at a personalised level.

How you can learn someone’s decision-making process (when you’ve assumed it’s a fast and frugal tree) is not trivial, but if you can figure it out, then you can build significantly superior recommender systems.

If you’re Netflix, for example, you might figure that someone makes their movie choices based only on age of movie and its IMDB score. So their screen is customised to show just these two parameters. Someone else might be making their decisions based on who the lead actors are, and they need to be shown that information along with the recommendations.

Another book I read recently was Todd Rose’s The End of Average. The book makes the powerful point that nobody really is average, especially when you’re looking a large number of dimensions, so designing for average means you’re designing for nobody.

I imagine that is one reason why a lot of recommender systems (Netflix or Amazon or Tinder) fail is that they model for the average, building one massive machine learning system, rather than learning each person’s fast and frugal tree.

The latter isn’t easy, but if it can be done, it can result in a significantly superior user experience!

Liverpool FC: Mid Season Review

After 20 games played, Liverpool are sitting pretty on top of the Premier League with 58 points (out of a possible 60). The only jitter in the campaign so far came in a draw away at Manchester United.

I made what I think is a cool graph to put this performance in perspective. I looked at Liverpool’s points tally at the end of the first 19 match days through the length of the Premier League, and looked at “progress” (the data for last night’s win against Sheffield isn’t yet up on my dataset, which also doesn’t include data for the 1992-93 season, so those are left out).

Given the strength of this season’s performance, I don’t think there’s that much information in the graph, but here it goes in any case:

I’ve coloured all the seasons where Liverpool were the title contenders. A few things stand out:

  1. This season, while great, isn’t that much better than the last one. Last season, Liverpool had three draws in the first half of the league (Man City at home, Chelsea away and Arsenal away). It was the first month of the second half where the campaign faltered (starting with the loss to Man City).
  2. This possibly went under the radar, but Liverpool had a fantastic start to the 2016-17 season as well, with 43 points at the halfway stage. To put that in perspective, this was one more than the points total at that stage in the title-chasing 2008-9 season.
  3. Liverpool went close in 2013-14, but in terms of points, the halfway performance wasn’t anything to write home about. That was also back in the time when teams didn’t dominate like nowadays, and eighty odd points was enough to win the league.

This is what Liverpool’s full season looked like (note that I’ve used a different kind of graph here. Not sure which one is better).

 

Finally, what’s the relationship between points at the end of the first half of the season (19 games) and the full season? Let’s run a regression across all teams, across all 38 game EPL seasons.

The regression doesn’t turn out to be THAT significant, with an R Squared of 41%. In other words, a team’s points tally at the halfway point in the season explains less than 50% of the variation in the points tally that the team will get in the second half of the season.

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.42967    0.97671   9.655   <2e-16 ***
Midway       0.64126    0.03549  18.070   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.992 on 478 degrees of freedom
  (20 observations deleted due to missingness)
Multiple R-squared:  0.4059,    Adjusted R-squared:  0.4046 
F-statistic: 326.5 on 1 and 478 DF,  p-value: < 2.2e-16

The interesting thing is that the coefficient of the midway score is less than 1, which implies that teams’ performances at the end of the season (literally) regress to the mean.

55 points at the end of the first 19 games is projected to translate to 100 at the end of the season. In fact, based on this regression model run on the first 19 games of the season, Liverpool should win the title by a canter.

PS: Look at the bottom of this projections table. It seems like for the first time in a very long time, the “magical” 40 points might be necessary to stave off relegation. Then again, it’s regression (pun intended).

This year on Spotify

I’m rather disappointed with my end-of-year Spotify report this year. I mean, I know it’s automated analytics, and no human has really verified it, etc.  but there are some basics that the algorithm failed to cover.

The first few slides of my “annual report” told me that my listening changed by seasons. That in January to March, my favourite artists were Black Sabbath and Pink Floyd, and from April to June they were Becky Hill and Meduza. And that from July onwards it was Sigala.

Now, there was a life-changing event that happened in late March which Spotify knows about, but failed to acknowledge in the report – I moved from the UK to India. And in India, Spotify’s inventory is far smaller than it is in the UK. So some of the bands I used to listen to heavily in the UK, like Black Sabbath, went off my playlist in India. My daughter’s lullaby playlist, which is the most consumed music for me, moved from Spotify to Amazon Music (and more recently to Apple Music).

The other thing with my Spotify use-case is that it’s not just me who listens to it. I share the account with my wife and daughter, and while I know that Spotify has an algorithm for filtering out kid stuff, I’m surprised it didn’t figure out that two people are sharing this account (and pitched us a family subscription).

According to the report, these are the most listened to genres in 2019:

Now there are two clear classes of genres here. I’m surprised that Spotify failed to pick it out. Moreover, the devices associated with my account that play Rock or Power Metal are disjoint from the devices that play Pop, EDM or House. It’s almost like Spotify didn’t want to admit that people share accounts.

Then some three slides on my podcast listening for the year, when I’ve overall listened to five hours of podcasts using Spotify. If I, a human, were building this report, I would have dropped this section citing insufficient data, rather than wasting three slides with analytics that simply don’t make sense.

I see the importance of this segment in Spotify’s report, since they want to focus more on podcasts (being an “audio company” rather than a “music company”), but maybe something in the report to encourage me to use Spotify for more podcasts (maybe recommending Spotify’s exclusive podcasts that I might like, be it based on limited data?) might have helped.

Finally, take a look at my our most played songs in 2019.

It looks like my daughter’s sleeping playlist threaded with my wife’s favourite songs (after a point the latter dominate). “My songs” are nowhere to be found – I have to go all the way down to number 23 to find Judas Priest’s cover of Diamonds and Rust. I mean I know I’ve been diversifying the kind of music that I listen to, while my wife listens to pretty much the same stuff over and over again!

In any case, automated analytics is all fine, but there are some not-so-edge cases where the reports that it generates is obviously bad. Hopefully the people at Spotify will figure this out and use more intelligence in producing next year’s report!

Spurs right to sack Pochettino?

A few months back, I built my “football club elo by manager” visualisation. Essentially, we take the week-by-week Premier League Elo ratings from ClubElo and overlay it with managerial tenures.

A clear pattern emerges – a lot of Premier League sackings have been consistent with clubs going down significantly in terms of Elo Ratings. For example, we have seen that Liverpool sacked Rafa Benitez, Kenny Dalglish (in 2012) and Brendan Rodgers all at the right time, and that similarly Manchester United sacked Jose Mourinho when he brought them back to below where he started.

And now the news comes in that Spurs have joined the party, sacking long-time coach Mauricio Pochettino. What I find interesting is the timing of the sacking – while international breaks are usually a popular time to change managers (the two week gap in fixtures gives a club some time to adjust), most sackings happen in the first week of the international break.

The Pochettino sacking is surprising in that it has come towards the end of the international break, giving the club four days before their next fixture (a derby at the struggling West Ham). However, the Guardian reports that Spurs are close to hiring Jose Mourinho, and that might explain the timing of the sacking.

So were Spurs right in sacking Pochettino, barely six months after he took them to a Champions League final? Let’s look at the Spurs story under Pochettino using Elo ratings. 

 

 

 

 

Pochettino took over in 2014 after an underwhelming 2013-14 when the club struggled under Andre Villas Boas and then Tim Sherwood. Initially, results weren’t too promising, as he took them from a 1800 rating down to 1700.

However, chairman Daniel Levy’s patience paid off, and the club mounted a serious challenge to Leicester in the 2015-16 season before falling away towards the end of the season, finishing third behind Arsenal. As the Elo shows, the improvement continued, as the club remained in Champions League places through the course of Pochettino’s reign.

Personally, the “highlight” of Pochettino’s reign was Spurs’ 4-1 demolition of Liverpool at Wembley in October 2017, a game I happened to watch at the stadium. And as per the Elo ratings the club plateaued shortly after that.

If that plateau had continued,  I suppose Pochettino would have remained in his job, giving the team regular Champions League football. This season, however, has been a disaster.

Spurs are 13 points below what they had scored in comparable fixtures last season, and unlikely to finish in the top six even. Their Elo has also dropped below 1850 for the first time since 2016-17. While that is still higher than where Pochettino started off at, the precipitous drop in recent times has meant that the club has possibly taken the right call in sacking Pochettino.

If Mourinho does replace him (it looks likely, as per the Guardian), it will present a personal problem for me – for over a decade now, Tottenham have been my “second team” in the top half of the Premier League, behind Liverpool. That cannot continue if Mourinho takes over. I’m wondering who to shift my allegiance to – it will have to be either Leicester or (horror of horrors) Chelsea!

Alchemy

Over the last 4-5 days I kinda immersed myself in finishing Rory Sutherland’s excellent book Alchemy.

It all started with a podcast, with Sutherland being the guest on Russ Roberts’ EconTalk last week. I’d barely listened to half the podcast when I knew that I wanted more of Sutherland, and so immediately bought the book on Kindle. The same evening, I finished my previous book and started reading this.

Sometimes I get a bit concerned that I’m agreeing with an author too much. What made this book “interesting” is that Sutherland is an ad-man and a marketer, and keeps talking down on data and economics, and plays up intuition and “feeling”. In other words, at least as far as professional career and leanings go, he is possibly as far from me as it gets. Yet, I found myself silently nodding in agreement as I went through the book.

If I have to summarise the book in one line I would say, “most decisions are made intuitively or based on feeling. Data and logic are mainly used to rationalise decisions rather than making them”.

And if you think about it, it’s mostly true. For example, you don’t use physics to calculate how much to press down on your car accelerator while driving – you do it essentially by trial and error and using your intuition to gauge the feedback. Similarly, a ball player doesn’t need to know any kinematics or projectile motion to know how to throw or hit or catch a ball.

The other thing that Sutherland repeatedly alludes to is that we tend to try and optimise things that are easy to measure or optimise. Financials are a good example of that. This decade, with the “big data revolution” being followed by the rise of “data science”, the amount of data available to make decisions has been endless, meaning that more and more decisions are being made using data.

The trouble, of course, is availability bias, or what I call as the “keys-under-lamppost bias”. We tend to optimise and make decisions on things that are easily measurable (this set of course is now much larger than it was a decade ago), and now that we know we are making use of more objective stuff, we have irrational confidence in our decisions.

Sutherland talks about barbell strategies, ergodicity, why big data leads to bullshit, why it is important to look for solutions beyond the scope of the immediate domain and the Dunning-Kruger effect. He makes statements such as “I would rather run a business with no mathematicians than with second-rate mathematicians“, which exactly mirrors my opinion of the “data science industry”.

There is absolutely no doubt why I liked the book.

Thinking again, while I said that professionally Sutherland seems as far from me as possible, it’s possibly not so true. While I do use a fair bit of data and economic analysis as part of my consulting work, I find that I make most of my decisions finally on intuition. Data is there to guide me, but the decision-making is always an intuitive process.

In late 2017, when I briefly worked in an ill-fated job in “data science”, I’d made a document about the benefits of combining data analysis with human insight. And if I think about my work, my least favourite work is where I’ve done work with data to help clients make “logical decision” (as Sutherland puts it).

The work I’ve enjoyed the most has been where I’ve used the data and presented it in ways in which my clients and I have noticed patterns, rationalised them and then taken a (intuitive) leap of faith into what the right course of action may be.

And this also means that over time I’ve been moving away from work that involves building models (the output is too “precise” to interest me), and take on more “strategic” stuff where there is a fair amount of intuition riding on top of the data.

Back to the book, I’m so impressed with it that in case I was still living in London, I would have pestered Sutherland to meet me, and then tried to convince him to let me work for him. Even if at the top level it seems like his work and mine are diametrically opposite..

I leave you with my highlights and notes from the book, and this tweet.

Here’s my book, in case you are interested.

 

EPL: Mid-Season Review

Going into the November international break, Liverpool are eight points ahead at the top of the Premier League. Defending champions Manchester City have slipped to fourth place following their loss to Liverpool. The question most commentators are asking is if Liverpool can hold on to this lead.

We are two-thirds of the way through the first round robin of the premier league. The thing with evaluating league standings midway through the round robin is that it doesn’t account for the fixture list. For example, Liverpool have finished playing the rest of the “big six” (or seven, if you include Leicester), but Manchester City have many games to go among the top teams.

So my practice over the years has been to compare team performance to corresponding fixtures in the previous season, and to look at the points difference. Then, assuming the rest of the season goes just like last year, we can project who is likely to end up where.

Now, relegation and promotion introduces a source of complication, but we can “solve” that by replacing last season’s relegated teams with this season’s promoted teams (18th by Championship winners, 19th by Championship runners-up, and 20th by Championship playoff winners).

It’s not the first time I’m doing this analysis. I’d done it once in 2013-14, and once in 2014-15. You will notice that the graphs look similar as well – that’s how lazy I am.

Anyways, this is the points differential thus far compared to corresponding fixtures of last season. 

 

 

 

Leicester are the most improved team from last season, having scored 8 points more than in corresponding fixtures from last season. Sheffield United, albeit starting from a low base, have done extremely well as well. And last season’s runners-up Liverpool are on a plus 6.

The team that has done worst relative to last season is Tottenham Hotspur, at minus 13. Key players entering the final years of their contract and not signing extensions, and scanty recruitment over the last 2-3 years, haven’t helped. And then there is Manchester City at minus 9!

So assuming the rest of the season’s fixtures go according to last season’s corresponding fixtures, what will the final table look  like at the end of the season?
We see that if Liverpool replicate their results from last season for the rest of the fixtures, they should win the league comfortably.

What is more interesting is the gaps between 1-2, 2-3 and 3-4. Each of the top three positions is likely to be decided “comfortably”, with a fairly congested mid-table.

As mentioned earlier, this kind of analysis is unfair to the promoted teams. It is highly unlikely that Sheffield will get relegated based on the start they’ve had.

We’ll repeat this analysis after a couple of months to see where the league stands!