innumerate – Pertinent Observations

I seem to be becoming a sort of “testing expert”, though the so-called “testing mafia” (ok I only called them that) may disagree. Nothing external happened since the last time I wrote about this topic, but here is more “expertise” from my end.

As some of you might be aware, I’ve now created a script that does the daily updates that I’ve been doing on Twitter for the last few weeks. After I went off twitter last week, I tried for a couple of days to get friends to tweet my graphs. That wasn’t efficient. And I’m not yet over the twitter addiction enough to log in to twitter every day to post my daily updates.

So I’ve done what anyone who has a degree in computer science, and who has a reasonable degree of self-respect, should do – I now have this script (that runs on my server) that generates the graph and some mildly “intelligent” commentary and puts it out at 8am everyday. Today’s update looked like this:

This will go out as an automated update henceforth.
Overall, cases in India are now doubling every 9.6 days pic.twitter.com/Qkd0t1u83k

— Karthik (@karthiks) April 24, 2020

Sometimes I make the mistake of going to twitter and looking at the replies to these automated tweets (that can be done without logging in). Most replies seem to be from the testing mafia. “All this is fine but we’re not testing enough so can’t trust the data”, they say. And then someone goes off on “tests per million” as if that is some gold standard.

As I discussed in my last post on this topic, random testing is NOT a good thing here. There are several ethical issues with that. The error rates with the testing means that there is a high chance of false positives, and also false negatives. So random testing can both “unleash” infected people, and unnecessarily clog hospital capacity with uninfected.

So if random testing is not a good metric on how adequately we are testing, what is? One idea comes from this Yahoo report on covid management in Vietnam.

According to data published by Vietnam’s health ministry on Wednesday, Vietnam has carried out 180,067 tests and detected just 268 cases, 83% of whom it says have recovered. There have been no reported deaths.

The figures are equivalent to nearly 672 tests for every one detected case, according to the Our World in Data website. The next highest, Taiwan, has conducted 132.1 tests for every case, the data showed

Total tests per positive case. Now, that’s an interesting metric. The basic idea is that if most of the people we are testing show positive, then we simply aren’t testing enough. However, if we are testing a lot of people for every positive case, then it means that we are also testing a large number of marginal cases (there is one caveat I’ll come to).

Also, tests per positive case also takes the “base rate” into effect. If a region has been affected massively, then the base rate itself will be high, and the region needs to test more. A less affected region needs less testing (remember we only test those with a high base rate). And it is likely that in a region with a higher base rate, more positive cases are found (this is a deadly disease. So anyone with more than a mild occurrence of the disease is bound to get themselves tested).

The only caveat here is that the tests need to be “of high quality”, i.e. they should be done on people with high base rates of having the disease. Any measure that becomes a metric is bound to be gamed, so if tests per positive case becomes a metric, it is easy for a region to game that by testing random people (rather than those with high base rates). For now, let’s assume that nobody has made this a “measure” yet, so there isn’t that much gaming yet.

So how is India faring? Based on data from covid19india.org, until yesterday India had done (as of yesterday, 23rd April) about 520,000 tests, of which about 23,000 people have tested positive. In other words, India has tested 23 people for every positive test. Compared to Vietnam (or even Taiwan) that’s a really low number.

However, different states are testing to different extents by this metric. Again using data from covid19india.org, I created this chart that shows the cumulative “tests per positive case” in each state in India. I drew each state in a separate graph, with different scales, because they were simply not comparable.

Notice that Maharashtra, our worst affected state is only testing 14 people for every positive case, and this number is going down over time. Testing capacity in that state (which has, on an absolute number, done the maximum number of tests) is sorely stretched, and it is imperative that testing be scaled up massively there. It seems highly likely that testing has been backlogged there with not enough capacity to test the high base rate cases. Gujarat and Delhi, other badly affected states, are also in similar boats, testing only 16 and 13 people (respectively) for every infected person.

At the other end, Orissa is doing well, testing 230 people for every positive case (this number is rising). Karnataka is not bad either, with about 70 tests per case (again increasing. The state massively stepped up on testing last Thursday). Andhra Pradesh is doing nearly 60. Haryana is doing 65.

Now I’m waiting for the usual suspects to reply to this (either on twitter, or as a comment on my blog) saying this doesn’t matter we are “not doing enough tests per million”.

I wonder why some people are proud to show off their innumeracy (OK fine, I understand that it’s a bit harsh to describe someone who doesn’t understand Bayes’s Theorem as “innumerate”).

So DNA put out a news report proclaiming “Air India, IndiGo flyers worst hit by flight delays in January: DGCA“. The way the headline has been written, it appears as if Air India and Indigo are equally bad in terms of delayed flights. And an innumerate reader or journalist would actually believe that number, since the article states that 96,000 people were inconvenienced by Air India’s delays, and 75,000 odd by Indigo’s delays – both are of the same order of magnitude.

However, by comparing raw numbers thus, an important point that this news report misses out is that Indigo flies twice as many passengers as Air India. For the same period as the above data (January 2015), DGCA data (it’s all in this one big clunky PDF) shows that while about 11.65 lakh passengers flew Air India, about 22.76 lakh passengers flew Indigo – almost twice the number. So on a percentage basis, Indigo is only half as bad as Air India.

The graph above shows the number of passengers delayed as a proportion of the number of passengers flown, and this indicates that Indigo is in clear second place as an offender (joined by tiny AirAsia). Yet, to bracket it with Air India (by not taking proportions) indicates sheer innumeracy on the part of the journalist (unnamed in the article)!

I’m not surprised by the numbers, though. The thing with Indigo (and AirAsia) is that the business model depends upon quick turnaround of planes, and thus there is little slack between flights. In winters, morning flights (especially from North India) get delayed because of fog and the lack of slack means the delays cascade leading to massive delays. Hence there is good reason to not fly Indigo in winter (and for Indigo to build slack into its winter schedules). Interestingly, the passenger load factor (number of passengers carried as a function of capacity) for Indigo is 85%, which is interestingly lower than Jet Airways (a so-called “full service carrier”)’ s 87%. And newly launched full service Vistara operated at only 45% in January!

We are in for interesting times in the Indian aviation industry.

Tag: innumerate

Tests per positive case

Airline delays in India