More on Covid-19 prevalence in Karnataka

As the old song went, “when the giver gives, he tears the roof and gives”.

Last week the Government of Karnataka released its report on the covid-19 serosurvey done in the state. You might recall that it had concluded that the number of cases had been undercounted by a factor of 40, but then some things were suspect in terms of the sampling and the weighting.

This week comes another sero-survey, this time a preprint of a paper that has been submitted to a peer reviewed journal. This survey was conducted by the IDFC Institute, a think tank, and involves academics from the University of Chicago and Duke University, and relies on the extensive sampling network of CMIE.

At the broad level, this survey confirms the results of the other survey – it concludes that “Overall seroprevalence in the state implies that by August at least 31.5 million residents had been infected by August”. This is much higher than the overall conclusions of the state-sponsored survey, which had concluded that “about 19 million residents had been infected by mid-September”.

I like seeing two independent assessments of the same quantity. While each may have its own sources of error, and may not independently offer much information, comparing them can offer some really valuable insights. So what do we have here?

The IDFC-Duke-Chicago survey took place between June and August, and concluded that 31.5 million residents of Karnataka (out of a total population of about 70 million) have been infected by covid-19. The state survey in September had suggested 19 million residents had been infected by September.

Clearly, since these surveys measure the number of people “who have ever been affected”, both of them cannot be correct. If 31 million people had been affected by end August, clearly many more than 19 million should have been infected by mid-September. And vice versa. So, as Ravi Shastri would put it, “something’s got to give”. What gives?

Remember that I had thought the state survey numbers might have been an overestimate thanks to inappropriate sampling (“low risk” not being low risk enough, and not weighting samples)? If 20 million by mid-September was an overestimate, what do you say about 31 million by end August? Surely an overestimate? And that is not all.

If you go through the IDFC-Duke-Chicago paper, there are a few figures and tables that don’t make sense at all. For starters, check out this graph, that for different regions in the state, shows the “median date of sampling” and the estimates on the proportion of the population that had antibodies for covid-19.

Check out the red line on the right. The sampling for the urban areas for the Bangalore region was completed by 24th June. And the survey found that more than 50% of respondents in this region had covid-19 antibodies. On 24th June.

Let’s put that in context. As of 24th June, Bangalore Urban had 1700 confirmed cases. The city’s population is north of 10 million. I understand that 24th June was the “median date” of the survey in Bangalore city. Even if the survey took two weeks after that, as of 8th of July, Bangalore Urban had 12500 confirmed cases.

The state survey had estimated that known cases were 1 in 40. 12500 confirmed cases suggests about 500,000 actual cases. That’s 5% of Bangalore’s population, not 50% as the survey claimed. Something is really really off. Even if we use the IDFC-Duke-Chicago paper’s estimates that only 1 in 100 cases were reported / known, then 12500 known cases by 8th July translates to 1.25 million actual cases, or 12.5% of the city’s population (well below 50% ).

My biggest discomfort with the IDFC-Duke-Chicago effort is that it attempts to sample a rather rapidly changing variable over a long period of time. The survey went on from June 15th to August 29th. By June 15th, Karnataka had 7200 known cases (and 87 deaths). By August 29th the state had 327,000 known cases and 5500 deaths. I really don’t understand how the academics who ran the study could reconcile their data from the third week of June to the data from the third week of August, when the nature of the pandemic in the state was very very different.

And now, having looked at this paper, I’m more confident of the state survey’s estimations. Yes, it might have sampling issues, but compared to the IDFC-Duke-Chicago paper, the numbers make so much more sense. So yeah, maybe the factor of underestimation of Covid-19 cases in Karnataka is 40.

Putting all this together, I don’t understand one thing. What these surveys have shown is that

  1. More than half of Bangalore has already been infected by covid-19
  2. The true infection fatality rate is somewhere around 0.05% (or lower).

So why do we still have a (partial) lockdown?

PS: The other day on WhatsApp I saw this video of an extremely congested Chickpet area on the last weekend before Diwali. My initial reaction was “these people have lost their minds. Why are they all in such a crowded place?”. Now, after thinking about the surveys, my reaction is “most of these people have most definitely already got covid and recovered. So it’s not THAT crazy”.

Communicating binary forecasts

One silver lining in the madness of the US Presidential election counting is that there are some interesting analyses floating around regarding polling and surveying and probabilities and visualisation. Take this post from Andrew Gelman’s blog, for example:

Suppose our forecast in a certain state is that candidate X will win 0.52 of the two-party vote, with a forecast standard deviation of 0.02. Suppose also that the forecast has a normal distribution.[…]

Then your 68% predictive interval for the candidate’s vote share is [0.50, 0.54], and your 95% interval is [0.48, 0.56].

Now suppose the candidate gets exactly half of the vote. Or you could say 0.499, the point being that he lost the election in that state.

This outcome falls on the boundary of the 68% interval, it’s one standard deviation away from the forecast. In no sense would this be called a prediction error or a forecast failure.

But now let’s say it another way. The forecast gave the candidate an 84% chance of winning! And then he lost. That’s pretty damn humiliating. The forecast failed.

It took me a while to appreciate this. In a binary outcome, if your model says predicts 52%, with a standard deviation of 2%, you are in effect predicting a “win” (50% or higher) with a probability of 84%! Somehow I had never thought about it that way.

In any case, this tells you how tricky forecasting a binary outcome is. You might think (based on your sample size) that a 2% standard deviation is reasonable. Except that when the mean of your forecast is close to the barrier (50% in this case), the “reasonable standard deviation” lends a much stronger meaning to your forecast.

Gelman goes on:

That’s right. A forecast of 0.52 +/- 0.02 gives you an 84% chance of winning.

We want to increase the sd in the above expression so as to send the win probability down to 60%. How much do we need to increase it? Maybe send it from 0.02 to 0.03?

> pnorm(0.52, 0.50, 0.03)
[1] 0.75

Uh, no, that wasn’t enough! 0.04?

> pnorm(0.52, 0.50, 0.04)
[1] 0.69

0.05 won’t do it either. We actually have to go all the way up to . . . 0.08:

> pnorm(0.52, 0.50, 0.08)
[1] 0.60

That’s right. If your best guess is that candidate X will receive 0.52 of the vote, and you want your forecast to give him a 60% chance of winning the election, you’ll have to ramp up the sd to 0.08, so that your 95% forecast interval is a ridiculously wide 0.52 +/- 2*0.08, or [0.36, 0.68].

Who said forecasting an election is easy?

 

How do bored investors invest?

Earlier this year, the inimitable Matt Levine (currently on paternity leave) came up with the “boredom markets hypothesis” ($, Bloomberg).

If you like eating at restaurants or bowling or going to movies or going out dancing, now you can’t. If you like watching sports, there are no sports. If you like casinos, they are closed. You’re pretty much stuck inside with your phone. You can trade stocks for free on your phone. That might be fun? It isn’t that fun, compared to either (1) what you’d normally do for fun or (2) trading stocks not in the middle of a recessionary crisis, but those are not the available competition. The available competition is “Animal Crossing” and “Tiger King.” Is trading stocks on your phone more fun than playing “Animal Crossing” or watching “Tiger King”?

The idea was that with the coming of the pandemic, there was a stock market crash and that “normal forms of entertainment” were shut, so people took to trading stocks for fun. Discount brokers such as Robinhood or Zerodha allowed investors to trade in a cheap and easy way.

In any case, until August, a website called RobinTrack used to track the number of account holders on Robinhood who were invested in each stock (or ETF or Index). The service was shut down in August after Robinhood shut down access to the data that Robintrack was accessing.

In any case, the Robintrack archives exist, and just for fun, I decided to download all the data the other day and “do some data mining”. More specifically I thought I should explore the “boredom market hypothesis” using Robintrack data, and see what stocks investors were investing in, and how its price moved before and after they bought it.

Now, I’m pretty certain that someone else has done this exact analysis. In fact, in the brief period when I did consider doing a PhD (2002-4), the one part I didn’t like at all was “literature survey”. And since this blog post is not an academic exercise, I’m not going to attempt doing a literature survey here. Anyways.

First up, I thought I will look at what the “most popular stocks” are. By most popular, I mean the stocks held by most investors on Robinhood. I naively thought it might be something like Amazon or Facebook or Tesla. I even considered SPY (the S&P 500 ETF) or QQQ (the Nasdaq ETF). It was none of those.

The most popular stock on Robinhood turned out to be “ACB” (Aurora Cannabis). It was followed b y Ford and GE. Apple came in fourth place, followed by American Airlines (!!) and Microsoft. Again, note that we only have data on the number of Robinhood accounts owning each stock, and don’t know how many stocks they really owned.

In any case, I thought I should also look at how this number changed over time for the top 20 such stocks, and also look at how the stocks did at the same time. This graph is the result. Both the red and blue lines are scaled. Red lines show how many investors held the stock. Blue line shows the closing stock price on each day. 

The patterns are rather interesting. For stocks like Tesla, for example, yoou find a very strong correlation between the stock price and number of investors on Robinhood holding it. In other words, the hypothesis that the run up in the Tesla stock price this year was a “retail rally” makes sense. We can possibly say the same thing about some of the other tech stocks such as Apple, Microsoft or even Amazon.

Not all stocks show this behaviour, though. Aurora Cannabis, for example, we find that the lower the stock price went, the more the investors who invested. And then the company announced quarterly results in May, and the stock rallied. And the Robinhood investors seem to have cashed out en masse! It seems bizarre. I’m sure if you look carefully at each graph in the above set of graphs, you can tell a nice interesting story.

Not satisfied with looking at which stocks most investors were invested in this year, I wanted to look at which the “true boredom” stocks are. For this purpose, I looked at the average number of people who held the stock in January and February, and the maximum number of of people who held the stock March onwards. The ratio of the latter to the former told me “by how many times the interest in a stock rose”. To avoid obscure names, I only considered stocks held by at least 1000 people (on average) in Jan-Feb.

Unsurprisingly, Hertz, which declared bankruptcy in the course of the pandemic, topped here. The number of people holding the stock increased by a factor of 150 during the lockdown.

And if you  go through the list you will see companies that have been significantly adversely affected by the pandemic – cruise companies (Royal Caribbean and Carnival), airlines (United, American, Delta), resorts and entertainment (MGM Resorts, Dave & Buster’s). And then in July, you see a sudden jump in interest in AstraZeneca after the company announced successful (initial rounds of) trials of its Covid vaccine being developed with Oxford University.

And apart from a few companies where retail interest has largely coincided with increasing share price, we see that retail investors are sort of contrarians – picking up bets in companies with falling stock prices. There is a pretty consistent pattern there.

Maybe “boredom investing” is all about optionality? When you are buying a stock at a very low price, you are essentially buying a “real option” (recall that fundamentally, equity is a call option on the assets of a company, with the strike price at the amount of debt the company has).

So when the stock price goes really low, retail investors think that there isn’t much to lose (after all a stock price is floored at zero), and that there is money to be made in case the company rallies. It’s as if they are discounting the money they are actually putting in, and any returns they get out of this is a bonus.

I think that is a fair way to think about investing when you are using it as a cure for boredom. Do you?

Covid-19 Prevalence in Karnataka

Finally, many months after other Indian states had conducted a similar exercise, Karnataka released the results of its first “covid-19 sero survey” earlier this week. The headline number being put out is that about 27% of the state has already suffered from the infection, and has antibodies to show for it. From the press release:

Out of 7.07 crore estimated populationin Karnataka, the study estimates that 1.93 crore (27.3%) of the people are either currently infected or already had the infection in the past, as of 16 September 2020.

To put that number in context, as of 16th September, there were a total of 485,000 confirmed cases in Karnataka (official statistics via covid19india.org), and 7536 people had died of the disease in the state.

It had long been estimated that official numbers of covid-19 cases are off by a factor of 10 or 20 – that the actual number of people who have got the disease is actually 10 to 20 times the official number. The serosurvey, assuming it has been done properly, suggests that the factor (as of September) is 40!

If the ratio has continued to hold (and the survey accurate), nearly one in two people in Karnataka have already got the disease! (as of today, there are 839,000 known cases in Karnataka)

Of course, there are regional variations, though I should mention that the smaller the region you take, the less accurate the survey will be (smaller sample size and all that). In Bangalore Urban, for example, the survey estimates that 30% of the population had been infected by mid-September. If the ratio holds, we see that nearly 60% of the population in the city has already got the disease.

The official statistics (separate from the survey) also suggest that the disease has peaked in Karnataka. In fact, it seems to have peaked right around the time the survey was being conducted, in September. In September, it was common to see 7000-1000 new cases confirmed in Karnataka each day. That number has come down to about 3000 per day now.

Now, there are a few questions we need to answer. Firstly – is this factor of 40 (actual cases to known cases) feasible? Based on this data point, it makes sense:

In May, when Karnataka had a very small number of “native cases” and was aggressively testing everyone who had returned to the state from elsewhere, a staggering 93% of currently active cases were asymptomatic. In other words, only 1 in 14 people who was affected was showing any sign of symptoms.

Then, as I might have remarked on Twitter a few times, compulsory quarantining or hospitalisation (which was in force until July IIRC) has been a strong disincentive to people from seeking medical help or getting tested. This has meant that people get themselves tested only when the symptoms are really clear, or when they need attention. The downside of this, of course, has been that many people have got themselves tested too late for help. One statistic I remember is that about 33% of people who died of covid-19 in hospitals died within 24 hours of hospitalisation.

So if only one in 14 show any symptoms, and only those with relatively serious symptoms (or with close relatives who have serious symptoms) get themselves tested, this undercount by a factor of 40 can make sense.

Then – does the survey makes sense? Is 15000 samples big enough for a state of 70 million? For starters, the population of the state doesn’t matter. Rudimentary statistics (I always go to this presentation by Rajeeva Karandikar of CMI)  tells us that the size of the population doesn’t matter. As long as the sample has been chosen randomly, all that matters for the accuracy of the survey is the size of the sample. And for a binary decision (infected / not), 15000 is good enough as long as the sample has been random.

And that is where the survey raises questions – the survey has used an equal number of low risk, high risk and medium risk samples. “High risk” have been defined as people with comorbidities. Moderate risk are people who interact a lot with a lot of people (shopkeepers, healthcare workers, etc.). Both seem fine. It’s the “low risk” that seems suspect, where they have included pregnant women and attendants of outpatient patients in hospitals.

I have a few concerns – are the “low risk” low risk enough? Doesn’t the fact that you have accompanied someone to hospital, or  gone to hospital yourself (because you are pregnant), make you higher than average risk? And then – there are an equal number of low risk, medium risk and high risk people in the sample and there doesn’t seem to be any re-weighting. This suggests to me that the medium and high risk people have been overrepresented in the sample.

Finally, the press release says:

We excluded those already diagnosed with SARS-CoV2 infection, unwilling to provide a sample for the test, or did not agree to provide informed consent

I wonder if this sort of exclusion doesn’t result in a bias in itself.

Putting all this together – that there are qual samples of low, medium and high risk, that the “low risk” sample itself contains people of higher than normal risk, and that people who have refused to participate in the survey have been excluded – I sense that the total prevalence of covid-19 in Karnataka is likely to be overstated. By what factor, it is impossible to say. Maybe our original guess that the incidence of the disease is about 20 times the number of known cases is still valid? We will never know.

Nevertheless, we can be confident that a large section of the state (may not be 50%, but maybe 40%?) has already been infected with covid-19 and unless the ongoing festive season plays havoc, the number of cases is likely to continue dipping.

However, this is no reason to be complacent. I think Nitin Pai is  bang on here.

And I know a lot of people who have been aggressively social distancing (not even meeting people who have domestic help coming home, etc.). It is important that when they do relax, they do so in a graded manner.

Wear masks. Avoid crowded closed places. If you are going to get covid-19 anyway (and many of us have already got it, whether we know it or not), it is significantly better for you that you get a small viral load of it.

I wish I’d seen this 12 years back

Have I told you that my wife regularly puts out a lot of great content on relationships? She has a relationship blog. A newsletter on relationship markets (brownie points for guessing the funda of the name). A personal youtube channel. A “professional” youtube channel.

There’s a lot of amazing content she puts out through all these channels, but I must say that I got especially blown away by this one video that she has put up recently. It is a conversation with Urvashi Goverdhan, an actor and model, about “how to get the first date”.

So you might be single and wondering how you can chat up someone of the opposite sex (or in the interest of diversity, should I say “gender that you want to partner with”?). And if you are like what I was until 2009, you have no clue how to do it.

Most of the time you play over conservative and miss out on opportunities. Sometimes you might decide to go aggressive, but do it all wrong (as I kept doing through most of the 2000s), and count yourself lucky that your “target” decides that physically harming you or shaming you is not worth their time.

If you are like that (and I was like that till at least August 2009 – if you’re not convinced go read my blog archives. Everything about my life from 2004 onwards is well documented here), then I strongly urge you to listen to this conversation, as these two wonderful ladies talk about what women look out for in men, and what men need to do to get women’s attention in a nice manner.

Watching this now, I so strongly wish that I had seen this 11-15 years back. I would have been able to make very good use of it back then. Then again, if I had seen this video and been able to make good use of it back in the day, then there is a strong likelihood that I may not have met the person who is now my wife at all (some of you might know – we met through this blog, and then Orkut, and then chatted for long enough that when we met, it was a “qualified lead” that I was able to convert).

It is a long video, but completely worth your time. So go ahead and watch it in full. Oh, and you should subscribe to the Marriage Broker Auntie Youtube channel as well, if you haven’t already.

Still not convinced that you should watch the video pasted above? Here are some pointers I gathered, all from the first 10 minutes of the video:

  1. If a boy is in a group that already has girls, then there is a higher chance of other girls wanting to talk to him
  2. Pick up lines don’t work. At all.
  3. When you repeatedly make eye contact with someone, smile. Don’t stare.

Okay, now go off and watch the video!

Election Counting Day

At the outset I must say that I’m deeply disappointed (based on the sources I’ve seen, mostly based on googling) with the reporting around the US presidential elections.

For example, if I google, I get something like “Biden leads Trump 225-213”. At the outset, that seems like useful information. However, the “massive discretisation” of the US electorate means that it actually isn’t. Let me explain.

Unlike India, where each of the 543 constituencies have a separate election, and the result of one doesn’t influence another, the US presidential election is at the state level. In all but a couple of small states, the party that gets most votes in the state gets all the votes of that state. So something like California is worth 55 votes. Florida is  worth 29 votes. And so on.

And some of these states are “highly red/blue” states, which means that they are extremely likely to vote for one of the two parties. For example, a victory is guaranteed for the Democrats in California and New York, states they had won comprehensively in the 2016 election (their dominance is so massive in these states that once a friend who used to live in New York had told me that he “doesn’t know any Republican voters”).

Just stating Biden 225 – Trump 213 obscures all this information. For example, if Biden’s 225 excludes California, the election is as good as over since he is certain to win the state’s 55 seats.

Also – this is related to my rant last week about the reporting of the opinion polls in the US – the front page on Google for US election results shows the number of votes that each candidate has received so far (among votes that have been counted). Once again, this is highly misleading, since the number of votes DOESN’T MATTER – what matters is the number of delegates (“seats” in an Indian context) each candidate gets, and that gets decided at the state level.

Maybe I’ve been massively spoilt by Indian electoral reporting, pioneered by the likes of NDTV. Here, it’s common to show the results and leads along with margins. It is common to show what the swing is relative to the previous elections. And some publications even do “live forecasting” of the total number of seats won by each party using a variation of the votes to seats model that I’ve written about.

American reporting lacks all of this. Headline numbers are talked about. “Live reports” on sites such as Five Thirty Eight are flooded with reports of individual senate seats, which to me sitting halfway round the world, is noise. All I care about is the likelihood of Trump getting re-elected.

Reports talk about “swing states” and how each party has performed in these, but neglect mentioning which party had won it the last time. So “Biden leading in Arizona” is of no importance to me unless I know how Arizona had voted in 2016, and what the extent of the swing is.

So what would I have liked? 225-213 is fine, but can the publications project it to the full 538 seats? There are several “models” they can use for this. The simplest one is to assume that states that haven’t declared leads yet have voted the same way as they did in 2016. One level of complexity can be using the votes to seats model, by estimating swings from the states that have declared leads, and then applying it to similar states that haven’t given out any information. And then you can get more complicated, but you realise it isn’t THAT complicated.

All in all, I’m disappointed with the reporting. I wonder if the split of American media down political lines has something to do with this.

Trump, Tamasikate, NED and ADHD

My friend Ravikiran Rao has written a blogpost about how “Trump is Tamasik“. In this, he has used the Tamasik-Rajasik-Satvik framework from ancient India, modelled how most leaders are Rajasik, and how Trump is not, and is actually a Tamasik.

One of the hypotheses in the post is that a lot of commentators make the mistake of analysing Trump through a Rajasik lens (which they are used to since most other leaders are Rajasik), and so get him wrong.

The blogpost, like a lot of Ravi’s blogposts, triggered off a lot of thoughts in my head. My first reaction after starting to read was that “hey, can we compare Tamasikate to NED (noenthuda)”? The idea of Tamasik that I have is that it is about “doing nothing”. And “no enthu da” of course vocalises that philosophy – you don’t have enthu to do anything. And so Tamasikate is like NED.

That was the first thing where I found myself describing myself as a possible Tamasik.

And then Ravi goes on.

Trump, as I was saying, is Tamasik. He is driven by his impulses, and in his case, the impulses are all negative ones. Now, to be fair, all of us struggle with our impulses and emotional drives, but becoming a functional adult involves learning to rein them in, and converting them into higher order goals. We all have sexual desires, for example. The Rajasik nature involves sublimating them into a higher order emotion called love, and pursuit of love involves choosing one person and forgoing others; not giving into the impulse of going after every woman you find sexy. Trump has not made that transition at all. A Clinton may give into his impulse; Trump is his impulse.

I was thinking about the common theories about ADHD, which I’ve been diagnosed with. One theory is that ADHD leads to a “lack of executive functioning”. If we were to describe this using the Tamasik-Rajasic-Satvik framework, we can say that all of us have a “Tamasik base”, which is about our emotions, about our impulses and all that.

And then on top of the Tamasik base is a Rajasik “executive function”. It is this function that allows us to plan, think long-term, suppress our impulses when they are suboptimal, and do all the rest of things that society expects of functioning adults. However, the thing with ADHD is that this Rajasik executive function is impaired. So you are unable to plan well. You give in to your impulses. You frequently change plans. You are impulsive.

Sometimes I think that a lot of my theories are my attempts to rationalise myself and my own decisions. For example, after I first got diagnosed with ADHD in 2012, I realised that my seminal studs and fighters framework was an attempt to rationalise that.

Now Ravi’s post about Trump’s Tamasikate makes me think – I instinctively associate Tamasikate with NED. And your Tamasikate comes out in fuller light if your Rajasik executive function is impaired, which is what they say happens to someone who has ADHD.

So through this, is NED also a symptom of ADHD?

Opinion polling in India and the US

(Relative) old-time readers of this blog might recall that in 2013-14 I wrote a column called “Election Metrics” for Mint, where I used data to analyse elections and everything else related to that. This being the election where Narendra Modi suddenly emerged as a spectacular winner, the hype was high. And I think a lot of people did read my writing during that time.

In any case, somewhere during that time, my editor called me “Nate Silver of India”.

I followed that up with an article on why “there can be no Nate Silver in India” (now they seem to have put it behind a sort of limited paywall). In that, I wrote about the polling systems in India and in the US, and about how India is so behind the US when it comes to opinion polling.

Basically, India has fewer opinion polls. Many more political parties. A far more diverse electorate. Less disclosure when it comes to opinion polls. A parliamentary system. And so on and so forth.

Now, seven years later, as we are close to a US presidential election, I’m not sure the American opinion polls are as great as I made them out to be. Sure, all the above still apply. And when these poll results are put in the hands of a skilled analyst like Nate Silver, it is possible to make high quality forecasts based on that.

However, the reporting of these polls in the mainstream media, based on my limited sampling, is possibly not of much higher quality than what we see in India.

Basically I don’t understand why analysts abroad make such a big deal of “vote share” when what really matters is the “seat share”.

Like in 2016, Hillary Clinton won more votes than Donald Trump, but Trump won the election because he got “more seats” (if you think about it, the US presidential elections is like a first past the post parliamentary election with MASSIVE constituencies (California giving you 55 seats, etc.) ).

And by looking at the news (and social media), it seems like a lot of Americans just didn’t seem to get it. People alleged that Trump “stole the election” (while all he did was optimise based on the rules of the game). They started questioning the rules. They seemingly forgot the rules themselves in the process.

I think this has to do with the way opinion polls are reported in the US. Check out this graphic, for example, versions of which have been floating around on mainstream and social media for a few months now.

This shows voting intention. It shows what proportion of people surveyed have said they will vote for one of the two candidates (this is across polls. The reason this graph looks so “continuous” is that there are so many polls in the US). However, this shows vote share, and that might have nothing to do with seat share.

The problem with a lot (or most) opinion polls in India is that they give seat share predictions without bothering to mention what the vote share prediction is. Most don’t talk about sample sizes. This makes it incredibly hard to trust these polls.

The US polls (and media reports of those) have the opposite problem – they try to forecast vote share without trying to forecast how many “seats” they will translate to. “Biden has an 8 percentage point lead over Trump” says nothing. What I’m looking for is something like “as things stand, Biden is likely to get 20 (+/- 15) more electoral college votes than Trump”. Because electoral college votes is what this election is about. The vote share (or “popular vote”, as they call it in the US (perhaps giving it a bit more legitimacy than it deserves) ), for the purpose of the ultimate result, doesn’t matter.

In the Indian context, I had written this piece on how to convert votes to seats (again paywalled, it seems like). There, I had put some pictures (based on state-wise data from general elections in India before 2014).

An image from my article for Mint in 2014 on converting votes to seats. Look at the bottom left graph

What I had found is that in a two-cornered contest, small differences in vote share could make a massive difference in the number of seats won. This is precisely the situation that they have in the US – a two cornered contest. And that means opinion polls predicting vote shares only should be taken with some salt.

Yet another initiation

I’m still reeling from the Merseyside derby. It had been a long time since a game of football so emotionally drained me. In fact, the last time I remember getting a fever (literally) while watching a game of football was in the exact same fixture in 2013, which had ended 3-3 thanks to a Daniel Sturridge equaliser towards the end.

In any case, my fever (which I’ve now recovered from) and emotional exhaustion is not the reason today’s match will be memorable. It also happens to be a sort of initiation of my daughter as a bonafide Liverpool fan.

 

View this post on Instagram

 

Initiating @abherikarthik to the Merseyside derby. #ynwa #lfc

A post shared by Karthik S (@skthewimp) on

It’s been a sort of trend in recent times (at least since the lockdown) that Liverpool games have been scheduled for late evenings or late night India times.  That has meant that I haven’t been able to involve the daughter, who on most days goes to bed at seven, in the football.

She has seen me watch highlights of Liverpool games. She admires the “Liverpool. We are Champions” poster that I had ordered after last season’s Premier League victory, and have since stuck on the walls of our study. She knows I’ve been a fan of Liverpool for a long time now (it dates to more than eleven years before she was born).

However, till date, after she had truly started understand stuff (she is four now), we had never watched a game together. And so when it was announced that the Everton-Liverpool game would be held at 5pm IST, I decided it was time for initiation.

I had casually slipped it to her on Tuesday (or so) that “on Saturday, we will be watching football together. And we will have drinks and snacks along with it”. And then on Wednesday she asked me what day of the week it was. “So how many days to Saturday”, she asked. When I asked her what was special about the coming Saturday, she let out a happy scream saying “football party!!”. On the same day she had informed her mother that we both were “going to have a football party on Saturday”, and that her mother was not welcome.

She’s spent the last three days looking forward to today. At four o’clock today, as I was “busy” watching the IPL game, she expressed her disappointment that I had not yet started preparing for the party. I finally swung into action around 4:30 (though a shopping trip in the morning had taken care of most of the prep).

A popcorn packet was put into the microwave. The potato chips packet (from a local “Sai hot chips” store) was opened, and part of its contents poured into a bowl. I showed her the bottles of fresh fruit juice that I had got, that had been pushed to me by a promoter at the local Namdhari’s store. Initially opting for the orange juice, she later said she wanted the “berry smoothie”. I poured it into a small wine glass that she likes. A can of diet coke and some Haldiram salted peanuts for me, and we were set.

I was pleasantly surprised that she sat still on the couch with me pretty much for the length of the match (she’s generally the restless types, like me). She tied the Liverpool scarf around her in many different ways. She gorged on the snacks (popcorn, potato chips and pomegranate in the first half; nachos with ketchup in the second). She kept asking who is winning. She kept asking me “where Liverpool was from” after I told her that “Everton are from Liverpool”.

I explained to her the concept of football, and goals. Once in the second half she was curious to see Adrian in the Liverpool goal, and that she “hadn’t seen the Liverpool goal in a long time”. Presently, Dominic Calvert-Lewin equalised to make it 2-2, giving her the glimpse of the goal she had so desired.

At the end of the game, she couldn’t grasp the concept of a draw. “But who won?”, she kept asking. She didn’t grasp the concept of offside either, though it possibly didn’t help that Liverpool seemed to play a far deeper line today than they have this season.

I’m glad that she had such an interesting game to make her “football watching debut”. Not technically, of course, since I remember cradling her on my lap when Jose Mourinho parked two Manchester United buses at Anfield (she was a month old then), and that had been a dreadful game.

A friend told me that I should “let her make her own choices” and not foist my club affiliations on her. Let’s see where this goes.

 

Republic and TV Ratings

I had written this back when Republic had just launched, and had got insanely high TV ratings in its first few weeks of broadcast. Opposing channels had contested the claim.

I had written this analysis for Mint, but to the best of my knowledge they did not publish it. So, very belatedly, since Republic and TV Ratings are in the news, I’m putting this here. This article was originally written on 20th May 2017

[E]ven if 1 in 500 of Sony Max’s viewers decided to watch Republic out of curiosity, that would have been enough to give Republic a 50% market share among English news channels

There has been much fuss in the media over the last few days regarding the newly launched news channel Republic’s gain in market share. According to data released by the Broadcast Audience Research Council, the channel had a “50% share” among all English language news channels in its first week of launch.

To be more precise, research by BARC, which relies on a small population of households with “listeners”, showed that there were 2.1 million “impressions” for Republic during the week of 6th-12th May (bizarrely, the television week runs from Saturday to Friday). The “impressions” of the next four highest watched English news channels (Times Now, NDTV 24×7, India TV and CNN News 18) added up to the same number
BARC uses a panel of 22,000 households whose viewing habits are continuously tracked and aggregated in order to produce overall viewership numbers . According to the organisation’s website , this panel was formed based on a comprehensive survey of about 250,000 households, and was selected to cover different states and socio-economic segments in a representative fashion.

“Audio watermarks” are added to the programming of different channels (these are sounds outside the human hearing range that are added on top of the regular program), and a receiver in a respondent’s home recognises the channel by the watermark when the TV is playing. The receiver then transmits in real time the viewing data to a central server which then computes aggregate viewership numbers.

The computation of aggregates is not a simple process since the geographic and socio-economic distribution of the sample households don’t necessarily reflect that of the population. Hence, results from the receivers needs to be weighted in an appropriate fashion before BARC produces the overall viewership numbers.

With this as the background, there are a few reasons why we should not get too excited by the fact that Republic got a “50% marketshare” in its first week of broadcast. Firstly, the 50% figure is wrong because it is 50% among the top 5 channels (the BARC website weirdly doesn’t give data beyond the top 5 in a category). While the remaining news channels may not individually have too many impressions, their total need not be insignificant.

Secondly, while the 2.1 million impressions for Republic in its first week sounds impressive, we must note that the overall market share for English news channels in India is rather minuscule.

To put in context, Table 2 has the total impressions of the top 10 channels in India. The highest watched channel, Sony MAX, had a billion impressions, which is 500 times as many as Republic. And as the table shows, the numbers don’t fall too drastically. Republic’s overall market share is tiny indeed.
In fact, to get a better perspective of how tiny the segment of English news channels is, it is instructive to compare them to Hindi News channels.

The top 5 Hindi News channels each have at least 30 times as many impressions as Republic.

In this context, Republic’s 50% market share among English news channels is nothing much to write about. Given the size of the genre itself, getting a 50% marketshare in the first week is no big deal. To put it simply, even if 1 in 500 of Sony Max’s viewers decided to watch Republic out of curiosity, that would have been enough to give it a 50% market share in English news channels.

We should also account for errors in BARC’s methodology – something that rival news channels have mentioned in their complaint. While the data collection method using audio watermarks is sound (since there is no manual intervention), there can be significant errors in terms of sampling. At first glance, 22,000 seems like a large enough sample. However, given the fact that BARC tracks more than 400 channels , this sample size is possibly inadequate. Also, given that this is a stratified sample chosen at the state, city size and socio-economic segment level, there is an assumption that all households of a certain socio-economic class in a certain region have homogeneous TV watching habits. With 400 channels to choose from, this is not a very great assumption.

Finally, it remains to be seen if Republic manages to retain its viewership in coming weeks. Once the novelty factor of the new channels wears off, it is possible that its viewership might decline. If Republic manages to hold on to, or increase, its viewership, it can be seen as a positive for the otherwise struggling English TV news industry.

Please remember that this article was written more than three years ago. All my opinions and information used in this blogpost are as things were known to me at that point in time. Also, all numbers in this article are “current” as of May 2017. 
Postscript: Monday’s Times of India had a great article on this topic. Refer to that for a more contemporary analysis of this topic.