Legacy Metrics

Yesterday (or was it the day before? I’ve lost track of time with full time WFH now) the Times of India Bangalore edition had two headlines.

One was the Karnataka education minister BC Nagesh talking about deciding on school closures on a taluk (sub-district) wise basis. “We don’t want to take a decision for the whole state. However, in taluks where test positivity is more than 5%, we will shut schools”, he said.

That was on page one.

And then somewhere inside the newspaper, there was another article. The Indian Council for Medical Research has recommended that “only symptomatic patients should be tested for Covid-19”. However, for whatever reason, Karnataka had decided to not go by this recommendation, and instead decided to ramp up testing.

These two articles are correlated, though the paper didn’t say they were.

I should remind you of one tweet, that I elaborated about a few days back:

 

The reason why Karnataka has decided to ramp up testing despite advisory to the contrary is that changing policy at this point in time will mess with metrics. Yes, I stand by my tweet that test positivity ratio is a shit metric. However, with the government having accepted over the last two years that it is a good metric, it has become “conventional wisdom”. Everyone uses it because everyone else uses it. 

And so you have policies on school shutdowns and other restrictive measures being dictated by this metric – because everyone else uses the same metric, using this “cannot be wrong”. It’s like the old adage that “nobody got fired for hiring IBM”.

ICMR’s message to cut testing of asymptomatic individuals is a laudable one – given that an overwhelming number of people infected by the incumbent Omicron variant of covid-19 have no symptoms at all. The reason it has not been accepted is that it will mess with the well-accepted metric.

If you stop testing asymptomatic people, the total number of tests will drop sharply. The people who are ill will get themselves tested anyways, and so the numerator (number of positive reports) won’t drop. This means that the ratio will suddenly jump up.

And that needs new measures – while 5% is some sort of a “critical number” now (like it is with p-values), the “critical number” will be something else. Moreover, if only symptomatic people are to be tested, the number of tests a day will vary even more – and so the positivity ratio may not be as stable as it is now.

All kinds of currently carefully curated metrics will get messed up. And that is a big problem for everyone who uses these metrics. And so there will be pushback.

Over a period of time, I expect the government and its departments to come up alternate metrics (like how banks have now come up with an alternative to LIBOR), after which the policy to cut testing for asymptomatic people will get implemented. Until then, we should bow to the “legacy metric”.

And if you didn’t figure out already, legacy metrics are everywhere. You might be the cleverest data scientist going around and you might come up with what you think might be a totally stellar metric. However, irrespective of how stellar it is, that people have to change their way of thinking and their process to process it means that it won’t get much acceptance.

The strategy I’ve come to is to either change the metric slowly, in stages (change it little by little), or to publish the new metric along with the old one. Depending on how clever the new metric is, one of the metrics will die away.

Metrics

Over the weekend, I wrote this on twitter:

 

Surprisingly (at the time of writing this at least), I haven’t got that much abuse for this tweet, considering how “test positivity” has been held as the gold standard in terms of tracking the pandemic by governments and commentators.

The reason why I say this is a “shit metric” is simple – it doesn’t give that much information. Let’s think about it.

For a (ratio) metric to make sense, both the numerator and the denominator need to be clearly defined, and there needs to be clear information content in the ratio. In this particular case, both the numerator and the denominator are clear – latter is the number of people who got Covid tests taken, and the former is the number of these people who returned a positive test.

So far so good. Apart from being an objective measure, test positivity ratio is  also a “ratio”, and thus normalised (unlike absolute number of positive tests).

So why do I say it doesn’t give much information? Because of the information content.

The problem with test positivity ratio is the composition of the denominator (now we’re getting into complicated territory). Essentially, there are many reasons why people get tested for Covid-19. The most obvious reason to get tested is that you are ill. Then, you might get tested when a family member is ill. You might get tested because your employer mandates random tests. You might get tested because you have to travel somewhere and the airline requires it. And so on and so forth.

Now, for each of these reasons for getting tested, we can define a sort of “prior probability of testing positive” (based on historical averages, etc). And the positivity ratio needs to be seen in relation to this prior probability. For example, in “peaceful times” (eg. Bangalore between August and November 2021), a large proportion of the tests would be “random” – people travelling or employer-mandated. And this would necessarily mean a low test positivity.

The other extreme is when the disease is spreading rapidly – few people are travelling or going physically to work. Most of the people who get tested are getting tested because they are ill. And so the test positivity ratio will be rather high.

Basically – rather than the ratio telling you how bad the covid situation is in a region, it is influenced by how bad the covid situation is. You can think of it as some sort of a Schrödinger-ian measurement.

That wasn’t an offhand comment. Because government policy is an important input into test positivity ratio. For example, take “contact tracing”, where contacts of people who have tested positive are hunted down and also tested. The prior probability of a contact of a covid patient testing positive is far higher than the prior probability of a random person testing positive.

And so, as and when the government steps up contact tracing (as it does in the early days of each new wave), test positivity ratio goes up, as more “high prior probability” people get tested. Similarly, whether other states require a negative test to travel affects positivity ratio – the more the likelihood that you need a test to travel, the more likely that “low prior probability” people will take the test, and the lower the ratio will be. Or when governments decide to “randomly test” people (puling them off the streets of whatever), the ratio will come down.

In other words – the ratio can be easily gamed by governments, apart from just being influenced by government policy.

So what do we do now? How do we know whether the Covid-19 situation is serious enough to merit clamping down on people’s liberties? If test positivity ratio is a “shit metric” what can be a better one?

In this particular case (writing this on 3rd Jan 2022), absolute number of positive cases is as bad a metric as test positivity – over the last 3 months, the number of tests conducted in Bangalore has been rather steady. Moreover, the theory so far has been that Omicron is far less deadly than earlier versions of Covid-19, and the vaccination rate is rather high in Bangalore.

While defining metrics, sometimes it is useful to go back to first principles, and think about why we need the metric in the first place and what we are trying to optimise. In this particular case, we are trying to see when it makes sense to cut down economic activity to prevent the spread of the disease.

And why do we need lockdowns? To prevent hospitals from getting overwhelmed. You might remember the chaos of April-May 2021, when it was near impossible to get a hospital bed in Bangalore (even crematoriums had long queues). This is a situation we need to avoid – and the only one that merits lockdowns.

One simple measure we can use is to see how many hospital beds are actually full with covid patients, and if that might become a problem soon. Basically – if you can measure something “close to the problem”, measure it and use that as the metric. Rather than using proxies such as test positivity.

Because test positivity depends on too many factors, including government action. Because we are dealing with a new variant here, which is supposedly less severe. Because most of us have been vaccinated now, our response to getting the disease will be different. The change in situation means the old metrics don’t work.

It’s interesting that the Mumbai municipal corporation has started including bed availability in its daily reports.

Profit and politics

Earlier today I came across this article about data scientists on LinkedIn that I agreed with so much that I started wondering if it was simply a case of confirmation bias.

A few sentences (possibly taken out of context) from there that I agree with:

  • Many large companies have fallen into the trap that you need a PhD to do data science, you don’t.
  • There are some smart people who know a lot about a very narrow field, but data science is a very broad discipline. When these PhD’s are put in charge, they quickly find they are out of their league.
  • Often companies put a strong technical person in charge when they really need a strong business person in charge.
  •  I always found the academic world more political than the corporate world and when your drive is profits and customer satisfaction, that academic mindset is more of a liability than an asset.

Back to the topic, which is the last of these sentences. This is something I’ve intended to write for 5-6 years now, since the time I started off as an independent management consultant.

During the early days I took on assignments from both for-profit and not-for-profit organisations, and soon it was very clear that I enjoyed working with for-profit organisations a lot more. It wasn’t about money – I was fairly careful in my negotiations to never underprice myself. It was more to do with processes, and interactions.

The thing in for-profit companies is that objectives are clear. While not everyone in the company has an incentive to increase the bottom-line, it is not hard to understand what they want based on what they do.

For example, in most cases a sales manager optimises for maximum sales. Financial controllers want to keep a check on costs. And so on. So as part of a consulting assignment, it’s rather easy to know who wants what, and how you should pitch your solution to different people in order to get buy-in.

With a not-for-profit it’s not that clear. While each person may have their own metrics and objectives, because the company is not for profit, these objectives and metrics need not be everything they’re optimising for.

Moreover, in the not for profit world, the lack of money or profit as an objective means you cannot differentiate yourself with efficiency or quantity. Take the example of an organisation which, for whatever reason, gets to advice a ministry on a particular subject, and does so without a fee or only for a nominal fee.

How can a competitor who possibly has a better solution to the same problem “displace” the original organisation? In the business world, this can be done by showing superior metrics and efficiency and offering to do the job at a lower cost and stuff like that. In the not-for-profit setup, you can’t differentiate on things like cost or efficiency, so the only thing you can do is to somehow provide your services in parallel and hope that the client gets it.

And then there is access. If you’re a not-for-profit consultant who has a juicy project, it is in your interest to become a gatekeeper and prevent other potential consultants from getting the same kind of access you have – for you never know if someone else who might get access through you might end up elbowing you out.

Shoes and metrics

The best metric to measure the age of a pair of shoes is the distance walked in them

My latest pair of “belt chappli” (sandals with a belt going around the heels) is only ten months old, but has started wearing. Walking long distances in the said sandals has become a pain. The top is nice, the sole is fantastic, but the inner sole has gotten FUBARed. Maybe it was a stone that got stuck under my feet which I didn’t notice. Maybe it was several such small stones. But with the inner sole “gone”, time is nigh to possibly retire the chappal.

But then a good pair of sandals is supposed to last much longer (and I did 2 longish foreign trips in this period where this chappal didn’t travel with me). Historically, good sandals have lasted two years or more. And it is not that this one is cheap. I paid close to Rs. 2000 for it, and it’s branded, too (Lee Cooper), and I had found it after a lot o difficulty (three months of searching). That it has lasted less than a year is not fair.

But then the question arises as to whether I have the right metrics in place. The number of months or years that a pair of shoes lasts is an intuitive metric of its quality, but it is not the right one. For, a pair of shoes doesn’t wear when it is not worn! Of course there might be mild wear and tear due to weather conditions, but for a pair of shoes made of good leather, that can be ignored.

So maybe the best metric for a pair of shoes is the amount of time it is worn? Then again, while a shoe might wear while its worn, it doesn’t wear too much when it’s at rest –  I mean its shape changes to fit the wearer’s foot (over the medium term) and that might cause some wear and tear, but in the long run, there is unlikely to be much wear and tear at rest.

From that perspective, I hereby declare that the best metric to measure a shoe’s performance is the number of kilometres walked or run in it (latter causes significantly more wear and tear, but let’s assume that walking shoes and running shoes are mutually exclusive (which they’re not) ). This is an excellent because it takes care of a number of features that correlate with the wear and tear, and is not hard to fathom.

Going by this metric, my current pair of “belt chappli” has put in considerable service. Over the last ten months, the frequency of going on “beats” in Jayanagar has gone up, and the distance covered in each beat, too. Having pretty much stopped driving, I walk more than I used to, and this is my default shoe for such perambulations.

The problem now is the search cost – good belt chapplis that fit my feet are hard to find. It’s a liquidity problem, I think (:P). Maybe I should just consider getting the inner sole replaced and get on with this one.

Volatility of Human Body Weight

Ever since I shed roughly 20 kilos over the course of the second half of last year, I’ve become extremely weight-conscious. Given how quickly I shed so much weight, I’m paranoid that I might gain back so much again as quickly. This means I monitor my weight as closely as I can, limit myself in terms of “sin foods” and check my weight as often as possible, typically whenever I manage to make it to the gym (about twice a week on average).

Having been used to analog scales lifelong (there’s one at home, but it is wrongly calibrated I think), the digital scales (with 7-segment display) that are there at a gym provide me with a bit of a problem. I think they are too precise – they show my weight up to 1 place of decimal (in kilograms), and thinking about it, I think that much detail is unwarranted.

The reason being that I think given the normal cycles, I think the weight of the human body is highly volatile and measuring a volatile commodity at a scale finer than the volatility (when all you are interested in is the long-term average) is fraught with danger and inaccuracy. For example, every time you drink two glasses of water, your weight shoots up by half a kilo. Every time you pee, your weight correspondingly comes down. Every time you eat, up the weight goes, and every time you defecate, down go the scales.

Given this, I find the digital weighing machine at my gym a bit of a pain, but then I’m trying to figure out what the normal volatilty of the human body weight is, so that I can quickly catch on to any upward trend and make amends as soon as I can help it. Over the last couple of months, the machine has shown up various numbers between 73.8 and 75.5 and I have currently made a mental note that I’m not going to panic unless I go past 76.

I wonder if I’m making enough allowances for the volatility of my own body weight, and if I should reset my panic limits. I have other metrics to track my weight also – though my various trousers are all calibrated as “size 34” some have smaller waists than the others, and my algo every morning is to start wearing my pants starting from the smallest available, and go to work in the first one that fits, and when I know that I’m having trouble buttoning up my black chinos, that’s another alarm button.

Yeah sometimes I do think I’m too paranoid about my weight, but again it’s due to the speed at which I reduced that I’m anxious to make sure I don’t go back up at the same rate!

Update

Economist Ajay Shah sends me (and other members of a mailing list we belong to) this wonderful piece he has put together on weight management. Do read. But my question remains – how do you measure your body’s weight volatility?

Arranged Scissors 15: Stud and Fighter Beauty

Ok so here we come to the holy grail. The grand unification. Kunal Sawardekar can scream even more loudly now. Two concepts that i’ve much used and abused over the last year or so come together. In a post that will probably be the end of both these concepts in the blogging format. I think I want to write books. I want to write two books – one about each of these concepts. And after thinking about it, I don’t think a blook makes sense. Too  many readers will find it stale. So, this post signals the end of these two concepts in blog format. They’ll meet you soon, at a bookstore near you.

So this post is basically about how the aunties (basically women of my mother’s generation) evaluate a girl’s beauty and about how it significantly differs from the way most others evaluate it. For most people, beauty is a subjective thing. It is, as the proverb goes, in the eyes of the beholder. You look at the thing of beauty (not necessarily a joy forever) as a complete package. And decide whether the package is on hte whole beautiful. It is likely that different people have different metrics, but they are never explicit. Thus, different people find different people beautiful, and everyone has his/her share of beauty.

So I would like to call that as the “stud” way of evaluating beauty. It is instinctive. It is about insights hitting your head (about whether someone is beautiful or not). It is not a “process”. And it is “quick”. And “easy” – you don’t sweat much to decide whether someone is beautiful or not. It is the stud way of doing it. It is the way things are meant to be. Unfortunately, women of my mother’s generation (and maybe earlier generations) have decided to “fighterize” this aspect also.

So this is how my mother (just to take an example) goes about evaluating a girl. The girl is first split into components. Eyes, nose, hair, mouth, lips, cheeks, symmetry, etc. etc. Each of these components has its own weightage (differnet women use different weightages for evaluation. however for a particular woman, the weightage set is the same irrespective of who she is evaluating). And each gets marked on a 5-point likert scale (that’s what my mother uses; others might use scales of different lengths).

There are both subject-wise cutoffs and aggregate cutoff (this is based on the weighted average of scores for each component). So for a girl to qualify as a “CMP daughter-in-law”, she has to clear each of the subject cutoffs and also the total. Again – different women use different sets of cutoffs, but a particular woman uses only one set. And so forth.

I wonder when this system came into being, and why. I wonder if people stopped trusting their own judgment on “overall beauty” because of which they evolved this scale. I wonder if it was societal pressure that led to women look for a CMP daughter-in-law for which purpose they adopted this scale. It’s not “natural” so I can’t give a “selfish gene” argument in support of it. But I still wonder. And my mother still uses scales such as this to evaluate my potential bladees. Such are life.