Stable Diffusion and Chat GPT and Logistic Regression

For a long time I have had this shibboleth on whether someone is a “statistics person or a machine learning person”. It is based on what they call regressions where the dependent variable is binary. Statisticians simply call it “logit” (there is also a “probit“).

Now, in terms of implementation as well, there is one big difference between the way “logit” is modelled versus “logistic regression”. For a logit model (if you are using python, you need to use the “statsmodels” package for this, not scikit learn), the number of observations needs to far exceed the number of independent variables.

Else, a matrix that needs to be inverted as part of the solution will turn out to be singular, and there will be no solution. I guess I betrayed my greater background in statistics than in Machine Learning when, in 2018, I wrote this blogpost on machine learning being a “process to tie down coefficients in maths models“.

For “logistic regression” (as opposed to “logit”) puts no such constraint – on the regression matrix being invertible. Instead of actually inverting the matrix, machine learning approaches simply focus on learning the terms of the inverted matrix using gradient descent (basically the opposite of hill climbing), so mathematical inconveniences such as matrices that cannot be inverted are moot there.

And so you have logistic regression models with thousands of variables, often calibrated with a fewer number of data points. To be honest, I can’t understand this fully – without sufficient information (data points) to calibrate the coefficients, there will always be a sense of randomness in the output. The model has too many degrees of freedom, and so there is additional information the model is supplying (apart from what was supplied in the training data!).

Of late I have been playing a fair bit with generative AI (primarily ChatGPT and Stable Diffusion). The other day, my daughter and I were alone in my in-laws’ house, and I told her “look I’ve brought my personal laptop along, if you want we can play with it”. And she had demanded that she “play with stable diffusion”. This is the image she got for “tiger chasing deer”.

I have written earlier here about how the likes of ChatGPT and Stable Diffusion in a way redefine “information content“.

 

And if you think about it, almost by definition, “generative AI” creates information (and hallucinates, like in the above pic). Traditionally speaking, a “picture is worth a thousand words”, but if you can generate a picture with just a few words of prompt, the information content in it is far less than a thousand words.

In some sense, this reminds me of “logistic regression” once again. By definition (because it is generative), there is insufficient “tying down of coefficients”, because of which the AI inevitably ends up “adding value of its own”, which by definition is random.

So, you will end up getting arbitrary results. ChatGPT often gives you wrong answers to questions. Dall-E and Midjourney and Stable Diffusion will return nonsense images such as the above. Because a “generative AI” needs to create information, by definition, all the coefficients of the model cannot be well calibrated. 

And the consequence of this is that however good these AIs get, however much data is used to train them, there will always be an element of randomness to them. There will always be test cases where they give funny results.

No, AGI is not here yet.

More on CRM

On Friday afternoon, I got a call on my phone. It was  “+91 9818… ” number, and my first instinct was it was someone at work (my company is headquartered in Gurgaon), and I mentally prepared a “don’t you know I’m on vacation? can you call me on Monday instead” as I picked the call.

It turned out to be Baninder Singh, founder of Savorworks Coffee. I had placed an order on his website on Thursday, and I half expected him to tell me that some of the things I had ordered were out of stock.

“Karthik, for your order of the Pi?anas, you have asked for an Aeropress grind. Are you sure of this? I’m asking you because you usually order whole beans”, Baninder said. This was a remarkably pertinent observation, and an appropriate question from a seller. I confirmed to him that this was indeed deliberate (this smaller package is to take to office along with my Aeropress Go), and thanked him for asking. He went on to point out that one of the other coffees I had ordered had very limited stocks, and I should consider stocking up on it.

Some people might find this creepy (that the seller knows exactly what you order, and notices changes in your order), but from a more conventional retail perspective, this is brilliant. It is great that the seller has accurate information on your profile, and is able to detect any anomalies and alert you before something goes wrong.

Now, Savorworks is a small business (a Delhi based independent roastery), and having ordered from them at least a dozen times, I guess I’m one of their more regular customers. So it’s easy for them to keep track and take care of me.

It is similar with small “mom-and-pop” stores. Limited and high-repeat clientele means it’s easy for them to keep track of them and look after them. The challenge, though, is how do you scale it? Now, I’m by no means the only person thinking about this problem. Thousands of business people and data scientists and retailers and technology people and what not have pondered this question for over a decade now. Yet, what you find is that at scale you are simply unable to provide the sort of service you can at small scale.

In theory it should be possible for an AI to profile customers based on their purchases, adds to carts, etc. and then provide them customised experiences. I’m sure tonnes of companies are already trying to do this. However, based on my experience I don’t think anyone is doing this well.

I might sound like a broken record here, but my sense is that this is because the people who are building the algos are not the ones who are thinking of solving the business problems. The algos exist. In theory, if I look at stuff like stable diffusion or Chat GPT (both of which I’ve been playing around with extensively in the last 2 days), algorithms for stuff like customer profiling shouldn’t be THAT hard. The issue, I suspect, is that people have not been asking the right questions of the algos.

On one hand, you could have business people looking at patterns they have divined themselves and then giving precise instructions to the data scientists on how to detect them – and the detection of these patterns would have been hard coded. On the other, the data scientists would have had a free hand and would have done some unsupervised stuff without much business context. And both approaches lead to easily predictable algos that aren’t particularly intelligent.

Now I’m thinking of this as a “dollar bill on the road” kind of a problem. My instinct tells me that “solution exists”, but my other instinct tells that “if a solution existed someone would have found it given how many companies are working on this kind of thing for so long”.

The other issue with such algos it that the deeper you get in prediction the harder it is. At the cohort (of hundreds of users) level, it should not be hard to profile. However, at the personal user level (at which the results of the algos are seen by customers) it is much harder to get right. So maybe there are good solutions but we haven’t yet seen it.

Maybe at some point in the near future, I’ll take another stab at solving this kind of problem. Until then, you have human intelligence and random algos.

 

Alcohol, dinner time and sleep

A couple of months back, I presented what I now realise is a piece of bad data analysis. At the outset, there is nothing special about this – I present bad data analysis all the time at work. In fact, I may even argue that as a head of Data Science and BI, I’m entitled to do this. Anyway, this is not about work.

In that piece, I had looked at some of the data I’ve been diligently collecting about myself for over a year, correlated it with the data collected through my Apple Watch, and found a correlation that on days I drank alcohol, my sleeping heart rate average was higher.

And so I had concluded that alcohol is bad for me. Then again, I’m an experimenter so I didn’t let that stop me from having alcohol altogether. In fact, if I look at my data, the frequency of having alcohol actually went up after my previous blog post, though for a very different reason.

However, having written this blog post, every time I drank, I would check my sleeping heart rate the next day. Most days it seemed “normal”. No spike due to the alcohol. I decided it merited more investigation – which I finished yesterday.

First, the anecdotal evidence – what kind of alcohol I have matters. Wine and scotch have very little impact on my sleep or heart rate (last year with my Ultrahuman patch I’d figured that they had very little impact on blood sugar as well). Beer, on the other hand, has a significant (negative) impact on heart rate (I normally don’t drink anything else).

Unfortunately this data point (what kind of alcohol I drank or how much I drank) I don’t capture in my daily log. So it is impossible to analyse it scientifically.

Anecdotally I started noticing another thing – all the big spikes I had reported in my previous blogpost on the topic were on days when I kept drinking (usually with others) and then had dinner very late. Could late dinner be the cause of my elevated heart rate? Again, in the days after my previous blogpost, I would notice that late dinners would lead to elevated sleeping heart rates  (even if I hadn’t had alcohol that day). Looking at my nightly heart rate graph, I could see that the heart rate on these days would be elevated in the early part of my sleep.

The good news is this (dinner time) is a data point I regularly capture. So when I finally got down to revisiting the analysis yesterday, I had a LOT of data to work with. I won’t go into the intricacies of the analysis (and all the negative results) here. But here are the key insights.

If I regress my resting heart rate against the binary of whether I had alcohol the previous day, I get a significant regression, with a R^2 of 6.1% (i.e. whether I had alcohol the previous day or not explains 6.1% of the variance in my sleeping heart rate). If I have had alcohol the previous day, my sleeping heart rate is higher by about 2 beats per minute on average.

Call:
lm(formula = HR ~ Alcohol, data = .)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.6523 -2.6349 -0.3849  2.0314 17.5477 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  69.4849     0.3843 180.793  < 2e-16 ***
AlcoholYes    2.1674     0.6234   3.477 0.000645 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.957 on 169 degrees of freedom
Multiple R-squared:  0.06676,   Adjusted R-squared:  0.06123 
F-statistic: 12.09 on 1 and 169 DF,  p-value: 0.000645

Then I regressed my resting heart rate on dinner time (expressed in hours) alone. Again a significant regression but with a much higher R^2 of 9.7%. So what time I have dinner explains a lot more of the variance in my resting heart rate than whether I’ve had alcohol. And each hour later I have my dinner, my sleeping heart rate that night goes up by 0.8 bpm.

Call:
lm(formula = HR ~ Dinner, data = .)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.6047 -2.4551 -0.0042  2.0453 16.7891 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  54.7719     3.5540  15.411  < 2e-16 ***
Dinner        0.8018     0.1828   4.387 2.02e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.881 on 169 degrees of freedom
Multiple R-squared:  0.1022,    Adjusted R-squared:  0.09693 
F-statistic: 19.25 on 1 and 169 DF,  p-value: 2.017e-05

Finally, for the sake of completeness, I regressed with both. The interesting thing is the adjusted R^2 pretty much added up – giving me > 16% now (so effectively the two (dinner time and alcohol) are uncorrelated). The coefficients are pretty much the same once again.

Call:
lm(formula = HR ~ Dinner, data = .)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.6047 -2.4551 -0.0042  2.0453 16.7891 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  54.7719     3.5540  15.411  < 2e-16 ***
Dinner        0.8018     0.1828   4.387 2.02e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.881 on 169 degrees of freedom
Multiple R-squared:  0.1022,    Adjusted R-squared:  0.09693 
F-statistic: 19.25 on 1 and 169 DF,  p-value: 2.017e-05

So the takeaway is simple – alcohol might be okay, but have dinner at my regular time (~ 6pm). Also – if I’m going out drinking, I better finish my dinner and go. And no – having beer won’t work – it is going to be another dinner in itself. So stick to wine or scotch.

I must mention things I analysed against and didn’t find significant – whether I have coffee, what time I sleep, the time gap between dinner time and sleep time – all of these have no impact on my resting heart rate. All that matters is alcohol and when I have dinner.

And the last one is something I should never compromise on.

 

 

 

Mo Salah and Machine Learning

First of all, I’m damn happy that Mo Salah has renewed his Liverpool contract. With Sadio Mane also leaving, the attack was looking a bit thin (I was distinctly unhappy with the Jota-Mane-Diaz forward line we used in the Champions League final. Lacked cohesion). Nunez is still untested in terms of “leadership”, and without Salah that would’ve left Firmino as the only “attacking leader”.

(non-technical readers can skip the section in italics and still make sense of this post)

Now that this is out of the way, I’m interested in seeing one statistic (for which I’m pretty sure I don’t have the data). For each of the chances that Salah has created, I want to look at the xG (expected goals) and whether he scored or not. And then look at a density plot of xG for both categories (scored or not). 

For most players, this is likely to result in two very distinct curves – they are likely to score from a large % of high xG chances, and almost not score at all from low xG chances. For Salah, though, the two density curves are likely to be a lot closer.

What I’m saying is – most strikers score well from easy chances, and fail to score from difficult chances. Salah is not like that. On the one hand, he creates and scores some extraordinary goals out of nothing (low xG). On the other, he tends to miss a lot of seemingly easy chances (high xG).

In fact, it is quite possible to look at a player like Salah, see a few sitters that he has missed (he misses quite a few of them), and think he is a poor forward. And if you look at a small sample of data (or short periods of time) you are likely to come to the same conclusion. Look at the last 3-4 months of the 2021-22 season. The consensus among pundits then was that Salah had become poor (and on Reddit, you could see Liverpool fans arguing that we shouldn’t give him a lucrative contract extension since ‘he has lost it’).

It is well possible that this is exactly the conclusion Jose Mourinho came to back in 2013-14 when he managed Salah at Chelsea (and gave him very few opportunities). The thing with a player like Salah is that he is so unpredictable that it is very possible to see samples and think he is useless.

Of late, I’ve been doing (rather, supervising (and there is no pun intended) ) a lot of machine learning work. A lot of this has to do with binary classification – classifying something as either a 0 or a 1. Data scientists build models, which give out a probability score that the thing is a 1, and then use some (sometimes arbitrary) cutoff to determine whether the thing is a 0 or a 1.

There are a bunch of metrics in data science on how good a model is, and it all comes down to what the model predicted and what “really” happened. And I’ve seen data scientists work super hard to improve on these accuracy measures. What can be done to predict a little bit better? Why is this model only giving me 77% ROC-AUC when for the other problem I was able to get 90%?

The thing is – if the variable you are trying to predict is something like whether Salah will score from a particular chance, your accuracy metric will be really low indeed. Because he is fundamentally unpredictable. It is the same with some of the machine learning stuff – a lot of models are trying to predict something that is fundamentally unpredictable, so there is a limit on how accurate the model will get.

The problem is that you would have come across several problem statements that are much more predictable that you think it is a problem with you (or your model) that you can’t predict better. Pundits (or Jose) would have seen so many strikers who predictably score from good chances that they think Salah is not good.

The solution in these cases is to look at aggregates. Looking for each single prediction will not take us anywhere. Instead, can we predict over a large set of data whether we broadly got it right? In my “research” for this blogpost, I found this.

Last season, on average, Salah scored precisely as many goals as the model would’ve predicted! You might remember stunners like the one against Manchester City at Anfield. So you know where things got averaged out.

So many numbers! Must be very complicated!

The story dates back to 2007. Fully retrofitting, I was in what can be described as my first ever “data science job”. After having struggled for several months to string together a forecasting model in Java (the bugs kept multiplying and cascading), I’d given up and gone back to the familiarity of MS Excel and VBA (remember that this was just about a year after I’d finished my MBA).

My seat in the office was near a door that led to the balcony, where smokers would gather. People walking to the balcony, with some effort, could see my screen. No doubt most of them would’ve seen my spending 90% (or more) of my time on Google Talk (it’s ironical that I now largely use Google Chat for work). If someone came at an auspicious time, though, they would see me really working, which was using MS Excel.

I distinctly remember this one time this guy who shared my office cab walked up behind me. I had a full sheet of Excel data and was trying to make sense of it. He took one look at my screen and exclaimed, “oh, so many numbers! Must be very complicated!” (FWIW, he was a software engineer). I gave him a fairly dirty look, wondering what was complicated about a fairly simple dataset on Excel. He moved on, to the balcony. I moved on, with my analysis.

It is funny that, fifteen years down the line, I have built my career in data science. Yet, I just can’t make sense of large sets of numbers. If someone sends me a sheet full of numbers I can’t make out the head or tail of it. Maybe I’m a victim of my own obsessions, where I spend hours visualising data so I can make some sense of it – I just can’t understand matrices of numbers thrown together.

At the very least, I need the numbers formatted well (in an Excel context, using either the “,” or “%” formats), with all numbers in a single column right aligned and rounded off to the exact same number of decimal places (it annoys me that by default, Excel autocorrects “84.0” (for example) to “84” – that disturbs this formatting. Applying “,” fixes it, though). Sometimes I demand that conditional formatting be applied on the numbers, so I know which numbers stand out (again I have a strong preference for red-white-green (or green-white-red, depending upon whether the quantity is “good” or “bad”) formatting). I might even demand sparklines.

But send me a sheet full of numbers and without any of the above mentioned decorations, and I’m completely unable to make any sense or draw any insight out of it. I fully empathise now, with the guy who said “oh, so many numbers! must be very complicated!”

And I’m supposed to be a data scientist. In any case, I’d written a long time back about why data scientists ought to be good at Excel.

Alcohol and sleep

A few months back we’d seen this documentary on Netflix (I THINK) on the effects of alcohol on health. Like you would expect from a well-made documentary (rather than a polemic), the results were inconclusive. There were a few mildly positive effects, some negative effects, some indicators on how alcohol can harm your health, etc.

However, the one thing I remember from that documentary is about alcohol’s effect on sleep – that drinking makes you sleep worse (contrary to popular imagination where you can easily pass out if you drink a lot). And I have now managed to validate that for myself using data.

The more perceptive of you might know that I log my life. I have a spreadsheet where every day I record some vital statistics (sleep and meal times, anxiety, quality of work, etc. etc.). For the last three months I’ve also had an Apple Watch, which makes its own recordings of its vital statistics.

Until this morning these two data sets had been disjoint – until I noticed an interesting pattern in my average sleeping heart rate. And then I decided to join them and do some analysis. A time series to start:

Notice the three big spikes in recent times. And they only seem to be getting higher (I’ll come to that in a bit).

And then sometimes a time series doesn’t do justice to patterns – absent the three recent big spikes it’s hard to see from this graph if alcohol has an impact on sleep heart rate. This is where a boxplot can help.

The difference is evident here – when I have alcohol, my heart rate during sleep is much higher, which means I don’t rest as well.

That said, like everything else in the world, it is not binary. Go back to the time series and see – I’ve had alcohol fairly often in this time period but my heart rate hasn’t spiked as much on all days. This is where quantity of alcohol comes in.

Most days when I drink, it’s largely by myself at home. A glass or two of either single malt or wine. And the impact on sleep is only marginal. So far so good.

On 26th, a few colleagues had come home. We all drank Talisker. I had far more than I normally have. And so my heart rate spiked (79). And then on June 1st, I took my team out to Arbor. Pretty much for the first time in 2022 I was drinking beer. I drank a fair bit. 84.

And then on Saturday I went for a colleague’s birthday party. There were only cocktails. I drank lots of rum and coke (I almost never drink rum). 89.

My usual drinking, if you see, doesn’t impact my health that much. But big drinking is big problem, especially if it’s a kind of alcohol I don’t normally drink.

Now, in the interest of experimentation, one of these days I need to have lots of wine and see how I sleep!

PS: FWIW Sleeping heart rate is uncorrelated with how much coffee I have

PS2: Another time I wrote about alcohol

PS3: Maybe in my daily log I need to convert the alcohol column from binary to numeric (and record the number of units of alcohol I drink)

 

Structures of professions and returns to experience

I’ve written here a few times about the concept of “returns to experience“. Basically, in some fields such as finance, the “returns to experience” is rather high. Irrespective of what you have studied or where, how long you have continuously been in the industry and what you have been doing has a bigger impact on your performance than your way of thinking or education.

In other domains, returns to experience is far less. After a few years in the profession, you would have learnt all you had to, and working longer in the job will not necessarily make you better at it. And so you see that the average 15 years experience people are not that much better than the average 10 years experience people, and so you see salaries stagnating as careers progress.

While I have spoken about returns to experience, till date, I hadn’t bothered to figure out why returns to experience is a thing in some, and only some, professions. And then I came across this tweetstorm that seeks to explain it.

Now, normally I have a policy of not reading tweetstorms longer than six tweets, but here it was well worth it.

It draws upon a concept called “cognitive flexibility theory”.

Basically, there are two kinds of professions – well-structured and ill-structured. To quickly summarise the tweetstorm, well-structured professions have the same problems again and again, and there are clear patterns. And in these professions, first principles are good to reason out most things, and solve most problems. And so the way you learn it is by learning concepts and theories and solving a few problems.

In ill-structured domains (eg. business or medicine), the concepts are largely the same but the way the concepts manifest in different cases are vastly different. As a consequence, just knowing the theories or fundamentals is not sufficient in being able to understand most cases, each of which is idiosyncratic.

Instead, study in these professions comes from “studying cases”. Business and medicine schools are classic examples of this. The idea with solving lots of cases is NOT that you can see the same patterns in a new case that you see, but that having seen lots of cases, you might be able to reason HOW to approach a new case that comes your way (and the way you approach it is very likely novel).

Picking up from the tweetstorm once again:

 

It is not hard to see that when the problems are ill-structured or “wicked”, the more the cases you have seen in your life, the better placed you are to attack the problem. Naturally, assuming you continue to learn from each incremental case you see, the returns to experience in such professions is high.

In securities trading, for example, the market takes very many forms, and irrespective of what chartists will tell you, patterns seldom repeat. The concepts are the same, however. Hence, you treat each new trade as a “case” and try to learn from it. So returns to experience are high. And so when I tried to reenter the industry after 5 years away, I found it incredibly hard.

Chess, on the other hand, is well-structured. Yes, alpha zero might come and go, but a lot of the general principles simply remain.

Having read this tweetstorm, gobbled a large glass of wine and written this blogpost (so far), I’ve been thinking about my own profession – data science. My sense is that data science is an ill-structured profession where most practitioners pretend it is well-structured. And this is possibly because a significant proportion of practitioners come from academia.

I keep telling people about my first brush with what can now be called data science – I was asked to build a model to forecast demand for air cargo (2006-7). The said demand being both intermittent (one order every few days for a particular flight) and lumpy (a single order could fill up a flight, for example), it was an incredibly wicked problem.

Having had a rather unique career path in this “industry” I have, over the years, been exposed to a large number of unique “cases”. In 2012, I’d set about trying to identify patterns so that I could “productise” some of my work, but the ill-structured nature of problems I was taking up meant this simply wasn’t forthcoming. And I realise (after having read the above-linked tweetstorm) that I continue to learn from cases, and that I’m a much better data scientist than I was a year back, and much much better than I was two years back.

On the other hand, because data science attracts a lot of people from pure science and engineering (classically well-structured fields), you see a lot of people trying to apply overly academic or textbook approaches to problems that they see. As they try to divine problem patterns that don’t really exist, they fail to recognise novel “cases”. And so they don’t really learn from their experience.

Maybe this is why I keep saying that “in data science, years of experience and competence are not correlated”. However, fundamentally, that ought NOT to be the case.

This is also perhaps why a lot of data scientists, irrespective of their years of experience, continue to remain “junior” in their thinking.

PS: The last few paragraphs apply equally well to quantitative finance and economics as well. They are ill-structured professions that some practitioners (thanks to well-structured backgrounds) assume are well-structured.

Compression Stereotypes

One of the most mindblowing things I learnt while I was doing my undergrad in Computer Science and Engineering was Lempel-Ziv-Welch (LZW) compression. It’s one of the standard compression algorithms used everywhere nowadays.

The reason I remember this is twofold – firstly, I remember implementing this as part of an assignment (our CSE program at IITM was full of those), and feeling happy to be coding in C rather than in the dreaded Java (which we had to use for most other assignments).

The other is that this is one of those algorithms that I “internalised” while doing something totally different – in this case I was having coffee/ tea with a classmate in our hostel mess.

I won’t go into the algorithm here. However, the basic concept is that as and when we see a new pattern, we give it a code, and every subsequent occurrence of that pattern is replaced by its corresponding code. And the beauty of it is that you don’t need to ship a separate dictionary -the compressed code itself encapsulates it.

Anyway, in practical terms, the more the same kind of patterns are repeated in the original file, the more the file can be compressed. In some sense, the more the repetition of patterns, the less the overall “information” that the original file can carry – but that discussion is for another day.

I’ve been thinking of compression in general and LZW compression in particular when I think of stereotyping. The whole idea of stereotyping is that we are fundamentally lazy, and want to “classify” or categorise or pigeon-hole people using the fewest number of bits necessary.

And so, we use lazy heuristics – gender, caste, race, degrees, employers, height, even names, etc. to make our assumptions of what people are going to be like. This is fundamentally lazy, but also effective – in a sense, we have evolved to stereotype people (and objects and animals) because that allows our brain to be efficient; to internalise more data by using fewer bits. And for this precise reason, to some extent, stereotyping is rational.

However, the problem with stereotypes is that they can frequently be wrong. We might see a name and assume something about a person, and they might turn out to be completely different. The rational response to this is not to beat oneself for stereotyping in the first place – it is to update one’s priors with the new information that one has learnt about this person.

So, you might have used a combination of pre-known features of a person to categorise him/her. The moment you realise that this categorisation is wrong, you ought to invest additional bits in your brain to classify this person so that the stereotype doesn’t remain any more.

The more idiosyncratic and interesting you are, the more the number of bits that will be required to describe you. You are very very different from any of the stereotypes that can possibly be used to describe you, and this means people will need to make that effort to try and understand you.

One of the downsides of being idiosyncratic, though, is that most people are lazy and won’t make the effort to use the additional bits required to know you, and so will grossly mischaracterise you using one of the standard stereotypes.

On yet another tangential note, getting to know someone is a Bayesian process. You make your first impressions of them based on whatever you find out about them, and go on building a picture of them incrementally based on the information you find out about them. It is like loading a picture on a website using a bad internet connection – first the picture appears grainy, and then the more idiosyncratic features can be seen.

The problem with refusing to use stereotypes, or demonising stereotypes, is that you fail to use the grainy pictures when that is the best available, and instead infinitely wait to get better pictures. On the other hand, failing to see beyond stereotypes means that you end up using grainy pictures when more clear ones are available.

And both of these approaches is suboptimal.

PS: I’ve sometimes wondered why I find it so hard to remember certain people’s faces. And I realise that it’s usually because they are highly idiosyncratic and not easy to stereotype / compress (both are the same thing). And so it takes more effort to remember them, and if I don’t really need to remember them so much, I just don’t bother.

Legacy Metrics

Yesterday (or was it the day before? I’ve lost track of time with full time WFH now) the Times of India Bangalore edition had two headlines.

One was the Karnataka education minister BC Nagesh talking about deciding on school closures on a taluk (sub-district) wise basis. “We don’t want to take a decision for the whole state. However, in taluks where test positivity is more than 5%, we will shut schools”, he said.

That was on page one.

And then somewhere inside the newspaper, there was another article. The Indian Council for Medical Research has recommended that “only symptomatic patients should be tested for Covid-19”. However, for whatever reason, Karnataka had decided to not go by this recommendation, and instead decided to ramp up testing.

These two articles are correlated, though the paper didn’t say they were.

I should remind you of one tweet, that I elaborated about a few days back:

 

The reason why Karnataka has decided to ramp up testing despite advisory to the contrary is that changing policy at this point in time will mess with metrics. Yes, I stand by my tweet that test positivity ratio is a shit metric. However, with the government having accepted over the last two years that it is a good metric, it has become “conventional wisdom”. Everyone uses it because everyone else uses it. 

And so you have policies on school shutdowns and other restrictive measures being dictated by this metric – because everyone else uses the same metric, using this “cannot be wrong”. It’s like the old adage that “nobody got fired for hiring IBM”.

ICMR’s message to cut testing of asymptomatic individuals is a laudable one – given that an overwhelming number of people infected by the incumbent Omicron variant of covid-19 have no symptoms at all. The reason it has not been accepted is that it will mess with the well-accepted metric.

If you stop testing asymptomatic people, the total number of tests will drop sharply. The people who are ill will get themselves tested anyways, and so the numerator (number of positive reports) won’t drop. This means that the ratio will suddenly jump up.

And that needs new measures – while 5% is some sort of a “critical number” now (like it is with p-values), the “critical number” will be something else. Moreover, if only symptomatic people are to be tested, the number of tests a day will vary even more – and so the positivity ratio may not be as stable as it is now.

All kinds of currently carefully curated metrics will get messed up. And that is a big problem for everyone who uses these metrics. And so there will be pushback.

Over a period of time, I expect the government and its departments to come up alternate metrics (like how banks have now come up with an alternative to LIBOR), after which the policy to cut testing for asymptomatic people will get implemented. Until then, we should bow to the “legacy metric”.

And if you didn’t figure out already, legacy metrics are everywhere. You might be the cleverest data scientist going around and you might come up with what you think might be a totally stellar metric. However, irrespective of how stellar it is, that people have to change their way of thinking and their process to process it means that it won’t get much acceptance.

The strategy I’ve come to is to either change the metric slowly, in stages (change it little by little), or to publish the new metric along with the old one. Depending on how clever the new metric is, one of the metrics will die away.

Metrics

Over the weekend, I wrote this on twitter:

 

Surprisingly (at the time of writing this at least), I haven’t got that much abuse for this tweet, considering how “test positivity” has been held as the gold standard in terms of tracking the pandemic by governments and commentators.

The reason why I say this is a “shit metric” is simple – it doesn’t give that much information. Let’s think about it.

For a (ratio) metric to make sense, both the numerator and the denominator need to be clearly defined, and there needs to be clear information content in the ratio. In this particular case, both the numerator and the denominator are clear – latter is the number of people who got Covid tests taken, and the former is the number of these people who returned a positive test.

So far so good. Apart from being an objective measure, test positivity ratio is  also a “ratio”, and thus normalised (unlike absolute number of positive tests).

So why do I say it doesn’t give much information? Because of the information content.

The problem with test positivity ratio is the composition of the denominator (now we’re getting into complicated territory). Essentially, there are many reasons why people get tested for Covid-19. The most obvious reason to get tested is that you are ill. Then, you might get tested when a family member is ill. You might get tested because your employer mandates random tests. You might get tested because you have to travel somewhere and the airline requires it. And so on and so forth.

Now, for each of these reasons for getting tested, we can define a sort of “prior probability of testing positive” (based on historical averages, etc). And the positivity ratio needs to be seen in relation to this prior probability. For example, in “peaceful times” (eg. Bangalore between August and November 2021), a large proportion of the tests would be “random” – people travelling or employer-mandated. And this would necessarily mean a low test positivity.

The other extreme is when the disease is spreading rapidly – few people are travelling or going physically to work. Most of the people who get tested are getting tested because they are ill. And so the test positivity ratio will be rather high.

Basically – rather than the ratio telling you how bad the covid situation is in a region, it is influenced by how bad the covid situation is. You can think of it as some sort of a Schrödinger-ian measurement.

That wasn’t an offhand comment. Because government policy is an important input into test positivity ratio. For example, take “contact tracing”, where contacts of people who have tested positive are hunted down and also tested. The prior probability of a contact of a covid patient testing positive is far higher than the prior probability of a random person testing positive.

And so, as and when the government steps up contact tracing (as it does in the early days of each new wave), test positivity ratio goes up, as more “high prior probability” people get tested. Similarly, whether other states require a negative test to travel affects positivity ratio – the more the likelihood that you need a test to travel, the more likely that “low prior probability” people will take the test, and the lower the ratio will be. Or when governments decide to “randomly test” people (puling them off the streets of whatever), the ratio will come down.

In other words – the ratio can be easily gamed by governments, apart from just being influenced by government policy.

So what do we do now? How do we know whether the Covid-19 situation is serious enough to merit clamping down on people’s liberties? If test positivity ratio is a “shit metric” what can be a better one?

In this particular case (writing this on 3rd Jan 2022), absolute number of positive cases is as bad a metric as test positivity – over the last 3 months, the number of tests conducted in Bangalore has been rather steady. Moreover, the theory so far has been that Omicron is far less deadly than earlier versions of Covid-19, and the vaccination rate is rather high in Bangalore.

While defining metrics, sometimes it is useful to go back to first principles, and think about why we need the metric in the first place and what we are trying to optimise. In this particular case, we are trying to see when it makes sense to cut down economic activity to prevent the spread of the disease.

And why do we need lockdowns? To prevent hospitals from getting overwhelmed. You might remember the chaos of April-May 2021, when it was near impossible to get a hospital bed in Bangalore (even crematoriums had long queues). This is a situation we need to avoid – and the only one that merits lockdowns.

One simple measure we can use is to see how many hospital beds are actually full with covid patients, and if that might become a problem soon. Basically – if you can measure something “close to the problem”, measure it and use that as the metric. Rather than using proxies such as test positivity.

Because test positivity depends on too many factors, including government action. Because we are dealing with a new variant here, which is supposedly less severe. Because most of us have been vaccinated now, our response to getting the disease will be different. The change in situation means the old metrics don’t work.

It’s interesting that the Mumbai municipal corporation has started including bed availability in its daily reports.