Legacy Metrics

Yesterday (or was it the day before? I’ve lost track of time with full time WFH now) the Times of India Bangalore edition had two headlines.

One was the Karnataka education minister BC Nagesh talking about deciding on school closures on a taluk (sub-district) wise basis. “We don’t want to take a decision for the whole state. However, in taluks where test positivity is more than 5%, we will shut schools”, he said.

That was on page one.

And then somewhere inside the newspaper, there was another article. The Indian Council for Medical Research has recommended that “only symptomatic patients should be tested for Covid-19”. However, for whatever reason, Karnataka had decided to not go by this recommendation, and instead decided to ramp up testing.

These two articles are correlated, though the paper didn’t say they were.

I should remind you of one tweet, that I elaborated about a few days back:

 

The reason why Karnataka has decided to ramp up testing despite advisory to the contrary is that changing policy at this point in time will mess with metrics. Yes, I stand by my tweet that test positivity ratio is a shit metric. However, with the government having accepted over the last two years that it is a good metric, it has become “conventional wisdom”. Everyone uses it because everyone else uses it. 

And so you have policies on school shutdowns and other restrictive measures being dictated by this metric – because everyone else uses the same metric, using this “cannot be wrong”. It’s like the old adage that “nobody got fired for hiring IBM”.

ICMR’s message to cut testing of asymptomatic individuals is a laudable one – given that an overwhelming number of people infected by the incumbent Omicron variant of covid-19 have no symptoms at all. The reason it has not been accepted is that it will mess with the well-accepted metric.

If you stop testing asymptomatic people, the total number of tests will drop sharply. The people who are ill will get themselves tested anyways, and so the numerator (number of positive reports) won’t drop. This means that the ratio will suddenly jump up.

And that needs new measures – while 5% is some sort of a “critical number” now (like it is with p-values), the “critical number” will be something else. Moreover, if only symptomatic people are to be tested, the number of tests a day will vary even more – and so the positivity ratio may not be as stable as it is now.

All kinds of currently carefully curated metrics will get messed up. And that is a big problem for everyone who uses these metrics. And so there will be pushback.

Over a period of time, I expect the government and its departments to come up alternate metrics (like how banks have now come up with an alternative to LIBOR), after which the policy to cut testing for asymptomatic people will get implemented. Until then, we should bow to the “legacy metric”.

And if you didn’t figure out already, legacy metrics are everywhere. You might be the cleverest data scientist going around and you might come up with what you think might be a totally stellar metric. However, irrespective of how stellar it is, that people have to change their way of thinking and their process to process it means that it won’t get much acceptance.

The strategy I’ve come to is to either change the metric slowly, in stages (change it little by little), or to publish the new metric along with the old one. Depending on how clever the new metric is, one of the metrics will die away.

Metrics

Over the weekend, I wrote this on twitter:

 

Surprisingly (at the time of writing this at least), I haven’t got that much abuse for this tweet, considering how “test positivity” has been held as the gold standard in terms of tracking the pandemic by governments and commentators.

The reason why I say this is a “shit metric” is simple – it doesn’t give that much information. Let’s think about it.

For a (ratio) metric to make sense, both the numerator and the denominator need to be clearly defined, and there needs to be clear information content in the ratio. In this particular case, both the numerator and the denominator are clear – latter is the number of people who got Covid tests taken, and the former is the number of these people who returned a positive test.

So far so good. Apart from being an objective measure, test positivity ratio is  also a “ratio”, and thus normalised (unlike absolute number of positive tests).

So why do I say it doesn’t give much information? Because of the information content.

The problem with test positivity ratio is the composition of the denominator (now we’re getting into complicated territory). Essentially, there are many reasons why people get tested for Covid-19. The most obvious reason to get tested is that you are ill. Then, you might get tested when a family member is ill. You might get tested because your employer mandates random tests. You might get tested because you have to travel somewhere and the airline requires it. And so on and so forth.

Now, for each of these reasons for getting tested, we can define a sort of “prior probability of testing positive” (based on historical averages, etc). And the positivity ratio needs to be seen in relation to this prior probability. For example, in “peaceful times” (eg. Bangalore between August and November 2021), a large proportion of the tests would be “random” – people travelling or employer-mandated. And this would necessarily mean a low test positivity.

The other extreme is when the disease is spreading rapidly – few people are travelling or going physically to work. Most of the people who get tested are getting tested because they are ill. And so the test positivity ratio will be rather high.

Basically – rather than the ratio telling you how bad the covid situation is in a region, it is influenced by how bad the covid situation is. You can think of it as some sort of a Schrödinger-ian measurement.

That wasn’t an offhand comment. Because government policy is an important input into test positivity ratio. For example, take “contact tracing”, where contacts of people who have tested positive are hunted down and also tested. The prior probability of a contact of a covid patient testing positive is far higher than the prior probability of a random person testing positive.

And so, as and when the government steps up contact tracing (as it does in the early days of each new wave), test positivity ratio goes up, as more “high prior probability” people get tested. Similarly, whether other states require a negative test to travel affects positivity ratio – the more the likelihood that you need a test to travel, the more likely that “low prior probability” people will take the test, and the lower the ratio will be. Or when governments decide to “randomly test” people (puling them off the streets of whatever), the ratio will come down.

In other words – the ratio can be easily gamed by governments, apart from just being influenced by government policy.

So what do we do now? How do we know whether the Covid-19 situation is serious enough to merit clamping down on people’s liberties? If test positivity ratio is a “shit metric” what can be a better one?

In this particular case (writing this on 3rd Jan 2022), absolute number of positive cases is as bad a metric as test positivity – over the last 3 months, the number of tests conducted in Bangalore has been rather steady. Moreover, the theory so far has been that Omicron is far less deadly than earlier versions of Covid-19, and the vaccination rate is rather high in Bangalore.

While defining metrics, sometimes it is useful to go back to first principles, and think about why we need the metric in the first place and what we are trying to optimise. In this particular case, we are trying to see when it makes sense to cut down economic activity to prevent the spread of the disease.

And why do we need lockdowns? To prevent hospitals from getting overwhelmed. You might remember the chaos of April-May 2021, when it was near impossible to get a hospital bed in Bangalore (even crematoriums had long queues). This is a situation we need to avoid – and the only one that merits lockdowns.

One simple measure we can use is to see how many hospital beds are actually full with covid patients, and if that might become a problem soon. Basically – if you can measure something “close to the problem”, measure it and use that as the metric. Rather than using proxies such as test positivity.

Because test positivity depends on too many factors, including government action. Because we are dealing with a new variant here, which is supposedly less severe. Because most of us have been vaccinated now, our response to getting the disease will be different. The change in situation means the old metrics don’t work.

It’s interesting that the Mumbai municipal corporation has started including bed availability in its daily reports.

Ronald Coase, Scott Adams and Intrapersonal Vertical Integration

I have a new HR policy. I call it “intrapersonal vertical integration”. Read on.

I

Back in the 193os, economist Ronald Coase wrote an article on “the nature of the firm” (the link is to Wikipedia, not to the actual paper). It was a description of why people form companies and partnerships and so on, rather than all being gig workers negotiating each piece of work.

The key concept here was one of transaction costs – if everyone were to be a freelancer, like I was between 2012 and 2020 (both included), then for every little piece of work there would need to be a piece of negotiation.

“Can you build this dashboard for me?”
“Yes. That would be $10000”
“No, I’ll only pay $2000”
“9000”
“3000 final”
“get lost”

During my long period of freelancing, I internalised this, and came up with a “minimum order value” – a reasonable amount which could account for transaction costs like the above (just as I write this, I’m changing videos on Youtube for my wife, and she’s asking me to put 30 second videos. And I’m refusing saying “too much transaction cost. I need my hands for something else (blogging)” ).

This worked out fine for the projects that I actually got, but transaction costs meant that a lot of the smaller deals never worked out. I lost out on potential revenue from those, and my potential clients lost out on work getting done.

So, instead, if I were to be part of a company, like I am now, transaction costs are far lower. Yes, we might negotiate on exact specifications, or deadlines, but price was a single negotiation at the time I joined the firm. And so a lot more work gets done – better for me and better for the company. And this is why companies exist. It might sound obvious, but Coase put it in a nice and elegant theoretical framework.

II

I’ve written about this several times on my blog – Scott Adams’s theory that there are two ways in which you can be really successful.

1. Become the best at one specific thing.
2. Become very good (top 25%) at two or more things.

This is advice that I have taken seriously, and I’ve followed the second path. Being the best at one specific thing is too hard, and too random as well – “the best” is a sort of a zero sum game. Instead, being very good in a few things is easier to do, and as I’d said in one of my other posts on this, being very good in uncorrelated things is a clear winner.

I will leave this here and come back later on in the post, like how Dasharatha gave some part of the mango to Sumitra (second in line), and then decided to come back to her later on in the distribution.

III

I came up with this random theory the other day on the purpose of product managers. This theory is really random and ill-formed, and I haven’t bothered discussing it with any real product managers.

The need for product managers comes from software engineers’ insistence on specific “system requirement specifications”. 

I learnt software engineering in a formal course back in 2002. Back then, the default workflow for software engineering was the so-called “waterfall model”. It was a linear sequential thing where the first part of the process goes in clearly defining system requirement specifications. Then there would be an unambiguous “design document”. And only then would coding begin.

In that same decade (2000s), “agile” programming became a thing. This meant fast iterations and continuous improvements. Software would be built layer by layer. However, software engineers had traditionally worked only with precise specifications, and “ambiguous business rules” would throw them off. And so the role of the product manager was created – who would manage the software product in a way that they would interface with ambiguous business on one side, and precise software engineers on the other.

Their role was to turn ambiguity to certainty, and get work done. They would never be hands on – instead their job would be to give precise instructions to people who would be hands on.

I have never worked as either a software engineer or a product manager, but I don’t think I’d enjoy either job. On the one hand, I don’t like being given precise instructions, and instead prefer ambiguity. On the other, if I were to give precise instructions, I would rather use C++ or Python to give those instructions than English or Kannada. In other words, if I were to be precise in my communication, I would rather talk to a computer than to another human.

It possibly has to do with my work history. I spent a little over two years as a quant at a top tier investment bank. As part of the job, I was asked to write production code. I used to protest, saying writing C++ code wasn’t the best use of my time or effort. “But think about the effort involved in explaining your model to someone else”, the higher ups in the company would tell me. “Wouldn’t it be far easier to just code it yourself?”

IV

Coase reasoned that transaction costs are the reason why we need a firm. We don’t need frequent negotiations and transaction costs, so if people were to get together in the form of a firm, they could coordinate much better and get a lot more work done, with more value accruing to every party involve.

However, I don’t think Coase went far enough. Just putting people in one firm only eliminates one level of transaction costs – of negotiating conditions and prices. Even when you are in the same firm, coordinating with colleagues implies communication, and unless precise, the communication links can end up being the weak links in how much the firm can achieve.

Henry Ford’s genius was to recognise the assembly line (a literal conveyor belt) as a precise form of communication. The workers in his factories were pretty much automatons, doing their precise job, in the knowledge that everyone else was doing their own. The assembly line made communication simpler, and that allowed greater specialisation to unlock value in the firm – to the extent that each worker could get at least five dollars a day and the firm would still be profitable.

It doesn’t work so neatly in what can be classified as “knowledge industries”. Like with the product manager and the software engineer, there is a communication layer which, if it fails, can bring down the entire process.

And there are other transaction costs implied in this communication – let’s say you are building stuff that I need to build on to make the final product. Every time I think you need to build something slightly different, it involves a process of communication and negotiation. It involves the product manager to write a new section in the document. And when working on complex problems, this can increase the complexity multifold.

So we are back to Scott Adams (finally). Building on what I’d said before – you need to be “very good” at two or more things, and it helps if these things are uncorrelated (in terms of being able to add unique value). However, it is EVEN MORE USEFUL if the supposedly uncorrelated skills you have can be stacked, in a form of vertical integration.

In other words, if you are good at several things that are uncorrelated, where the output of one thing can be the input into another, you are a clear winner.

Adams, for example, is good at understanding business, he is funny and he can draw. The combination of the first two means that he can write funny business stories, and that he can also draw means he has created a masterpiece in the form of Dilbert.

Don’t get me wrong – you can have a genius storyteller and a genius artist come together to make great art (Goscinny and Uderzo, for example). However, it takes a lot of luck for a Goscinny to find his Uderzo, or vice versa. I haven’t read much Asterix but what I’m old by friends is that the quality dropped after Uderzo was forced to be his own Goscinny (after the latter died).

At a completely different level – I have possibly uncorrelated skills in understanding business and getting insight out of data. One dovetails into the other and so I THINK I’m doing well in business intelligence. If I were only good at business, and needed to keep asking someone to churn the data on each iteration, my output would be far far slower and poorer.

So I extend this idea into “intrapersonal vertical integration”. If you are good at two or more things, and one can lead into another, you have a truly special set of skills and can be really successful.

Putting it another way – in knowledge jobs, communication can be so expensive that if you can vertically integrate yourself across multiple jobs, you can add significant value even if you are not the best at each of the individual skills.

Finish

In knowledge work, communication is the weakest link, so the fewer levels of communication you have, the better and faster you can do your job. Even if you get the best for every level in your chain, the strength (or lack of it) of communication between them can mean that they produce suboptimal output.

Instead if you can get people who are just good at two or more things in the chain (rather than being the best at any one), you can add significantly better value.

Putting it another way, yes, I’m batting for bits-and-pieces players rather than genuine batsmen or bowlers. However, the difference between what I’m saying and cricket is that in cricket batting and bowling are not vertically integrated. If they were, bits and pieces players would work far far better.

The Downside

I’ve written about this before. While being good at uncorrelated things that dovetail into one another can be a great winning strategy, liquidity can be your enemy. That you are unique means that there aren’t too many like you. And so organisations may not want to bet too much on you – since you will be hard to replace. And decide to take the slack in communication and get specialists for each position instead.

PS: 

I have written a book on transaction costs and liquidity. As it happens, today it is on display at the Bangalore Literature Festival.

Cross posted on LinkedIn

Why calls are disruptive to work

It is well known in my company that I don’t like phone calls. I mean – they are useful at times, but they have their time and place. For most normal office communication, it is far easier to do it using chat or mail, and less disruptive to your normal work day.

Until recently, I hadn’t been able to really articulate why phone calls (this includes Meet / Zoom / Teams / whatever) are disruptive to work, but recently had an epiphany when I was either drunk or hungover (can’t remember which now) during/after a recent company party.

Earlier that day, during the said party, one colleague (let’s call him C1) had told me about another colleague (let’s call him C2) and his (C2’s) penchant for phone calls. “Sometimes we would have written a long detailed document”, C1 said, “and then C2 will say, ‘I have to make one small point here. Can you please call me?’. He’s just the opposite of you”

I don’t know why after this I started thinking about circuit switching and packet switching. And then I realised why I hate random office calls.

Currently I use a Jio connection for my phone. The thing with Ji0 (and 4G in general, I think) is that it uses packet switching for phone calls – it uses the same data network for calls as well. This is different from earlier 2G (and 3G as well, if I’m not wrong) networks where calls were made on a different voice (circuit switching) network. Back then, if you got a call, your phone’s data connection would get interrupted – no packages could be sent because your phone was connected through a circuit. It was painful.

Now, with packet switching for phone calls as well, the call “packets” and the browsing “packets” can coexist and co-travel on the “pipes” connecting the phone to the tower and the wide world beyond. So you can take phone calls while still using data.

Phone calls in the middle of work disrupt work in exactly the same way.

The thing with chatting with someone while you’re working is that you can multitask. You send a message and by the time they reply you might have written a line of code, or sent another message to someone else. This means chatting doesn’t really disrupt work -it might slow down work (since you’re also doing work in smaller packets now), but your work goes on. Your other chats go on. You don’t put your life on hold because of this call.

A work phone call (especially if it has to be a video call) completely disrupts this network. Suddenly you have to give one person (or persons) at the end of the line your complete undivided attention. Work gets put on hold. Your other conversations get put on hold. The whole world slows down for you.

And once you hang up, you have the issue of gathering the context again on what you were doing and what you were thinking about and the context of different conversations (this is a serious problem for me). Everything gets disrupted. Sometimes it is even difficult to start working again.

I don’t know if this issue is specific to me because of my ADHD (and hence the issues in restarting work). Actually – ADHD leads to another problem. You might be hyper focussing on one thing at work, and when you get a call you are still hyper focussed on the same thing. And that means you can’t really pay attention to the call you are on, and can end up saying some shit. With chat / email, you don’t need to respond to everything immediately, so you can wait until the hyper focus is over!

In any case, I’m happy that I have the reputation I have, that I don’t like doing calls and prefer to do everything through text. The only downside I can think of of this is that you have to put everything in writing.

PSA: Google Calendar now allows you to put “focus time” on your own calendar. So far I haven’t used it too much but plan to use it more in the near future.

 

Pipe jobs

Sangeet Paul Choudary, my friend from business school, became a global business guru essentially based on one idea – that businesses can either be “platforms” or “pipes”, and that a business that is a platform can add far more value than a business that is just a pipe.

If I think about it, I currently work for a company that can be best described as a pipe (rather than a platform) and I think it’s doing quite well. From that perspective, though a platform business can be more successful it’s possible to build a good pipe business as well.

All that aside – one random thought I’ve got in recent days is that – pipes and platforms don’t apply to businesses alone. Even people can be “pipes”. Rather certain peoples jobs make them pipes. In other words they are pipe jobs.

What are pipe jobs? These are jobs where the persons responsibility is to act as a pipe between two other people. The pair of people they connect can vary over time – but this is the essence of the job. Essentially the job is about acting as a bridge between two people.

The classic pipe job is the translator or interpreter – whose job is to literally ensure that two people who might otherwise find it hard to communicate can communicate.

However there are more such jobs. For example you must have come across people in your company who – irrespective of what you as them, ask someone else for the answer. And then convey that answer to you. In other words – they are a pipe through which the question and answer flows.

That said, they need not ask the same person for the answer each time. Instead they might decide based on the question who the right person to ask might be. In fact that is a classic way in which they add value – by determining which two ends to connect themselves to.

Spokespersons and envoys, of course, are again classic pipes. They lack independent authority but represent their masters/mistresses, and act as a pipe between them and the rest of the world. Unlike the corporate pipes mentioned above, theee people usually don’t add the additional value of figuring out which ends to connect.

So in a corporate context, how do you go from being a pipe to a platform ? A risk averse way is to be a connector – to determine which two ends to connect each time you are asked something. I thjnk there are several titles for this kind of role – seen a lot in software companies.

A more risky but much more rewarding way to get out of pipedom is to develop an opinion – you might still connect and represent people but over a period of time you learn and develop an opinion. So not every question needs to be forwarded to the other end of the pipe. However your years as a pipe would have helped you build credibility among the ends of the pipe. And so you can be a better pipe.

I think this theory is genetic enough – most of you who work for companies should be able to think of several roles whose jobs essentially involve being a pipe!

What have I missed out on here ?

Why online meetings work but not online conferences

Sitting through a “slip fielding meeting” this morning, I had an epiphany – on why office work and meetings have adjusted fairly well to online formats, but not conferences. It has to do with backchannel conversations.

In meetings where everyone is in the same room, there is naturally just one conversation. Everyone is speaking to everyone else at the same time. Unless the meeting is humongously large, it is considered rude for people to “cross talk” in the meeting, and hence there is just one conversation. Of course, in the last decade or so, people have taken to texting at meetings and stuff, but that is still small.

The advantage with moving this kind of a meeting online is that now crosstalk is fully legit, as long as you are doing it using text only. Anyway, everyone is sitting with their computers. All it takes is one simple alt-tab or command-tab, and you can chat away with others present in the meeting. In fact, this makes online meetings MORE efficient by increasing the information flow (since the main channel of large meetings are usually low throughput).

It is the other way round with conferences and events. In conferences and events, the whole point is backchannel conversation. Pretty much nobody is there to listen to the lectures or panel discussions anyways – all that most attendees want to do is to meet other attendees.

And off-line conferences are conveniently structured to enable such interaction. By having multiple parallel sessions, for example, it becomes legit to just stay out and talk to others. There is always a buzz in the corridors (one conference which was single-session-at-a-time only turned out to be bloody boring).

The other thing is that most backchannel and side channel conversations at conferences are between people who don’t yet know each other, and who are there for discovery. So you need to physically bump into someone to talk to them – you can’t randomly start a conversation with someone.

And this translates horribly to online. Online is great for backchannel and side channel conversations with people you already know well – like colleagues. When you don’t know most other people, side channel conversation is awkward. And the main channel content in conferences is largely useless anyway.

This is why it is important that conferences and seminars and other such events move to an offline format asap. For large work meetings we can continue online even after we’re all back at office.

PS: I’m firmly in the DJ D-Sol camp in terms of calling people back to work, at least to make them live in or close to the “home locations”. This way, you have the optionality to meet at short notice without planning, something that fully remote work makes it really hard.

Modelling for accuracy

Recently I’ve been remembering the first assignment of my “quantitative methods 2” course at IIMB back in 2004. In the first part of that course, we were learning regression. And so this assignment involved a regression problem. Not too hard at first sight – maybe 3 explanatory variables.

We had been randomly divided into teams of four. I remember working on it in the Computer Centre, in close proximity to some other teams. I remember trying to “do gymnastics” – combining variables, transforming them, all in the hope of trying to get the “best possible R square”. From what I remember, most of the groups went “R square hunting” that day. The assignment had been cleverly chosen such that for an academic exercise, the R Square wasn’t very high.

As an aside – one thing a lot of people take a long time to come to terms with is that in “real life” (industry problems) R squares aren’t usually that high. Forecast accuracy isn’t that high. And that the elegant methods they had learnt back in school / academia may not be as elegant any more in industry. I think I’ve written about this, but I can’t find the link now.

Anyway, back to QM2. I remember the professor telling us that three groups would be chosen at random on the day of the assignment submission, and from each of these three groups one person would be chosen at random who would have to present the group’s solution to the class. I remember that the other three people in my group all decided to bunk class that day! In any case, our group wasn’t called to present.

The whole point of this massive build up is – our approach (and the approach of most other groups) had been all wrong. We had just gone in a mad hunt for R square, not bothering to figure out whether the wild transformations and combinations that we were making made any business sense. Moreover, in our mad hunt for R square, we had all forgotten to consider whether a particular variable was significant, and if the regression itself was significant.

What we learnt was that while R square matters, it is not everything. The “model needs to be good”. The variables need to make sense. In statistics you can’t just go about optimising for one metric – there are several others. And this lesson has stuck with me. And guides how I approach all kinds of data modelling work. And I realise that is in conflict with the way data science is widely practiced nowadays.

The way data science is largely practiced in the wild nowadays is precisely a mad hunt for R Square (or area under ROC curve, if you’re doing a classification problem). Whether the variables used make sense doesn’t matter. Whether the transformations are sound doesn’t matter. It doesn’t matter at all whether the model is “good”, or appropriate – the only measure of goodness of the model seems to be the R square!

In a way, contests such as Kaggle have exacerbated this trend. In contests, typically, there is a precise metric (such as R Square) that you are supposed to maximise. With contests being evaluated algorithmically, it is difficult to evaluate on multiple parameters – especially not whether “the model is good”. And since nowadays a lot of data scientists hone their skills by participating in contests such as on Kaggle, they are tuned to simply go R square hunting.

Also, the big difference between Kaggle and real life is that in Kaggle, the model that you build doesn’t matter. It’s just a combination. You get the best R square. You win. You take the prize. You go home.

You don’t need to worry about how the data for the model was collected. The model doesn’t have to be implemented. No business decisions need to be made based on the model. Contest done, model done.

Obviously that is not how things work in real life. Building the model is only one in a long series of steps in solving the business problem. And when you focus too much on just one thing – the model’s accuracy in the data that you have been given, a lot can be lost in the rest of the chain (including application of the model in future situations).

And in this way, by focussing on just a small portion of the entire data science process (model building), I think Kaggle (and other similar competition platforms) has actually done a massive disservice to data science itself.

Tailpiece

This is completely unrelated to the rest of the post, but too small to merit a post of its own.

Suppose you ask a software engineer to sort a few datasets. He goes about applying bubble sort, heap sort, quick sort, insertion sort and a whole host of other techniques. And then picks the one that sorted the given datasets fastest.

That’s precisely how it seems “data science” is practiced nowadays

Junior Data Scientists

Since this is a work related post, I need to emphasise that all opinions in this are my own, and don’t reflect that of any organisation / organisations I might be affiliated with

The last-released episode of my Data Chatter podcast is with Abdul Majed Raja, a data scientist at Atlassian. We mostly spoke about R and Python, the two programming languages / packages most used for data science, and spoke about their relative merits and demerits.

While we mostly spoke about R and Python, Abdul’s most insightful comment, in my opinion, had to do with neither. While talking about online tutorials and training, he spoke about how most tutorials related to data science are aimed at the entry level, for people wanting to become data scientists, and that there was very little readymade material to help people become better data scientists.

And from my vantage point, as someone who has been heavily trying to recruit data scientists through the course of this year, this is spot on. A lot of profiles I get (most candidates who apply to my team get put through an open ended assignment) seem uncorrelated with the stated years of experience on their CVs. Essentially, a lot of them just appear “very junior”.

This “juniority”, in most cases, comes through in the way that people have done their assignments. A telltale sign, for example, is an excessive focus on necessary but nowhere sufficient things such as data cleaning, variable transformation, etc. Another telltale sign is the simple application of methods without bothering to explain why the method was chosen in the first place.

Apart from the lack of tutorials around, one reason why the quality of data science profiles continues to remain “junior” could be the organisation of teams themselves. To become better at your job, you need interact with people who are better than you at your job. Unfortunately, the rapid rise in demand for data scientists in the last decade has meant that this peer learning is not always there.

Yes – if you are a bunch of data scientists working together, you can pull each other up. However, if many of you have come in through the same process, it is that much more difficult – there is no benchmark for you.

The other thing is the structure of the teams (I’m saying this with very little data, so call me out if I’m bullshitting) – unlike software engineers, data scientists seldom work in large teams. Sometimes they are scattered across the organisation, largely working with tech or business teams. In any case, companies don’t need that many data scientists. So the number is low to start off with as well.

Another reason is the structure of the market – for the last decade the demand for data scientists has far exceeded the available supply. So that has meant that there is no real reason to upskill – you’ll get a job anyway.

Abdul’s solution, in the absence of tutorials, is for data scientists to look at other people’s code. The R community, for example, has a weekly Tidy Tuesday data challenge, and a lot of people who take that challenge put up their code online. I’m pretty certain similar resources exist for Python (on Kaggle, if not anywhere else).

So for someone who wants to see how other data scientists work and learn from them, there is plenty of resources around.

PS: I want to record a podcast episode on the “pile stirring” epidemic in machine learning (where people simply throw methods at a dataset without really understanding why that should work, or understanding the basic math of different methods). So far I’ve been unable to find a suitable guest. Recommendations welcome.

Formal interactions

Over the last couple of years, as the covid-19 pandemic has hit us and people have been asked to work from home, there has been a raging debate on the utility of office, especially for “knowledge work” (where the only “tool” you need is a computer).

Some companies such as Twitter have announced a “remote work in perpetuity”. Others such as Goldman Sachs have declared that remote work is inefficient and people need to return to offices asap. I probably was closer to the twitter position not so long ago, but now I think I’m firmly in the GS camp.

If you look at all the articles on remote work (I think Derek Thompson of The Atlantic has written some interesting pieces on this), one of the main arguments in favour of getting people to office is “informal interactions”, “bumping into colleagues”, “water cooler conversations”, etc. These kind of unstructured interactions can lead to new thoughts, which lead to innovation which lead to growth, goes the saying.

And in response to this, some companies have been trying to replicate these informal interactions in the zoom world. Instead of bumping into a colleague, you are forced to do a random “coffee chat” with a random colleague. There are online events. The hope here is that they will stand in for offline informal interactions.

Whether these events actually work or not, I don’t know. However, as I come close to a year in my job, it is not the informal interactions that I care about when I think of office vs remote. It’s “formal interactions”.

The lightbulb moment occurred earlier this week. I’m working on a fairly challenging problem with two others in my team. Two of the three of us were in office, and started talking about this problem. We drew some stuff on the whiteboard. Did some handwaving. And soon we had a new idea on how to approach this problem.

Now the task at hand was to explain this to the third guy, who is in another city. We opened Google Meet. We opened a “JamBoard” in that. I tried to replicate the whiteboard drawing, but he couldn’t see my handwaving (you realise that in video calls, video and screen share are two disjoint things!). It took a whole lot of effort to get the idea across.

This is not an isolated incident. In terms of collaborative work, I’ve found on multiple occasions that simply sitting together for a short duration of time can achieve so much more than what you can do in online meetings.

Another thing is that I’ve found myself to get exhausted faster in online meetings. Maybe I speak louder. Maybe having to look in one particular direction for the duration of the meeting is stressful. Offline meetings I can keep going and going and going (especially when on methylphenidate). Online, 2-3 meetings and I’m exhausted.

And then you have new colleagues and onboarding. Employees at an early stage require an extremely high degree of collaborative work. You need to “show stuff” to your new colleagues. Sometimes you might just take over their laptop. There are times when they need interventions that in the off-line world take 2 minutes, but online you need to schedule a meeting for.

Notice that none of the stuff I’ve mentioned so far is “informal”. Maybe it’s the nature of the work – involving deep thinking and complicated ideas. Remote work is absolutely brilliant in terms of the ability to shut yourself off without distractions and do deep work. The moment you need to collaborate, though, you need to be in the same physical space as your collaborators.

It’s unlikely I’ll ever want to go back to office full time (as I said, working from home is brilliant for deep work). However, I do look forward to a permanent hybrid model, meeting in office at least once a week. Hopefully the pandemic will allow us to get to this sooner rather than later.

Oh, and informal interactions are only a bonus.

ADHD and the Bhagavad Gita

A couple of weeks back, I stumbled upon an article I had written for Huffington Post India a few years back about what it is like to live with ADHD.  Until HuffPost India shut down, if you googled my name, one of the first links that you would find was this article. Now, the public version of the article is lost for posterity.

In any case, the draft lives on in my email outbox, and I have since forwarded it to a few people. This is how I begin that article:

There is a self-referential episode in the Mahabharata where sage Vyasa tries to get Ganesha to scribe the Mahabharata. Ganesha accepts the task, but imposes the condition that if Vyasa stopped dictating, he will stop writing and the epic will remain unfinished for ever.

If you have Attention Deficit Hyperactivity Disorder (ADHD), you would ideally want to work like Ganesha writing the Mahabharata – in long bursts where you are so constantly stimulated that there is no room for distraction. ADHD makes you a bad finisher, and makes you liable to abandon projects. You could be so distracted that it takes incredible effort to get back to the task. Once you are distracted, you might even forget that you were doing this task, and thus leave it unfinished. Moreover, ADHD makes it incredibly hard to do grunt-work, which is essential in finishing tasks or projects.

And earlier today, during on of my random distractions at work, I started thinking that this is not the only instance in the Mahabharata where ADHD makes an appearance. If you look at the Mahabharata in its fullest form, which includes the Bhagavad Gita (which, it appears, is a retrospective addition), ADHD makes yet another appearance.

If you distill the Bhagavad Gita to its bare essentials, the “principal component” will be this shloka:

??????????????????? ?? ????? ??????
?? ?????????????????? ?? ?????????????????? ?-??

In Roman scripts—

Karmanye vadhikaraste Ma Phaleshu Kadachana,
Ma Karmaphalaheturbhurma Te Sangostvakarmani

Googling threw up this translation (same site as the above quote):

The meaning of the verse is—

You have the right to work only but never to its fruits.
Let not the fruits of action be your motive, nor let your attachment be to inaction.

And I was thinking about it in the context of some work recently – for those of us with ADHD, this is a truism. Because unless we hyper focus on something (and the essence of ADHD is that you can’t choose what you want to hyper focus on), we have no attachments. It is like that “Zen email”.

Assume that there is a gap between the completion of the work and the observation of the “fruits” (results) of the work. By the time the fruits of the work are known, it is highly likely that you have completely forgotten about the work itself and moved on to hyper focus on something else.

In this case, whatever is the result of the work, that you have moved on means that you have become disattached from the work that you did, and so don’t really care about the result. And that makes it easier for you to appreciate the result in a cold, rational and logical manner – if you happen to care about it at all, that is.

The only exception is if you had continued to hyperfocus on the work even after it was completed. In this kind of a situation, you become excessively attached to the work that you have done (and to an unhealthy level). And in this case you care about the flowers, fruits, seeds and subsequent plants of your work. Not a good state to be in, of course, but it doesn’t happen very often so it’s fine.

The other thing about ADHD and “moving on” is that you don’t get possessive of your past work, and you are more willing to tear down something you had built in the past (which doesn’t make sense any more) and start rebuilding it. Again, this can both be a negative (reinventing your own wheel / wasting time) and a positive (ability to improve).

Random line I just came up with – on average, people with ADHD are exactly the same as people without ADHD. Just that their distributions are different.