Chat GPT and Information Content

For the last week or so I’ve been fooling around with Chat GPT. It’s been a lot of fun, if you can endure the verbosity of its answers. Yesterday, at work, I used it for tech support when some people I was talking to were using too much jargon.

Before I came home from work yesterday, my last conversation was with a couple of recruiters, about a role that I’m hiring for. So when I got home, on a whim, I wondered if I could check out of Chat GPT could write job descriptions for me.

I asked:

write a job description for a senior data scientist at India’s largest logistics company. 4-7 years exp. Needs to have experience in ML, OR and statistics, and be able to work close to business and on ambiguous problems

It answered (pasting in full since the screenshot above is limited):

Job Title: Senior Data Scientist

Company: India’s Largest Logistics Company

Location: India

Experience: 4-7 years

We are seeking an experienced and highly skilled Senior Data Scientist to join our team at India’s largest logistics company. In this role, you will be responsible for using machine learning, operations research, and statistics to drive business insights and decision making.

As a Senior Data Scientist, you will have a deep understanding of data science techniques and technologies, and will be able to apply them to complex and ambiguous problems. You will be able to work closely with business teams to understand their needs and to develop solutions that drive value for the company.

To be successful in this role, you should have at least 4-7 years of experience in data science, and should have a strong background in machine learning, operations research, and statistics. You should be able to work independently and be comfortable working on complex and ambiguous problems.

Responsibilities:

  • Use machine learning, operations research, and statistics to drive business insights and decision making
  • Work closely with business teams to understand their needs and develop solutions
  • Develop and maintain data pipelines and systems
  • Collaborate with other data scientists and cross-functional teams to develop and implement solutions

Qualifications:

  • Bachelor’s or Master’s degree in a quantitative field, such as computer science, statistics, or engineering
  • At least 4-7 years of experience in data science
  • Strong background in machine learning, operations research, and statistics
  • Ability to work independently and on complex and ambiguous problems
  • Excellent communication and collaboration skills
  • Experience with Python, R, or other programming languages is a plus

We offer a competitive salary and benefits package, as well as opportunities for growth and advancement. If you are an experienced data scientist with a passion for driving business insights and decision making, we encourage you to apply.

In the past when I’ve been hiring, I’ve been a bit annoyed by people asking for job descriptions. I’ve largely found them to be fluff and boilerplate, and I’ve tried to get away with a sentence or two. However, due to the insistence of recruiters and candidates alike, I’ve ended up writing elaborate job descriptions. I don’t need to any more.

This post, however, is not about hiring (but yeah – I am hiring, and the above job description is reasonably accurate). It’s about information content in general.

Traditionally, information content has been measured in bits. A bit is the amount of information contained in an equi-probable binary event.

Sometimes when we find that someone is too verbose and using too many words when fewer would suffice, we say that their bit rate is low. We also use “low bit rate” to describe people such as former Prime Minister Atal Behari Vajpayee, who would speak incredibly slowly.

However, beyond the bit, which is a fairly technical concept, it has been difficult to quantify information content. Sometimes you read an article or a story and find that there is nothing much to it. But given the natural language, and the context of various words, it is impossible to quantify the information content.

Now, with Chat GPT, maybe it becomes a bit easier (though one would need a “reverse chat GPT algo”, to find the set of prompts required for Chat GPT to churn out a particular essay). Above, for example, I’ve shown how much fluff there generally is to the average job description – a fairly short prompt generated this longish description that is fairly accurate.

So you can define the information content of a piece or essay in terms of the number of words in the minimum set of prompts required for Chat GPT (or something like it) to come up with it. If you are a boring stereotypical writer, the set of prompts required will be lower. If you are highly idiosyncratic, then you will need to give a larger number of prompts for Chat GPT to write like you. You know where I’m going.

This evening, in office, a colleague commented that now it will be rather easy to generate marketing material. “Even blogs might become dead, since with a few prompts you can get that content”, he said (it can be a legit service to build off the Chat GPT API to take a tweet and convert it into an essay).

I didn’t tell him then but I have decided to take it up as a challenge. I consider myself to be a fairly idiosyncratic writer, which means I THINK there is a fair bit of information content in what I write, and so this blog will stay relevant. Let’s see how it goes.

PS: I still want to train a GAN on my blog (well over a million words, at last count) and see how it goes. If you know of any tools I can use for this, let me know!

 

Heads of departments

Recently I was talking to someone about someone else. “He got an offer to join XXXXXX as CTO”, the guy I was talking to told me, “but I told him not to take it. Problem with CTO role is that you just stop learning and growing. Better to join a bigger place as a VP”.

The discussion meandered for a couple of minutes when I added “I feel the same way about being head of analytics”. I didn’t mention it then (maybe it didn’t flash), but this was one of the reasons why I lobbied for (and got) taking on the head of data science role as well.

I sometimes feel lonely in my job. It is not something anyone in my company can do anything about. The loneliness is external – I sometimes find that I don’t have too many “peers” (across companies). Yes, I know a handful of heads of analytics / data science across companies, but it is just that – a handful. And I can’t claim to empathise with all of them (and I’m sure the feeling is mutual).

Irrespective of the career path you have chosen, there comes a point in your career where your role suddenly becomes “illiquid”. Within your company, you are the only person doing the sort of job that you are doing. Across companies, again, there are few people who do stuff similar to what you do.

The kind of problems they solve might be different. Different companies are structured differently. The same role name might mean very different things in very different places. The challenges you have to face daily to do your job may be different. And more importantly, you might simply be interested in doing different things.

And the danger that you can get into when you get into this kind of a role is that you “stop growing”. Unless you get sufficient “push from below” (team members who are smarter than you, and who are better than you on some dimensions), there is no natural way for you to learn more about the kind of problems you are solving (or the techniques). You find that your current level is more than sufficient to be comfortable in your job. And you “put peace”.

And then one day you find ten years have got behind youNo one told you when to run, you missed the starting gun

(I want you to now imagine the gong sound at the beginning of “Time” playing in your ears at this point in the blogpost)

One thing I tell pretty much everyone I meet is that my networking within my own industry (analytics and data science) is shit. And this is something I need to improve upon. Apart from the “push from below” (which I get), the only way to continue to grow in my job is to network with peers and learn from them.

The other thing is to read. Over the weekend I snatched the new iPad (which my daughter had been using; now she has got my wife’s old Macbook Air) and put all my favourite apps on it. I feel like I’m back in 2007 again, subscribing to random blogs (just that most of them are on substack now, rather than on Blogspot or Livejournal or WordPress), in the hope that I will learn. Let me see where this takes me.

And maybe some people decide that all this pain is simply not worth it, and choose to grow by simply becoming more managerial, and “building an empire”.

George Mallory and Metrics

It is not really known if George Mallory actually summited the Everest in 1924 – he died on that climb, and his body was only found in 1999 or so. It wasn’t his first attempt at scaling the Everest, and at 37, some people thought he was too old to do so.

There is this popular story about Mallory that after one of his earlier attempts at scaling the Everest, someone asked him why he wanted to climb the peak. “Because it’s there”, he replied.

George Mallory (extreme left) and companions

In the sense of adventure sport, that’s a noble intention to have. That you want to do something just because it is possible to do it is awesome, and can inspire others. However, one problem with taking quotes from something like adventure sport, and then translating it to business (it’s rather common to get sportspeople to give “inspirational lectures” to business people) is that the entire context gets lost, and the concept loses relevance.

Take Mallory’s “because it’s there” for example. And think about it in the context of corporate metrics. “Because it’s there” is possibly the worst reason to have a metric in place (or should we say “because it can be measured?”). In fact, if you think about it, a lot of metrics exist simply because it is possible to measure them. And usually, unless there is some strong context to it, the metric itself is meaningless.

For example, let’s say we can measure N features of a particular entity (take N = 4, and the features as length, breadth, height and weight, for example). There will be N! was in which these metrics can be combined, and if you take all possible arithmetic operations, the number of metrics you can produce from these basic N metrics is insane. And you can keep taking differences and products and ratios ad infinitum, so with a small number of measurements, the number of metrics you can produce is infinite (both literally and figuratively). And most of them don’t make sense.

That doesn’t normally dissuade our corporate “measurer”. That something can be measured, that “it’s there”, is sometimes enough reason to measure something. And soon enough, before you know it, Goodhart’s Law would have taken over, and that metric would have become a target for some poor manager somewhere (and of course, soon ceases to be a metric itself). And circular logic starts from there.

That something can be measured, even if it can be measured highly accurately, doesn’t make it a good metric.

So what do we do about it? If you are in a job that requires you to construct or design or make metrics, how can you avoid the “George Mallory trap”?

Long back when I used to take lectures on logical fallacies, I would have this bit on not mistaking correlation for causation. “Abandon your numbers and look for logic”, I would say. “See if the pattern you are looking at makes intuitive sense”.

I guess it is the same for metrics. It is all well to describe a metric using arithmetic. However, can you simply explain it in natural language, and can the listener easily understand what you are saying? And more importantly, does that make intuitive sense?

It might be fashionable nowadays to come up with complicated metrics (I do that all the time), in the hope that it will offer incremental benefit over something simpler, but more often than not the difficulty in understanding it makes the additional benefit moot. It is like machine learning, actually, where sometimes adding features can improve the apparent accuracy of the model, while you’re making it worse by overfitting.

So, remember that lessons from adventure sport don’t translate well to business. “Because it’s there” / “because it can be measured” is absolutely NO REASON to define a metric.

Speed, Accuracy and Shannon’s Channel Coding Theorem

I was probably the CAT topper in my year (2004) (they don’t give out ranks, only percentiles (to two digits of precision), so this is a stochastic measure). I was also perhaps the only (or one of the very few) person to get into IIMs that year despite getting 20 questions wrong.

It had just happened that I had attempted far more questions than most other people. And so even though my accuracy was rather poor, my speed more than made up for it, and I ended up doing rather well.

I remember this time during my CAT prep, where the guy who was leading my CAT factory once suggested that I was making too many errors so I should possibly slow down and make fewer mistakes. I did that in a few mock exams. I ended up attempting far fewer questions. My accuracy (measured as % of answers I got wrong) didn’t change by much. So it was an easy decision to forget above accuracy and focus on speed and that served me well.

However, what serves you well in an entrance exam need not necessarily serve you well in life. An exam is, by definition, an artificial space. It is usually bounded by certain norms (of the format). And so, you can make blanket decisions such as “let me just go for speed”, and you can get away with it. In a way, an exam is a predictable space. It is a caricature of the world. So your learnings from there don’t extend to life.

In real life, you can’t “get away with 20 wrong answers”. If you have done something wrong, you are (most likely) expected to correct it. Which means, in real life, if you are inaccurate in your work, you will end up making further iterations.

Observing myself, and people around me (literally and figuratively at work), I sometimes wonder if there is a sort of efficient frontier in terms of speed and accuracy. For a given level of speed and accuracy, can we determine an “ideal gradient” – on which way a person needs to move in order to make the maximum impact?

Once in a while, I take book recommendations from academics, and end up reading (rather, trying to read) academic books. Recently, someone had recommended a book that combined information theory and machine learning, and I started reading it. Needless to say, within half a chapter, I was lost, and I had abandoned the book. Yet, the little I read performed the useful purpose of reminding me of Shannon’s channel coding theorem.

Paraphrasing, what it states is that irrespective of how noisy a channel is, using the right kind of encoding and redundancy, we will be able to predictably send across information at a certain maximum speed. The noisier the channel, the more the redundancy we will need, and the lower the speed of transmission.

In my opinion (and in the opinions of several others, I’m sure), this is a rather profound observation, and has significant impact on various aspects of life. In fact, I’m prone to abusing it in inexact manners (no wonder I never tried to become an academic).

So while thinking of the tradeoff between speed and accuracy, I started thinking of the channel coding theorem. You can think of a person’s work (or “working mind”) as a communication channel. The speed is the raw speed of transmission. The accuracy (rather, the lack of it) is a measure of noise in the channel.

So the less accurate someone is, the more the redundancy they require in communication (or in work). For example, if you are especially prone to mistakes (like I am sometimes), you might need to redo your work (or at least a part of it) several times. If you are the more accurate types, you need to redo less often.

And different people have different speed-accuracy trade-offs.

I don’t have a perfect way to quantify this, but maybe we can think of “true speed of work” by dividing the actual speed in which someone does a piece of work by the number of iterations they need to get it right.  OK it is not so straightforward (there might be other ways to build redundancy – like getting two independent people to do the same thing and then tally the numbers), but I suppose you get the drift.

The interesting thing here is that the speed and accuracy is not only depend on the person but the nature of work itself. For me, a piece of work that on average takes 1 hour has a different speed-accuracy tradeoff compared to a piece of work that on average takes a day (usually, the more complicated and involved a piece of analysis, the more the error rate for me).

In any case, the point to be noted is that the speed-accuracy tradeoff is different for different people, and in different contexts. For some people, in some contexts, there is no point at all in expecting highly accurate work – you know they will make mistakes anyways, so you might as well get the work done quickly (to allow for more time to iterate).

And in a way, figuring out speed-accuracy tradeoffs of the people who work for you is an important step in getting the best out of them.

 

Financial ratio metrics

It’s funny how random things stick in your head a couple of decades later. I don’t even remember which class in IIMB this was. It surely wasn’t an accounting or a finance class. But it was one in which we learnt about some financial ratios.

I don’t even remember what exactly we had learnt that day (possibly return on invested capital?). I think it was three different financial metrics that can be read off a financial statement, and which then telescope very nicely together to give a fourth metric. I’ve forgotten the details, but I remember the basic concepts.

A decade ago, I used to lecture frequently on how NOT to do data analytics. I had this standard lecture that I called “smelling bullshit” that dealt with common statistical fallacies. Things like correlation-causation, or reasoning with small samples, or selection bias. Or stocks and flows.

One set of slides in that lecture was about not comparing stocks and flows. Most people don’t internalise it. It even seems like you cannot get a job as a journalist if you understand the distinction between stocks and flows. Every other week you see comparisons of someone’s net worth to some country’s GDP, for example. Journalists make a living out of this.

In any case, whenever I would come to these slides, there would always be someone in the audience with a training in finance who would ask “but what about financial ratios? Don’t we constantly divide stocks and flows there?”

And then I would go off into how we would divide a stock by a flow (typically) in finance, but we never compared a stock to a flow. For example, you can think of working capital as a ratio – you take the total receivables on the balance sheet and divide it by the sales in a given period from the income statement, to get “days of working capital”. Note that you are only dividing, not comparing the sales to the receivables. And then you take this ratio (which has dimension “days”) and then compare it across companies or across regions to do your financial analysis.

If you look at financial ratios, a lot of them have dimensions, though sometimes you don’t really notice it (I sometimes say “dimensional analysis is among the most powerful tools in data science”). Asset turnover, for example, is sales in a period divided by assets and has the dimension of inverse time. Inventory (total inventory on BS divided by sales in a period) has a dimension of time. Likewise working capital. Profit margins, however, are dimensionless.

In any case, the other day at work I was trying to come up with a ratio for something. I kept doing gymnastics with numbers on an excel sheet, but without luck. And I had given up.

Nowadays I have started taking afternoon walks at office (whenever I go there), just after I eat lunch (I carry a box of lunch which I eat at my desk, and then go for a walk). And on today’s walk (or was it Tuesday’s?) I realised the shortcomings in my attempts to come up with a metric for whatever I was trying to measure.

I was basically trying too hard to come up with a dimensionless metric and kept coming up with some nonsense or the other. Somewhere during my walk, I thought of finance, and financial metrics. Light bulb lit up.

My mistake had been that I had been trying to come up with something dimensionless. The moment I realised that this metric needs to involve both stocks and flows, I had it. To be honest, I haven’t yet come up with the perfect metric (this is for those colleagues who are reading this and wondering what new metric I’ve come up with), but I’m on my way there.

Since both a stock and a flow need to be measured, the metric is going to be a ratio of both. And it is necessarily going to have dimensions (most likely either time or inverse time).

And if I think about it (again I won’t be able to give specific examples), a lot of metrics in life will follow this pattern – where you take a stock and a flow and divide one by the other. Not just in finance, not just in logistics, not just in data science,  it is useful to think of metrics that have dimensions, and express them using those dimensions.

Some product manager (I have a lot of friends in that profession) once told me that a major job of being a product manager is to define metrics. Now I’ll say that dimensional analysis is the most fundamental tool for a product manager.

A day at an award function

So I got an award today. It is called “exemplary data scientist”, and was given out by the Analytics India Magazine as part of their MachineCon 2022. I didn’t really do anything to get the award, apart from existing in my current job.

I guess having been out of the corporate world for nearly a decade, I had so far completely missed out on the awards and conferences circuit. I would see old classmates and colleagues put pictures on LinkedIn collecting awards. I wouldn’t know what to make of it when my oldest friend would tell me that whenever he heard “eye of the tiger”, he would mentally prepare to get up and go receive an award (he got so many I think). It was a world alien to me.

Parallelly, I used to crib about how while I’m well networked in India, and especially in Bangalore, my networking within the analytics and data science community is shit. In a way, I was longing for physical events to remedy this, and would lament that the pandemic had killed those.

So I was positively surprised when about a month ago Analytics India Magazine wrote to me saying they wanted to give me this award, and it would be part of this in-person conference. I knew of the magazine, so after asking around a bit on legitimacy of such awards and looking at who had got it the last time round, I happily accepted.

Most of the awardees were people like me – heads of analytics or data science at some company in India. And my hypothesis that my networking in the industry was shit was confirmed when I looked at the list of attendees – of 100 odd people listed on the MachineCon website, I barely knew 5 (of which 2 didn’t turn up at the event today).

Again I might sound like a n00b, but conferences like today are classic two sided markets (read this eminently readable paper on two sided markets and pricing of the same by Jean Tirole of the University of Toulouse). On the one hand are awardees – people like me and 99 others, who are incentivised to attend the event with the carrot of the award. On the other hand are people who want to meet us, who will then pay to attend the event (or sponsor it; the entry fee for paid tickets to the event was a hefty $399).

It is like “ladies’ night” that pubs have, where on a particular days of the week, women who go to the pub get a free drink. This attracts women, which in turn attracts men who seek to court the women. And what the pub spends in subsidising the women it makes back in terms of greater revenue from the men on the night.

And so it was at today’s conference. I got courted by at least 10 people, trying to sell me cloud services, “AI services on the cloud”, business intelligence tools, “AI powered business intelligence tools”, recruitment services and the like. Before the conference, I had received LinkedIn requests from a few people seeking to sell me stuff at the conference. In the middle of the conference, I got a call from an organiser asking me to step out of the hall so that a sponsor could sell to me.

I held a poker face with stock replies like “I’m not the person who makes this purchasing decision” or “I prefer open source tools” or “we’re building this in house”.

With full benefit of hindsight, Radisson Blu in Marathahalli is a pretty good conference venue. An entire wing of the ground floor of the hotel is dedicated for events, and the AIM guys had taken over the place. While I had not attended any such event earlier, it had all the markings of a well-funded and well-organised event.

As I entered the conference hall, the first thing that struck me was the number of people in suits. Most people were in suits (though few wore ties; And as if the conference expected people to turn up in suits, the goodie bag included a tie, a pair of cufflinks and a pocket square). And I’m just not used to that. Half the days I go to office in shorts. When I feel like wearing something more formal, I wear polo T-shirts with chinos.

My colleagues who went to the NSE last month to ring the bell to take us public all turned up company T-shirts and jeans. And that’s precisely what I wore to the conference today, though I had recently procured a “formal uniform” (polo T-shirt with company logo, rather than my “usual uniform” which is a round neck T-shirt). I was pretty much the only person there in “uniform”. Towards the end of the day, I saw one other guy in his company shirt, but he was wearing a blazer over it!

Pretty soon I met an old acquaintance (who I hadn’t known would be at the conference). He introduced me to a friend, and we went for coffee. I was eating a cookie with the coffee, and had an insight – at conferences, you should eat with your left hand. That way, you don’t touch the food with the same hand you use to touch other people’s hands (surprisingly I couldn’t find sanitiser dispensers at the venue).

The talks, as expected, were nothing much to write about. Most were by sponsors selling their wares. The one talk that wasn’t by a sponsor was delivered by a guy who was introduced as “his greatgrandfather did this. His grandfather did that. And now this guy is here to talk about ethics of AI”. Full Challenge Gopalakrishna feels happened (though, unfortunately, the Kannada fellows I’d hung out with earlier that day hadn’t watched the movie).

I was telling some people over lunch (which was pretty good) that talking about ethics in AI at a conference has become like worshipping Ganesha as part of any elaborate pooja. It has become the de riguer thing to do. And so you pay obeisance to the concept and move on.

The awards function had three sections. The first section was for “users of AI” (from what I understood). The second (where I was included) was for “exemplary data scientists”. I don’t know what the third was for (my wife is ill today so I came home early as soon as I’d collected my award), except that it would be given by fast bowler and match referee Javagal Srinath. Most of the people I’d hung out with through the day were in the Srinath section of the awards.

Overall it felt good. The drive to Marathahalli took only 45 minutes each way (I drove). A lot of people had travelled from other cities in India to reach the venue. I met a few new people. My networking in data science and analytics is still not great, but far better than it used to be. I hope to go for more such events (though we need to figure out how to do these events without that talks).

PS: Everyone who got the award in my section was made to line up for a group photo. As we posed with our awards, an organiser said “make sure all of you hold the prizes in a way that the Intel (today’s chief sponsor) logo faces the camera”. “I guess they want Intel outside”, I joked. It seemed to be well received by the people standing around me. I didn’t talk to any of them after that, though.

The “intel outside” pic. Courtesy: https://www.linkedin.com/company/analytics-india-magazine/posts/?feedView=all

 

Proof of work

I like to say sometimes that one reason I never really get crypto is that it involves the concept of “proof of work”. That phrase sort of triggers me. It reminds me of all the times when I was in school when I wouldn’t get full marks in maths despite getting all the answers correct because I “didn’t show working”.

In any case, I spent about fifteen minutes early this morning drinking my aeropress and deleting LinkedIn connection requests. Yeah, you read that right. It took that long to refuse all the connection requests I had got since yesterday, when I put a fairly innocuous post saying I’m hiring.

I understand that the market is rather tough nowadays. Companies are laying employees off ($) left right and centre (in fact, this (paywalled) article prompted my post – I’m hoping to find good value in the layoff market). Interest rates are going up. Stock prices are going down. Startup funding has slowed. The job market is not easy. And so you see an innocuous post like this getting such a massive reaction.

In any case, the reason I was thinking about “proof of work” is that the responses to my post reminded me of my own (unsuccessful) job hunts from a few years back. I remember randomly applying through LinkedIn. I remember using easy apply. And I remember pretty much not hearing back from anyone.

Time for a bollywood break:

Yes, the choice of where I’ve started this video is deliberate. As i was spending time this morning refusing all the LinkedIn connection requests (some 500+ people I have no clue about had simply added me without any matter of introduction or purpose), I was thinking of this song.

I followed a simple strategy – I engaged with people who had cared to write a note (or InMail) to me along with the connection request, and I just ignored the rest. As I kept hitting “ignore ignore ignore … ” on my phone (while sipping coffee with the other hand), I realised that I almost hit “ignore” on one of my company HRs who had added me. A few minutes later, I actually hit ignore on a colleague who I’ve actually worked with (I made amends by sending him back a connection request that he accepted).

Given the flood of requests that I had got, I was forced to use a broad brush. I was forced to use simple heuristics rather than evaluating each application on its true merit. I’m pretty sure I’ve made plenty of errors of omission today (that said, my heuristic has thrown up a bunch of fairly promising candidates).

In any case, if you think about it, the heuristic I used can pretty well be described as “proof of work”. And what the proof of work achieved here was to help people stand out in a crowded market. That there was some work showed a certain minimum threshold of interest, and that was sufficient to get my attention, which is all that mattered here. And on a related note, during normal times (when I get a maximum of one or two LinkedIn requests each day), I do take the effort to evaluate each request on its own merit. No proof of work is necessary.

And if you think about it, “proof of work” is rather prevalent in the natural world. A peacock’s feathers are the most commonly quoted example of this one. The beautiful tail comes at a huge cost in terms of agility and ability to fly, and the tail is a way for the peacock to show off to potential mates that “I can carry this thing and yet stay alive so imagine how fit my genes are. Mate with me”.

Anyway, back to the hiring market, you need a way to stand out. Maybe a nicely written cover letter. Maybe a referral (or “influence” as we used to pejoratively call this back in the 90s). Maybe a strong github profile. (Ok the last one is literally a proof of work!)

Else you will just get swept away with the tide.

 

PS: In general, I was also thinking of the wisdom of writing to someone at a time when you know he/she will be flooded with other messages. The bar for you to stand out is much much higher. Being contrarian helps i guess.

So many numbers! Must be very complicated!

The story dates back to 2007. Fully retrofitting, I was in what can be described as my first ever “data science job”. After having struggled for several months to string together a forecasting model in Java (the bugs kept multiplying and cascading), I’d given up and gone back to the familiarity of MS Excel and VBA (remember that this was just about a year after I’d finished my MBA).

My seat in the office was near a door that led to the balcony, where smokers would gather. People walking to the balcony, with some effort, could see my screen. No doubt most of them would’ve seen my spending 90% (or more) of my time on Google Talk (it’s ironical that I now largely use Google Chat for work). If someone came at an auspicious time, though, they would see me really working, which was using MS Excel.

I distinctly remember this one time this guy who shared my office cab walked up behind me. I had a full sheet of Excel data and was trying to make sense of it. He took one look at my screen and exclaimed, “oh, so many numbers! Must be very complicated!” (FWIW, he was a software engineer). I gave him a fairly dirty look, wondering what was complicated about a fairly simple dataset on Excel. He moved on, to the balcony. I moved on, with my analysis.

It is funny that, fifteen years down the line, I have built my career in data science. Yet, I just can’t make sense of large sets of numbers. If someone sends me a sheet full of numbers I can’t make out the head or tail of it. Maybe I’m a victim of my own obsessions, where I spend hours visualising data so I can make some sense of it – I just can’t understand matrices of numbers thrown together.

At the very least, I need the numbers formatted well (in an Excel context, using either the “,” or “%” formats), with all numbers in a single column right aligned and rounded off to the exact same number of decimal places (it annoys me that by default, Excel autocorrects “84.0” (for example) to “84” – that disturbs this formatting. Applying “,” fixes it, though). Sometimes I demand that conditional formatting be applied on the numbers, so I know which numbers stand out (again I have a strong preference for red-white-green (or green-white-red, depending upon whether the quantity is “good” or “bad”) formatting). I might even demand sparklines.

But send me a sheet full of numbers and without any of the above mentioned decorations, and I’m completely unable to make any sense or draw any insight out of it. I fully empathise now, with the guy who said “oh, so many numbers! must be very complicated!”

And I’m supposed to be a data scientist. In any case, I’d written a long time back about why data scientists ought to be good at Excel.

Recruitment and diversity

This post has potential to become controversial and is related to my work, so I need to explicitly state upfront that all opinions here are absolutely my own and do not, in any way, reflect those of my employers or colleagues or anyone else I’m associated with.

I run a rather diverse team. Until my team grew inorganically two months back (I was given more responsibility), there were eight of us in the team. Each of us have masters degrees (ok we’re not diverse in that respect). Sixteen degrees / diplomas in total. And from sixteen different colleges / universities. The team’s masters degrees are in at least four disjoint disciplines.

I have built this part of my team ground up. And have made absolutely made no attempt to explicitly foster diversity in my team. Yet, I have a rather diverse team. You might think it is on accident. You might find weird axes on which the team is not diverse at all (masters degrees is one). I simply think it is because there was no other way.

I like to think that I have fairly high standards when it comes to hiring. Based on the post-interview conversations I have had with my team members, these standards have percolated to them as well. This means we have a rather tough task hiring. This means very few people even qualify to be hired by my team. Earlier this year I asked for a bigger hiring budget. “Let’s see if you can exhaust what you’ve been given, and then we can talk”, I was told. The person who told me this was not being sarcastic – he was simply aware of my demand-supply imbalance.

Essentially, in terms of hiring I face such a steep demand-supply imbalance that even if I wanted to, it would be absolutely impossible for me to discriminate while hiring, either positively or negatively.

If I want to hire less of a certain kind of profile (whatever that profile is), I would simply be letting go of qualified candidates. Given how long it takes to find each candidate in general, imagine how much longer it would take to find candidates if I were to only look at a subset of applicants (to prefer a category I want more of in my team). Any kind of discrimination (apart from things critical to the job such as knowledge of mathematics and logic and probability and statistics, and communication) would simply mean I’m shooting myself in the foot.

Not all jobs, however, are like this. In fact, a large majority of jobs in the world are of the type where you don’t need a particularly rare combination of skills. This means potential supply (assuming you are paying decently, treating employees decently, etc.) far exceeds demand.

When you’re operating in this kind of a market, cost of discrimination (either positive or negative) is rather low. If you were to rank all potential candidates, picking up number 25 instead of number 20 is not going to leave you all that worse off. And so you can start discriminating on axes that are orthogonal to what is required to do the job. And that way you can work towards a particular set of “diversity (or lack of it) targets”.

Given that a large number of jobs (not weighted by pay) belong to this category, the general discourse is that if you don’t have a diverse team it is because you are discriminating in a particular manner. What people don’t realise is that it is pretty impossible do discriminate in some cases.

All that said, I still stand by my 2015 post on “axes on diversity“. Any externally visible axis of diversity – race / colour / gender / sex / sexuality – is likely to diminish diversity in thought. And – again this is my personal opinion – I value diversity in thought and approach much more than the visible sources of diversity.

 

Structures of professions and returns to experience

I’ve written here a few times about the concept of “returns to experience“. Basically, in some fields such as finance, the “returns to experience” is rather high. Irrespective of what you have studied or where, how long you have continuously been in the industry and what you have been doing has a bigger impact on your performance than your way of thinking or education.

In other domains, returns to experience is far less. After a few years in the profession, you would have learnt all you had to, and working longer in the job will not necessarily make you better at it. And so you see that the average 15 years experience people are not that much better than the average 10 years experience people, and so you see salaries stagnating as careers progress.

While I have spoken about returns to experience, till date, I hadn’t bothered to figure out why returns to experience is a thing in some, and only some, professions. And then I came across this tweetstorm that seeks to explain it.

Now, normally I have a policy of not reading tweetstorms longer than six tweets, but here it was well worth it.

It draws upon a concept called “cognitive flexibility theory”.

Basically, there are two kinds of professions – well-structured and ill-structured. To quickly summarise the tweetstorm, well-structured professions have the same problems again and again, and there are clear patterns. And in these professions, first principles are good to reason out most things, and solve most problems. And so the way you learn it is by learning concepts and theories and solving a few problems.

In ill-structured domains (eg. business or medicine), the concepts are largely the same but the way the concepts manifest in different cases are vastly different. As a consequence, just knowing the theories or fundamentals is not sufficient in being able to understand most cases, each of which is idiosyncratic.

Instead, study in these professions comes from “studying cases”. Business and medicine schools are classic examples of this. The idea with solving lots of cases is NOT that you can see the same patterns in a new case that you see, but that having seen lots of cases, you might be able to reason HOW to approach a new case that comes your way (and the way you approach it is very likely novel).

Picking up from the tweetstorm once again:

 

It is not hard to see that when the problems are ill-structured or “wicked”, the more the cases you have seen in your life, the better placed you are to attack the problem. Naturally, assuming you continue to learn from each incremental case you see, the returns to experience in such professions is high.

In securities trading, for example, the market takes very many forms, and irrespective of what chartists will tell you, patterns seldom repeat. The concepts are the same, however. Hence, you treat each new trade as a “case” and try to learn from it. So returns to experience are high. And so when I tried to reenter the industry after 5 years away, I found it incredibly hard.

Chess, on the other hand, is well-structured. Yes, alpha zero might come and go, but a lot of the general principles simply remain.

Having read this tweetstorm, gobbled a large glass of wine and written this blogpost (so far), I’ve been thinking about my own profession – data science. My sense is that data science is an ill-structured profession where most practitioners pretend it is well-structured. And this is possibly because a significant proportion of practitioners come from academia.

I keep telling people about my first brush with what can now be called data science – I was asked to build a model to forecast demand for air cargo (2006-7). The said demand being both intermittent (one order every few days for a particular flight) and lumpy (a single order could fill up a flight, for example), it was an incredibly wicked problem.

Having had a rather unique career path in this “industry” I have, over the years, been exposed to a large number of unique “cases”. In 2012, I’d set about trying to identify patterns so that I could “productise” some of my work, but the ill-structured nature of problems I was taking up meant this simply wasn’t forthcoming. And I realise (after having read the above-linked tweetstorm) that I continue to learn from cases, and that I’m a much better data scientist than I was a year back, and much much better than I was two years back.

On the other hand, because data science attracts a lot of people from pure science and engineering (classically well-structured fields), you see a lot of people trying to apply overly academic or textbook approaches to problems that they see. As they try to divine problem patterns that don’t really exist, they fail to recognise novel “cases”. And so they don’t really learn from their experience.

Maybe this is why I keep saying that “in data science, years of experience and competence are not correlated”. However, fundamentally, that ought NOT to be the case.

This is also perhaps why a lot of data scientists, irrespective of their years of experience, continue to remain “junior” in their thinking.

PS: The last few paragraphs apply equally well to quantitative finance and economics as well. They are ill-structured professions that some practitioners (thanks to well-structured backgrounds) assume are well-structured.