Mo Salah and Machine Learning

First of all, I’m damn happy that Mo Salah has renewed his Liverpool contract. With Sadio Mane also leaving, the attack was looking a bit thin (I was distinctly unhappy with the Jota-Mane-Diaz forward line we used in the Champions League final. Lacked cohesion). Nunez is still untested in terms of “leadership”, and without Salah that would’ve left Firmino as the only “attacking leader”.

(non-technical readers can skip the section in italics and still make sense of this post)

Now that this is out of the way, I’m interested in seeing one statistic (for which I’m pretty sure I don’t have the data). For each of the chances that Salah has created, I want to look at the xG (expected goals) and whether he scored or not. And then look at a density plot of xG for both categories (scored or not).

For most players, this is likely to result in two very distinct curves – they are likely to score from a large % of high xG chances, and almost not score at all from low xG chances. For Salah, though, the two density curves are likely to be a lot closer.

What I’m saying is – most strikers score well from easy chances, and fail to score from difficult chances. Salah is not like that. On the one hand, he creates and scores some extraordinary goals out of nothing (low xG). On the other, he tends to miss a lot of seemingly easy chances (high xG).

In fact, it is quite possible to look at a player like Salah, see a few sitters that he has missed (he misses quite a few of them), and think he is a poor forward. And if you look at a small sample of data (or short periods of time) you are likely to come to the same conclusion. Look at the last 3-4 months of the 2021-22 season. The consensus among pundits then was that Salah had become poor (and on Reddit, you could see Liverpool fans arguing that we shouldn’t give him a lucrative contract extension since ‘he has lost it’).

It is well possible that this is exactly the conclusion Jose Mourinho came to back in 2013-14 when he managed Salah at Chelsea (and gave him very few opportunities). The thing with a player like Salah is that he is so unpredictable that it is very possible to see samples and think he is useless.

Of late, I’ve been doing (rather, supervising (and there is no pun intended) ) a lot of machine learning work. A lot of this has to do with binary classification – classifying something as either a 0 or a 1. Data scientists build models, which give out a probability score that the thing is a 1, and then use some (sometimes arbitrary) cutoff to determine whether the thing is a 0 or a 1.

There are a bunch of metrics in data science on how good a model is, and it all comes down to what the model predicted and what “really” happened. And I’ve seen data scientists work super hard to improve on these accuracy measures. What can be done to predict a little bit better? Why is this model only giving me 77% ROC-AUC when for the other problem I was able to get 90%?

The thing is – if the variable you are trying to predict is something like whether Salah will score from a particular chance, your accuracy metric will be really low indeed. Because he is fundamentally unpredictable. It is the same with some of the machine learning stuff – a lot of models are trying to predict something that is fundamentally unpredictable, so there is a limit on how accurate the model will get.

The problem is that you would have come across several problem statements that are much more predictable that you think it is a problem with you (or your model) that you can’t predict better. Pundits (or Jose) would have seen so many strikers who predictably score from good chances that they think Salah is not good.

The solution in these cases is to look at aggregates. Looking for each single prediction will not take us anywhere. Instead, can we predict over a large set of data whether we broadly got it right? In my “research” for this blogpost, I found this.

Last season, on average, Salah scored precisely as many goals as the model would’ve predicted! You might remember stunners like the one against Manchester City at Anfield. So you know where things got averaged out.

Conductors and CAPM

For a long time I used to wonder why orchestras have conductors. I possibly first noticed the presence of the conductor sometime in the 1990s when Zubin Mehta was in the news. And then I always wondered why this person, who didn’t play anything but stood there waving a stick, needed to exist. Couldn’t the orchestra coordinate itself like rockstars or practitioners of Indian music forms do?

And then i came across this video a year or two back.

And then the computer science training I’d gone through two decades back kicked in – the job of an orchestra conductor is to reduce an O(n^2) problem to an O(n) problem.

For a  group of musicians to make music, they need to coordinate with each other. Yes, they have the staff notation and all that, but still they need to know when to speed up or slow down, when to make what transitions, etc. They may have practiced together but the professional performance needs to be flawless. And so they need to constantly take cues from each other.

When you have $n$ musicians who need to coordinate, you have $\frac{n.(n-1)}{2}$ pairs of people who need to coordinate. When $n$ is small, this is trivial, and so you see that small ensembles or rock bands can easily coordinate. However, as $n$ gets large, $n^2$ grows well-at-a-faster-rate. And that is a problem, and a risk.

Enter the conductor. Rather than taking cues from one another, the musicians now simply need to take cues from this one person. And so there are now only $n$ pairs that need to coordinate – each musician in the band with the conductor. Or an $O(n^2)$ problem has become an $O(n)$ problem!

For whatever reason, while I was thinking about this yesterday, I got reminded of legendary finance professor R Vaidya‘s class on capital asset pricing model (CAPM), or as he put it “Sharpe single index model” (surprisingly all the links I find for this are from Indian test prep sites, so not linking).

We had just learnt portfolio theory, and how using the expected returns, variances and correlations between a set of securities we could construct an “efficient frontier” of securities that could give us the best risk-adjusted return. Seemed very mathematically elegant, except that in case you needed to construct a portfolio of $n$ stocks, you needed $n^2$ correlations. In other word, an $O(n^2)$ problem.

And then Vaidya introduced CAPM, which magically reduced the problem to an $O(n)$ problem. By suddenly introducing the concept of an index, all that mattered for each stock now was its beta – the coefficient of its returns proportional to the index returns. You didn’t need to care about how stocks reacted with each other any more – all you needed was the relationship with the index.

In a sense, if you think about it, the index in CAPM is like the conductor of an orchestra. If only all $O(n^2)$ problems could be reduced to $O(n)$ problems this elegantly!

Management and Verification

For those of you who are new here, my wife and I used to organise “NED Talks” in our home in Bangalore. The first edition happened in 2015 (organised on a whim), and encouraged by its success, we organised 10 more editions until 2019. We have put up snippets of some talks here.

In the second edition of the NED Talks (February 2015), we had a talk by V Vinay (noted computer scientist, former IISc professor, co-inventor of Simputer, co-founder of Strand Life Sciences, Ati Motors, etc. etc.), where he spoke about “computational complexity”.

Now, having studied computer science, “computational complexity” was not a new topic to me, but one thing that Vinay said has stayed with me – it is that verifying an algorithm is far more efficient than actually executing the algorithm.

To take a simple example, factorising a number into prime factors is NP Hard – in other words, it is a really hard problem. However, verifying the prime factorisation of a number is trivial – you can just multiply the factors and see if it gives back the number you started with.

I was thinking about this paradigm the ohter day when I was thinking about professional managers – several times in life I have wondered “how can this person manage this function when he/she has no experience in that function?”. Maybe it is because I had been subjected to two semesters of workshop in the beginning of my engineering, but I have intuitively assumed that you can only manage stuff that you have personally done – especially if it is a non-trivial / specialist role.

But then – if you think about it, at some level, management is basically about “verification”. To see whether you have done your work properly, I don’t need to precisely know how you have done it. All I need to know is whether you have done bullshit – which means, I don’t need to “replicate your algorithm”. I only need to “verify your algorithm”, which computer science tells us can be an order of magnitude simpler than actually building the algorithm.

The corollary of this is that if you have managed X, you need not be good at X, or actually even have done X. All it shows is that you know how to manage X, which can be an order of magnitude simple than actually doing X.

This also (rather belatedly) explains why I have largely been wary of hiring “pure managers” for my team. Unless they have been hands on at their work, I start wondering if they actually know how to do it, or only know how to manage it (and I’m rather hands on, and only hire hands on people).

And yet another corollary is that if you have spent too long just managing teams, you might have gotten so used to just verifying algorithms that you can’t write algorithms any more.

And yet another before I finish – computer science has a lot of lessons to offer life.

Compression Stereotypes

One of the most mindblowing things I learnt while I was doing my undergrad in Computer Science and Engineering was Lempel-Ziv-Welch (LZW) compression. It’s one of the standard compression algorithms used everywhere nowadays.

The reason I remember this is twofold – firstly, I remember implementing this as part of an assignment (our CSE program at IITM was full of those), and feeling happy to be coding in C rather than in the dreaded Java (which we had to use for most other assignments).

The other is that this is one of those algorithms that I “internalised” while doing something totally different – in this case I was having coffee/ tea with a classmate in our hostel mess.

I won’t go into the algorithm here. However, the basic concept is that as and when we see a new pattern, we give it a code, and every subsequent occurrence of that pattern is replaced by its corresponding code. And the beauty of it is that you don’t need to ship a separate dictionary -the compressed code itself encapsulates it.

Anyway, in practical terms, the more the same kind of patterns are repeated in the original file, the more the file can be compressed. In some sense, the more the repetition of patterns, the less the overall “information” that the original file can carry – but that discussion is for another day.

I’ve been thinking of compression in general and LZW compression in particular when I think of stereotyping. The whole idea of stereotyping is that we are fundamentally lazy, and want to “classify” or categorise or pigeon-hole people using the fewest number of bits necessary.

And so, we use lazy heuristics – gender, caste, race, degrees, employers, height, even names, etc. to make our assumptions of what people are going to be like. This is fundamentally lazy, but also effective – in a sense, we have evolved to stereotype people (and objects and animals) because that allows our brain to be efficient; to internalise more data by using fewer bits. And for this precise reason, to some extent, stereotyping is rational.

However, the problem with stereotypes is that they can frequently be wrong. We might see a name and assume something about a person, and they might turn out to be completely different. The rational response to this is not to beat oneself for stereotyping in the first place – it is to update one’s priors with the new information that one has learnt about this person.

So, you might have used a combination of pre-known features of a person to categorise him/her. The moment you realise that this categorisation is wrong, you ought to invest additional bits in your brain to classify this person so that the stereotype doesn’t remain any more.

The more idiosyncratic and interesting you are, the more the number of bits that will be required to describe you. You are very very different from any of the stereotypes that can possibly be used to describe you, and this means people will need to make that effort to try and understand you.

One of the downsides of being idiosyncratic, though, is that most people are lazy and won’t make the effort to use the additional bits required to know you, and so will grossly mischaracterise you using one of the standard stereotypes.

On yet another tangential note, getting to know someone is a Bayesian process. You make your first impressions of them based on whatever you find out about them, and go on building a picture of them incrementally based on the information you find out about them. It is like loading a picture on a website using a bad internet connection – first the picture appears grainy, and then the more idiosyncratic features can be seen.

The problem with refusing to use stereotypes, or demonising stereotypes, is that you fail to use the grainy pictures when that is the best available, and instead infinitely wait to get better pictures. On the other hand, failing to see beyond stereotypes means that you end up using grainy pictures when more clear ones are available.

And both of these approaches is suboptimal.

PS: I’ve sometimes wondered why I find it so hard to remember certain people’s faces. And I realise that it’s usually because they are highly idiosyncratic and not easy to stereotype / compress (both are the same thing). And so it takes more effort to remember them, and if I don’t really need to remember them so much, I just don’t bother.

How Python swallowed R

A week ago, I put a post on LinkedIn saying if someone else working in analytics / data science primarily uses R for their work, I would like to chat.

I got two responses, one of which was from a guy who strictly isn’t in analytics / data science, but needs to analyse large amounts of data for his work. I had a long chat with the other guy today.

Yesterday I put the same post on Twitter, and have got a few more responses from there. However, it is staggering. An overwhelming majority of data people who I know work in Python. One of the reasons I put these posts was to assure myself that I’m not alone in using R, though the response so far hasn’t given me too much of an assurance.

So why do most companies end up using Python for analytics, even when R is clearly better for things like data wrangling, reporting, visualisation, dashboarding, etc.? I have a few theories on this, and I think all of them come together to result in python having its “overwhelming marketshare” (at least among people I know).

Tech people clearly prefer python since it’s easier to integrate. So the tech leaders request the data science leaders to use Python, since it is much easier for the tech people. In a lot of organisations, data science reports into tech, so this request is honoured.

Even if it isn’t, if you recall, “data scientists” are generally tech facing rather than business facing. This means that the models they build need to be codified, and added to the company’s code base. This means necessarily working together with tech, and this means using a programming language that tech is comfortable with.

Then, this spills over. Usually, someone has the bright idea that the firm shouldn’t use two languages for what is essentially the same thing. And so the analytics people are also forced to use python for their analytics, even if it isn’t built for the purpose. And then it spreads.

Next is the “cool factor”. There is this impression that the more technical a solution is, the more superior it is, even if it has no direct business impact (an employer had once  told me, “I have raised money saying we are using machine learning. If our investors see the algorithms you’re proposing, they’ll want their money back”).

So a youngster getting into data wants to do “all the latest stuff”. This means machine learning. Deep learning. Reinforcement learning. And all that. There is an impression that this kind of work is “better work” compared to let’s say generating business insights using data. And in general, the packages for machine learning have traditionally been easier in Python than they are in R (though R is fast catching up, and in general python is far behind R when it comes to user friendliness).

Then, the growth in data and jobs associated with it such as machine learning or data engineering have meant that a lot of formerly tech people have got into data work. Python is fundamentally a programming language, with a package (pandas) added on to do data work. Techies find it far more intuitive than R, which is fundamentally a statistical software. On the other hand, people who are coming from a business / Excel background find it far more comfortable to use R. Python can be intimidating (I fall in this bucket).

So yeah – the tech integration, the number of tech people who are coming into data and the “cool factor” associated with the more techie stuff means that Python is gaining, at R’s expense (in my circle at least).

In any case I’m going to continue to use R. I’m at least 10X faster in R than I am in Python, and having used R for 12 years now, I’m too used to that way of working to change things up.

Jio, Amazon and Information Content

A long long time ago I had installed the Jio Cinema app on my Fire TV Stick. I had perhaps watched two movies on that, and then completely forgotten about it. This evening, I had to look for a movie to watch my the wife, and having exhausted most of the “compatible content” (stuff we can watch together on Netflix) and been exhausted by the user experience on Prime Video, I decided to give this app a try.

I ended up selecting a movie, which I later found out has a 4.5 IMDB rating and doesn’t even have a Wikepedia page. Needless to say, we abandoned the movie midway. That’s when the wife went in to put the daughter to bed and my fun began.

So Jio Cinema follows what I call the “Amazon paradigm for product management”. Since Amazon tries to sell every product (or service) as if it is a physical book, it has one single mantra for product management. “Improve selection and they will come”.

The user experience doesn’t matter. How easy the product is to use, and how pleasing it looks on the eye, and whether it has occasional bugs, is all secondary. All that matters is selection. Given that the company built its business on the back of selling “long tail” books, this is not so surprising, except that it doesn’t necessarily work in other categories.

I’ve written about Amazon’s ineptitude in product management before, in the context of that atrocity of an app called Sony Liv. The funny thing is that the Jio Cinema app (on Fire TV Stick) looks and feels pretty much exactly like Sony Liv. Maybe there is an open source shitty fire TV app that these guys have based themselves on?

In any case, I started browsing the Jio Cinema app, and I found something called “movies in 15 minutes“. Initially I thought it was a parody. The first few movies I noticed there were things I had never heard of. “This is perhaps for bad movies”, I reasoned. I kept scrolling, and more recognisable names popped up.

I decided to watch Deewana, which was released just before the start of my optimal age of movie appreciation, and which, for some reason, we didn’t get home a video cassette of.

It’s basically a collage of scenes from the movie. It’s like someone has put together a “highlights package”, taking all the important scenes and then putting them together.

And for a movie like Deewana it works. The 15 minute version had all the necessary plot elements to fully follow the movie. It is a great movie, for 15 minutes. Maybe at 30 minutes as well it might be a great movie. However, I can’t imagine having watched it in the full version.

That was two hours back. I’ve since gone crazy watching 15 minute versions of many other movies (mostly from the 70s and 80s, though they have movies as recent as Jab We Met). It’s been fantastic.

However, I have one crib. This has to do with information content. Essentially, the premise behind “movies in 15 minutes” is that the information content in these movies is so little that the whole thing can be compressed into 15 minutes.  The problem is that not every movie has the same amount of information.

15 minutes was perfect for Deewana. It was also appropriate for Kasam Paida Karne Waali Ki, which I watched only because it gets referenced in Gangs of Wasseypur. Between these two, I “watched” Namak Halaal, and I didn’t understand the head or tail of it. I had to go to Wikepedia to understand the plot.

Essentially the plot of Namak Halaal is complex enough, I imagine, that compressing it into 15 minutes is impossible without significant information loss. And the loss of information was so much that I couldn’t understand the summary at all. Maybe I’ll watch the movie in full some day.

I’m writing this blogpost after watching the 15 minute version of Don. I guess whoever made the summary realised that the movie is so complex that it can’t really be compressed into 15 minutes – and so they have added a voiceover to narrate the key elements.

In any case, I’m feeling super thrilled. I normally don’t watch movies because the bit rate in most movies is too low. Compression means that I can happily watch the movies without ever getting bored.

I wish they made these 15 minute versions of all movies. Jio, all (your Amazon-style product maangement) is forgiven.

Now on to Amar Akbar Anthony.

The Tube Strike Model For The Pandemic

In 2002, as part of my undergrad in computer science, I took a course in “Artificial Intelligence”. It was a “restricted elective” – you had to either take that or another course called “Artificial Neural Networks”. That Neural Networks was then considered disjoint from AI will tell you how the field of computer science has changed in the 15 years since I graduated.

In any case, as part of our course on AI, we learnt heuristics. These were approximate algorithms to solve a problem – seldom did well in terms of worst case complexity but in most cases got the job done. Back then, the dominant discourse was that you had to tell a computer how to solve a problem, not just show it a large number of positive and negative examples and allow it to learn by itself (though that was the approach taken by the elective I did not elect for).

One such heuristic was Simulated Annealing. The problem with a classic “hill climbing” algorithm is that you can get caught in local optima. And the deterministic hill climbing algorithm doesn’t let you get off your local optima to search for better optima. Hence there are variants. In Simulated Annealing, in the early part of the algorithm you are allowed to take big steps down (assuming you are trying to find the peak). As the algorithm progresses, it “cools down” (hence simulated annealing) and the extent to which you are allowed to climb down is massively reduced.

It is not just in algorithms, or in the case of AI, do we get stuck in local optima. In a recent post, I had made a passing reference to a paper about the tube strikes of 2014.

It is clearly visible from the two panels that far fewer commuters were able to use their modal station during the strike, which implies that a substantial number of individuals were forced to explore alternative routes. The data also suggest that the strike brought about some lasting changes in behaviour, as the fraction of commuters that made use of their modal station seemingly drops after the strike (in the paper we substantiate this claim econometrically).

Screw the paper if you don’t want to read it. Basically the concept is that the strike of 2014 shook things up. People were forced to explore alternatives. And some alternatives stuck. In other words, a lot of people had got stuck in local maxima. And when an external event (the strike) pushed them off their local pedestals (figuratively speaking), they were able to find better maxima.

And that was only the result of a three-day strike. Now, the pandemic has gone on for 5-6 months now (depending on the part of world you are in). During this time, a lot of behaviour otherwise considered normal have been questioned by people behaving thus. My theory is that a lot of these hitherto “normal behaviours” were essentially local optima. And with the pandemic forcing people to rethink their behaviours, they will find better optima.

I can think of a few examples from my own life.

1. I wrote about this the other day. I had gotten used to a schedule of heavy weight lifting for my workouts. I had plateaued in all my lifts, and this meant that my upper body had plateaued at a rather suboptimal level. However much I tried to improve my bench press and shoulder press (using only these movements) the bar refused to budge. And my shoulders refused to get bigger. I couldn’t do a (palms facing away) pull up.
Thanks to the pandemic, the gym shut, and I was forced to do body weight exercises at home. There was a limit on how much I could load my legs and back, so I focussed more on my upper body, especially doing different progressions of the pushup. And back in the gym today, I discovered I could easily do pullups now.

Similarly, the progression of body weight squats I knew forced me to learn to squat deep (hamstrings touching calves). Today for the first time ever I did deep front squats. This means in a few months I can learn to clean.

2. I was used to eating Milky Mist set curd (the one that comes in a 1kg box). It was nice and creamy and I loved eating it. It isn’t widely available and there was one supermarket close to home from where I could get it. As soon as the lockdown happened that supermarket shut. Even when it opened it had long lines, and there were physical barricades between my house and that so I couldn’t drive to it.

In the meantime I figured that the guy who delivers milk to my door in the morning could deliver (Nandini) curd as well. And I started buying from him. Well, it’s not as creamy as Milky Mist, but it’s good enough. And I’m not going back.

3. This was a see-saw. For the first month of the lockdown most bakeries nearby were shut. So I started trying out bread at this supermarket close to home (not where I got Milky Mist from). I loved it. Presently, bakeries reopened and the density of cases in Bangalore meant I became wary of going to supermarkets. So now we’ve shifted back to freshly baked bread from the local bakery
4. I’d tried intermittent fasting several times in life but had never been able to do it on a consistent basis. In the initial part of the lockdown good bread was hard to come by (since the bakeries shut and I hadn’t discovered the supermarket bread yet). There had been a bird flu scare near Bangalore so we weren’t buying eggs either. What do we do for breakfast? Just skip it. Now i have no problem not having breakfast at all

The list goes on. And I’m sure this applies to you as well. Think of all the behavioural changes that the pandemic has forced on you, and think of which all you will go back on once it has passed. There is likely to be a set of behavioural changes that won’t change back.

Like how one in 20 passengers who changed routes following the 2014 tube strikes never went back to their earlier routes. Except that this time it is a 6-month disruption.

What this means is that even when the pandemic is past us, the economy will not look like the economy that was before the pandemic hit us. There will be winners and losers. And since it will take time and effort for people doing “loser jobs” to retrain themselves (if possible) to do “winner jobs”, the economic downturn will be even longer.

I’m calling it the “tube strike mental model” for behavioural change during the pandemic.

Half-watching movies, and why I hate tweetstorms

It has to do with “bit rate”

I don’t like tweetstorm. Up to six tweets is fine, but beyond that I find it incredibly difficult to hold my attention for. I actually find it stressful. So of late, I’ve been making a conscious effort to stop reading tweetstorms when they start stressing me out. The stress isn’t worth any value that the tweetstorms may have.

I remember making the claim on twitter that I refuse to read any more tweetstorms of more than six tweets henceforth. I’m not able to find that tweet now.

Anyways…

Why do I hate tweetstorms? It is for the same reason that I like to “half-watch” movies, something that endlessly irritates my wife. I has to do with “bit rates“.

I use the phrase “bit rate” to refer to the rate of flow of information (remember that bit is a measure of information).

The thing with movies is that some of them have very low bit rate. More importantly, movies have vastly varying bit rates through their lengths. There are some parts in a movie where pretty much nothing happens, and a lot of it is rather predictable. There are other parts where lots happens.

This means that in the course of a movie you find yourself engrossed in some periods and bored in others, and that can be rather irritating. And boredom in the parts where nothing is happening sometimes leads me to want to turn off the movie.

So I deal with this by “half watching”, essentially multi tasking while watching. Usually this means reading, or being on twitter, while watching a movie. This usually works beautifully. When the bit rate from the movie is high, I focus. When it is low, I take my mind off and indulge in the other thing that I’m doing.

It is not just movies that I “half-watch” – a lot of sport also gets the same treatment. Like right now I’m “watching” Watford-Southampton as I’m writing this.

A few years back, my wife expressed disapproval of my half-watching. By also keeping a book or computer, I wasn’t “involved enough” in the movie, she started saying, and that half-watching meant we “weren’t really watching the movie together”. And she started demanding full attention from me when we watched movies together.

The main consequence of this is that I started watching fewer movies. Given that I can rather easily second-guess movie plots, I started finding watching highly predictable stuff rather boring. In any case, I’ve recently received permission to half-watch again, and have watched two movies in the last 24 hours (neither of which I would have been able to sit through had I paid full attention – they had low bit rates).

So what’s the problem with tweetstorms? The problem is that their bit rate is rather high. With “normal paragraph writing” we have come to expect a certain degree of redundancy. This allows us to skim through stuff while getting information from them at the same time. The redundancy means that as long as we get some key words or phrases, we can fill in the rest of the stuff, and reading is rather pleasant.

The thing with a tweetstorm is that each sentence (tweet, basically) has a lot of information packed into it. So skimming is not an option. And the information hitting your head at the rate that tweetstorms generally convey can result in a lot of stress.

The other thing with tweetstorms, of course, is that each tweet is disjoint from the one before and after it. So there is no flow to the reading, and the mind has to expend extra energy to process what’s happening. Combine this with a rather high bit rate, and you know why I can’t stand them.

Cookbooks, Code and College

Why Is This Interesting, a fascinating daily newsletter I subscribe to, has this edition on code and cookbooks. The basic crib here is that most coding books teach you to code as if you were trying to become a professional coder, rather than trying to teach you to code as an additional life skill.

This, the author Noah Brier remarks, is quite unlike how most cookbooks teach you to cook, where there is absolutely no pretence of trying to turn you into a professional cook. Cookbooks know that most people who want to learn to cook simply want to cook for themselves or their families, so professional level learning is not required. This, however, is not the case with books on coding.

In fact, this pretty much explains why I completely fell out of love with coding during my undergrad in Computer Science. I remember being rather excited in 2000, when I got an entrance exam score good enough to get admitted to the Computer Science program at IIT Madras. I had learnt to code only two years before, but I’d taken on to it rather well, and had quickly built a reputation of being one of the better coders in school.

And then the four year program in Computer Science sucked out all the love I had for coding. This cooking-code post reminded me why – basically most professors in my department assumed that all of us wanted to be academics and taught us that way. This wasn’t an unfair assumption, since 17 of the 22 of us who graduated in 2004 either immediately or in a couple of years went to grad school.

However, the approach of teaching that assumed that you would be an expert or an academic meant a paradigm that made it incredibly hard to learn unless you were insanely motivated.

For example, the fourth year B.Tech. project was almost always supposed to be a “work of research” that would turn into a paper (or dozen). There was a lot of theory all round (I didn’t mind parts of it, like some bits of algorithm analysis, but most of it was boring). The course was heavy in terms of assignments, which you can argue was a practical concept, but the way the assignments were done by most people meant that the bar was rather academic.

And that meant that someone like me, who didn’t want to be “an engineer” to begin with, but had entered with a love for coding, quickly fell out of love with the field itself. In hindsight, given the way I was taught, I’m not surprised that my first option upon exit was to go to business school, and it would be at least five years later that I would begin to appreciate that I had an aptitude for code.

(Interestingly, business school was different. Nobody assumed anybody would become an academic, so the teaching was far more palatable.)

At the end of my B.Tech. in Computer Science and Engineering from IIT Madras, I was very clear about one thing – I didn’t want to be an engineer. I didn’t want to pursue a career in Computer Science, either. This was after entering IIT with a reputation of being a “stud programmer”, and being cocky and telling people that my hobby was “programming”.

I must have written about this enough times on this blog that I can’t be bothered about finding links, but my Computer Science degree at IIT Madras made me hate programming. I didn’t mind (some of) the maths, but it was the actual coding bit that I actively came to hate. And when an internship told me that research wasn’t something I was going to be good at, fleeing the field was an obvious decision and I quickly went to business school.

Thinking back about it, I think my problem is that I give up when faced with steep learning curves. I like systems that are easy and intuitive to use, and have a great user experience. The “geeky” products that are difficult to use and geeks take pride in, I have no patience for. I remember learning to code macros in Microsoft Excel in my first post B-school job in 2006 being the time when I started falling in love with Computer Science once again.

The big problem with CSE in IIT Madras was that they made you code. A lot. Which you might think is totally normal for a program in computer science. Except that all the professors there were perhaps like me, and wanted systems that were easy to use, which means that just about anything we needed to build, we needed to build a user interface for. And in 2002, that meant coding in Java, and producing those ugly applets which were interactive but anything but easy to use.

The amount of Java coding I did in those four years is not funny. And Java is a difficult language to code – it’s incredibly verbose and complicated (especially compared to something like Python, for example), and impossible to code without a book or a dictionary of APIs handy. And because it’s so verbose, it’s buggy. And you find it difficult to make things work. And even when you make it work, the UI that it produces is incredibly ugly.

So it amused me to come across this piece of news that my old department has “developed a new framework that could make the programs written in JAVA language more efficient“. I don’t know who uses Java any more (I thought the language of choice among computer scientists nowadays is Python. While it’s infinitely easier than Java, it again produces really ugly graphics), but it’s interesting that people in my old department are still at it. And even going about making things more efficient!

Also, you might find the article itself (this is on the IIT alumni website) amusing. Go ahead and give it a read.

To solve this problem, V Krishna and Manas Thakur tweaked the two compilation procedures. In the first compilation step, more elaborate and time-consuming analysis is performed and wherever the conversion stalls due to unavailability of the library from the computer, a partial result is created. Now, during the second stage of compilation, the just in-time compilers, with available libraries from the computer, work to resolve the partial values to generate final values and finally a more precise result. As the time taken during the first exhaustive compilation does not get included in execution time, the whole procedure still remains time-saving, while leading to highly efficient codes