Mo Salah and Machine Learning

First of all, I’m damn happy that Mo Salah has renewed his Liverpool contract. With Sadio Mane also leaving, the attack was looking a bit thin (I was distinctly unhappy with the Jota-Mane-Diaz forward line we used in the Champions League final. Lacked cohesion). Nunez is still untested in terms of “leadership”, and without Salah that would’ve left Firmino as the only “attacking leader”.

(non-technical readers can skip the section in italics and still make sense of this post)

Now that this is out of the way, I’m interested in seeing one statistic (for which I’m pretty sure I don’t have the data). For each of the chances that Salah has created, I want to look at the xG (expected goals) and whether he scored or not. And then look at a density plot of xG for both categories (scored or not). 

For most players, this is likely to result in two very distinct curves – they are likely to score from a large % of high xG chances, and almost not score at all from low xG chances. For Salah, though, the two density curves are likely to be a lot closer.

What I’m saying is – most strikers score well from easy chances, and fail to score from difficult chances. Salah is not like that. On the one hand, he creates and scores some extraordinary goals out of nothing (low xG). On the other, he tends to miss a lot of seemingly easy chances (high xG).

In fact, it is quite possible to look at a player like Salah, see a few sitters that he has missed (he misses quite a few of them), and think he is a poor forward. And if you look at a small sample of data (or short periods of time) you are likely to come to the same conclusion. Look at the last 3-4 months of the 2021-22 season. The consensus among pundits then was that Salah had become poor (and on Reddit, you could see Liverpool fans arguing that we shouldn’t give him a lucrative contract extension since ‘he has lost it’).

It is well possible that this is exactly the conclusion Jose Mourinho came to back in 2013-14 when he managed Salah at Chelsea (and gave him very few opportunities). The thing with a player like Salah is that he is so unpredictable that it is very possible to see samples and think he is useless.

Of late, I’ve been doing (rather, supervising (and there is no pun intended) ) a lot of machine learning work. A lot of this has to do with binary classification – classifying something as either a 0 or a 1. Data scientists build models, which give out a probability score that the thing is a 1, and then use some (sometimes arbitrary) cutoff to determine whether the thing is a 0 or a 1.

There are a bunch of metrics in data science on how good a model is, and it all comes down to what the model predicted and what “really” happened. And I’ve seen data scientists work super hard to improve on these accuracy measures. What can be done to predict a little bit better? Why is this model only giving me 77% ROC-AUC when for the other problem I was able to get 90%?

The thing is – if the variable you are trying to predict is something like whether Salah will score from a particular chance, your accuracy metric will be really low indeed. Because he is fundamentally unpredictable. It is the same with some of the machine learning stuff – a lot of models are trying to predict something that is fundamentally unpredictable, so there is a limit on how accurate the model will get.

The problem is that you would have come across several problem statements that are much more predictable that you think it is a problem with you (or your model) that you can’t predict better. Pundits (or Jose) would have seen so many strikers who predictably score from good chances that they think Salah is not good.

The solution in these cases is to look at aggregates. Looking for each single prediction will not take us anywhere. Instead, can we predict over a large set of data whether we broadly got it right? In my “research” for this blogpost, I found this.

Last season, on average, Salah scored precisely as many goals as the model would’ve predicted! You might remember stunners like the one against Manchester City at Anfield. So you know where things got averaged out.

Pirate organisations

It’s over 20 years now since I took a “core elective” (yeah, the contradiction!) in IIT on “design and analysis of algorithms”. It was a stellar course, full of highly interesting assignments and quotable quotes. The highlight of the course was a “2 pm onwards” mid term examination, where we could take as much time as we wanted.

Anyway, the relevance of that course to this discussion is one of the problems in our first assignment. It was a puzzle .

It has to do with a large number of pirates who have chanced upon a number of gold coins. There is a strict rank ordering of pirates from most to least powerful (1 to N, with 1 being the most powerful). The problem is about how to distribute the coins among the pirates.

Pirate 1 proposes a split. If at least half the pirates (including himself) vote in favour of the split, the split is accepted and everyone goes home. If (strictly) more than half vote against the split, the pirate is thrown overboard and Pirate 2 proposes a split. This goes on until the split has been accepted. Assuming all the pirates are perfectly rational, how would you split the coins if you were Pirate 1? There is a Wikipedia page on it.

I won’t go into the logic here, but the winning play for Pirate 1 is to give 1 coin to each of the other odd numbered pirates, and keep the rest for himself. If he fails to do so and gets thrown overboard, the optimal solution for Pirate 2 is to give 1 coin to each of the other even numbered pirates, and keep the rest for himself.

So basically you see that this kind of a game structure implies that all odd numbered pirates form a coalition, and all the even numbered pirates form another. It’s like if you were to paint all pirates in one coalition black, you would get a perfectly striped structure.

Now, this kind of a “alternating coalition” can sometimes occur in corporate settings as well. Let us stick to just one path in the org chart, down to the lowest level of employee (so no “uncles” (in a tree sense) in the mix).

Let’s say you are having trouble with your boss and are unable to prevail upon her for some reason. Getting the support of your peers is futile in this effort. So what do you do? You go to your boss’s boss and try to get that person onside, and together you can take on your boss. This can occasionally be winning.

Similarly, let us say you seek to undermine (in the literal sense) one of your underlings who is being troublesome. What do you do? You ally with one of their underlings, to try and prevail upon your underling. Let’s say your boss and your underling have thought similarly to you – they will then ally to try and take you down.

Now see what this looks like – your boss’s boss, you and your underling’s underling are broadly allied. Your boss and your underling (and maybe your underling’s underling’s underling) are broadly allied. So it is like the pirate problem yet again, with people alternate in the hierarchy allying with each other!

Then again, in organisations, alliances and rivalries are never permanent. For each piece of work that you seek to achieve, you do what it takes and ally with the necessary people to finish it. And so, in the broad scheme of all alliances that happen, this “pirate structure” is pretty rare. And so it hasn’t been studied well enough.

PS: I was wondering recently why people don’t offer training programs in “corporate game theory”. The problem, I guess, is that no HR or L&D person will sponsor it – there is no point in having everyone in your org being trained in the same kind of game theory – they will nullify each other and the training will do down the drain.

I suppose this is why you have leadership coaches – who are hired by individual employees to navigate the corporate games.

Management and Verification

For those of you who are new here, my wife and I used to organise “NED Talks” in our home in Bangalore. The first edition happened in 2015 (organised on a whim), and encouraged by its success, we organised 10 more editions until 2019. We have put up snippets of some talks here.

In the second edition of the NED Talks (February 2015), we had a talk by V Vinay (noted computer scientist, former IISc professor, co-inventor of Simputer, co-founder of Strand Life Sciences, Ati Motors, etc. etc.), where he spoke about “computational complexity”.

Now, having studied computer science, “computational complexity” was not a new topic to me, but one thing that Vinay said has stayed with me – it is that verifying an algorithm is far more efficient than actually executing the algorithm.

To take a simple example, factorising a number into prime factors is NP Hard – in other words, it is a really hard problem. However, verifying the prime factorisation of a number is trivial – you can just multiply the factors and see if it gives back the number you started with.

I was thinking about this paradigm the ohter day when I was thinking about professional managers – several times in life I have wondered “how can this person manage this function when he/she has no experience in that function?”. Maybe it is because I had been subjected to two semesters of workshop in the beginning of my engineering, but I have intuitively assumed that you can only manage stuff that you have personally done – especially if it is a non-trivial / specialist role.

But then – if you think about it, at some level, management is basically about “verification”. To see whether you have done your work properly, I don’t need to precisely know how you have done it. All I need to know is whether you have done bullshit – which means, I don’t need to “replicate your algorithm”. I only need to “verify your algorithm”, which computer science tells us can be an order of magnitude simpler than actually building the algorithm.

The corollary of this is that if you have managed X, you need not be good at X, or actually even have done X. All it shows is that you know how to manage X, which can be an order of magnitude simple than actually doing X.

This also (rather belatedly) explains why I have largely been wary of hiring “pure managers” for my team. Unless they have been hands on at their work, I start wondering if they actually know how to do it, or only know how to manage it (and I’m rather hands on, and only hire hands on people).

And yet another corollary is that if you have spent too long just managing teams, you might have gotten so used to just verifying algorithms that you can’t write algorithms any more.

And yet another before I finish – computer science has a lot of lessons to offer life.

 

Jio, Amazon and Information Content

A long long time ago I had installed the Jio Cinema app on my Fire TV Stick. I had perhaps watched two movies on that, and then completely forgotten about it. This evening, I had to look for a movie to watch my the wife, and having exhausted most of the “compatible content” (stuff we can watch together on Netflix) and been exhausted by the user experience on Prime Video, I decided to give this app a try.

I ended up selecting a movie, which I later found out has a 4.5 IMDB rating and doesn’t even have a Wikepedia page. Needless to say, we abandoned the movie midway. That’s when the wife went in to put the daughter to bed and my fun began.

So Jio Cinema follows what I call the “Amazon paradigm for product management”. Since Amazon tries to sell every product (or service) as if it is a physical book, it has one single mantra for product management. “Improve selection and they will come”.

The user experience doesn’t matter. How easy the product is to use, and how pleasing it looks on the eye, and whether it has occasional bugs, is all secondary. All that matters is selection. Given that the company built its business on the back of selling “long tail” books, this is not so surprising, except that it doesn’t necessarily work in other categories.

I’ve written about Amazon’s ineptitude in product management before, in the context of that atrocity of an app called Sony Liv. The funny thing is that the Jio Cinema app (on Fire TV Stick) looks and feels pretty much exactly like Sony Liv. Maybe there is an open source shitty fire TV app that these guys have based themselves on?

In any case, I started browsing the Jio Cinema app, and I found something called “movies in 15 minutes“. Initially I thought it was a parody. The first few movies I noticed there were things I had never heard of. “This is perhaps for bad movies”, I reasoned. I kept scrolling, and more recognisable names popped up.

I decided to watch Deewana, which was released just before the start of my optimal age of movie appreciation, and which, for some reason, we didn’t get home a video cassette of.

It’s basically a collage of scenes from the movie. It’s like someone has put together a “highlights package”, taking all the important scenes and then putting them together.

And for a movie like Deewana it works. The 15 minute version had all the necessary plot elements to fully follow the movie. It is a great movie, for 15 minutes. Maybe at 30 minutes as well it might be a great movie. However, I can’t imagine having watched it in the full version.

That was two hours back. I’ve since gone crazy watching 15 minute versions of many other movies (mostly from the 70s and 80s, though they have movies as recent as Jab We Met). It’s been fantastic.

However, I have one crib. This has to do with information content. Essentially, the premise behind “movies in 15 minutes” is that the information content in these movies is so little that the whole thing can be compressed into 15 minutes.  The problem is that not every movie has the same amount of information.

15 minutes was perfect for Deewana. It was also appropriate for Kasam Paida Karne Waali Ki, which I watched only because it gets referenced in Gangs of Wasseypur. Between these two, I “watched” Namak Halaal, and I didn’t understand the head or tail of it. I had to go to Wikepedia to understand the plot.

Essentially the plot of Namak Halaal is complex enough, I imagine, that compressing it into 15 minutes is impossible without significant information loss. And the loss of information was so much that I couldn’t understand the summary at all. Maybe I’ll watch the movie in full some day.

I’m writing this blogpost after watching the 15 minute version of Don. I guess whoever made the summary realised that the movie is so complex that it can’t really be compressed into 15 minutes – and so they have added a voiceover to narrate the key elements.

In any case, I’m feeling super thrilled. I normally don’t watch movies because the bit rate in most movies is too low. Compression means that I can happily watch the movies without ever getting bored.

I wish they made these 15 minute versions of all movies. Jio, all (your Amazon-style product maangement) is forgiven.

Now on to Amar Akbar Anthony.

Java and IIT Madras

At the end of my B.Tech. in Computer Science and Engineering from IIT Madras, I was very clear about one thing – I didn’t want to be an engineer. I didn’t want to pursue a career in Computer Science, either. This was after entering IIT with a reputation of being a “stud programmer”, and being cocky and telling people that my hobby was “programming”.

I must have written about this enough times on this blog that I can’t be bothered about finding links, but my Computer Science degree at IIT Madras made me hate programming. I didn’t mind (some of) the maths, but it was the actual coding bit that I actively came to hate. And when an internship told me that research wasn’t something I was going to be good at, fleeing the field was an obvious decision and I quickly went to business school.

Thinking back about it, I think my problem is that I give up when faced with steep learning curves. I like systems that are easy and intuitive to use, and have a great user experience. The “geeky” products that are difficult to use and geeks take pride in, I have no patience for. I remember learning to code macros in Microsoft Excel in my first post B-school job in 2006 being the time when I started falling in love with Computer Science once again.

The big problem with CSE in IIT Madras was that they made you code. A lot. Which you might think is totally normal for a program in computer science. Except that all the professors there were perhaps like me, and wanted systems that were easy to use, which means that just about anything we needed to build, we needed to build a user interface for. And in 2002, that meant coding in Java, and producing those ugly applets which were interactive but anything but easy to use.

The amount of Java coding I did in those four years is not funny. And Java is a difficult language to code – it’s incredibly verbose and complicated (especially compared to something like Python, for example), and impossible to code without a book or a dictionary of APIs handy. And because it’s so verbose, it’s buggy. And you find it difficult to make things work. And even when you make it work, the UI that it produces is incredibly ugly.

So it amused me to come across this piece of news that my old department has “developed a new framework that could make the programs written in JAVA language more efficient“. I don’t know who uses Java any more (I thought the language of choice among computer scientists nowadays is Python. While it’s infinitely easier than Java, it again produces really ugly graphics), but it’s interesting that people in my old department are still at it. And even going about making things more efficient!

Also, you might find the article itself (this is on the IIT alumni website) amusing. Go ahead and give it a read.

To solve this problem, V Krishna and Manas Thakur tweaked the two compilation procedures. In the first compilation step, more elaborate and time-consuming analysis is performed and wherever the conversion stalls due to unavailability of the library from the computer, a partial result is created. Now, during the second stage of compilation, the just in-time compilers, with available libraries from the computer, work to resolve the partial values to generate final values and finally a more precise result. As the time taken during the first exhaustive compilation does not get included in execution time, the whole procedure still remains time-saving, while leading to highly efficient codes

Hypothesis Testing in Monte Carlo

I find it incredible, and not in a good way, that I took fourteen years to make the connection between two concepts I learnt barely a year apart.

In August-September 2003, I was auditing an advanced (graduate) course on Advanced Algorithms, where we learnt about randomised algorithms (I soon stopped auditing since the maths got heavy). And one important class of randomised algorithms is what is known as “Monte Carlo Algorithms”. Not to be confused with Monte Carlo Simulations, these are randomised algorithms that give a one way result. So, using the most prominent example of such an algorithm, you can ask “is this number prime?” and the answer to that can be either “maybe” or “no”.

The randomised algorithm can never conclusively answer “yes” to the primality question. If the algorithm can find a prime factor of the number, it answers “no” (this is conclusive). Otherwise it returns “maybe”. So the way you “conclude” that a number is prime is by running the test a large number of times. Each run reduces the probability that it is a “no” (since they’re all independent evaluations of “maybe”), and when the probability of “no” is low enough, you “think” it’s a “yes”. You might like this old post of mine regarding Monte Carlo algorithms in the context of romantic relationships.

Less than a year later, in July 2004, as part of a basic course in statistics, I learnt about hypothesis testing. Now (I’m kicking myself for failing to see the similarity then), the main principle of hypothesis testing is that you can never “accept a hypothesis”. You either reject a hypothesis or “fail to reject” it.  And if you fail to reject a hypothesis with a certain high probability (basically with more data, which implies more independent evaluations that don’t say “reject”), you will start thinking about “accept”.

Basically hypothesis testing is a one-sided  test, where you are trying to reject a hypothesis. And not being able to reject a hypothesis doesn’t mean we necessarily accept it – there is still the chance of going wrong if we were to accept it (this is where we get into messy territory such as p-values). And this is exactly like Monte Carlo algorithms – one-sided algorithms where we can only conclusively take a decision one way.

So I was thinking of these concepts when I came across this headline in ESPNCricinfo yesterday that said “Rahul Johri not found guilty” (not linking since Cricinfo has since changed the headline). The choice, or rather ordering, of words was interesting. “Not found guilty”, it said, rather than the usual “found not guilty”.

This is again a concept of one-sided testing. An investigation can either find someone guilty or it fails to do so, and the heading in this case suggested that the latter had happened. And as a deliberate choice, it became apparent why the headline was constructed this way – later it emerged that the decision to clear Rahul Johri of sexual harassment charges was a contentious one.

In most cases, when someone is “found not guilty” following an investigation, it usually suggests that the evidence on hand was enough to say that the chance of the person being guilty was rather low. The phrase “not found guilty”, on the other hand, says that one test failed to reject the hypothesis, but it didn’t have sufficient confidence to clear the person of guilt.

So due credit to the Cricinfo copywriters, and due debit to the product managers for later changing the headline rather than putting a fresh follow-up piece.

PS: The discussion following my tweet on the topic threw up one very interesting insight – such as Scotland having had a “not proven” verdict in the past for such cases (you can trust DD for coming up with such gems).

Beer and diapers: Netflix edition

When we started using Netflix last May, we created three personas for the three of us in the family – “Karthik”, “Priyanka” and “Berry”. At that time we didn’t realise that there was already a pre-created “kids” (subsequently renamed “children” – don’t know why that happened) persona there.

So while Priyanka and I mostly use our respective personas to consume Netflix (our interests in terms of video content hardly intersect), Berry uses both her profile and the kids profile for her stuff (of course, she’s too young to put it on herself. We do it for her). So over the year, the “Berry” profile has been mostly used to play Peppa Pig, and the occasional wildlife documentary.

Which is why we were shocked the other day to find that “Real life wife swap” had been recommended on her account. Yes, you read that right. We muttered a word of abuse about Netflix’s machine learning algorithms and since then have only used the “kids” profile to play Berry’s stuff.

Since then I’ve been wondering what made Netflix recommend “real life wife swap” to Berry. Surely, it would have been clear to Netflix that while it wasn’t officially classified as one, the Berry persona was a kid’s account? And even if it didn’t, didn’t the fact that the account was used for watching kids’ stuff lead the collaborative filtering algorithms at Netflix to recommend more kids’ stuff? I’ve come up with various hypotheses.

Since I’m not Netflix, and I don’t have their data, I can’t test it, but my favourite hypothesis so far involves what is possibly the most commonly cited example in retail analytics – “beer and diapers“. In this most-likely-apocryphal story, a supermarket chain discovered that beer and diapers were highly likely to appear together in shopping baskets. Correlation led to causation and a hypothesis was made that this was the result of tired fathers buying beer on their diaper shopping trips.

So the Netflix version of beer-and-diapers, which is my hypothesis, goes like this. Harrowed parents are pestered by their kids to play Peppa Pig and other kiddie stuff. The parents are so stressed that they don’t switch to the kid’s persona, and instead play Peppa Pig or whatever from their own accounts. The kid is happy and soon goes to bed. And then the parent decides to unwind by watching some raunchy stuff like “real life wife swap”.

Repeat this story in enough families, and you have a strong enough pattern that accounts not explicitly classified as “kids/children” have strong activity of both kiddie stuff and adult content. And when you use an account not explicitly mentioned as “kids” to watch kiddie stuff, it gets matched to these accounts that have created the pattern – Netflix effectively assumes that watching kid stuff on an adult account indicates that the same account is used to watch adult content as well. And so serves it to Berry!

Machine learning algorithms basically work on identifying patterns in data, and then fitting these patterns on hitherto unseen data. Sometimes the patterns make sense – like Google Photos identifying you even in your kiddie pics. Other times, the patterns are offensive – like the time Google Photos classified a black woman as a “gorilla“.

Thus what is necessary is some level of human oversight, to make sure that the patterns the machine has identified makes some sort of sense (machine learning purists say this is against the spirit of machine learning, since one of the purposes of machine learning is to discover patterns not perceptible to humans).

That kind of oversight at Netflix would have suggested that you can’t tag a profile to a “kiddie content AND adult content” category if the profile has been used to watch ONLY kiddie content (or ONLY adult content). And that kind of oversight would have also led Netflix to investigate issues of users using “general” account for their kids, and coming up with an algorithm to classify such accounts as kids’ accounts, and serve only kids’ content there.

It seems, though, that algorithms run supreme at Netflix, and so my baby daughter gets served “real life wife swap”. Again, this is all a hypothesis (real life wife swap being recommended is a fact, of course)!

Genetic Algorithms

I first learnt about Genetic Algorithms almost exactly thirteen years ago, when it was taught to me by Prof. Deepak Khemani as part of a course on “artificial intelligence”. I remember liking the course a fair bit, and took a liking to the heuristics and “approximate solutions” after the mathematically intensive algorithms course of the previous semester.

The problem with the course, however, was that it didn’t require us to code the algorithms we had learnt (for which we were extremely thankful back then, since in term 5 of Computer Science at IIT Madras, this was pretty much the only course that didn’t involve too many programming assignments).

As a result, while I had learnt all these heuristics (hill climbing, simulated annealing, taboo search, genetic algorithms, etc.) fairly well in theory, I had been at a complete loss as to how to implement any of them. And so far, during the course of my work, I had never had an opportunity to use any of these techniques. Until today that is.

I can’t describe the problem here since this is a client assignment, but when I had to pick a subset from a large set that satisfied certain properties, I knew I needed a method that would reach the best subset quickly. A “fitness function” quickly came to mind and it was obvious that I should use genetic algorithms to solve the problem.

The key with using genetic algorithms is that you need to be able to code the solution in the form of a string, and then define functions such as “crossover” and “mutation”. Given that I was looking for a subset, coding it as a string was rather easy, and since I had unordered subsets, the crossover was also easy – basic random number generation. Within fifteen minutes of deciding I should use GA, the entire algorithm was in my head. It was only a question of implementing it.

As I started writing the code, I started getting fantasies of being able to finish it in an hour and then write a blog post about it. As it happened, it took much longer. The first cut took some three hours (including some breaks), and it wasn’t particularly good, and was slow to converge.  I tweaked things around a bit but things didn’t improve by much.

And that was when I realise that I had done the crossover wrong – when two strings have elements in common and need to be crossed over, I had to take care that elements in common did not repeat into the same “child” (needed the subsets to be of a certain length). So that needed some twist in the code. That done, the code still seemed inefficient.

I had been doing the crossover wrong. If I started off with 10 strings, I would form 5 pairs from them (each participating in exactly one pair) which would result in 10 new strings. And then I would put these 20 (original 10 and new 10) strings through a fitness test and discard the weakest 10. And iterate. The problem was that the strongest strings had as much of a chance of reproducing as the weakest. This was clearly not right.

So I tweaked the code so that the fitter strings had a higher chance of reproducing than the less fit. This required me to put the fitness test at the beginning of each iteration rather than the end. I had to refactor the code a little bit to make sure I didn’t repeat computations. Now I was drawing pairs of strings from the original “basket” and randomly crossing them over. And putting them through the fitness test. And so forth.

I’m extremely happy with the results of the algorithm. I’ve got just the kind of output that I had expected. More importantly, I was extremely happy with the process of coding the whole thing in. I did the entire coding in R, which is what I use for my data analysis (data size meant I didn’t need anything quicker).

The more interesting part is that this only solved a very small part of the problem I’m trying to solve for my client. Tomorrow I’m back to solving a different part of the problem. Genetic algorithms have served their purpose. Back when I started this assignment I had no clue I would be using genetic algorithms. In fact, I had no clue what techniques I might use.

Which is why I get annoyed when people ask me what kind of techniques I use in my problem solving. Given the kind of problems I take on, most will involve a large variety of math, CS and statistics techniques, each of which will only play a small role in the entire solution. This is also the reason I get annoyed when people put methods they are going to use to solve the problem on their pitch-decks. To me, that gives an impression that they are solving a toy version of the problem and not the real problem – or that the consultants are grossly oversimplifying the problem to be solved.

PS: Much as some people might describe it that way, I wouldn’t describe Genetic Algorithms as “machine learning”. I think there’s way too much intervention on the part of the programmer for it to be described thus.

Making coding cool again

I learnt to code back in 1998. My aunt taught me the basics of C++, and I was fascinated by all that I could make my bad old x386 computer to do. Soon enough I was solving complex math problems, and using special ASCII characters to create interesting pattens on screen. It wasn’t long before I wrote the code for two players sitting on the same machine to play Pong. And that made me a star.

I was in a rather stud class back then (the school I went to in class XI had a reputation for attracting toppers), and after a while I think I had used my coding skills to build a reasonable reputation. In other words, coding was cool. And all the other kids also looked up to coding as a special skill.

Somewhere down the line, though I don’t remember when it was, coding became uncool. Despite graduating with a degree in Computer Science from IIT Madras, I didn’t want a “coding job”. I ended up with one, but didn’t want to take it, and so I wrote some MBA entrance exams, and made my escape that way.

By the time I graduated from my MBA, coding had become even more uncool. If you were in a job that required you to code, it was an indication that you were in the lowest rung, and thus not in a “management job”. Perhaps even worse, if your job required you to code, you were probably in an “IT job”, something that was back then considered as being a “dead end” and thus not a very preferred job. Thus, even if you coded in your job, you tended to downplay it. You didn’t want your peers to think you were either in a “bottom rung” job or in an “IT job”. So I wrote fairly studmax code (mostly using VB on Excel) but didn’t particularly talk about it when I met my MBA friends. As I moved jobs (they became progressively studder) my coding actually increased, but I continued to downplay the coding bit.

And I don’t think it’s just me. Thanks to the reasons explained above, coding is considered uncool among most MBA graduates. Even most engineering graduates from good colleges don’t find coding cool, for that is the job that their peers in big-name big-size dead-end-job software services companies do. And if people consider coding uncool, it has a dampening impact on the quality of talent that goes into jobs that involves coding. And that means code becomes less smart. And so forth.

So the question is how we can make coding cool again. I’m not saying it’s totally uncool. There are plenty of talented people who want to code, and who think it’s cool. The problem though is that the marginal potential coder is not taking to coding because he thinks that coding is not cool enough. And making coding cool will make more people want to take it up, which will lead to greater number of people take up this vocation!

Any ideas?

Hedgehogs and foxes: Or, a day in the life of a quant

I must state at the outset that this post is inspired by the second chapter of Nate Silver’s book The Signal and the Noise. In that chapter, which is about election forecasting, Silver draws upon the old Russian parable of the hedgehog and the fox. According to that story, the fox knows several tricks while the hedgehog knows only one – curling up into a ball. The story ends in favour of the hedgehog, as none of the tricks of the unfocused fox can help him evade the predator.

Most political pundits, says Silver, are like hedgehogs. They have just one central idea to their punditry and they tend to analyze all issues through that. A good political forecaster, however, needs to be able to accept and process any new data inputs, and include that in his analysis. With just one technique, this can be hard to achieve and so Silver says that to be a good political forecaster one needs to be a fox. While this might lead to some contradictory statements and thus bad punditry, it leads to good forecasts. Anyway, you can know about election forecasting from Silver’s book.

The world of “quant” and “analytics” which I inhabit is again similarly full of hedgehogs. You have the statisticians, whose solution for every problem is a statistical model. They can wax eloquent about Log Likelihood Estimators but can have trouble explaining why you should use that in the first place. Then you have the banking quants (I used to be one of those), who are proficient in derivatives pricing, stochastic calculus and partial differential equations, but if you ask them why a stock price movement is generally assumed to be lognormal, they don’t have answers. Then you have the coders, who can hack, scrape and write really efficient code, but don’t know much math. And mathematicians who can come up with elegant solutions but who are divorced from reality.

While you might make a career out of falling under any of the above categories, to truly unleash your potential as a quant, you should be able to do all. You should be a fox and should know each of these  tricks. And unlike the fox in the Old Russian fairy tale, the key to being a good fox is to know what trick to use when. Let me illustrate this with an example from my work today (actual problem statement masked since it involves client information).

So there were two possible distributions that a particular data point could have come from and I had to try and analyze which of them it came from (simple Bayesian probability, you might think). However, calculating the probability wasn’t so straightforward, as it wasn’t a standard function. Then I figured I could solve the probability problem using the inclusion-exclusion principle (maths again), and wrote down a mathematical formulation for it.

Now, I was dealing with a rather large data set, so I would have to use the computer, so I turned my mathematical solution into pseudo-code. Then, I realized that the pseudo-code was recursive, and given the size of the problem I would soon run out of memory. I had to figure out a solution using dynamic programming. Then, following some more code optimization, I had the probability. And then I had to go back to do the Bayesian analysis in order to complete the solution. And then present the solution in the form of a “business solution”, with all the above mathematical jugglery being abstracted from the client.

This versatility can come in handy in other places, too. There was a problem for which I figured out that the appropriate solution involved building a classification tree. However, given the nature of the data at hand, none of the off-the-shelf classification tree algorithms for were ideal. So I simply went ahead and wrote my own code for creating such trees. Then, I figured that classification trees are in effect a greedy algorithm, and can lead to getting stuck at local optima. And so I put in a simulated annealing aspect to it.

While I may not have in depth knowledge of any of the above techniques (to gain breadth you have to sacrifice depth), that I’m aware of a wide variety of techniques means I can provide the solution that is best for the problem at hand. And as I go along, I hope to keep learning more and more techniques – even if I don’t use them, being aware of them will lead to better overall problem solving.