Cliquebusting

Last evening we hosted a party at home. Like all parties we host, we used Graph Theory to plan this one. This time, however, we used graph theory in a very different way to how we normally use it – our intent was to avoid large cliques. And, looking back, I think it worked.

First, some back story. For some 3-4 months now we’ve been planning to have a party at home. There has been no real occasion accompanying it – we’ve just wanted to have a party for the heck of it, and to meet a few people.

The moment we started planning, my wife declared “you are the relatively more extrovert among the two of us, so organising this is your responsibility”. I duly put NED. She even wrote a newsletter about it.

The gamechanger was this podcast episode I listened to last month.

The episode, like a lot of podcast episodes, is related to this book that the guest has written. Something went off in my head as I listened to this episode on my way to work one day.

The biggest “bingo” moment was that this was going to be a strictly 2-hour party (well, we did 2.5 hours last night). In other words, “limited liability”!!

One of my biggest issues about having parties at my house is that sometimes guests tend to linger on, and there is no “defined end time”. For someone with limited social skills, this can be far more important than you think.

The next bingo was that this would be a “cocktail” party (meaning, no main course food). Again that massively brought down the cost of hosting – no planning menus, no messy food that would make the floor dirty, no hassles of cleaning up, and (most importantly) you could stick to your 2 / 2.5 hour limit without any “blockers”.

Listen to the whole episode. There are other tips and tricks, some of which I had internalised ahead of yesterday’s party. And then came the matter of the guest list.

I’ve always used graph theory (coincidentally my favourite subject from my undergrad) while planning parties. Typical use cases have been to ensure that the graph is connected (everyone knows at least one other person) and that there are no “cut vertices” (you don’t want the graph to get disconnected if one person doesn’t turn up).

This time we used it in another way – we wanted the graph to be connected but not too connected! The idea was that if there are small groups of guests who know each other too well, then they will spend the entirety of the party hanging out with each other, and not add value to the rest of the group.

Related to this was the fact that we had pre-decided that this party is not going to be a one-off, and we will host regularly. This made it easier to leave out people – we could always invite them the next time. Again, it is important that the party was “occasion-less” – if it is a birthday party or graduation party or wedding party or some such, people might feel offended that you left them out. Here, because we know we are going to do this regularly, we know “everyone’s number will come sometime”.

I remember the day we make the guest list. “If we invite X and Y, we cannot invite Z since she knows both X and Y too well”. “OK let’s leave out Z then”. “Take this guy’s name off the list, else there will be too many people from this hostel”. “I’ve met these two together several times, so we can call exactly one of them”. And so on.

With the benefit of hindsight, it went well. Everyone who said they will turn up turned up. There were fourteen adults (including us), which meant that there were at least three groups of conversation at any point in time – the “anti two pizza rule” I’ve written about. So a lot of people spoke to a lot of other people, and it was easy to move across groups.

I had promised to serve wine and kODbaLe, and kept it – kODbaLe is a fantastic party food in that it is large enough that you don’t eat too many in the course of an evening, and it doesn’t mess up your fingers. So no need of plates, and very little use of tissues. The wine was served in paper cups.

I wasn’t very good at keeping up timelines – maybe I drank too much wine. The party was supposed to end at 7:30, but it was 7:45 when I banged a spoon on a plate to get everyone’s attention and inform them that the party was over. In another ten minutes, everyone had left.

Stable Diffusion and Chat GPT and Logistic Regression

For a long time I have had this shibboleth on whether someone is a “statistics person or a machine learning person”. It is based on what they call regressions where the dependent variable is binary. Statisticians simply call it “logit” (there is also a “probit“).

Now, in terms of implementation as well, there is one big difference between the way “logit” is modelled versus “logistic regression”. For a logit model (if you are using python, you need to use the “statsmodels” package for this, not scikit learn), the number of observations needs to far exceed the number of independent variables.

Else, a matrix that needs to be inverted as part of the solution will turn out to be singular, and there will be no solution. I guess I betrayed my greater background in statistics than in Machine Learning when, in 2018, I wrote this blogpost on machine learning being a “process to tie down coefficients in maths models“.

For “logistic regression” (as opposed to “logit”) puts no such constraint – on the regression matrix being invertible. Instead of actually inverting the matrix, machine learning approaches simply focus on learning the terms of the inverted matrix using gradient descent (basically the opposite of hill climbing), so mathematical inconveniences such as matrices that cannot be inverted are moot there.

And so you have logistic regression models with thousands of variables, often calibrated with a fewer number of data points. To be honest, I can’t understand this fully – without sufficient information (data points) to calibrate the coefficients, there will always be a sense of randomness in the output. The model has too many degrees of freedom, and so there is additional information the model is supplying (apart from what was supplied in the training data!).

Of late I have been playing a fair bit with generative AI (primarily ChatGPT and Stable Diffusion). The other day, my daughter and I were alone in my in-laws’ house, and I told her “look I’ve brought my personal laptop along, if you want we can play with it”. And she had demanded that she “play with stable diffusion”. This is the image she got for “tiger chasing deer”.

I have written earlier here about how the likes of ChatGPT and Stable Diffusion in a way redefine “information content“.

 

And if you think about it, almost by definition, “generative AI” creates information (and hallucinates, like in the above pic). Traditionally speaking, a “picture is worth a thousand words”, but if you can generate a picture with just a few words of prompt, the information content in it is far less than a thousand words.

In some sense, this reminds me of “logistic regression” once again. By definition (because it is generative), there is insufficient “tying down of coefficients”, because of which the AI inevitably ends up “adding value of its own”, which by definition is random.

So, you will end up getting arbitrary results. ChatGPT often gives you wrong answers to questions. Dall-E and Midjourney and Stable Diffusion will return nonsense images such as the above. Because a “generative AI” needs to create information, by definition, all the coefficients of the model cannot be well calibrated. 

And the consequence of this is that however good these AIs get, however much data is used to train them, there will always be an element of randomness to them. There will always be test cases where they give funny results.

No, AGI is not here yet.

Computer science and psychology

This morning, when I got back from the gym, my wife and daughter were playing 20 questions, with my wife having just taught my daughter the game.

Given that this was the first time they were playing, they started with guessing “2 digit numbers”. And when I came in, they were asking questions such as “is this number divisible by 6” etc.

To me this was obviously inefficient. “Binary search is O(log n)“, I realised in my head, and decided this is a good time to teach my daughter binary search.

So for the next game, I volunteered to guess, and started with “is the number \ge 55“? And went on to “is the number \ge 77“, and got to the number in my wife’s mind (74) in exactly  7 guesses (and you might guess that \lceil log_2 90 \rceil (90 is the number of 2 digit numbers) is 7).

And so we moved on. Next, I “kept” 41, and my wife went through a rather random series of guesses (including “is it divisible by 4” fairly early on) to get in 8 tries. By this time I had been feeling massively proud, of putting to good use my computer science knowledge in real life.

“See, you keep saying that I’m not a good engineer. See how I’m using skills that I learnt in my engineering to do well in this game”, I exclaimed. My wife didn’t react.

It was finally my daughter’s turn to keep a number in mind, and my turn to guess.

“Is the number \ge 55?”
“Yes”

“Is the number \ge 77?”
“Yes”

“Is the number \ge 88?”
“Yes”

My wife started grinning. I ignored it and continued with my “process”, and I got to the right answer (99) in 6 tries. “You are stupid and know nothing”, said my wife. “As soon as she said it’s greater than 88, I knew it is 99. You might be good at computer science but I’m good at psychology”.

She had a point. And then I started thinking – basically the binary search method works under the assumption that the numbers are all uniformly distributed. Clearly, my wife had some superior information to me, which made 99 far more probable than any number between 89 and 98. And s0 when the answer to “Is the number \ge 88?”turned out to by “yes”, she made an educated guess that it’s 99.

And since I’m used to writing algorithms, and  teaching dumb computers to solve problems, I used a process that didn’t make use of any educated guesses! And thus took far many more steps to get to the answer.

When the numbers don’t follow a uniform distribution, binary search works differently. You don’t start with the middle number – instead, you start with the weighted median of all the numbers! And then go on to the weighted median of whichever half you end up in. And so on and so forth until you find the number in the counterparty’s mind. That is the most optimal algo.

Then again, how do you figure out what the prior distribution of numbers is? For that, I guess knowing some psychology helps.

 

Chat GPT and Information Content

For the last week or so I’ve been fooling around with Chat GPT. It’s been a lot of fun, if you can endure the verbosity of its answers. Yesterday, at work, I used it for tech support when some people I was talking to were using too much jargon.

Before I came home from work yesterday, my last conversation was with a couple of recruiters, about a role that I’m hiring for. So when I got home, on a whim, I wondered if I could check out of Chat GPT could write job descriptions for me.

I asked:

write a job description for a senior data scientist at India’s largest logistics company. 4-7 years exp. Needs to have experience in ML, OR and statistics, and be able to work close to business and on ambiguous problems

It answered (pasting in full since the screenshot above is limited):

Job Title: Senior Data Scientist

Company: India’s Largest Logistics Company

Location: India

Experience: 4-7 years

We are seeking an experienced and highly skilled Senior Data Scientist to join our team at India’s largest logistics company. In this role, you will be responsible for using machine learning, operations research, and statistics to drive business insights and decision making.

As a Senior Data Scientist, you will have a deep understanding of data science techniques and technologies, and will be able to apply them to complex and ambiguous problems. You will be able to work closely with business teams to understand their needs and to develop solutions that drive value for the company.

To be successful in this role, you should have at least 4-7 years of experience in data science, and should have a strong background in machine learning, operations research, and statistics. You should be able to work independently and be comfortable working on complex and ambiguous problems.

Responsibilities:

  • Use machine learning, operations research, and statistics to drive business insights and decision making
  • Work closely with business teams to understand their needs and develop solutions
  • Develop and maintain data pipelines and systems
  • Collaborate with other data scientists and cross-functional teams to develop and implement solutions

Qualifications:

  • Bachelor’s or Master’s degree in a quantitative field, such as computer science, statistics, or engineering
  • At least 4-7 years of experience in data science
  • Strong background in machine learning, operations research, and statistics
  • Ability to work independently and on complex and ambiguous problems
  • Excellent communication and collaboration skills
  • Experience with Python, R, or other programming languages is a plus

We offer a competitive salary and benefits package, as well as opportunities for growth and advancement. If you are an experienced data scientist with a passion for driving business insights and decision making, we encourage you to apply.

In the past when I’ve been hiring, I’ve been a bit annoyed by people asking for job descriptions. I’ve largely found them to be fluff and boilerplate, and I’ve tried to get away with a sentence or two. However, due to the insistence of recruiters and candidates alike, I’ve ended up writing elaborate job descriptions. I don’t need to any more.

This post, however, is not about hiring (but yeah – I am hiring, and the above job description is reasonably accurate). It’s about information content in general.

Traditionally, information content has been measured in bits. A bit is the amount of information contained in an equi-probable binary event.

Sometimes when we find that someone is too verbose and using too many words when fewer would suffice, we say that their bit rate is low. We also use “low bit rate” to describe people such as former Prime Minister Atal Behari Vajpayee, who would speak incredibly slowly.

However, beyond the bit, which is a fairly technical concept, it has been difficult to quantify information content. Sometimes you read an article or a story and find that there is nothing much to it. But given the natural language, and the context of various words, it is impossible to quantify the information content.

Now, with Chat GPT, maybe it becomes a bit easier (though one would need a “reverse chat GPT algo”, to find the set of prompts required for Chat GPT to churn out a particular essay). Above, for example, I’ve shown how much fluff there generally is to the average job description – a fairly short prompt generated this longish description that is fairly accurate.

So you can define the information content of a piece or essay in terms of the number of words in the minimum set of prompts required for Chat GPT (or something like it) to come up with it. If you are a boring stereotypical writer, the set of prompts required will be lower. If you are highly idiosyncratic, then you will need to give a larger number of prompts for Chat GPT to write like you. You know where I’m going.

This evening, in office, a colleague commented that now it will be rather easy to generate marketing material. “Even blogs might become dead, since with a few prompts you can get that content”, he said (it can be a legit service to build off the Chat GPT API to take a tweet and convert it into an essay).

I didn’t tell him then but I have decided to take it up as a challenge. I consider myself to be a fairly idiosyncratic writer, which means I THINK there is a fair bit of information content in what I write, and so this blog will stay relevant. Let’s see how it goes.

PS: I still want to train a GAN on my blog (well over a million words, at last count) and see how it goes. If you know of any tools I can use for this, let me know!

 

Mo Salah and Machine Learning

First of all, I’m damn happy that Mo Salah has renewed his Liverpool contract. With Sadio Mane also leaving, the attack was looking a bit thin (I was distinctly unhappy with the Jota-Mane-Diaz forward line we used in the Champions League final. Lacked cohesion). Nunez is still untested in terms of “leadership”, and without Salah that would’ve left Firmino as the only “attacking leader”.

(non-technical readers can skip the section in italics and still make sense of this post)

Now that this is out of the way, I’m interested in seeing one statistic (for which I’m pretty sure I don’t have the data). For each of the chances that Salah has created, I want to look at the xG (expected goals) and whether he scored or not. And then look at a density plot of xG for both categories (scored or not). 

For most players, this is likely to result in two very distinct curves – they are likely to score from a large % of high xG chances, and almost not score at all from low xG chances. For Salah, though, the two density curves are likely to be a lot closer.

What I’m saying is – most strikers score well from easy chances, and fail to score from difficult chances. Salah is not like that. On the one hand, he creates and scores some extraordinary goals out of nothing (low xG). On the other, he tends to miss a lot of seemingly easy chances (high xG).

In fact, it is quite possible to look at a player like Salah, see a few sitters that he has missed (he misses quite a few of them), and think he is a poor forward. And if you look at a small sample of data (or short periods of time) you are likely to come to the same conclusion. Look at the last 3-4 months of the 2021-22 season. The consensus among pundits then was that Salah had become poor (and on Reddit, you could see Liverpool fans arguing that we shouldn’t give him a lucrative contract extension since ‘he has lost it’).

It is well possible that this is exactly the conclusion Jose Mourinho came to back in 2013-14 when he managed Salah at Chelsea (and gave him very few opportunities). The thing with a player like Salah is that he is so unpredictable that it is very possible to see samples and think he is useless.

Of late, I’ve been doing (rather, supervising (and there is no pun intended) ) a lot of machine learning work. A lot of this has to do with binary classification – classifying something as either a 0 or a 1. Data scientists build models, which give out a probability score that the thing is a 1, and then use some (sometimes arbitrary) cutoff to determine whether the thing is a 0 or a 1.

There are a bunch of metrics in data science on how good a model is, and it all comes down to what the model predicted and what “really” happened. And I’ve seen data scientists work super hard to improve on these accuracy measures. What can be done to predict a little bit better? Why is this model only giving me 77% ROC-AUC when for the other problem I was able to get 90%?

The thing is – if the variable you are trying to predict is something like whether Salah will score from a particular chance, your accuracy metric will be really low indeed. Because he is fundamentally unpredictable. It is the same with some of the machine learning stuff – a lot of models are trying to predict something that is fundamentally unpredictable, so there is a limit on how accurate the model will get.

The problem is that you would have come across several problem statements that are much more predictable that you think it is a problem with you (or your model) that you can’t predict better. Pundits (or Jose) would have seen so many strikers who predictably score from good chances that they think Salah is not good.

The solution in these cases is to look at aggregates. Looking for each single prediction will not take us anywhere. Instead, can we predict over a large set of data whether we broadly got it right? In my “research” for this blogpost, I found this.

Last season, on average, Salah scored precisely as many goals as the model would’ve predicted! You might remember stunners like the one against Manchester City at Anfield. So you know where things got averaged out.

Conductors and CAPM

For a long time I used to wonder why orchestras have conductors. I possibly first noticed the presence of the conductor sometime in the 1990s when Zubin Mehta was in the news. And then I always wondered why this person, who didn’t play anything but stood there waving a stick, needed to exist. Couldn’t the orchestra coordinate itself like rockstars or practitioners of Indian music forms do?

And then i came across this video a year or two back.

And then the computer science training I’d gone through two decades back kicked in – the job of an orchestra conductor is to reduce an O(n^2) problem to an O(n) problem.

For a  group of musicians to make music, they need to coordinate with each other. Yes, they have the staff notation and all that, but still they need to know when to speed up or slow down, when to make what transitions, etc. They may have practiced together but the professional performance needs to be flawless. And so they need to constantly take cues from each other.

When you have n musicians who need to coordinate, you have \frac{n.(n-1)}{2} pairs of people who need to coordinate. When n is small, this is trivial, and so you see that small ensembles or rock bands can easily coordinate. However, as n gets large, n^2 grows well-at-a-faster-rate. And that is a problem, and a risk.

Enter the conductor. Rather than taking cues from one another, the musicians now simply need to take cues from this one person. And so there are now only n pairs that need to coordinate – each musician in the band with the conductor. Or an O(n^2) problem has become an O(n) problem!

For whatever reason, while I was thinking about this yesterday, I got reminded of legendary finance professor R Vaidya‘s class on capital asset pricing model (CAPM), or as he put it “Sharpe single index model” (surprisingly all the links I find for this are from Indian test prep sites, so not linking).

We had just learnt portfolio theory, and how using the expected returns, variances and correlations between a set of securities we could construct an “efficient frontier” of securities that could give us the best risk-adjusted return. Seemed very mathematically elegant, except that in case you needed to construct a portfolio of n stocks, you needed n^2 correlations. In other word, an O(n^2) problem.

And then Vaidya introduced CAPM, which magically reduced the problem to an O(n) problem. By suddenly introducing the concept of an index, all that mattered for each stock now was its beta – the coefficient of its returns proportional to the index returns. You didn’t need to care about how stocks reacted with each other any more – all you needed was the relationship with the index.

In a sense, if you think about it, the index in CAPM is like the conductor of an orchestra. If only all O(n^2) problems could be reduced to O(n) problems this elegantly!

Management and Verification

For those of you who are new here, my wife and I used to organise “NED Talks” in our home in Bangalore. The first edition happened in 2015 (organised on a whim), and encouraged by its success, we organised 10 more editions until 2019. We have put up snippets of some talks here.

In the second edition of the NED Talks (February 2015), we had a talk by V Vinay (noted computer scientist, former IISc professor, co-inventor of Simputer, co-founder of Strand Life Sciences, Ati Motors, etc. etc.), where he spoke about “computational complexity”.

Now, having studied computer science, “computational complexity” was not a new topic to me, but one thing that Vinay said has stayed with me – it is that verifying an algorithm is far more efficient than actually executing the algorithm.

To take a simple example, factorising a number into prime factors is NP Hard – in other words, it is a really hard problem. However, verifying the prime factorisation of a number is trivial – you can just multiply the factors and see if it gives back the number you started with.

I was thinking about this paradigm the ohter day when I was thinking about professional managers – several times in life I have wondered “how can this person manage this function when he/she has no experience in that function?”. Maybe it is because I had been subjected to two semesters of workshop in the beginning of my engineering, but I have intuitively assumed that you can only manage stuff that you have personally done – especially if it is a non-trivial / specialist role.

But then – if you think about it, at some level, management is basically about “verification”. To see whether you have done your work properly, I don’t need to precisely know how you have done it. All I need to know is whether you have done bullshit – which means, I don’t need to “replicate your algorithm”. I only need to “verify your algorithm”, which computer science tells us can be an order of magnitude simpler than actually building the algorithm.

The corollary of this is that if you have managed X, you need not be good at X, or actually even have done X. All it shows is that you know how to manage X, which can be an order of magnitude simple than actually doing X.

This also (rather belatedly) explains why I have largely been wary of hiring “pure managers” for my team. Unless they have been hands on at their work, I start wondering if they actually know how to do it, or only know how to manage it (and I’m rather hands on, and only hire hands on people).

And yet another corollary is that if you have spent too long just managing teams, you might have gotten so used to just verifying algorithms that you can’t write algorithms any more.

And yet another before I finish – computer science has a lot of lessons to offer life.

 

Compression Stereotypes

One of the most mindblowing things I learnt while I was doing my undergrad in Computer Science and Engineering was Lempel-Ziv-Welch (LZW) compression. It’s one of the standard compression algorithms used everywhere nowadays.

The reason I remember this is twofold – firstly, I remember implementing this as part of an assignment (our CSE program at IITM was full of those), and feeling happy to be coding in C rather than in the dreaded Java (which we had to use for most other assignments).

The other is that this is one of those algorithms that I “internalised” while doing something totally different – in this case I was having coffee/ tea with a classmate in our hostel mess.

I won’t go into the algorithm here. However, the basic concept is that as and when we see a new pattern, we give it a code, and every subsequent occurrence of that pattern is replaced by its corresponding code. And the beauty of it is that you don’t need to ship a separate dictionary -the compressed code itself encapsulates it.

Anyway, in practical terms, the more the same kind of patterns are repeated in the original file, the more the file can be compressed. In some sense, the more the repetition of patterns, the less the overall “information” that the original file can carry – but that discussion is for another day.

I’ve been thinking of compression in general and LZW compression in particular when I think of stereotyping. The whole idea of stereotyping is that we are fundamentally lazy, and want to “classify” or categorise or pigeon-hole people using the fewest number of bits necessary.

And so, we use lazy heuristics – gender, caste, race, degrees, employers, height, even names, etc. to make our assumptions of what people are going to be like. This is fundamentally lazy, but also effective – in a sense, we have evolved to stereotype people (and objects and animals) because that allows our brain to be efficient; to internalise more data by using fewer bits. And for this precise reason, to some extent, stereotyping is rational.

However, the problem with stereotypes is that they can frequently be wrong. We might see a name and assume something about a person, and they might turn out to be completely different. The rational response to this is not to beat oneself for stereotyping in the first place – it is to update one’s priors with the new information that one has learnt about this person.

So, you might have used a combination of pre-known features of a person to categorise him/her. The moment you realise that this categorisation is wrong, you ought to invest additional bits in your brain to classify this person so that the stereotype doesn’t remain any more.

The more idiosyncratic and interesting you are, the more the number of bits that will be required to describe you. You are very very different from any of the stereotypes that can possibly be used to describe you, and this means people will need to make that effort to try and understand you.

One of the downsides of being idiosyncratic, though, is that most people are lazy and won’t make the effort to use the additional bits required to know you, and so will grossly mischaracterise you using one of the standard stereotypes.

On yet another tangential note, getting to know someone is a Bayesian process. You make your first impressions of them based on whatever you find out about them, and go on building a picture of them incrementally based on the information you find out about them. It is like loading a picture on a website using a bad internet connection – first the picture appears grainy, and then the more idiosyncratic features can be seen.

The problem with refusing to use stereotypes, or demonising stereotypes, is that you fail to use the grainy pictures when that is the best available, and instead infinitely wait to get better pictures. On the other hand, failing to see beyond stereotypes means that you end up using grainy pictures when more clear ones are available.

And both of these approaches is suboptimal.

PS: I’ve sometimes wondered why I find it so hard to remember certain people’s faces. And I realise that it’s usually because they are highly idiosyncratic and not easy to stereotype / compress (both are the same thing). And so it takes more effort to remember them, and if I don’t really need to remember them so much, I just don’t bother.

How Python swallowed R

A week ago, I put a post on LinkedIn saying if someone else working in analytics / data science primarily uses R for their work, I would like to chat.

I got two responses, one of which was from a guy who strictly isn’t in analytics / data science, but needs to analyse large amounts of data for his work. I had a long chat with the other guy today.

Yesterday I put the same post on Twitter, and have got a few more responses from there. However, it is staggering. An overwhelming majority of data people who I know work in Python. One of the reasons I put these posts was to assure myself that I’m not alone in using R, though the response so far hasn’t given me too much of an assurance.

So why do most companies end up using Python for analytics, even when R is clearly better for things like data wrangling, reporting, visualisation, dashboarding, etc.? I have a few theories on this, and I think all of them come together to result in python having its “overwhelming marketshare” (at least among people I know).

Tech people clearly prefer python since it’s easier to integrate. So the tech leaders request the data science leaders to use Python, since it is much easier for the tech people. In a lot of organisations, data science reports into tech, so this request is honoured.

Even if it isn’t, if you recall, “data scientists” are generally tech facing rather than business facing. This means that the models they build need to be codified, and added to the company’s code base. This means necessarily working together with tech, and this means using a programming language that tech is comfortable with.

Then, this spills over. Usually, someone has the bright idea that the firm shouldn’t use two languages for what is essentially the same thing. And so the analytics people are also forced to use python for their analytics, even if it isn’t built for the purpose. And then it spreads.

Next is the “cool factor”. There is this impression that the more technical a solution is, the more superior it is, even if it has no direct business impact (an employer had once  told me, “I have raised money saying we are using machine learning. If our investors see the algorithms you’re proposing, they’ll want their money back”).

So a youngster getting into data wants to do “all the latest stuff”. This means machine learning. Deep learning. Reinforcement learning. And all that. There is an impression that this kind of work is “better work” compared to let’s say generating business insights using data. And in general, the packages for machine learning have traditionally been easier in Python than they are in R (though R is fast catching up, and in general python is far behind R when it comes to user friendliness).

Then, the growth in data and jobs associated with it such as machine learning or data engineering have meant that a lot of formerly tech people have got into data work. Python is fundamentally a programming language, with a package (pandas) added on to do data work. Techies find it far more intuitive than R, which is fundamentally a statistical software. On the other hand, people who are coming from a business / Excel background find it far more comfortable to use R. Python can be intimidating (I fall in this bucket).

So yeah – the tech integration, the number of tech people who are coming into data and the “cool factor” associated with the more techie stuff means that Python is gaining, at R’s expense (in my circle at least).

In any case I’m going to continue to use R. I’m at least 10X faster in R than I am in Python, and having used R for 12 years now, I’m too used to that way of working to change things up.

Jio, Amazon and Information Content

A long long time ago I had installed the Jio Cinema app on my Fire TV Stick. I had perhaps watched two movies on that, and then completely forgotten about it. This evening, I had to look for a movie to watch my the wife, and having exhausted most of the “compatible content” (stuff we can watch together on Netflix) and been exhausted by the user experience on Prime Video, I decided to give this app a try.

I ended up selecting a movie, which I later found out has a 4.5 IMDB rating and doesn’t even have a Wikepedia page. Needless to say, we abandoned the movie midway. That’s when the wife went in to put the daughter to bed and my fun began.

So Jio Cinema follows what I call the “Amazon paradigm for product management”. Since Amazon tries to sell every product (or service) as if it is a physical book, it has one single mantra for product management. “Improve selection and they will come”.

The user experience doesn’t matter. How easy the product is to use, and how pleasing it looks on the eye, and whether it has occasional bugs, is all secondary. All that matters is selection. Given that the company built its business on the back of selling “long tail” books, this is not so surprising, except that it doesn’t necessarily work in other categories.

I’ve written about Amazon’s ineptitude in product management before, in the context of that atrocity of an app called Sony Liv. The funny thing is that the Jio Cinema app (on Fire TV Stick) looks and feels pretty much exactly like Sony Liv. Maybe there is an open source shitty fire TV app that these guys have based themselves on?

In any case, I started browsing the Jio Cinema app, and I found something called “movies in 15 minutes“. Initially I thought it was a parody. The first few movies I noticed there were things I had never heard of. “This is perhaps for bad movies”, I reasoned. I kept scrolling, and more recognisable names popped up.

I decided to watch Deewana, which was released just before the start of my optimal age of movie appreciation, and which, for some reason, we didn’t get home a video cassette of.

It’s basically a collage of scenes from the movie. It’s like someone has put together a “highlights package”, taking all the important scenes and then putting them together.

And for a movie like Deewana it works. The 15 minute version had all the necessary plot elements to fully follow the movie. It is a great movie, for 15 minutes. Maybe at 30 minutes as well it might be a great movie. However, I can’t imagine having watched it in the full version.

That was two hours back. I’ve since gone crazy watching 15 minute versions of many other movies (mostly from the 70s and 80s, though they have movies as recent as Jab We Met). It’s been fantastic.

However, I have one crib. This has to do with information content. Essentially, the premise behind “movies in 15 minutes” is that the information content in these movies is so little that the whole thing can be compressed into 15 minutes.  The problem is that not every movie has the same amount of information.

15 minutes was perfect for Deewana. It was also appropriate for Kasam Paida Karne Waali Ki, which I watched only because it gets referenced in Gangs of Wasseypur. Between these two, I “watched” Namak Halaal, and I didn’t understand the head or tail of it. I had to go to Wikepedia to understand the plot.

Essentially the plot of Namak Halaal is complex enough, I imagine, that compressing it into 15 minutes is impossible without significant information loss. And the loss of information was so much that I couldn’t understand the summary at all. Maybe I’ll watch the movie in full some day.

I’m writing this blogpost after watching the 15 minute version of Don. I guess whoever made the summary realised that the movie is so complex that it can’t really be compressed into 15 minutes – and so they have added a voiceover to narrate the key elements.

In any case, I’m feeling super thrilled. I normally don’t watch movies because the bit rate in most movies is too low. Compression means that I can happily watch the movies without ever getting bored.

I wish they made these 15 minute versions of all movies. Jio, all (your Amazon-style product maangement) is forgiven.

Now on to Amar Akbar Anthony.