computer science – Page 3 – Pertinent Observations

The Tube Strike Model For The Pandemic

In 2002, as part of my undergrad in computer science, I took a course in “Artificial Intelligence”. It was a “restricted elective” – you had to either take that or another course called “Artificial Neural Networks”. That Neural Networks was then considered disjoint from AI will tell you how the field of computer science has changed in the 15 years since I graduated.

In any case, as part of our course on AI, we learnt heuristics. These were approximate algorithms to solve a problem – seldom did well in terms of worst case complexity but in most cases got the job done. Back then, the dominant discourse was that you had to tell a computer how to solve a problem, not just show it a large number of positive and negative examples and allow it to learn by itself (though that was the approach taken by the elective I did not elect for).

One such heuristic was Simulated Annealing. The problem with a classic “hill climbing” algorithm is that you can get caught in local optima. And the deterministic hill climbing algorithm doesn’t let you get off your local optima to search for better optima. Hence there are variants. In Simulated Annealing, in the early part of the algorithm you are allowed to take big steps down (assuming you are trying to find the peak). As the algorithm progresses, it “cools down” (hence simulated annealing) and the extent to which you are allowed to climb down is massively reduced.

It is not just in algorithms, or in the case of AI, do we get stuck in local optima. In a recent post, I had made a passing reference to a paper about the tube strikes of 2014.

It is clearly visible from the two panels that far fewer commuters were able to use their modal station during the strike, which implies that a substantial number of individuals were forced to explore alternative routes. The data also suggest that the strike brought about some lasting changes in behaviour, as the fraction of commuters that made use of their modal station seemingly drops after the strike (in the paper we substantiate this claim econometrically).

Screw the paper if you don’t want to read it. Basically the concept is that the strike of 2014 shook things up. People were forced to explore alternatives. And some alternatives stuck. In other words, a lot of people had got stuck in local maxima. And when an external event (the strike) pushed them off their local pedestals (figuratively speaking), they were able to find better maxima.

And that was only the result of a three-day strike. Now, the pandemic has gone on for 5-6 months now (depending on the part of world you are in). During this time, a lot of behaviour otherwise considered normal have been questioned by people behaving thus. My theory is that a lot of these hitherto “normal behaviours” were essentially local optima. And with the pandemic forcing people to rethink their behaviours, they will find better optima.

I can think of a few examples from my own life.

I wrote about this the other day. I had gotten used to a schedule of heavy weight lifting for my workouts. I had plateaued in all my lifts, and this meant that my upper body had plateaued at a rather suboptimal level. However much I tried to improve my bench press and shoulder press (using only these movements) the bar refused to budge. And my shoulders refused to get bigger. I couldn’t do a (palms facing away) pull up.
Thanks to the pandemic, the gym shut, and I was forced to do body weight exercises at home. There was a limit on how much I could load my legs and back, so I focussed more on my upper body, especially doing different progressions of the pushup. And back in the gym today, I discovered I could easily do pullups now.

Similarly, the progression of body weight squats I knew forced me to learn to squat deep (hamstrings touching calves). Today for the first time ever I did deep front squats. This means in a few months I can learn to clean.
I was used to eating Milky Mist set curd (the one that comes in a 1kg box). It was nice and creamy and I loved eating it. It isn’t widely available and there was one supermarket close to home from where I could get it. As soon as the lockdown happened that supermarket shut. Even when it opened it had long lines, and there were physical barricades between my house and that so I couldn’t drive to it.
In the meantime I figured that the guy who delivers milk to my door in the morning could deliver (Nandini) curd as well. And I started buying from him. Well, it’s not as creamy as Milky Mist, but it’s good enough. And I’m not going back.
This was a see-saw. For the first month of the lockdown most bakeries nearby were shut. So I started trying out bread at this supermarket close to home (not where I got Milky Mist from). I loved it. Presently, bakeries reopened and the density of cases in Bangalore meant I became wary of going to supermarkets. So now we’ve shifted back to freshly baked bread from the local bakery
I’d tried intermittent fasting several times in life but had never been able to do it on a consistent basis. In the initial part of the lockdown good bread was hard to come by (since the bakeries shut and I hadn’t discovered the supermarket bread yet). There had been a bird flu scare near Bangalore so we weren’t buying eggs either. What do we do for breakfast? Just skip it. Now i have no problem not having breakfast at all

The list goes on. And I’m sure this applies to you as well. Think of all the behavioural changes that the pandemic has forced on you, and think of which all you will go back on once it has passed. There is likely to be a set of behavioural changes that won’t change back.

Like how one in 20 passengers who changed routes following the 2014 tube strikes never went back to their earlier routes. Except that this time it is a 6-month disruption.

What this means is that even when the pandemic is past us, the economy will not look like the economy that was before the pandemic hit us. There will be winners and losers. And since it will take time and effort for people doing “loser jobs” to retrain themselves (if possible) to do “winner jobs”, the economic downturn will be even longer.

I’m calling it the “tube strike mental model” for behavioural change during the pandemic.

Half-watching movies, and why I hate tweetstorms

It has to do with “bit rate”

I don’t like tweetstorm. Up to six tweets is fine, but beyond that I find it incredibly difficult to hold my attention for. I actually find it stressful. So of late, I’ve been making a conscious effort to stop reading tweetstorms when they start stressing me out. The stress isn’t worth any value that the tweetstorms may have.

The tweetstorm is an emerging art-form but in most cases it is rather unreadable. A slightly longer form such as a blog or an essay is so much better! https://t.co/OyLcSV7C01

— Karthik (@karthiks) April 18, 2020

I remember making the claim on twitter that I refuse to read any more tweetstorms of more than six tweets henceforth. I’m not able to find that tweet now.

Anyways…

Why do I hate tweetstorms? It is for the same reason that I like to “half-watch” movies, something that endlessly irritates my wife. I has to do with “bit rates“.

I use the phrase “bit rate” to refer to the rate of flow of information (remember that bit is a measure of information).

The thing with movies is that some of them have very low bit rate. More importantly, movies have vastly varying bit rates through their lengths. There are some parts in a movie where pretty much nothing happens, and a lot of it is rather predictable. There are other parts where lots happens.

This means that in the course of a movie you find yourself engrossed in some periods and bored in others, and that can be rather irritating. And boredom in the parts where nothing is happening sometimes leads me to want to turn off the movie.

So I deal with this by “half watching”, essentially multi tasking while watching. Usually this means reading, or being on twitter, while watching a movie. This usually works beautifully. When the bit rate from the movie is high, I focus. When it is low, I take my mind off and indulge in the other thing that I’m doing.

It is not just movies that I “half-watch” – a lot of sport also gets the same treatment. Like right now I’m “watching” Watford-Southampton as I’m writing this.

A few years back, my wife expressed disapproval of my half-watching. By also keeping a book or computer, I wasn’t “involved enough” in the movie, she started saying, and that half-watching meant we “weren’t really watching the movie together”. And she started demanding full attention from me when we watched movies together.

The main consequence of this is that I started watching fewer movies. Given that I can rather easily second-guess movie plots, I started finding watching highly predictable stuff rather boring. In any case, I’ve recently received permission to half-watch again, and have watched two movies in the last 24 hours (neither of which I would have been able to sit through had I paid full attention – they had low bit rates).

So what’s the problem with tweetstorms? The problem is that their bit rate is rather high. With “normal paragraph writing” we have come to expect a certain degree of redundancy. This allows us to skim through stuff while getting information from them at the same time. The redundancy means that as long as we get some key words or phrases, we can fill in the rest of the stuff, and reading is rather pleasant.

The thing with a tweetstorm is that each sentence (tweet, basically) has a lot of information packed into it. So skimming is not an option. And the information hitting your head at the rate that tweetstorms generally convey can result in a lot of stress.

The other thing with tweetstorms, of course, is that each tweet is disjoint from the one before and after it. So there is no flow to the reading, and the mind has to expend extra energy to process what’s happening. Combine this with a rather high bit rate, and you know why I can’t stand them.

Cookbooks, Code and College

Why Is This Interesting, a fascinating daily newsletter I subscribe to, has this edition on code and cookbooks. The basic crib here is that most coding books teach you to code as if you were trying to become a professional coder, rather than trying to teach you to code as an additional life skill.

This, the author Noah Brier remarks, is quite unlike how most cookbooks teach you to cook, where there is absolutely no pretence of trying to turn you into a professional cook. Cookbooks know that most people who want to learn to cook simply want to cook for themselves or their families, so professional level learning is not required. This, however, is not the case with books on coding.

In fact, this pretty much explains why I completely fell out of love with coding during my undergrad in Computer Science. I remember being rather excited in 2000, when I got an entrance exam score good enough to get admitted to the Computer Science program at IIT Madras. I had learnt to code only two years before, but I’d taken on to it rather well, and had quickly built a reputation of being one of the better coders in school.

And then the four year program in Computer Science sucked out all the love I had for coding. This cooking-code post reminded me why – basically most professors in my department assumed that all of us wanted to be academics and taught us that way. This wasn’t an unfair assumption, since 17 of the 22 of us who graduated in 2004 either immediately or in a couple of years went to grad school.

However, the approach of teaching that assumed that you would be an expert or an academic meant a paradigm that made it incredibly hard to learn unless you were insanely motivated.

For example, the fourth year B.Tech. project was almost always supposed to be a “work of research” that would turn into a paper (or dozen). There was a lot of theory all round (I didn’t mind parts of it, like some bits of algorithm analysis, but most of it was boring). The course was heavy in terms of assignments, which you can argue was a practical concept, but the way the assignments were done by most people meant that the bar was rather academic.

And that meant that someone like me, who didn’t want to be “an engineer” to begin with, but had entered with a love for coding, quickly fell out of love with the field itself. In hindsight, given the way I was taught, I’m not surprised that my first option upon exit was to go to business school, and it would be at least five years later that I would begin to appreciate that I had an aptitude for code.

(Interestingly, business school was different. Nobody assumed anybody would become an academic, so the teaching was far more palatable.)

Java and IIT Madras

At the end of my B.Tech. in Computer Science and Engineering from IIT Madras, I was very clear about one thing – I didn’t want to be an engineer. I didn’t want to pursue a career in Computer Science, either. This was after entering IIT with a reputation of being a “stud programmer”, and being cocky and telling people that my hobby was “programming”.

I must have written about this enough times on this blog that I can’t be bothered about finding links, but my Computer Science degree at IIT Madras made me hate programming. I didn’t mind (some of) the maths, but it was the actual coding bit that I actively came to hate. And when an internship told me that research wasn’t something I was going to be good at, fleeing the field was an obvious decision and I quickly went to business school.

Thinking back about it, I think my problem is that I give up when faced with steep learning curves. I like systems that are easy and intuitive to use, and have a great user experience. The “geeky” products that are difficult to use and geeks take pride in, I have no patience for. I remember learning to code macros in Microsoft Excel in my first post B-school job in 2006 being the time when I started falling in love with Computer Science once again.

The big problem with CSE in IIT Madras was that they made you code. A lot. Which you might think is totally normal for a program in computer science. Except that all the professors there were perhaps like me, and wanted systems that were easy to use, which means that just about anything we needed to build, we needed to build a user interface for. And in 2002, that meant coding in Java, and producing those ugly applets which were interactive but anything but easy to use.

The amount of Java coding I did in those four years is not funny. And Java is a difficult language to code – it’s incredibly verbose and complicated (especially compared to something like Python, for example), and impossible to code without a book or a dictionary of APIs handy. And because it’s so verbose, it’s buggy. And you find it difficult to make things work. And even when you make it work, the UI that it produces is incredibly ugly.

So it amused me to come across this piece of news that my old department has “developed a new framework that could make the programs written in JAVA language more efficient“. I don’t know who uses Java any more (I thought the language of choice among computer scientists nowadays is Python. While it’s infinitely easier than Java, it again produces really ugly graphics), but it’s interesting that people in my old department are still at it. And even going about making things more efficient!

Also, you might find the article itself (this is on the IIT alumni website) amusing. Go ahead and give it a read.

To solve this problem, V Krishna and Manas Thakur tweaked the two compilation procedures. In the first compilation step, more elaborate and time-consuming analysis is performed and wherever the conversion stalls due to unavailability of the library from the computer, a partial result is created. Now, during the second stage of compilation, the just in-time compilers, with available libraries from the computer, work to resolve the partial values to generate final values and finally a more precise result. As the time taken during the first exhaustive compilation does not get included in execution time, the whole procedure still remains time-saving, while leading to highly efficient codes

Super Deluxe

In my four years in Madras (2000-4), I learnt just about enough Tamil to watch a Tamil movie with subtitles. Without subtitles is still a bit of a stretch for me, but the fact that streaming sites offer all movies with subtitles means I can watch Tamil movies now.

At the end, I didn’t like Super Deluxe. I thought it was an incredibly weird movie. The last half an hour was beyond bizarre. Rather, the entire movie is weird (which is good in a way we’ll come to in a bit), but there is a point where there is a step-change in the weirdness.

The wife had watched the movie some 2-3 weeks back, and I was watching it on Friday night. Around the time when she finished the movie she was watching and was going to bed, she peered into my laptop and said “it’s going to get super weird now”. “As if it isn’t weird enough already”, I replied. In hindsight, she was right. She had peered into my laptop right at the moment when the weirdness goes to yet another level.

It’s not often that I watch movies, since most movies simply fail to hold my attention. The problem is that most plots are rather predictable, and it is rather easy to second-guess what happens in each scene. It is the information theoretic concept of “surprise”.

Surprise is maximised when the least probable thing happens at every point in time. And when the least probable thing doesn’t happen, there isn’t a story, so filmmakers overindex on surprises and making sure the less probable thing will happen. So if you indulge in a small bit of second order thinking, the surprises aren’t surprising any more, and the movie becomes boring.

Super Deluxe establishes pretty early on that the plot is going to be rather weird. And when you think the scene has been set with sufficient weirdness in each story (there are four intertwined stories in the movie, as per modern fashion), the next time the movie comes back to the story, the story is shown to get weirder. And so you begin to expect weirdness. And this, in a way, makes the movie less predictable.

The reason a weird movie is less predictable is that at each scene it is simply impossible for the view to even think of the possibilities. And in a movie that gets progressively weirder like this one, every time you think you have listed out the possibilities and predicted what happens, what follows is something from outside your “consideration set”. And that keeps you engaged, and wanting to see what happens.

The problem with a progressively weird movie is that at some point it needs to end. And it needs to end in a coherent way. Well, it is possible sometimes to leave the viewer hanging, but some filmmakers see the need to provide a coherent ending.

And so what usually happens is that at some point in time the plot gets so remarkably simplified that everything suddenly falls in place (though nowhere as beautifully as things fall in place at the end of a Wodehouse novel). Another thing that can happen is that weirdness it taken up a notch, so that things fall in place at a “meta level”, at which point the movie can end.

The thing with Super Deluxe is that both these things happen! On one side the weirdness is taken up several notches. And on the other the plots get so oversimplified that things just fall in place. And that makes you finish the movie with a rather bitter taste in the mouth, feeling thoroughly unsatisfied.

That the “ending” of the movie (where things get really weird AND really simplified) lasts half an hour doesn’t help matters.

Good vodka and bad chicken

When I studied Artificial Intelligence, back in 2002, neural networks weren’t a thing. The limited compute capacity and storage available at that point in time meant that most artificial intelligence consisted of what is called “rule based methods”.

And as part of the course we learnt about machine translation, and the difficulty of getting the implicit meaning across. The favourite example by computer scientists in that time was the story of how some scientists translated “the spirit is willing but the flesh is weak” into Russian using an English-Russian translation software, and then converted it back into English using a Russian-English translation software.

The result was “the vodka is excellent but the chicken is not good”.

While this joke may not be valid any more thanks to the advances in machine translation, aided by big data and neural networks, the issue of translation is useful in other contexts.

Firstly, speaking in a language that is not your “technical first language” makes you eschew jargon. If you have been struggling to get rid of jargon from your professional vocabulary, one way to get around it is to speak more in your native language (which, if you’re Indian, is unlikely to be your technical first language). Devoid of the idioms and acronyms that you normally fill your official conversation with, you are forced to think, and this practice of talking technical stuff in a non-usual language will help you cut your jargon.

There is another use case for using non-standard languages – dealing with extremely verbose prose. A number of commentators, a large number of whom are rather well-reputed, have this habit of filling their columns with flowery language, GRE words, repetition and rhetoric. While there is usually some useful content in these columns, it gets lost in the language and idioms and other things that would make the columnist’s high school English teacher happy.

I suggest that these columns be given the spirit-flesh treatment. Translate them into a non-English language, get rid of redundancies in sentences and then translate them back into English. This process, if the translators are good at producing simple language, will remove the bluster and make the column much more readable.

Speaking in a non-standard language can also make you get out of your comfort zone and think harder. Earlier this week, I spent two hours recording a podcast in Hindi on cricket analytics. My Hindi is so bad that I usually think in Kannada or English and then translate the sentence “live” in my head. And as you can hear, I sometimes struggle for words. Anyway here is the thing. Listen to this if you can bear to hear my Hindi for over an hour.

AlphaZero Revisited

It’s been over a year since Google’s DeepMind first made its splash with the reinforcement-learning based chess playing engine AlphaZero. The first anniversary of the story of AlphaZero being released also coincided with the publication of the peer-reviewed paper.

To go with the peer-reviewed paper, DeepMind has released a further 200 games played between AlphaZero and the conventional chess engine StockFish, which is again heavily loaded in favour of wins for AlphaZero, but also contains 6 game where AlphaZero lost. I’ve been following these games on GM Daniel King’s excellent Powerplaychess channel, and want to revise my opinion on AlphaZero.

Back then, I had looked at AlphaZero’s play from my favourite studs and fighter framework, which in hindsight doesn’t do full justice to AlphaZero. From the games that I’ve seen from the set released this season, AlphaZero’s play hasn’t exactly been “stud”. It’s just that it’s much more “human”. And the reason why AlphaZero’s play possibly seems more human is because of the way it “learns”.

Conventional chess engines evaluate a position by considering all possible paths (ok not really, they use an intelligent method called Alpha-Beta Pruning to limit their search size), and then play the move that leads to the best position at the end of the search. These engines use “pre-learnt human concepts” such as point count for different pieces, which are used to evaluate positions. And this leads to a certain kind of play.

AlphaZero’s learning, process, however, involves playing zillions of games against itself (since I wrote that previous post, I’ve come back up to speed with reinforcement learning). And then based on the results of these games, it evaluates positions it reached in the course of play (in hindsight). On top of this, it builds a deep learning model to identify the goodness of positions.

Given my limited knowledge of how deep learning works, this process involves AlphaZero learning about “features” of games that have more often than not enabled it to win. So somewhere in the network there will be a node that represents “control of centre”. Another node deep in the network might represent “safety of king”. Yet another might perhaps involve “open A file”.

Of course, none of these features have been pre-specified to AlphaZero. It has simply learnt it by training its neural network on zillions of games it has played against itself. And while deep learning is hard to “explain”, it is likely to have so happened that the features of the game that AlphaZero has learnt are remarkably similar to the “features” of the game that human players have learnt over the centuries. And it is because of the commonality in these features that we find AlphaZero’s play so “human”.

Another way to look at is from the concept of “10000 hours” that Malcolm Gladwell spoke about in his book Outliers. As I had written in my review of the book, the concept of 10000 hours can be thought of as “putting fight until you get enough intuition to become stud”. AlphaZero, thanks to its large number of processors, has effectively spent much more than “10000 hours” playing against itself, with its neural network constantly “learning” from the positions faced and the outcomes of the game reached. And this way, it has “gained intuition” over features of the game that lead to wins, giving it an air of “studness”.

The interesting thing to me about AlphaZero’s play is that thanks to its “independent development” (in a way like the Finches of Galapagos), it has not been burdened by human intuition on what is good or bad, and learnt its own heuristics. And along the way, it has come up with a bunch of heuristics that have not commonly be used by human players.

Keeping bishops on the back rank (once the rooks have been connected), for example. A stronger preference for bishops to knights than humans. Suddenly simplifying from a terrifying-looking attack into a winning endgame (machines are generally good at endgames, so this is not that surprising). Temporary pawn and piece sacrifices. And all that.

Thanks to engines such as LeelaZero, we can soon see the results of these learnings being applied to human chess as well. And human chess can only become better!

Hypothesis Testing in Monte Carlo

I find it incredible, and not in a good way, that I took fourteen years to make the connection between two concepts I learnt barely a year apart.

In August-September 2003, I was auditing an advanced (graduate) course on Advanced Algorithms, where we learnt about randomised algorithms (I soon stopped auditing since the maths got heavy). And one important class of randomised algorithms is what is known as “Monte Carlo Algorithms”. Not to be confused with Monte Carlo Simulations, these are randomised algorithms that give a one way result. So, using the most prominent example of such an algorithm, you can ask “is this number prime?” and the answer to that can be either “maybe” or “no”.

The randomised algorithm can never conclusively answer “yes” to the primality question. If the algorithm can find a prime factor of the number, it answers “no” (this is conclusive). Otherwise it returns “maybe”. So the way you “conclude” that a number is prime is by running the test a large number of times. Each run reduces the probability that it is a “no” (since they’re all independent evaluations of “maybe”), and when the probability of “no” is low enough, you “think” it’s a “yes”. You might like this old post of mine regarding Monte Carlo algorithms in the context of romantic relationships.

Less than a year later, in July 2004, as part of a basic course in statistics, I learnt about hypothesis testing. Now (I’m kicking myself for failing to see the similarity then), the main principle of hypothesis testing is that you can never “accept a hypothesis”. You either reject a hypothesis or “fail to reject” it. And if you fail to reject a hypothesis with a certain high probability (basically with more data, which implies more independent evaluations that don’t say “reject”), you will start thinking about “accept”.

Basically hypothesis testing is a one-sided test, where you are trying to reject a hypothesis. And not being able to reject a hypothesis doesn’t mean we necessarily accept it – there is still the chance of going wrong if we were to accept it (this is where we get into messy territory such as p-values). And this is exactly like Monte Carlo algorithms – one-sided algorithms where we can only conclusively take a decision one way.

So I was thinking of these concepts when I came across this headline in ESPNCricinfo yesterday that said “Rahul Johri not found guilty” (not linking since Cricinfo has since changed the headline). The choice, or rather ordering, of words was interesting. “Not found guilty”, it said, rather than the usual “found not guilty”.

This is again a concept of one-sided testing. An investigation can either find someone guilty or it fails to do so, and the heading in this case suggested that the latter had happened. And as a deliberate choice, it became apparent why the headline was constructed this way – later it emerged that the decision to clear Rahul Johri of sexual harassment charges was a contentious one.

In most cases, when someone is “found not guilty” following an investigation, it usually suggests that the evidence on hand was enough to say that the chance of the person being guilty was rather low. The phrase “not found guilty”, on the other hand, says that one test failed to reject the hypothesis, but it didn’t have sufficient confidence to clear the person of guilt.

So due credit to the Cricinfo copywriters, and due debit to the product managers for later changing the headline rather than putting a fresh follow-up piece.

PS: The discussion following my tweet on the topic threw up one very interesting insight – such as Scotland having had a “not proven” verdict in the past for such cases (you can trust DD for coming up with such gems).

Human, Animal and Machine Intelligence

Earlier this week I started watching this series on Netflix called “Terrorism Close Calls“. Each episode is about an instance of attempted terrorism that has been foiled in the last 2 decades. For example, there is one example of the plot to bomb a set of transatlantic flights from London to North America in 2006 (a consequence of which is that liquids still aren’t allowed on board flights).

So the first episode of the series involves this Afghani guy who drives all the way from Colorado to New York to place a series of bombs in the latter’s subways (metro train system). He is under surveillance through the length of his journey, and just as he is about to enter New York, he is stopped for what seems like a “routine drugs test”.

As the episode explains, “a set of dogs went around his car sniffing”, but “rather than being trained to sniff drugs” (as is routine in such a stop), “these dogs had been trained to sniff explosives”.

This little snippet got me thinking about how machines are “trained” to “learn”. At the most basic level, machine learning involves showing a large number of “positive cases” and “negative cases” based on which the program “learns” the differences between the positive and negative cases, and thus to identify the positive cases.

So if you want to built a system to identify cats in an image, you feed the machine a large number of images with cats in them, and a large(r) number of images without cats in them, each appropriately “labelled” (“cat” or “no cat”) and based on the differences, the system learns to identify cats.

Similarly, if you want to teach a system to detect cancers based on MRIs, you show it a set of MRIs that show malignant tumours, and another set of MRIs without malignant tumours, and sure enough the machine learns to distinguish between the two sets (you might have come across claims of “AI can cure cancer”. This is how it does it).

However, AI can sometimes go wrong by learning the wrong things. For example, an algorithm trained to recognise sheep started classifying grass as “sheep” (since most of the positive training samples had sheep in meadows). Another system went crazy in its labelling when an unexpected object (an elephant in a drawing room) was present in the picture.

While machines learn through lots of positive and negative examples, that is not how humans learn, as I’ve been observing as my daughter grows up. When she was very little, we got her a book with one photo each of 100 different animals. And we would sit with her every day pointing at each picture and telling her what each was.

Soon enough, she could recognise cats and dogs and elephants and tigers. All by means of being “trained on” one image of each such animal. Soon enough, she could recognise hitherto unseen pictures of cats and dogs (and elephants and tigers). And then recognise dogs (as dogs) as they passed her on the street. What absolutely astounded me was that she managed to correctly recognise a cartoon cat, when all she had seen thus far were “real cats”.

So where do animals stand, in this spectrum of human to machine learning? Do they recognise from positive examples only (like humans do)? Or do they learn from a combination of positive and negative examples (like machines)? One thing that limits the positive-only learning for animals is the limited range of their communication.

What drives my curiosity is that they get trained for specific things – that you have dogs to identify drugs and dogs to identify explosives. You don’t usually have dogs that can recognise both (specialisation is for insects, as they say – or maybe it’s for all non-human animals).

My suspicion (having never had a pet) is that the way animals learn is closer to how humans learn – based on a large number of positive examples, rather than as the difference between positive and negative examples. Just that the limit of the animal’s communication being limited means that it is hard to train them for more than one thing (or maybe there’s something to do with their mental bandwidth as well. I don’t know).

What do you think? Interestingly enough, there is a recent paper that talks about how many machine learning systems have “animal-like abilities” rather than coming close to human intelligence.

For millions of years, mankind lived, just like the animals.
And then something happened that unleashed the power of our imagination. We learned to talk
– Stephen Hawking, in the opening of a Roger Waters-less Pink Floyd’s Keep Talking

Programming Languages

I take this opportunity to apologise for my prior belief that all that matters is thinking algorithmically, and language in which the ideas are expressed doesn’t matter.

About a decade ago, I used to make fun of information technology company that hired developers based on the language they coded in. My contention was that writing code is a skill that you either have or you don’t, and what a potential employer needs to look for is the ability to think algorithmically, and then render ideas in code.

While I’ve never worked as a software engineer I find myself writing more and more code over the years as a part of doing data analysis. The primary tool I use is R, where coding doesn’t really feel like coding, since it is a rather high level language. However, I’m occasionally asked to show code in Python, since some clients are more proficient in that, and the one thing that has done is to teach me the value of domain knowledge of a programming language.

I take this opportunity to apologise for my prior belief that all that matters is thinking algorithmically, and language in which the ideas are expressed doesn’t matter.

This is because the language you usually program in subtly nudges you towards thinking in a particular way. Having mostly used R over the last decade, I think in terms of tables and data frames, and after having learnt tidyverse earlier this year, my way of thinking algorithmically has become in a weird way “object oriented” (no, this has nothing to do with classes). I take an “object” (a data frame) and then manipulate it in various ways, changing it, summarising stuff, calculating things on the fly and aggregating, until the point where the result comes out in an elegant manner.

And while Pandas allows chaining (in fact, it is from Pandas that I suspect the tidyverse guys got the idea for the “%>%” chaining operator), it is by no means as complete in its treatment of chaining as R, and that that makes things tricky.

Moreover, being proficient in R makes you think in terms of vectorised operations, and when you see that python doesn’t necessarily offer that, and and operations that were once simple in R are now rather complicated in Python, using list comprehension and what not.

Putting it another way, thinking algorithmically in the framework offered by one programming language makes it rather stressful to express these thoughts in another language where the way of algorithmic thinking is rather different.

For example, I’ve never got the point of the index in pandas dataframes, and I only find myself “resetting” it constantly so that my way of addressing isn’t mangled. Compared to the intuitive syntax in R, which is first and foremost a data analysis tool, and where the data frame is “native”, the programming language approach of python with its locs and ilocs is again irritating.

I can go on…

And I’m guessing this feeling is mutual – someone used to doing things the python way would find R’s syntax and way of doing things rather irritating. R’s machine learning toolkit for example is nowhere as easy as scikit learn is in python (this doesn’t affect me since I seldom need to use machine learning. For example, I use regression less than 5% of the time in my work).

The next time I see a job opening for a “java developer” I will not laugh like I used to ten years ago. I know that this posting is looking for a developer who can not only think algorithmically, but also algorithmically in the way that is most convenient to express in Java. And unlearning one way of algorithmic thinking and learning another isn’t particularly easy.