Shrinking deadlines

I’m reminded of this old joke/riddle, which also happened to feature in Gowri Ganesha. “If a 1 metre long sari takes 1 hour to dry in the sun, how long will and 8 metre long sari take to dry?”.

The instinctive answer, of course, is 8 hours, while if you think about it (and assume that you have enough clothesline space to not need to fold), the correct answer is likely to be 1 hour.

Now this riddle is completely unconnected to do with the point of the post, except that both have to do with time.

And then one day you find, ten years have got behind you.
No one told you when to run. You missed the starting gun. 

Ok enough distractions. I’m now home, home again.

Modern workspaces are synonymous with tight deadlines. Even when you give a conservative estimate on how long something will take, you get asked to compress the timelines further. If you protest too much and say that there is a lot to be done, sometimes you might get asked to “put one more person on the job and get it done quickly”.

This might work for routine, or “fighter” jobs – for example, if your job is to enter and copy data for (let’s say) 1000 records, you can easily put another person on the job, and the entire job will be done in about half the time (allowing for a little time for the new person to learn the job and for coordination).

As the job gets more complex, the harder it gets. At one level, there is more time to be spent by the new person coming into the job. Then, as the job gets more complex, it gets harder to divide and conquer, or to “specialise”. This means there is lesser impact to the new person coming in.

And then when you get closer and closer to the stud end of the spectrum, the advantage of putting more people to get the work done faster get lesser and lesser. There comes a point when the extra person actively becomes a liability. Again – I’m reminded of my childhood when occasionally I would ask my mother if she needed help in cooking. “Yes, the best way for you to help is for you to stay out of the kitchen”, she would say.

And then when the job gets really creative, there is a further limit on compression – a lot of the work is done “offline”. I keep telling people about how I finally discovered the proof of Ramsey’s numbers (3,3) while playing table tennis in my hostel, or how I had solved a tough assignment problem while taking a friend’s new motorcycle for a ride.

When you want to solve problems “offline” (to let the insight come to you rather than going hunting for it – I had once written about this) – there is no way to shorten the process. You need to let the problem stew in your head, and hope that some time it will get solved.

There is nothing that can be done here. The more you hurry up, the less the chances you give yourself of solving the problem. Everything needs to take its natural course.

I got reminded of it when we missed a deadline last Friday, and I decided to not think about it through the weekend. And then, an hour before I got to work on Monday, an idea occurred in the shower which fixed the problem. Even if I’d stressed myself (and my team) out on Friday, or done somersaults, the problem would not have been solved.

As I’d said in 2004, quality takes time.

Intelligent and Diligent

For whatever reason, when I was a schoolboy and first learnt of the word “diligent”, I assumed that it should be the opposite on intelligent. “Only people who are not intelligent need to be diligent”, the young I had reasoned.

And nearly thirty years later, I came across this stellar 2×2 on intelligence and diligence. I’ve read it in many places now, but will link to the version on farnam street blog. I’m copying this quote from the blog, which is apparently credited to two different military officers.

I divide my officers into four groups. There are clever, diligent, stupid, and lazy officers. Usually two characteristics are combined. Some are clever and diligent — their place is the General Staff. The next lot are stupid and lazy — they make up 90 percent of every army and are suited to routine duties. Anyone who is both clever and lazy is qualified for the highest leadership duties, because he possesses the intellectual clarity and the composure necessary for difficult decisions. One must beware of anyone who is stupid and diligent — he must not be entrusted with any responsibility because he will always cause only mischief.

Maybe I was up to something interesting back in the 1990s, even if it was rather self-serving. And maybe it is this concept I reprised in the late 2000s when I came up with “studs and fighters“. It was possibly my irritation with the “stupid and diligent” variety.

Now I’m thinking of this “stupid and diligent” 2×2 in terms of our schooling and education. Maybe there is this general feeling among parents, teachers and suchlike that intelligence is something you are “born with”, and you cannot become intelligent.

So the moment they spot a kid who is stupid and lazy, they decide that the best way to “improve” this kid is to make him/her more diligent, rather than more intelligent. In the short run this might work, since the kid is now able to do better in the school exams (which is what most teachers are optimising for). The long run effect, though, is that the kid, instead of ending up in the numerous but harmless “general staff” (stupid and lazy), ends up in the seemingly more competent but actually “dangerous, and only causing mischief” stupid and diligent quadrant.

In other words, our general schooling makes our adult population much more dangerous!

Speed, Accuracy and Shannon’s Channel Coding Theorem

I was probably the CAT topper in my year (2004) (they don’t give out ranks, only percentiles (to two digits of precision), so this is a stochastic measure). I was also perhaps the only (or one of the very few) person to get into IIMs that year despite getting 20 questions wrong.

It had just happened that I had attempted far more questions than most other people. And so even though my accuracy was rather poor, my speed more than made up for it, and I ended up doing rather well.

I remember this time during my CAT prep, where the guy who was leading my CAT factory once suggested that I was making too many errors so I should possibly slow down and make fewer mistakes. I did that in a few mock exams. I ended up attempting far fewer questions. My accuracy (measured as % of answers I got wrong) didn’t change by much. So it was an easy decision to forget above accuracy and focus on speed and that served me well.

However, what serves you well in an entrance exam need not necessarily serve you well in life. An exam is, by definition, an artificial space. It is usually bounded by certain norms (of the format). And so, you can make blanket decisions such as “let me just go for speed”, and you can get away with it. In a way, an exam is a predictable space. It is a caricature of the world. So your learnings from there don’t extend to life.

In real life, you can’t “get away with 20 wrong answers”. If you have done something wrong, you are (most likely) expected to correct it. Which means, in real life, if you are inaccurate in your work, you will end up making further iterations.

Observing myself, and people around me (literally and figuratively at work), I sometimes wonder if there is a sort of efficient frontier in terms of speed and accuracy. For a given level of speed and accuracy, can we determine an “ideal gradient” – on which way a person needs to move in order to make the maximum impact?

Once in a while, I take book recommendations from academics, and end up reading (rather, trying to read) academic books. Recently, someone had recommended a book that combined information theory and machine learning, and I started reading it. Needless to say, within half a chapter, I was lost, and I had abandoned the book. Yet, the little I read performed the useful purpose of reminding me of Shannon’s channel coding theorem.

Paraphrasing, what it states is that irrespective of how noisy a channel is, using the right kind of encoding and redundancy, we will be able to predictably send across information at a certain maximum speed. The noisier the channel, the more the redundancy we will need, and the lower the speed of transmission.

In my opinion (and in the opinions of several others, I’m sure), this is a rather profound observation, and has significant impact on various aspects of life. In fact, I’m prone to abusing it in inexact manners (no wonder I never tried to become an academic).

So while thinking of the tradeoff between speed and accuracy, I started thinking of the channel coding theorem. You can think of a person’s work (or “working mind”) as a communication channel. The speed is the raw speed of transmission. The accuracy (rather, the lack of it) is a measure of noise in the channel.

So the less accurate someone is, the more the redundancy they require in communication (or in work). For example, if you are especially prone to mistakes (like I am sometimes), you might need to redo your work (or at least a part of it) several times. If you are the more accurate types, you need to redo less often.

And different people have different speed-accuracy trade-offs.

I don’t have a perfect way to quantify this, but maybe we can think of “true speed of work” by dividing the actual speed in which someone does a piece of work by the number of iterations they need to get it right.  OK it is not so straightforward (there might be other ways to build redundancy – like getting two independent people to do the same thing and then tally the numbers), but I suppose you get the drift.

The interesting thing here is that the speed and accuracy is not only depend on the person but the nature of work itself. For me, a piece of work that on average takes 1 hour has a different speed-accuracy tradeoff compared to a piece of work that on average takes a day (usually, the more complicated and involved a piece of analysis, the more the error rate for me).

In any case, the point to be noted is that the speed-accuracy tradeoff is different for different people, and in different contexts. For some people, in some contexts, there is no point at all in expecting highly accurate work – you know they will make mistakes anyways, so you might as well get the work done quickly (to allow for more time to iterate).

And in a way, figuring out speed-accuracy tradeoffs of the people who work for you is an important step in getting the best out of them.

 

Christian Rudder and Corporate Ratings

One of the studdest book chapters I’ve read is from Christian Rudder’s Dataclysm. Rudder is a cofounder of OkCupid, now part of the match.com portfolio of matchmakers. In this book, he has taken insights from OkCupid’s own data to draw insights about human life and behaviour.

It is a typical non-fiction book, with a studmax first chapter, and which gets progressively weaker. And it is the first chapter (which I’ve written about before) that I’m going to talk about here. There is a nice write-up and extract in Maria Popova’s website (which used to be called BrainPickings) here.

Quoting Maria Popova:

What Rudder and his team found was that not all averages are created equal in terms of actual romantic opportunities — greater variance means greater opportunity. Based on the data on heterosexual females, women who were rated average overall but arrived there via polarizing rankings — lots of 1’s, lots of 5’s — got exponentially more messages (“the precursor to outcomes like in-depth conversations, the exchange of contact information, and eventually in-person meetings”) than women whom most men rated a 3.

In one-hit markets like love (you only need to love and be loved by one person to be “successful” in this), high volatility is an asset. It is like option pricing if you think about it – higher volatility means greater chance of being in the money, and that is all you care about here. How deep out of the money you are just doesn’t matter.

I was thinking about this in some random context this morning when I was also thinking of the corporate appraisal process. Now, the difference between dating and appraisals is that on OKCupid you might get several ratings on a 5-point scale, but in your office you only get one rating each year on a 5-point scale. However, if you are a manager, and especially if you are managing a large team, you will GIVE out lots of ratings each year.

And so I was wondering – what does the variance of ratings you give out tell about you as a manager? Assume that HR doesn’t impose any “grading on curve” thing, what does it say if you are a manager who gave out an average rating of 3, with standard deviation 0.5, versus a manager who gave an average of 3, with all employees receiving 1s and 5s.

From a corporate perspective, would you rather want a team full of 3s, or a team with a few 5s and a few 1s (who, it is likely, will leave)? Once again, if you think about it, it depends on your Vega (returns to volatility). In some sense, it depends on whether you are running a stud or a fighter team.

If you are running a fighter team, where there is no real “spectacular performance” but you need your people to grind it out, not make mistakes, pay attention to detail and do their jobs, you want a team full of3s. The 5s in this team don’t contribute that much more than a 3. And 1s can seriously hurt your performance.

On the other hand, if you’re running a stud team, you will want high variance. Because by the sheer nature of work, in a stud team, the 5s will add significantly more value than the 1s might cause damage. When you are running a stud team, a team full of 3s doesn’t work – you are running far below potential in that case.

Assuming that your team has delivered, then maybe the distribution of ratings across the team is a function of whether it does more stud or fighter work? Or am I force fitting my pet theory a bit too much here?

Studs and Fighters and Attack and Defence

The general impression in sport is that attack is “stud” and defence is “Fighter“. This is mainly because defence (in any game, pretty much) is primarily about not making errors, and being disciplined. Flamboyance can pay off in attack, when you only need to strike occasionally, but not in defence, where the real payoff comes from being consistent and excellent.

However, attack need not always be stud, and defence need not always be fighter. This is especially true in team sports such as football, where there can be a fair degree of organisation and coaching to get players to coordinate.

This piece in The Athletic (paywalled) gives an interesting instance of how attacking can be fighter, and how modern football is all about fighter attacking. It takes the instance of this weekend’s game between Tottenham Hotspur and Liverpool F.C., which the latter won.

Jack Pitt-Brooke, the author, talks about how Liverpool is fighter in attack because the players are well-drilled in attacking, and practice combination play, or what are known in football as “automisations”.

But in modern football, the opposite is true. The best football, the type played by Pep Guardiola’s Manchester City or Jurgen Klopp’s Liverpool, is the most rigorously planned, drilled and co-ordinated. Those two managers have spent years teaching their players the complex attacking patterns and synchronised movements that allow them to cut through every team in the country. That is why they can never be frustrated by opponents who just sit in and defend, why they are racking up points totals beyond the reach of anyone else.

Jose Mourinho, on the other hand, might be fighter in the way he sets up his defence, but not so when it comes to attacking. He steadfastly refuses to have his teams train attacking automisations. While defences are extremely well drilled, and know exactly how to coordinate, attackers are left to their own forces and creativity. What Mourinho does is to identify a handful of attackers (usually the centre forward and the guy just behind him) who are given “free roles” and are expected to use their own creativity in leading their team’s attacks.

As Pitt-Brooke went on to write in his article,

That, more than anything else, explains the difference between Klopp and Mourinho. Klopp wants to plan his way out of the randomness of football. Mourinho is more willing to accept it as a fact and work around it. So while the modern manager — Klopp, Guardiola, Antonio Conte — coaches players in ‘automisations’, pre-planned moves and patterns, Mourinho does not.

Jurgen Klopp the fighter, and Jose Mourinho the stud. That actually makes sense when you think of how their teams attack. It may not be intuitive, but upon some thought it makes sense.

Yes, attack is also being fighterised in modern sport.

AlphaZero Revisited

It’s been over a year since Google’s DeepMind first made its splash with the reinforcement-learning based chess playing engine AlphaZero. The first anniversary of the story of AlphaZero being released also coincided with the publication of the peer-reviewed paper.

To go with the peer-reviewed paper, DeepMind has released a further 200 games played between AlphaZero and the conventional chess engine StockFish, which is again heavily loaded in favour of wins for AlphaZero, but also contains 6 game where AlphaZero lost. I’ve been following these games on GM Daniel King’s excellent Powerplaychess channel, and want to revise my opinion on AlphaZero.

Back then, I had looked at AlphaZero’s play from my favourite studs and fighter framework, which in hindsight doesn’t do full justice to AlphaZero. From the games that I’ve seen from the set released this season, AlphaZero’s play hasn’t exactly been “stud”. It’s just that it’s much more “human”. And the reason why AlphaZero’s play possibly seems more human is because of the way it “learns”.

Conventional chess engines evaluate a position by considering all possible paths (ok not really, they use an intelligent method called Alpha-Beta Pruning to limit their search size), and then play the move that leads to the best position at the end of the search. These engines use “pre-learnt human concepts” such as point count for different pieces, which are used to evaluate positions. And this leads to a certain kind of play.

AlphaZero’s learning, process, however, involves playing zillions of games against itself (since I wrote that previous post, I’ve come back up to speed with reinforcement learning). And then based on the results of these games, it evaluates positions it reached in the course of play (in hindsight). On top of this, it builds a deep learning model to identify the goodness of positions.

Given my limited knowledge of how deep learning works, this process involves AlphaZero learning about “features” of games that have more often than not enabled it to win. So somewhere in the network there will be a node that represents “control of centre”. Another node deep in the network might represent “safety of king”. Yet another might perhaps involve “open A file”.

Of course, none of these features have been pre-specified to AlphaZero. It has simply learnt it by training its neural network on zillions of games it has played against itself. And while deep learning is hard to “explain”, it is likely to have so happened that the features of the game that AlphaZero has learnt are remarkably similar to the “features” of the game that human players have learnt over the centuries. And it is because of the commonality in these features that we find AlphaZero’s play so “human”.

Another way to look at is from the concept of “10000 hours” that Malcolm Gladwell spoke about in his book Outliers. As I had written in my review of the book, the concept of 10000 hours can be thought of as “putting fight until you get enough intuition to become stud”. AlphaZero, thanks to its large number of processors, has effectively spent much more than “10000 hours” playing against itself, with its neural network constantly “learning” from the positions faced and the outcomes of the game reached. And this way, it has “gained intuition” over features of the game that lead to wins, giving it an air of “studness”.

The interesting thing to me about AlphaZero’s play is that thanks to its “independent development” (in a way like the Finches of Galapagos), it has not been burdened by human intuition on what is good or bad, and learnt its own heuristics. And along the way, it has come up with a bunch of heuristics that have not commonly be used by human players.

Keeping bishops on the back rank (once the rooks have been connected), for example. A stronger preference for bishops to knights than humans. Suddenly simplifying from a terrifying-looking attack into a winning endgame (machines are generally good at endgames, so this is not that surprising). Temporary pawn and piece sacrifices. And all that.

Thanks to engines such as LeelaZero, we can soon see the results of these learnings being applied to human chess as well. And human chess can only become better!

Randomness and sample size

I have had a strange relationship with volleyball, as I’ve documented here. Unlike in most other sports I’ve played, I was a rather defensive volleyball player, excelling in backline defence, setting and blocking, rather than spiking.

The one aspect of my game which was out of line with the rest of my volleyball, but in line with my play in most other sports I’ve played competitively, was my serve. I had a big booming serve, which at school level was mostly unreturnable.

The downside of having an unreturnable serve, though, is that you are likely to miss your serve more often than the rest – it might mean hitting it too long, or into the net, or wide. And like in one of the examples I’ve quoted in my earlier post, it might mean not getting a chance to serve at all, as the warm up serve gets returned or goes into the net.

So I was discussing my volleyball non-career with a friend who is now heavily involved in the game, and he thought that I had possibly been extremely unlucky. My own take on this is that given how little I played, it’s quite likely that things would have gone spectacularly wrong.

Changing domains a little bit, there was a time when I was building strategies for algorithmic trading, in a class known as “statistical arbitrage”. The deal there is that you have a small “edge” on each trade, but if you do a large enough number of trades, you will make money. As it happened, the guy I was working for then got spooked out after the first couple of trades went bad and shut down the strategy at a heavy loss.

Changing domains a little less this time, this is also the reason why you shouldn’t check your portfolio too often if you’re investing for the long term – in the short run, when there have been “fewer plays”, the chances of having a negative return are higher even if you’re in a mostly safe strategy, as I had illustrated in this blog post in 2008 (using the Livejournal URL since the table didn’t port well to wordpress).

And changing domains once again, the sheer number of “samples” is possibly one reason that the whole idea of quantification of sport and “SABRmetrics” first took hold in baseball. The Major League Baseball season is typically 162 games long (and this is before the playoffs), which means that any small edge will translate into results in the course of the league. A smaller league would mean fewer games and thus more randomness, and a higher chance that a “better play” wouldn’t work out.

This also explains why when “Moneyball” took off with the Oakland A’s in the 1990s, they focussed mainly on league performance and not performance in the playoffs – in the latter, there are simply not enough “samples” for a marginal advantage in team strength to necessarily have the impact in terms of results.

And this is the problem with newly appointed managers of elite football clubs in Europe “targeting the Champions League” – a knockout tournament of that format means that the best team need not always win. Targeting a national league, played out over at least 34 games in the season is a much better bet.

Finally, there is also the issue of variance. A higher variance in performance means that observations of a few instances of bad performance is not sufficient to conclude that the player is a bad performer – a great performance need not be too far away. For a player with less randomness in performance – a more steady player, if you will – a few bad performances will tell you that they are unlikely to come good. High risk high return players, on the other hand, need to be given a longer rope.

I’d put this in a different way in a blog a few years back, about Mitchell Johnson.

AlphaZero defeats Stockfish: Quick thoughts

The big news of the day, as far as I’m concerned, is the victory of Google Deepmind’s AlphaZero over Stockfish, currently the highest rated chess engine. This comes barely months after Deepmind’s AlphaGo Zero had bested the earlier avatar of AlphaGo in the game of Go.

Like its Go version, the AlphaZero chess playing machine learnt using reinforcement learning (I remember doing a term paper on the concept back in 2003 but have mostly forgotten). Basically it wasn’t given any “training data”, but the machine trained itself on continuously playing with itself, with feedback given in each stage of learning helping it learn better.

After only about four hours of “training” (basically playing against itself and discovering moves), AlphaZero managed to record this victory in a 100-game match, winning 28 and losing none (the rest of the games were drawn).

There’s a sample game here on the Chess.com website and while this might be a biased sample (it’s likely that the AlphaZero engineers included the most spectacular games in their paper, from which this is taken), the way AlphaZero plays is vastly different from the way engines such as Stockfish have been playing.

I’m not that much of a chess expert (I “retired” from my playing career back in 1994), but the striking things for me from this game were

  • the move 7. d5 against the Queen’s Indian
  • The piece sacrifice a few moves later that was hard to see
  • AlphaZero’s consistent attempts until late in the game to avoid trading queens
  • The move Qh1 somewhere in the middle of the game

In a way (and being consistent with some of the themes of this blog), AlphaZero can be described as a “stud” chess machine, having taught itself to play based on feedback from games it’s already played (the way reinforcement learning broadly works is that actions that led to “good rewards” are incentivised in the next iteration, while those that led to “poor rewards” are penalised. The challenge in this case is to set up chess in a way that is conducive for a reinforcement learning system).

Engines such as StockFish, on the other hand, are absolute “fighters”. They get their “power” by brute force, by going down nearly all possible paths in the game several moves down. This is supplemented by analysis of millions of existing games of various levels which the engine “learns” from – among other things, it learns how to prune and prioritise the paths it searches on. StockFish is also fed a database of chess openings which it remembers and tries to play.

What is interesting is that AlphaZero has “discovered” some popular chess openings through the course of is self-learning. It is interesting to note that some popular openings such as the King’s Indian or French find little favour with this engine, while others such as the Queen’s Gambit or the Queen’s Indian find favour. This is a very interesting development in terms of opening theory itself.

Frequency of openings over time employed by AlphaZero in its “learning” phase. Image sourced from AlphaZero research paper.

In any case, my immediate concern from this development is how it will affect human chess. Over the last decade or two, engines such as stockfish have played a profound role in the development of chess, with current top players such as Magnus Carlsen or Sergey Karjakin having trained extensively with these engines.

The way top grandmasters play has seen a steady change in these years as they have ingested the ideas from engines such as StockFish. The game has become far more quiet and positional, as players seek to gain small advantages which steadily improves over the course of (long) games. This is consistent with the way the engines that players learn from play.

Based on the evidence of the one game I’ve seen of AlphaZero, it plays very differently from the existing engines. Based on this, it will be interesting to see how human players who train with AlphaZero based engines (or their clones) will change their game.

Maybe chess will turn back to being a bit more tactical than it’s been in the last decade? It’s hard to say right now!

Interview length

When I interviewed for my current job four months back, I was put through over twelve hours of high-quality interviews. This includes both telephonic and face-to-face processes (on one day, I was called to the office and grilled from 1030am to 630pm) and by “high quality”, I’m referring to the standard of questions that I was asked.

All the interviews were extremely enjoyable, and I had fun solving the problems that had been thrown at me. I must mention here that the entire process was a “stud interview” – one that tried to evaluate me on my thought process rather than evaluating what I know. I’ve also been through a few “fighter interviews” – ones where the interviewer just spends time finding out your “knowledge” – and I don’t remember taking a single job so far after passing this kind of an interview.

So recently I read this post by Seth Godin that someone had shared on Google Reader, where he says that there exists just no point in having long interviews and so interviews should be kept short and to the point. That way, he says, people’s time gets wasted less and the candidate also doesn’t need to waste much time interviewing. After reading that, I was trying to put my personal experience into perspective.

One thing is that in a “stud interview”, where you throw tough problems at the candidate, one of the key “steps” in the solution process is for an insight to hit the candidate. Even if you give hints, and mark liberally for “steps”, the “cracking” of the problem usually depends upon an insight. And it isn’t fair to expect that an insight hits the candidate on each and every question, and so the way to take out this factor is by having a large number of questions. Which means the interview takes longer.

The other thing about the length of the interview is signaling. Twelve hours of hardcore problem-solving sends out a signal to the candidate with regard to the quality of the group. It gives an idea to the candidate about what it takes to get into the group. It says that every person working in the group had to go through this kind of a process and hence is likely to be of high quality.

Another thing with the “stud interview” is that it also directly gives the candidate an idea of the quality of the people interviewing. Typically, hard math-puzzle based interviews are difficult to “take” (for the interviewer). So putting the candidate through this large number of math-problem-solving interviews tells him that the large number of people interviewing him are all good enough to take this kind of an interview. And this kind of interviews are also ruthless on the interviewer – it is usually not hard for a smart candidate to see through it if he thinks the interviewer has just mugged the answer to a question without actually solving it.

All put together, when you are recruiting for a job based on “stud interviews”, it makes sense for you to take time, and make the candidate go through several rounds. It also usually helps that most of these “stud interviews” are usually fun for the candidate also. On the other hand, if you are only willing to test what the candidate knows and are not really interested in the way he thinks, then you might follow Godin’s suggestion and keep the interview short.