Shrinking deadlines

I’m reminded of this old joke/riddle, which also happened to feature in Gowri Ganesha. “If a 1 metre long sari takes 1 hour to dry in the sun, how long will and 8 metre long sari take to dry?”.

The instinctive answer, of course, is 8 hours, while if you think about it (and assume that you have enough clothesline space to not need to fold), the correct answer is likely to be 1 hour.

Now this riddle is completely unconnected to do with the point of the post, except that both have to do with time.

And then one day you find, ten years have got behind you.
No one told you when to run. You missed the starting gun. 

Ok enough distractions. I’m now home, home again.

Modern workspaces are synonymous with tight deadlines. Even when you give a conservative estimate on how long something will take, you get asked to compress the timelines further. If you protest too much and say that there is a lot to be done, sometimes you might get asked to “put one more person on the job and get it done quickly”.

This might work for routine, or “fighter” jobs – for example, if your job is to enter and copy data for (let’s say) 1000 records, you can easily put another person on the job, and the entire job will be done in about half the time (allowing for a little time for the new person to learn the job and for coordination).

As the job gets more complex, the harder it gets. At one level, there is more time to be spent by the new person coming into the job. Then, as the job gets more complex, it gets harder to divide and conquer, or to “specialise”. This means there is lesser impact to the new person coming in.

And then when you get closer and closer to the stud end of the spectrum, the advantage of putting more people to get the work done faster get lesser and lesser. There comes a point when the extra person actively becomes a liability. Again – I’m reminded of my childhood when occasionally I would ask my mother if she needed help in cooking. “Yes, the best way for you to help is for you to stay out of the kitchen”, she would say.

And then when the job gets really creative, there is a further limit on compression – a lot of the work is done “offline”. I keep telling people about how I finally discovered the proof of Ramsey’s numbers (3,3) while playing table tennis in my hostel, or how I had solved a tough assignment problem while taking a friend’s new motorcycle for a ride.

When you want to solve problems “offline” (to let the insight come to you rather than going hunting for it – I had once written about this) – there is no way to shorten the process. You need to let the problem stew in your head, and hope that some time it will get solved.

There is nothing that can be done here. The more you hurry up, the less the chances you give yourself of solving the problem. Everything needs to take its natural course.

I got reminded of it when we missed a deadline last Friday, and I decided to not think about it through the weekend. And then, an hour before I got to work on Monday, an idea occurred in the shower which fixed the problem. Even if I’d stressed myself (and my team) out on Friday, or done somersaults, the problem would not have been solved.

As I’d said in 2004, quality takes time.

Intelligent and Diligent

For whatever reason, when I was a schoolboy and first learnt of the word “diligent”, I assumed that it should be the opposite on intelligent. “Only people who are not intelligent need to be diligent”, the young I had reasoned.

And nearly thirty years later, I came across this stellar 2×2 on intelligence and diligence. I’ve read it in many places now, but will link to the version on farnam street blog. I’m copying this quote from the blog, which is apparently credited to two different military officers.

I divide my officers into four groups. There are clever, diligent, stupid, and lazy officers. Usually two characteristics are combined. Some are clever and diligent — their place is the General Staff. The next lot are stupid and lazy — they make up 90 percent of every army and are suited to routine duties. Anyone who is both clever and lazy is qualified for the highest leadership duties, because he possesses the intellectual clarity and the composure necessary for difficult decisions. One must beware of anyone who is stupid and diligent — he must not be entrusted with any responsibility because he will always cause only mischief.

Maybe I was up to something interesting back in the 1990s, even if it was rather self-serving. And maybe it is this concept I reprised in the late 2000s when I came up with “studs and fighters“. It was possibly my irritation with the “stupid and diligent” variety.

Now I’m thinking of this “stupid and diligent” 2×2 in terms of our schooling and education. Maybe there is this general feeling among parents, teachers and suchlike that intelligence is something you are “born with”, and you cannot become intelligent.

So the moment they spot a kid who is stupid and lazy, they decide that the best way to “improve” this kid is to make him/her more diligent, rather than more intelligent. In the short run this might work, since the kid is now able to do better in the school exams (which is what most teachers are optimising for). The long run effect, though, is that the kid, instead of ending up in the numerous but harmless “general staff” (stupid and lazy), ends up in the seemingly more competent but actually “dangerous, and only causing mischief” stupid and diligent quadrant.

In other words, our general schooling makes our adult population much more dangerous!

Key Person Risk and Creative Professions

I’m coming to the conclusion that creative professions inevitably come with a “key person risk”. And this is due to the way teams in such professions are usually built.

I’ll start with a tweet that I put out today.

(I had NOT planned this post at the time when I put out this tweet)

I’ll not go into defining creative professions here, but I will leave it to say that you typically know it when you see one.

The thing with teams in such professions is that people who are good and creative are highly unlikely to get along with each other. Going into the animal kingdom for an analogy, we can think of dividing everyone in any such professions into “alphas” and “betas”. Alphas are the massively creative people who usually rise to lead their teams. Betas are the rest.

And given that any kind of creativity is due to some amount of lateral thinking, people good at creative professions are likely to hallucinate a bit (hallucination is basically lateral thinking taken to an extreme). And stretching it a bit more, you can say that people who are good at creative tasks are usually mad in one way or another.

As I had written briefly this morning, it is not usual for mad people (especially of a similar nature of madness) to get along with each other. So if you have a creative alpha leading the team, it is highly unlikely that he/she will have similar alphas in the next line of leadership. It is more likely that the next line of leadership will have people who are good complements to the alpha leader.

For example, in the ongoing World Cup, I’ve seen several tactical videos that have all said one thing – that Rodrigo De Paul’s primary role in the Argentinian team is to “cover for Messi”. Messi doesn’t track back, but De Paul will do the defending for him. Messi largely switches off, but De Paul is industrious enough to cover for Messi. When Messi goes forward, De Paul goes back. When Messi drops deep, De Paul makes a forward run.

This is the most typical creative partnership that you can get – one very obviously alpha creative supported by one or more steady performers who enable the creative person to do the creative work.

The question is – what happens when the creative head (the alpha) leaves? And the answer to this are going to be different in elite sport and the corporate world (and I’m mostly talking about the latter in this post).

In elite sport, when Messi retires (which he is likely to do after tomorrow’s final, irrespective of the result), it is virtually inconceivable that Argentina will ask De Paul to play in his position. Instead, they will look into others who are already playing in a sort of Messi role, maybe (or likely) at an inferior level and bring them up. De Paul will continue to play his role of central midfielder and continue to support whoever comes into Messi’s role.

In corporate setups, though, when one employee leaves, the obvious thing to do is to promote that person’s second in command. Sometimes there might be a battle for succession among various seconds in command, and the losers also leave the company. For most teams, where seconds in command are usually similar in style to the leader, this kind of succession planning works.

For creative teams, however, this usually leads to a disaster. More often than not, the second in command’s skills will be very different from that of the leader. If the leader had been an alpha creative (that’s the case we’re largely discussing here), the second in command is more likely to be a steady “water carrier” (a pejorative term used to describe France’s current coach Didier Deschamps).

And if this “water carrier” (no offence meant to anyone by this, but it is a convenient description) stays in the job for a long time, it is likely that the creative team will stop being creative. The thing that made it creative in the first place was the alpha’s leadership (this is especially true of small teams), and unless the new boss has recognised this and brings in a new set of alphas (or identifies potential alphas in the org and quickly promotes them), the team will start specialising in what was the new boss’s specialisation – which is to hold things steady and do all the right things and cover for someone who doesn’t exist any more.

So teams in creative professions have a key man risk in that if a particularly successful alpha leaves, the team as it remains is likely to stagnate and stop being creative. The only potential solutions I can think of are:

  • Bring in a new creative from outside to lead the team. The second in command remains just that
  • Coach the second in command to identify diverse (and creative alpha) talents within the team and recognise that there are alphas and betas. And the second in command basically leads the team but not the creative work
  • Organise the team more as a sports team where each person has a specific role. So if the attacking midfielder leaves, replace with a new attacking midfielder (or promote a junior attacking midfielder into a senior attacking midfielder). Don’t ask your defensive midfielders to suddenly become an attacking midfielder
  • Put pressure from above for alphas to have a sufficient number of other alphas as the next line of command. Retaining this team is easier said than done, and without betas the team can collapse.

Of course, if you look at all this from the perspective of the beta, there is an obvious question mark about career prospects. Unless you suddenly change your style (easier said than done), you will never be the alpha, and this puts in place a sort of glass ceiling for your career.

Speed, Accuracy and Shannon’s Channel Coding Theorem

I was probably the CAT topper in my year (2004) (they don’t give out ranks, only percentiles (to two digits of precision), so this is a stochastic measure). I was also perhaps the only (or one of the very few) person to get into IIMs that year despite getting 20 questions wrong.

It had just happened that I had attempted far more questions than most other people. And so even though my accuracy was rather poor, my speed more than made up for it, and I ended up doing rather well.

I remember this time during my CAT prep, where the guy who was leading my CAT factory once suggested that I was making too many errors so I should possibly slow down and make fewer mistakes. I did that in a few mock exams. I ended up attempting far fewer questions. My accuracy (measured as % of answers I got wrong) didn’t change by much. So it was an easy decision to forget above accuracy and focus on speed and that served me well.

However, what serves you well in an entrance exam need not necessarily serve you well in life. An exam is, by definition, an artificial space. It is usually bounded by certain norms (of the format). And so, you can make blanket decisions such as “let me just go for speed”, and you can get away with it. In a way, an exam is a predictable space. It is a caricature of the world. So your learnings from there don’t extend to life.

In real life, you can’t “get away with 20 wrong answers”. If you have done something wrong, you are (most likely) expected to correct it. Which means, in real life, if you are inaccurate in your work, you will end up making further iterations.

Observing myself, and people around me (literally and figuratively at work), I sometimes wonder if there is a sort of efficient frontier in terms of speed and accuracy. For a given level of speed and accuracy, can we determine an “ideal gradient” – on which way a person needs to move in order to make the maximum impact?

Once in a while, I take book recommendations from academics, and end up reading (rather, trying to read) academic books. Recently, someone had recommended a book that combined information theory and machine learning, and I started reading it. Needless to say, within half a chapter, I was lost, and I had abandoned the book. Yet, the little I read performed the useful purpose of reminding me of Shannon’s channel coding theorem.

Paraphrasing, what it states is that irrespective of how noisy a channel is, using the right kind of encoding and redundancy, we will be able to predictably send across information at a certain maximum speed. The noisier the channel, the more the redundancy we will need, and the lower the speed of transmission.

In my opinion (and in the opinions of several others, I’m sure), this is a rather profound observation, and has significant impact on various aspects of life. In fact, I’m prone to abusing it in inexact manners (no wonder I never tried to become an academic).

So while thinking of the tradeoff between speed and accuracy, I started thinking of the channel coding theorem. You can think of a person’s work (or “working mind”) as a communication channel. The speed is the raw speed of transmission. The accuracy (rather, the lack of it) is a measure of noise in the channel.

So the less accurate someone is, the more the redundancy they require in communication (or in work). For example, if you are especially prone to mistakes (like I am sometimes), you might need to redo your work (or at least a part of it) several times. If you are the more accurate types, you need to redo less often.

And different people have different speed-accuracy trade-offs.

I don’t have a perfect way to quantify this, but maybe we can think of “true speed of work” by dividing the actual speed in which someone does a piece of work by the number of iterations they need to get it right.  OK it is not so straightforward (there might be other ways to build redundancy – like getting two independent people to do the same thing and then tally the numbers), but I suppose you get the drift.

The interesting thing here is that the speed and accuracy is not only depend on the person but the nature of work itself. For me, a piece of work that on average takes 1 hour has a different speed-accuracy tradeoff compared to a piece of work that on average takes a day (usually, the more complicated and involved a piece of analysis, the more the error rate for me).

In any case, the point to be noted is that the speed-accuracy tradeoff is different for different people, and in different contexts. For some people, in some contexts, there is no point at all in expecting highly accurate work – you know they will make mistakes anyways, so you might as well get the work done quickly (to allow for more time to iterate).

And in a way, figuring out speed-accuracy tradeoffs of the people who work for you is an important step in getting the best out of them.

 

Christian Rudder and Corporate Ratings

One of the studdest book chapters I’ve read is from Christian Rudder’s Dataclysm. Rudder is a cofounder of OkCupid, now part of the match.com portfolio of matchmakers. In this book, he has taken insights from OkCupid’s own data to draw insights about human life and behaviour.

It is a typical non-fiction book, with a studmax first chapter, and which gets progressively weaker. And it is the first chapter (which I’ve written about before) that I’m going to talk about here. There is a nice write-up and extract in Maria Popova’s website (which used to be called BrainPickings) here.

Quoting Maria Popova:

What Rudder and his team found was that not all averages are created equal in terms of actual romantic opportunities — greater variance means greater opportunity. Based on the data on heterosexual females, women who were rated average overall but arrived there via polarizing rankings — lots of 1’s, lots of 5’s — got exponentially more messages (“the precursor to outcomes like in-depth conversations, the exchange of contact information, and eventually in-person meetings”) than women whom most men rated a 3.

In one-hit markets like love (you only need to love and be loved by one person to be “successful” in this), high volatility is an asset. It is like option pricing if you think about it – higher volatility means greater chance of being in the money, and that is all you care about here. How deep out of the money you are just doesn’t matter.

I was thinking about this in some random context this morning when I was also thinking of the corporate appraisal process. Now, the difference between dating and appraisals is that on OKCupid you might get several ratings on a 5-point scale, but in your office you only get one rating each year on a 5-point scale. However, if you are a manager, and especially if you are managing a large team, you will GIVE out lots of ratings each year.

And so I was wondering – what does the variance of ratings you give out tell about you as a manager? Assume that HR doesn’t impose any “grading on curve” thing, what does it say if you are a manager who gave out an average rating of 3, with standard deviation 0.5, versus a manager who gave an average of 3, with all employees receiving 1s and 5s.

From a corporate perspective, would you rather want a team full of 3s, or a team with a few 5s and a few 1s (who, it is likely, will leave)? Once again, if you think about it, it depends on your Vega (returns to volatility). In some sense, it depends on whether you are running a stud or a fighter team.

If you are running a fighter team, where there is no real “spectacular performance” but you need your people to grind it out, not make mistakes, pay attention to detail and do their jobs, you want a team full of3s. The 5s in this team don’t contribute that much more than a 3. And 1s can seriously hurt your performance.

On the other hand, if you’re running a stud team, you will want high variance. Because by the sheer nature of work, in a stud team, the 5s will add significantly more value than the 1s might cause damage. When you are running a stud team, a team full of 3s doesn’t work – you are running far below potential in that case.

Assuming that your team has delivered, then maybe the distribution of ratings across the team is a function of whether it does more stud or fighter work? Or am I force fitting my pet theory a bit too much here?

Studs and Fighters and Attack and Defence

The general impression in sport is that attack is “stud” and defence is “Fighter“. This is mainly because defence (in any game, pretty much) is primarily about not making errors, and being disciplined. Flamboyance can pay off in attack, when you only need to strike occasionally, but not in defence, where the real payoff comes from being consistent and excellent.

However, attack need not always be stud, and defence need not always be fighter. This is especially true in team sports such as football, where there can be a fair degree of organisation and coaching to get players to coordinate.

This piece in The Athletic (paywalled) gives an interesting instance of how attacking can be fighter, and how modern football is all about fighter attacking. It takes the instance of this weekend’s game between Tottenham Hotspur and Liverpool F.C., which the latter won.

Jack Pitt-Brooke, the author, talks about how Liverpool is fighter in attack because the players are well-drilled in attacking, and practice combination play, or what are known in football as “automisations”.

But in modern football, the opposite is true. The best football, the type played by Pep Guardiola’s Manchester City or Jurgen Klopp’s Liverpool, is the most rigorously planned, drilled and co-ordinated. Those two managers have spent years teaching their players the complex attacking patterns and synchronised movements that allow them to cut through every team in the country. That is why they can never be frustrated by opponents who just sit in and defend, why they are racking up points totals beyond the reach of anyone else.

Jose Mourinho, on the other hand, might be fighter in the way he sets up his defence, but not so when it comes to attacking. He steadfastly refuses to have his teams train attacking automisations. While defences are extremely well drilled, and know exactly how to coordinate, attackers are left to their own forces and creativity. What Mourinho does is to identify a handful of attackers (usually the centre forward and the guy just behind him) who are given “free roles” and are expected to use their own creativity in leading their team’s attacks.

As Pitt-Brooke went on to write in his article,

That, more than anything else, explains the difference between Klopp and Mourinho. Klopp wants to plan his way out of the randomness of football. Mourinho is more willing to accept it as a fact and work around it. So while the modern manager — Klopp, Guardiola, Antonio Conte — coaches players in ‘automisations’, pre-planned moves and patterns, Mourinho does not.

Jurgen Klopp the fighter, and Jose Mourinho the stud. That actually makes sense when you think of how their teams attack. It may not be intuitive, but upon some thought it makes sense.

Yes, attack is also being fighterised in modern sport.

Studs and fighters: Origin

As far as this blog is concerned, the concept of studs and fighters began sometime in 2007, when I wrote the canonical blog post on the topic. Since then the topic has been much used and abused.

Recently, though, I remembered when I had first come across the concept of studs and fighters. This goes way back to 1999, and has its origins in a conversation with two people who I consider as among the studdest people I’ve ever met (they’re both now professors at highly reputed universities).

We were on a day-long train journey, and were discussing people we had spent a considerable amount of time with over the previous one month. It was a general gossip session, the sort that was common to train journeys in the days before smartphones made people insular.

While discussing about one guy we had met, one of us (it wasn’t me for sure. It was one of the other two but I now can’t recall which of them it was) said “well, he isn’t particularly clever, but he is a very hard worker for sure”.

And so over time this distinction got institutionalised, first in my head and then in the heads of all my readers. There were two ways to be good at something – by either being clever or by being a very hard worker.

Thinking about it now, it seems rather inevitable that the concept that would become studs and fighters came about in the middle of a conversation among studs.

10X Studs and Fighters

Tech twitter, for the last week, has been inundated with unending debate on this tweetstorm by a VC about “10X engineers”. The tweetstorm was engineered by Shekhar Kirani, a Partner at Accel Partners.

I have friends and twitter-followees on both sides of the debate. There isn’t much to describe more about the “paksh” side of the debate. Read Shekhar’s tweetstorm I’ve put above, and you’ll know all there is to this side.

The vipaksh side argues that this normalises “toxicity” and “bad behaviour” among engineers (about “10X engineers”‘s hatred for meetings, and their not adhering to processes etc.). Someone I follow went to the extent to say that this kind of behaviour among engineers is a sign of privilege and lack of empathy.

This is just the gist of the argument. You can just do a search of “10X engineer”, ignore the jokes (most of them are pretty bad) and read people’s actual arguments for and against “10X engineers”.

Regular readers of this blog might be familiar with the “studs and fighters” framework, which I used so often in the 2007-9 period that several people threatened to stop reading me unless I stopped using the framework. I put it on a temporary hiatus and then revived it a couple of years back because I decided it’s too useful a framework to ignore.

One of the fundamental features of the studs and fighters framework is that studs and fighters respectively think that everyone else is like themselves. And this can create problems at the organisational level. I’d spoken about this in the introductory post on the framework.

To me this debate about 10X engineers and whether they are good or bad reminds me of the conflict between studs and fighters. Studs want to work their way. They are really good at what they’re competent at, and absolutely suck at pretty much everything else. So they try to avoid things they’re bad at, can sometimes be individualistic and prefer to work alone, and hope that how good they are at the things they’re good at will compensate for all that they suck elsewhere.

Fighters, on the other hand, are process driven, methodical, patient and sticklers for rules. They believe that output is proportional to input, and that it is impossible for anyone to have a 10X impact, even 1/10th of the time (:P). They believe that everyone needs to “come together as a group and go through a process”.

I can go on but won’t.

So should your organisation employ 10X engineers or not? Do you tolerate the odd “10X engineer” who may not follow company policy and all that in return for their superior contributions? There is no easy answer to this but overall I think companies together will follow a “mixed strategy”.

Some companies will be encouraging of 10X behaviour, and you will see 10X people gravitating towards such companies. Others will dissuade such behaviour and the 10X people there, not seeing any upside, will leave to join the 10X companies (again I’ve written about how you can have “stud organisations” and “fighter organisations”.

Note that it’s difficult to run an organisation with solely 10X people (they’re bad at managing stuff), so organisations that engage 10X people will also employ “fighters” who are cognisant that 10X people exist and know how they should be managed. In fact, being a fighter while recognising and being able to manage 10X behaviour is, I think, an important skill.

As for myself, I don’t like one part of Shekhar Kirani’s definition – that he restricts it to “engineers”. I think the sort of behaviour he describes is present in other fields and skills as well. Some people see the point in that. Others don’t.

Life is a mixed strategy.

AlphaZero Revisited

It’s been over a year since Google’s DeepMind first made its splash with the reinforcement-learning based chess playing engine AlphaZero. The first anniversary of the story of AlphaZero being released also coincided with the publication of the peer-reviewed paper.

To go with the peer-reviewed paper, DeepMind has released a further 200 games played between AlphaZero and the conventional chess engine StockFish, which is again heavily loaded in favour of wins for AlphaZero, but also contains 6 game where AlphaZero lost. I’ve been following these games on GM Daniel King’s excellent Powerplaychess channel, and want to revise my opinion on AlphaZero.

Back then, I had looked at AlphaZero’s play from my favourite studs and fighter framework, which in hindsight doesn’t do full justice to AlphaZero. From the games that I’ve seen from the set released this season, AlphaZero’s play hasn’t exactly been “stud”. It’s just that it’s much more “human”. And the reason why AlphaZero’s play possibly seems more human is because of the way it “learns”.

Conventional chess engines evaluate a position by considering all possible paths (ok not really, they use an intelligent method called Alpha-Beta Pruning to limit their search size), and then play the move that leads to the best position at the end of the search. These engines use “pre-learnt human concepts” such as point count for different pieces, which are used to evaluate positions. And this leads to a certain kind of play.

AlphaZero’s learning, process, however, involves playing zillions of games against itself (since I wrote that previous post, I’ve come back up to speed with reinforcement learning). And then based on the results of these games, it evaluates positions it reached in the course of play (in hindsight). On top of this, it builds a deep learning model to identify the goodness of positions.

Given my limited knowledge of how deep learning works, this process involves AlphaZero learning about “features” of games that have more often than not enabled it to win. So somewhere in the network there will be a node that represents “control of centre”. Another node deep in the network might represent “safety of king”. Yet another might perhaps involve “open A file”.

Of course, none of these features have been pre-specified to AlphaZero. It has simply learnt it by training its neural network on zillions of games it has played against itself. And while deep learning is hard to “explain”, it is likely to have so happened that the features of the game that AlphaZero has learnt are remarkably similar to the “features” of the game that human players have learnt over the centuries. And it is because of the commonality in these features that we find AlphaZero’s play so “human”.

Another way to look at is from the concept of “10000 hours” that Malcolm Gladwell spoke about in his book Outliers. As I had written in my review of the book, the concept of 10000 hours can be thought of as “putting fight until you get enough intuition to become stud”. AlphaZero, thanks to its large number of processors, has effectively spent much more than “10000 hours” playing against itself, with its neural network constantly “learning” from the positions faced and the outcomes of the game reached. And this way, it has “gained intuition” over features of the game that lead to wins, giving it an air of “studness”.

The interesting thing to me about AlphaZero’s play is that thanks to its “independent development” (in a way like the Finches of Galapagos), it has not been burdened by human intuition on what is good or bad, and learnt its own heuristics. And along the way, it has come up with a bunch of heuristics that have not commonly be used by human players.

Keeping bishops on the back rank (once the rooks have been connected), for example. A stronger preference for bishops to knights than humans. Suddenly simplifying from a terrifying-looking attack into a winning endgame (machines are generally good at endgames, so this is not that surprising). Temporary pawn and piece sacrifices. And all that.

Thanks to engines such as LeelaZero, we can soon see the results of these learnings being applied to human chess as well. And human chess can only become better!

AlphaZero defeats Stockfish: Quick thoughts

The big news of the day, as far as I’m concerned, is the victory of Google Deepmind’s AlphaZero over Stockfish, currently the highest rated chess engine. This comes barely months after Deepmind’s AlphaGo Zero had bested the earlier avatar of AlphaGo in the game of Go.

Like its Go version, the AlphaZero chess playing machine learnt using reinforcement learning (I remember doing a term paper on the concept back in 2003 but have mostly forgotten). Basically it wasn’t given any “training data”, but the machine trained itself on continuously playing with itself, with feedback given in each stage of learning helping it learn better.

After only about four hours of “training” (basically playing against itself and discovering moves), AlphaZero managed to record this victory in a 100-game match, winning 28 and losing none (the rest of the games were drawn).

There’s a sample game here on the Chess.com website and while this might be a biased sample (it’s likely that the AlphaZero engineers included the most spectacular games in their paper, from which this is taken), the way AlphaZero plays is vastly different from the way engines such as Stockfish have been playing.

I’m not that much of a chess expert (I “retired” from my playing career back in 1994), but the striking things for me from this game were

  • the move 7. d5 against the Queen’s Indian
  • The piece sacrifice a few moves later that was hard to see
  • AlphaZero’s consistent attempts until late in the game to avoid trading queens
  • The move Qh1 somewhere in the middle of the game

In a way (and being consistent with some of the themes of this blog), AlphaZero can be described as a “stud” chess machine, having taught itself to play based on feedback from games it’s already played (the way reinforcement learning broadly works is that actions that led to “good rewards” are incentivised in the next iteration, while those that led to “poor rewards” are penalised. The challenge in this case is to set up chess in a way that is conducive for a reinforcement learning system).

Engines such as StockFish, on the other hand, are absolute “fighters”. They get their “power” by brute force, by going down nearly all possible paths in the game several moves down. This is supplemented by analysis of millions of existing games of various levels which the engine “learns” from – among other things, it learns how to prune and prioritise the paths it searches on. StockFish is also fed a database of chess openings which it remembers and tries to play.

What is interesting is that AlphaZero has “discovered” some popular chess openings through the course of is self-learning. It is interesting to note that some popular openings such as the King’s Indian or French find little favour with this engine, while others such as the Queen’s Gambit or the Queen’s Indian find favour. This is a very interesting development in terms of opening theory itself.

Frequency of openings over time employed by AlphaZero in its “learning” phase. Image sourced from AlphaZero research paper.

In any case, my immediate concern from this development is how it will affect human chess. Over the last decade or two, engines such as stockfish have played a profound role in the development of chess, with current top players such as Magnus Carlsen or Sergey Karjakin having trained extensively with these engines.

The way top grandmasters play has seen a steady change in these years as they have ingested the ideas from engines such as StockFish. The game has become far more quiet and positional, as players seek to gain small advantages which steadily improves over the course of (long) games. This is consistent with the way the engines that players learn from play.

Based on the evidence of the one game I’ve seen of AlphaZero, it plays very differently from the existing engines. Based on this, it will be interesting to see how human players who train with AlphaZero based engines (or their clones) will change their game.

Maybe chess will turn back to being a bit more tactical than it’s been in the last decade? It’s hard to say right now!