10X Studs and Fighters

Tech twitter, for the last week, has been inundated with unending debate on this tweetstorm by a VC about “10X engineers”. The tweetstorm was engineered by Shekhar Kirani, a Partner at Accel Partners.

I have friends and twitter-followees on both sides of the debate. There isn’t much to describe more about the “paksh” side of the debate. Read Shekhar’s tweetstorm I’ve put above, and you’ll know all there is to this side.

The vipaksh side argues that this normalises “toxicity” and “bad behaviour” among engineers (about “10X engineers”‘s hatred for meetings, and their not adhering to processes etc.). Someone I follow went to the extent to say that this kind of behaviour among engineers is a sign of privilege and lack of empathy.

This is just the gist of the argument. You can just do a search of “10X engineer”, ignore the jokes (most of them are pretty bad) and read people’s actual arguments for and against “10X engineers”.

Regular readers of this blog might be familiar with the “studs and fighters” framework, which I used so often in the 2007-9 period that several people threatened to stop reading me unless I stopped using the framework. I put it on a temporary hiatus and then revived it a couple of years back because I decided it’s too useful a framework to ignore.

One of the fundamental features of the studs and fighters framework is that studs and fighters respectively think that everyone else is like themselves. And this can create problems at the organisational level. I’d spoken about this in the introductory post on the framework.

To me this debate about 10X engineers and whether they are good or bad reminds me of the conflict between studs and fighters. Studs want to work their way. They are really good at what they’re competent at, and absolutely suck at pretty much everything else. So they try to avoid things they’re bad at, can sometimes be individualistic and prefer to work alone, and hope that how good they are at the things they’re good at will compensate for all that they suck elsewhere.

Fighters, on the other hand, are process driven, methodical, patient and sticklers for rules. They believe that output is proportional to input, and that it is impossible for anyone to have a 10X impact, even 1/10th of the time (:P). They believe that everyone needs to “come together as a group and go through a process”.

I can go on but won’t.

So should your organisation employ 10X engineers or not? Do you tolerate the odd “10X engineer” who may not follow company policy and all that in return for their superior contributions? There is no easy answer to this but overall I think companies together will follow a “mixed strategy”.

Some companies will be encouraging of 10X behaviour, and you will see 10X people gravitating towards such companies. Others will dissuade such behaviour and the 10X people there, not seeing any upside, will leave to join the 10X companies (again I’ve written about how you can have “stud organisations” and “fighter organisations”.

Note that it’s difficult to run an organisation with solely 10X people (they’re bad at managing stuff), so organisations that engage 10X people will also employ “fighters” who are cognisant that 10X people exist and know how they should be managed. In fact, being a fighter while recognising and being able to manage 10X behaviour is, I think, an important skill.

As for myself, I don’t like one part of Shekhar Kirani’s definition – that he restricts it to “engineers”. I think the sort of behaviour he describes is present in other fields and skills as well. Some people see the point in that. Others don’t.

Life is a mixed strategy.

Single Malt Recommendation App

Life is too short to drink whisky you don’t like.

How often have you found yourself in a duty free shop in an airport, wondering which whisky to take back home? Unless you are a pro at this already, you might want something you haven’t tried before, but don’t want to end up buying something you may not like. The names are all grand, as Scottish names usually are. The region might offer some clue, but not so much.

So I started on this work a few years back, when I first discovered this whisky database. I had come up with a set of tables to recommend what whisky is similar to what, and which single malts are the “most unique”. Based on this, I discovered that I might like Ardbeg. And I ended up absolutely loving it.

And ever since, I’ve carried a couple of tables in my Evernote to make sure I have some recommendations handy when I’m at a whisky shop and need to make a decision. But then the tables are not user friendly, and don’t typically tell you what you should buy, and what your next choice should be and so on .

To make things more user-friendly, I have built this app where all you need to enter is your favourite set of single malts, and it gives you a list of other single malts that you might like.

The data set is the same. I once again use cosine similarity to find the similarity of different whiskies. Except that this time I take the average of your favourite whiskies, and then look for the whiskies that are closest to that.

In terms of technologies, I’ve used this R package called Shiny to build the app. It took not more than half an hour of programming effort to build, and most of that was in actually building the logic, not the UI stuff.

So take it for a spin, and let me know what you think.

 

Voice assistants and traditional retail

Traditionally, retail was an over-the-counter activity. There was a physical counter between the buyer and the seller, and the buyer would demand what he wanted, and the shopkeeper would hand it over to him. This form of retail gave greater power to the shopkeeper, which meant that brands could practice what can be described as “push marketing”.

Most of the marketing effort would be spent in selling to the shopkeeper and then providing him sufficient incentives to sell it on to the customer. In most cases the customer didn’t have that much of a choice. She would ask for “salt”, for example, and the shopkeeper would give her the brand of salt that benefited him the most to sell.

Sometimes some brands would provide sufficient incentives to the shopkeeper to ensure that similar products from competing brands wouldn’t be stocked at all, ensuring that the customer faced a higher cost of getting those products (going to another shops) if they desired it. Occasionally, such strategies would backfire (a client with extremely strong brand preferences would eschew the shopkeeper who wouldn’t stock these brands). Mostly they worked.

The invention of the supermarket (sometime in the late 1800s, if I remember my research for my book correctly – it followed the concept of set prices) changed the dynamics a little bit. In this scenario, while the retailer continues to do the “shortlisting”, the ultimate decision is in the hands of the customer, who will pick her favourite among the brands on display.

This increases the significance of branding in the minds of the customer. The strongest incentives to retailers won’t work (unless they result in competing brands being wiped out from the shelves – but that comes with a risk) if the customer has a preference for a competing product. At best the retailer can offer these higher-incentive brands better shelf space (eye level as opposed to ankle level, for example).

However, even in traditional over-the-counter retail, branding matters to an extent when there is choice (as I had detailed in an earlier post written several years ago). This is in the instance where the shopkeeper asks the customer which brand she wants, and the customer has to make the choice “blind” without knowing what exactly is available.

I’m reminded of this issue of branding and traditional retail as I try to navigate the Alexa voice assistant. Nowadays there are two ways in which I play music using Spotify – one is the “direct method” from the phone or computer, where I search for a song, a list gets thrown up and I can select which one to play. The other is through Alexa, where I ask for a song and the assistant immediately starts playing it.

With popular songs where there exists a dominant version, using the phone and Alexa give identical results (though there are exceptions to this as well – when I ask Alexa to play Black Sabbath’s Iron Man, it plays the live version which is a bit off). However, when you are looking for songs that have multiple interpretations, you implicitly let Alexa make the decision for you, like a shopkeeper in traditional retail.

So, for example, most popular nursery rhymes have been covered by several groups. Some do the job well, singing the rhymes in the most dominant tunes, and using the most popular versions of the lyrics. Other mangle the tunes, and even the lyrics (like this Indian YouTube channel called Chuchu TV has changed the story of Jack and Jill, to give a “moral” to the story. I’m sure as a teenager you had changed the lyrics of Jack and Jill as well :P).

And in this situation you want more control over which version is played. For most songs I prefer the Little Baby Bum version, while for some others I prefer the Nursery Rhymes 123 version, but there is no “rule”. And this makes it complicate to order songs via Alexa.

More importantly, if you are a music publisher, the usage of Alexa to play on Spotify means that you might be willing to give Spotify greater incentives so that your version of a song comes up on top when a user searches for it.

And when you factor in advertising and concepts such as “paid search” into the picture, the fact that the voice assistants dictate your choices makes the situation very complicated indeed.

I wonder if there’s a good solution to this problem.

Programming Languages

I take this opportunity to apologise for my prior belief that all that matters is thinking algorithmically, and language in which the ideas are expressed doesn’t matter.

About a decade ago, I used to make fun of information technology company that hired developers based on the language they coded in. My contention was that writing code is a skill that you either have or you don’t, and what a potential employer needs to look for is the ability to think algorithmically, and then render ideas in code. 

While I’ve never worked as a software engineer I find myself writing more and more code over the years as a part of doing data analysis. The primary tool I use is R, where coding doesn’t really feel like coding, since it is a rather high level language. However, I’m occasionally asked to show code in Python, since some clients are more proficient in that, and the one thing that has done is to teach me the value of domain knowledge of a programming language. 

I take this opportunity to apologise for my prior belief that all that matters is thinking algorithmically, and language in which the ideas are expressed doesn’t matter. 

This is because the language you usually program in subtly nudges you towards thinking in a particular way. Having mostly used R over the last decade, I think in terms of tables and data frames, and after having learnt tidyverse earlier this year, my way of thinking algorithmically has become in a weird way “object oriented” (no, this has nothing to do with classes). I take an “object” (a data frame) and then manipulate it in various ways, changing it, summarising stuff, calculating things on the fly and aggregating, until the point where the result comes out in an elegant manner. 

And while Pandas allows chaining (in fact, it is from Pandas that I suspect the tidyverse guys got the idea for the “%>%” chaining operator), it is by no means as complete in its treatment of chaining as R, and that that makes things tricky. 

Moreover, being proficient in R makes you think in terms of vectorised operations, and when you see that python doesn’t necessarily offer that, and and operations that were once simple in R are now rather complicated in Python, using list comprehension and what not. 

Putting it another way, thinking algorithmically in the framework offered by one programming language makes it rather stressful to express these thoughts in another language where the way of algorithmic thinking is rather different. 

For example, I’ve never got the point of the index in pandas dataframes, and I only find myself “resetting” it constantly so that my way of addressing isn’t mangled. Compared to the intuitive syntax in R, which is first and foremost a data analysis tool, and where the data frame is “native”, the programming language approach of python with its locs and ilocs is again irritating. 

I can go on… 

And I’m guessing this feeling is mutual – someone used to doing things the python way would find R’s syntax and way of doing things rather irritating. R’s machine learning toolkit for example is nowhere as easy as scikit learn is in python (this doesn’t affect me since I seldom need to use machine learning. For example, I use regression less than 5% of the time in my work). 

The next time I see a job opening for a “java developer” I will not laugh like I used to ten years ago. I know that this posting is looking for a developer who can not only think algorithmically, but also algorithmically in the way that is most convenient to express in Java. And unlearning one way of algorithmic thinking and learning another isn’t particularly easy. 

Information Technology and Large Cities

In my book Between the buyer and the seller, officially released exactly a year ago, I have a chapter on cities. In that I explain why industry clusters form, and certain cities or regions become hubs for certain types of industries.

In that, I spoke about the software industry in California’s Silicon Valley, and in Bangalore. I also mentioned how the Industrial Revolution wasn’t evenly distributed around England, and how it was clustered around textile hubs such as Birmingham and Manchester. I also used that chapter to talk about the problem with government-mandated special economic zones (this podcast with Amit Varma can help you understand the last point).

Back when Silicon Valley was still silicon valley (basically a semiconductor and hardware hub), it wasn’t as concentrated a hub as it is today. It was still fairly common for semiconductor companies to base themselves away from the valley. With the “new silicon valley” and the tech startup scene, though, there is no escaping the valley. It is almost an unwritten rule in US Tech startup circles that if you want to be successful with a tech startup, you better be in the valley.

And this is for good reason, as I explain in the book – Silicon Valley is where the ecosystem for successfully running a tech startup already exists, including access to skilled employees, subcontractors and investors, not to speak of a captive market. This, however, has meant that Silicon Valley is now overcrowded in many respects, with rents being sky high (reflected in high salaries), freeways jammed and other infrastructure under stress.

In fact, it is not just the silicon valley that has got crushed under the weight of being a tech hub – other “secondary hubs” such as Seattle (which also have a few tech majors, and where startups put off by the cost of the valley set up) are seeing their quality of life go down. The traffic and infrastructure woes in Bangalore are also rather similar.

So why is it that information technology has led to hubs that are much larger than historical hubs (based on other industries)? The simple answer lies in investment, or the lack of it.

Setting up an information technology company is “cheap” in terms of the investment in capital expenditure. No land needs to be bought, no plants need to be constructed and no machinery needs to be bought. All one needs is an office space (for which rent is paid monthly), and a set of employees (who again get paid a monthly salary). Even IT infrastructure (such as computing power and storage and communication) can be leased, and paid for periodically.

This implies that there is nothing that stops a startup company from locating itself in one of the existing hubs. This way, the company can avail all the benefits of being in the hub (supplier and customer infrastructure, employee pool, quality of life for employees and investors) without a high upfront cost.

Contrast this to “hard” industries that require manufacturing, where the benefits of being located in hubs is similar but the costs are far higher. As a hub develops, land gets expensive, which puts off further investors from locating themselves in the hub. This puts a natural limit on the size of the hubs, and if you think about it, large cities from earlier era were all “multi-purpose cities”, serving as hubs for several unrelated industries.

With information technology, though, the only impediment to the growth of a hub is the decreasing quality of life, information regarding which gets transmitted in indirect means such as higher rentals and commute times, and poor health. This indirect transmission of costs to investors results in friction, which means information technology hubs will grow larger before they stop growing. And as they go through this process, the quality of life of the hub’s residents suffers!

Beer and diapers: Netflix edition

When we started using Netflix last May, we created three personas for the three of us in the family – “Karthik”, “Priyanka” and “Berry”. At that time we didn’t realise that there was already a pre-created “kids” (subsequently renamed “children” – don’t know why that happened) persona there.

So while Priyanka and I mostly use our respective personas to consume Netflix (our interests in terms of video content hardly intersect), Berry uses both her profile and the kids profile for her stuff (of course, she’s too young to put it on herself. We do it for her). So over the year, the “Berry” profile has been mostly used to play Peppa Pig, and the occasional wildlife documentary.

Which is why we were shocked the other day to find that “Real life wife swap” had been recommended on her account. Yes, you read that right. We muttered a word of abuse about Netflix’s machine learning algorithms and since then have only used the “kids” profile to play Berry’s stuff.

Since then I’ve been wondering what made Netflix recommend “real life wife swap” to Berry. Surely, it would have been clear to Netflix that while it wasn’t officially classified as one, the Berry persona was a kid’s account? And even if it didn’t, didn’t the fact that the account was used for watching kids’ stuff lead the collaborative filtering algorithms at Netflix to recommend more kids’ stuff? I’ve come up with various hypotheses.

Since I’m not Netflix, and I don’t have their data, I can’t test it, but my favourite hypothesis so far involves what is possibly the most commonly cited example in retail analytics – “beer and diapers“. In this most-likely-apocryphal story, a supermarket chain discovered that beer and diapers were highly likely to appear together in shopping baskets. Correlation led to causation and a hypothesis was made that this was the result of tired fathers buying beer on their diaper shopping trips.

So the Netflix version of beer-and-diapers, which is my hypothesis, goes like this. Harrowed parents are pestered by their kids to play Peppa Pig and other kiddie stuff. The parents are so stressed that they don’t switch to the kid’s persona, and instead play Peppa Pig or whatever from their own accounts. The kid is happy and soon goes to bed. And then the parent decides to unwind by watching some raunchy stuff like “real life wife swap”.

Repeat this story in enough families, and you have a strong enough pattern that accounts not explicitly classified as “kids/children” have strong activity of both kiddie stuff and adult content. And when you use an account not explicitly mentioned as “kids” to watch kiddie stuff, it gets matched to these accounts that have created the pattern – Netflix effectively assumes that watching kid stuff on an adult account indicates that the same account is used to watch adult content as well. And so serves it to Berry!

Machine learning algorithms basically work on identifying patterns in data, and then fitting these patterns on hitherto unseen data. Sometimes the patterns make sense – like Google Photos identifying you even in your kiddie pics. Other times, the patterns are offensive – like the time Google Photos classified a black woman as a “gorilla“.

Thus what is necessary is some level of human oversight, to make sure that the patterns the machine has identified makes some sort of sense (machine learning purists say this is against the spirit of machine learning, since one of the purposes of machine learning is to discover patterns not perceptible to humans).

That kind of oversight at Netflix would have suggested that you can’t tag a profile to a “kiddie content AND adult content” category if the profile has been used to watch ONLY kiddie content (or ONLY adult content). And that kind of oversight would have also led Netflix to investigate issues of users using “general” account for their kids, and coming up with an algorithm to classify such accounts as kids’ accounts, and serve only kids’ content there.

It seems, though, that algorithms run supreme at Netflix, and so my baby daughter gets served “real life wife swap”. Again, this is all a hypothesis (real life wife swap being recommended is a fact, of course)!

More on interactive graphics

So for a while now I’ve been building this cricket visualisation thingy. Basically it’s what I think is a pseudo-innovative way of describing a cricket match, by showing how the game ebbs and flows, and marking off the key events.

Here’s a sample, from the ongoing game between Chennai Super Kings and Kolkata Knight Riders.

As you might appreciate, this is a bit cluttered. One “brilliant” idea I had to declutter this was to create an interactive version, using Plotly and D3.js. It’s the same graphic, but instead of all those annotations appearing, they’ll appear when you hover on those boxes (the boxes are still there). Also, when you hover over the line you can see the score and what happened on that ball.

When I came up with this version two weeks back, I sent it to a few friends. Nobody responded. I checked back with them a few days later. Nobody had seen it. They’d all opened it on their mobile devices, and interactive graphics are ill-defined for mobile!

Because on mobile there’s no concept of “hover”. Even “click” is badly defined because fingers are much fatter than mouse pointers.

And nowadays everyone uses mobile – even in corporate settings. People who spend most time in meetings only have access to their phones while in there, and consume all their information through that.

Yet, you have visualisation “experts” who insist on the joys of tools such as Tableau, or other things that produce nice-looking interactive graphics. People go ga-ga over motion charts (they’re slightly better in that they can communicate more without input from the user).

In my opinion, the lack of use on mobile is the last nail in the coffin of interactive graphics. It is not like they didn’t have their problems already – the biggest problem for me is that it takes too much effort on the part of the user to understand the message that is being sent out. Interactive graphics are also harder to do well, since the users might use them in ways not intended – hovering and clicking on the “wrong” places, making it harder to communicate the message you want to communicate.

As a visualiser, one thing I’m particular about is being in control of the message. As a rule, a good visualisation contains one overarching message, and a good visualisation is one in which the user gets the message as soon as she sees the chart. And in an interactive chart which the user has to control, there is no way for the designer to control the message!

Hopefully this difficulty with seeing interactive charts on mobile will mean that my clients will start demanding them less (at least that’s the direction in which I’ve been educating them all along!). “Controlling the narrative” and “too much work for consumer” might seem like esoteric problems with something, but “can’t be consumed on mobile” is surely a winning argument!

 

 

FaceTime Baby

My nephew Samvit, born in 2011, doesn’t talk much on the phone. It’s possibly because he didn’t talk much on the phone as a baby, but I’ve never been able to have a decent phone conversation with him (we get along really well when we meet, though). He talks a couple of lines and hands over the phone to his mother and runs off. If it’s a video call, he appears, says hi and disappears.

Berry (born in 2016), on the other hand, seems to have in a way “leapfrogged” the phone. We moved to London when she was five and a half months old, and since then we’ve kept in touch with my in-laws and other relatives primarily through video chat (FaceTime etc.). And so Berry has gotten used to seeing all these people on video, and has become extremely comfortable with the medium.

For example, when we were returning from our last Bangalore trip in December, we were worried that Berry would miss her grandparents tremendously. As it turned out, we landed in London and video called my in-laws, and Berry was babbling away as if there was no change in scene!

Berry has gotten so used to video calling that she doesn’t seem to get the “normal” voice call. Sure enough, she loves picking up the phone and holding it against her ear and saying “hello” and making pretend conversations (apparently she learnt this at her day care). But give her a phone and ask her to talk, and she goes quiet unless there’s another person appearing on screen.

Like there’s this one aunt of mine who is so tech-phobic that she doesn’t use video calls. And every time I call her she wants to hear Berry speak, except that Berry won’t speak because there is nobody on the screen! I’m now trying to figure out how to get this aunt to get comfortable with video calling just so that Berry can talk to her!

 

In that sense, Berry is a “video call” native. And I wouldn’t be surprised if it turns out that she’ll find it really hard to get comfortable with audio calls later on in life.

I’ll turn into one uncle now and say “kids nowadays… “

More issues with Slack

A long time back I’d written about how Slack in some ways was like the old DBabble messaging and discussion group platform, except for one small difference – Slack didn’t have threaded conversations which meant that it was only possible to hold one thread of thought in a channel, significantly limiting discussion.

Since then, Slack has introduced threaded conversations, but done it in an atrocious manner. The same linear feed in each channel remains, but there’s now a way to reply to specific messages. However, even in this little implementation Slack has done worse than even WhatsApp – by default, unless you check one little checkbox, your reply will only be sent to the person who originally posted the message, and doesn’t really post the message on the group.

And if you click the checkbox, the message is displayed in the feed, but in a rather ungainly manner. And threads are only one level deep (this was one reason I used to prefer LiveJournal over blogspot back in the day – comments could be nested in the former, allowing for significantly superior discussions).

Anyway, the point of this post is not about threads. It’s about another bug/feature of Slack which makes it an extremely difficult tool to use, especially for people like me.

The problem is slack is that it nudges you towards sending shorter messages rather than longer messages. In fact, there’s no facility at all to send a long well-constructed argument unless you keep holding on to Shift+Enter everytime you need a new line. There is a “insert text snippet” feature, but that lacks richness of any kind – like bullet points, for example.

What this does is to force you to use Slack for quick messages only, or only share summaries. It’s possible that this is a design feature, intended to capture the lack of attention span of the “twitter generation”, but it makes it an incredibly hard platform to use to have real discussions.

And when Slack is the primary mode of communication in your company (some organisations have effectively done away with email for internal communications, preferring to put everything on Slack), there is no way at all to communicate nuance.

PS: It’s possible that the metric for someone at Slack is “number of messages sent”. And nudging users towards writing shorter messages can mean more messages are sent!

PS2: DBabble allowed for plenty of nuance, with plenty of space to write your messages and arguments.

 

Coin change problem with change – Dijkstra’s Algorithm

The coin change problem is a well studied problem in Computer Science, and is a popular example given for teaching students Dynamic Programming. The problem is simple – given an amount and a set of coins, what is the minimum number of coins that can be used to pay that amount?

So, for example, if we have coins for 1,2,5,10,20,50,100 (like we do now in India), the easiest way to pay Rs. 11 is by using two coins – 10 and 1. If you have to pay Rs. 16, you can break it up as 10+5+1 and pay it using three coins.

The problem with the traditional formulation of the coin change problem is that it doesn’t involve “change” – the payer is not allowed to take back coins from the payee. So, for example, if you’ve to pay Rs. 99, you need to use 6 coins (50+20+20+5+2+2). On the other hand, if change is allowed, Rs. 99 can be paid using just 2 coins – pay Rs. 100 and get back Re. 1.

So how do you determine the way to pay using fewest coins when change is allowed? In other words, what happens to the coin change problems when negative coins can be used? (Paying 100 and getting back 1 is the same as paying 100 and (-1) ) .

Unfortunately, dynamic programming doesn’t work in this case, since we cannot process in a linear order. For example, the optimal way to pay 9 rupees when negatives are allowed is to break it up as (+10,-1), and calculating from 0 onwards (as we do in the DP) is not efficient.

For this reason, I’ve used an implementation of Dijkstra’s algorithm to determine the minimum number of coins to be used to pay any amount when cash back is allowed. Each amount is a node in the graph, with an edge between two amounts if the difference in amounts can be paid using a single coin. So there is an edge between 1 and 11 because the difference (10) can be paid using a single coin. Since cash back is allowed, the graph need not be directed.

So all we need to do to determine the way to pay each amount most optimally is to run Dijkstra’s algorithm starting from 0. The breadth first search has complexity $latex O(M^2 n)$ where M is the maximum amount we want to pay, while n is the number of coins.

I’ve implemented this algorithm using R, and the code can be found here. I’ve also used the algorithm to compute the number of coins to be used to pay all numbers between 1 and 10000 under different scenarios, and the results of that can be found here.

You can feel free to use this algorithm or code or results in any of your work, but make sure you provide appropriate credit!

PS: I’ve used “coin” here in a generic sense, in that it can mean “note” as well.