clustering – Pertinent Observations

Dunzo and Urbanclap

I realise that Dunzo and Urbanclap (and many other apps) grew in a particular way. Initially they weren’t sure of the exact problem that they were solving, and instead focussed on a particular “problem class”.

And then over time, based on pattern recognition and segmentation/cluster analysis of the kind of problems that people were using these apps to solve, they started providing more targeted solutions that made better business sense.

Dunzo started off as a “we’ll do anything for you” app. People making fun of the company would talk about a Dunzo executive who would come home, collect your bean bag, get the beans refilled and bring it back to you, and only charge for the beans.

I’m pretty sure that there were many other such weird use cases in which people sort of abused Dunzo in its early days. However, most of the users of the app, I’m guessing, used it for sending packages across town, and to fetch stuff for them from shops and restaurants. And now, four years down the line, Dunzo highlights these specific streamlined use cases in the app, and has figured out a good way of charging for each of them.

It’s similar with Urbanclap. While I didn’t use them in the early days, I used their competitor HouseJoy. I used the app to request for “a plumber”. A plumber duly arrived and did all sorts of odd jobs in our apartment building, some of which were dangerous. And then at the end we paid him in cash, and he told us that “if someone from the app calls, tell them you paid me only 200 rupees” (we had paid him 2000).

Soon, after being a marketplace for all sorts of odd jobs, Urbanclap and its ilk noticed patterns and started specific services. So last week we got someone from Urbanclap to “repair our water heater” (this had a fixed fee on the app). It is another set of such specific services that UrbanClap offers.

I may not have said much new in this post, but it’s basically a crystallisation of some of my thoughts of late – sometimes it’s okay to not have a particularly precise business plan as long as you know what problem class you’re tackling. If you manage to get funded and are willing to burn money, you can learn the best set of problems from the market (within your identified class).

It’s an expensive process for sure, since until you figure this out you’ll be spending a lot of time and money doing random shit, but if you and your investors are willing to bear this kind of expense, it might be worth it.

The worst thing that can happen to you, though, is that after you’ve burnt your company’s money in learning about the market’s precise problem statement, another well-capitalised firm moves faster than you to address this specific market. The question is how well you can put to use your learnings from the early period for later on.

Segmentation and machine learning

For best results, use machine learning to do customer segmentation, but then get humans with domain knowledge to validate the segments

There are two common ways in which people do customer segmentation. The “traditional” method is to manually define the axes through which the customers will get segmented, and then simply look through the data to find the characteristics and size of each segment.

Then there is the “data science” way of doing it, which is to ignore all intuition, and simply use some method such as K-means clustering and “do gymnastics” with the data and find the clusters.

A quantitative extreme of this method is to do gymnastics with your data, get segments out of it, and quantitatively “take action” on it without really bothering to figure out what each clusters represent. Loosely speaking, this is how a lot of recommendation systems nowadays work – some algorithm somewhere finds people similar to you based on your behaviour, and recommends to you what they liked.

I usually prefer a sort of middle ground. I like to let the algorithms (k-means easily being my favourite) to come up with the segments based on the data, and then have a bunch of humans look at the segments and make sense of it.

Basically whatever segments are thrown up by the algorithm need to be validated by human intuition. Getting counterintuitive clusters is also not a problem – on several occasions, people I’ve validated the clusters by (usually clients) have used the counterintuitive clusters to discover bugs, gaps in the data or patterns that they didn’t know of earlier.

Also, in terms of validation of clusters, it is always useful to get people with domain knowledge to validate the clusters. And this also means that whatever clusters you’ve generated you are able to represent them in a human-readable format. The best way of doing that is to use the cluster centres and then represent them somehow in a “physical” manner.

I started writing this post some three days ago and am only getting to finish it now. Unfortunately, in the meantime I’ve forgotten the exact motivation of why I started writing this. If i recall that, I’ll maybe do another post.

The missing middle in data science

Over a year back, when I had just moved to London and was job-hunting, I was getting frustrated by the fact that potential employers didn’t recognise my combination of skills of wrangling data and analysing businesses. A few saw me purely as a business guy, and most saw me purely as a data guy, trying to slot me into machine learning roles I was thoroughly unsuited for.

Around this time, I happened to mention to my wife about this lack of fit, and she had then remarked that the reason companies either want pure business people or pure data people is that you can’t scale a business with people with a unique combination of skills. “There are possibly very few people with your combination of skills”, she had said, and hence companies had gotten around the problem by getting some very good business people and some very good data people, and hope that they can add value together.

More recently, I was talking to her about some of the problems that she was dealing with at work, and recognised one of them as being similar to what I had solved for a client a few years ago. I quickly took her through the fundamentals of K-means clustering, and showed her how to implement it in R (and in the process, taught her the basics of R). As it had with my client many years ago, clustering did its magic, and the results were literally there to see, the business problem solved. My wife, however, was unimpressed. “This requires too much analytical work on my part”, she said, adding that “If I have to do with this level of analytical work, I won’t have enough time to execute my managerial duties”.

This made me think about the (yet unanswered) question of who should be solving this kind of a problem – to take a business problem, recognise it can be solved using data, figuring out the right technique to apply to it, and then communicating the results in a way that the business can easily understand. And this was a one-time problem, not something you would need to solve repeatedly, and so without the requirement to set up a pipeline and data engineering and IT infrastructure around it.

I admit this is just one data point (my wife), but based on observations from elsewhere, managers are usually loathe to get their hands dirty with data, beyond perhaps doing some basic MS Excel work. Data science specialists, on the other hand, will find it hard to quickly get intuition for a one-time problem, get data in a “dirty” manner, and then apply the right technique to solving it, and communicate the results in a business-friendly manner. Moreover, data scientists are highly likely to be involved in regular repeatable activities, making it an organisational nightmare to “lease” them for such one-time efforts.

This is what I call as the “missing middle problem” in data science. Problems whose solutions will without doubt add value to the business, but which most businesses are unable to address because of a lack of adequate skillset in solving the issue; and whose one-time nature makes it difficult for businesses to dedicate permanent resources to solve.

I guess so far this post has all the makings of a sales pitch, so let me turn it into one – this is precisely the kind of problem that my company Bespoke Data Insights is geared to solving. We specialise in solving problems that lie at the cusp of business and data. We provide end-to-end quantitative solutions for typically one-time business problems.

We come in, understand your business needs, and use a hypothesis-driven approach to model the problem in data terms. We select methods that in our opinion are best suited for the precise problem, not hesitating to build our own models if necessary (hence the Bespoke in the name). And finally, we synthesise the analysis in the form of recommendations that any business person can easily digest and action on.

So – if you’re facing a business problem where you think data might help, but don’t know how to proceed; or if you are curious about all this talk about AI and ML and data science and all that, and want to include it in your business; or you want your business managers to figure out how to use the data teams better, hire us.