Data Science is a Creative Profession

About a month or so back I had a long telephonic conversation with this guy who runs an offshored analytics/data science company in Bangalore. Like most other companies that are being built in the field of analytics, this follows the software services model – a large team in an offshored location, providing long-term standardised data science solutions to a client in a different “geography”.

As is usual with conversations like this one, we talked about our respective areas of work and kind of projects we take on, and soon we got to the usual bit in such conversations where we were trying to “find synergies”. Things were going swimmingly when this guy remarked that it was the first time he was coming across a freelancer in this profession. “I’ve heard of freelance designers and writers, but never freelance data scientists or analytics professionals”, he mentioned.

In a separate event I was talking to one old friend about another old friend who has set up a one-man company to do provide what is basically freelance consulting services. We reasoned that the reason this guy had set up a company rather than calling himself a freelancer given the reputation that “freelancers” (irrespective of the work they do) have – if you say you are a freelancer people think of someone smoking pot and working in a coffee shop on a Mac. If you say you are a partner or founder of a company, people imagine someone more corporate.

Now that the digression is out of the way let us get back to my conversation with the guy who runs the offshored shop. During the conversation I didn’t say much, just saying things like “what is wrong with being a freelancer in this profession”. But now that i think more about it, it is simply a function of the profession being a fundamentally creative profession.

For a large number of people, data science is simply about statistics, or “machine learning” or predictive modelling – it is about being given a problem expressed in statistical terms and finding the best possible model and model parameters for it. It is about being given a statistical problem and finding a statistical solution – I’m not saying, of course, that statistical modelling is not a creative profession – there is a fair bit of creativity involved in figuring out what kind of model to model, and picking the right model for the right data. But when you have a large team working on the problem, working effectively like an assembly line (with different people handling different parts of the solution), what you get is effectively an “assembly line solution”.

Coming back, let us look at this “a day in the life” post I wrote about a year back about a particular day in office for me. I’ve detailed in that the various kinds of problems I had to solve that day – hidden markov models and bayesian probability to writing code using dynamic programming and implementing the code in R, and then translating the solution back to the business context. Notice that when I started off working on the problem it was not known what domain the problem belonged in – it took some poking and prodding around in order to figure out the nature of the problem and the first step in solution.

And then on, it was one step leading to another, and there are two important facts to consider about each step – firstly, at each step, it wasn’t clear as to what the best class of technique was to get beyond the step – it was about exploration in order to figure out the best class of technique. Next, at no point in time was it known what the next step was going to be until the current step was solved. You can see that it is hard to do it in an assembly line fashion!

Now, you can talk about it being like a game of chess where you aren’t sure what the opponent will do, but then in chess the opponent is a rational human being, while here the “opponent” is basically the data and the patterns it shows, and there is no way to know until you try something as to how the data will react to that. So it is impossible to list out all steps beforehand and solve it – solution is an exploratory process.

And since solving a “data science problem” (as I define it, of course) is an exploratory, and thus creative, process, it is important to work in an atmosphere that fosters creativity and “thinking without thinking” (basically keep a problem in the back of your mind and then take your mind off it, and distract yourself to solve the problem). This is best done away from a traditional corporate environment – where you have to attend meetings and be liable to be disturbed by colleagues at all times, and this is why a freelance model is actually ideal! A small partnership also works – while you might find it hard to “assembly line” the problem, having someone to bounce thoughts and ideas with can have a positive impact to the creative process. Anything more like a corporate structure and you are removing the conditions necessary to foster creativity, and are in such situations more likely to come up with cookie-cutter solutions.

So unless your business model deals with doing repeatable and continuous analytical work for a client, you are better off organising yourselves in an environment that fosters creativity and not a traditional office kind of structure if you want to solve problems using data science. Then again, your mileage might vary!

Switching Off

Since last night I’ve been terribly sick. I slept fitfully, if at all, all of last night, and I’ve been totally out of action all day today. It’s nothing particularly serious – just a bad attack of the common cold, and I expect it to take its normal course. Yet, through the day, as I’ve struggled to think, I’ve realized how hard it’s become for me of late to switch off.

When I tell people that I freelance and lead a “portfolio life”, the first question I usually get asked is if  I can separate my work and non-work lives. This is especially important since my office is just a room inside my house. Usually i say that I do it pretty well. I have some strict rules, for example – I don’t work beyond 6:30 pm. I don’t work on weekends unless absolutely necessary (this includes Saturdays when my wife goes to work). In the last six months, I use my iPad for reading, so that I don’t use my work computer for non-work purposes – so of late I don’t even switch on my work computer on weekends and holidays.

Yet, I think I have difficulty switching off, especially on an unplanned basis. I took a vacation in December, and didn’t carry my work with me (for the first time since turning freelancer I even put an Out of Office AutoReply into my email). Yet, when I got back ten days later it seemed like I hadn’t taken a break from work, and could actually continue from where I had left off before I went (this is a good thing).

I have no difficulty taking my mind off work on most weekends, and on holidays. Yesterday, for example, was a general holiday in Bangalore (on account of Makara Sankranti). I had no problem switching off. Yet, despite being terribly sick and unable to work today, it has been really hard.

The downside of a “portfolio life” is that at any point in time  there is something pending. It is seldom that all your responsibilities close at the same time, and you can declare yourself to be “free” (which is why it is important to switch off in the evenings, on weekends, etc., and take the occasional vacation irrespective of whether the “work” is “finished”). So it is very rare that you get to your desk some day and realize there is “no work” – there may be no immediate deadlines, but there is always plenty to do.

In this context, today has been hard. I realize today that the common cold not only affects you physically but also mentally – it eats into your mindspace, and doesn’t allow you to think, which doesn’t allow you to work. And when you decide to declare a holiday for yourself and not work, things you do, such as the things you read, remind you of one aspect of work or the other – another downside of a portfolio life – too many non-work activities have a connection with work. And then you feel guilty about not working.

I think I need to figure out a policy of “casual leaves” for myself, where I tell myself that it is okay to not work on certain days, despite all that is there to be done. I’ve done it for myself for scheduled holidays – such as weekends or vacations. I need to convince myself to do this for the occasional unscheduled holiday, too – days like today.