Data Science and Software Engineering

I’m a data scientist. I’m good with numbers, and handling large and medium sized data sets (that doesn’t mean I’m bad at handling small data sets, of course). The work-related thing that gives me most kicks is to take a bunch of data and through a process of simple analysis, extract information out of it. To twist and turn the data, or to use management jargon “slice and dice”, and see things that aren’t visible to too many people. To formulate hypotheses, and use data to prove or disprove them. To represent data in simple but intuitive formats (i.e. graphs) so as to convey the information I want to convey.

I can count my last three jobs (including my current one) as being results of my quest to become better at data science and modeling. Unfortunately, none of these jobs have turned out particularly well (this includes my current one). The problem has been that in all these jobs, data science has been tightly coupled with software engineering, and I suck at software engineering.

Let me stop for a moment and tell you that I don’t mind programming. In fact, I love programming. I love writing code that makes my job easier, and automates things, and gives me data in formats that I desire. But I hate software engineering. Of writing code within a particular system, or framework. Or adhering to standards that someone else sets for “good code”. Of following processes and making my code usable by some dumbfuck somewhere else who wouldn’t get it if I wrote it the way I wanted. As I’d mentioned earlier, I like coding for myself. I don’t like coding for someone else. And so I suck at software engineering.

Now I wonder if it’s possible at all to decouple data science from software engineering. My instinct tells me that it should be possible. That I need not write production-level code in order to turn my data-based insights into commercially viable form. Unfortunately, in my search around the corporatosphere thus far, I haven’t been able to find something of the sort.

Which makes me wonder if I should create my own niche, rather than hoping for someone else to create it for me.

Coding

Back when I was in school (11th/12th) I think I was an awesome coder. I think I was especially good at what they called as “logic coding”, i.e. coming up with algos. I used to experiment quite a bit (as much was possible with TurboC) and had a lot of fun too. I remember doing graphics in TurboC, making a “pong” game, brick breaker, and a lot of other cool stuff. For our 12th standard project, Hareesh and I built this totally awesome cricket scoring program, which we unfortunately didn’t take forward (and went to college instead).

It was my love for coding that meant I fought with my parents (who wanted me to study Electrical) and decided to study Computer Science at IIT Madras. And then I lost it. Somewhere along the way. I didn’t enjoy coding any more. Soon, I began to hate coding. I would love coding when I would write the odd program in “pure” C, or when I would participate in contests such as BITWise. But I’d completely lost it.

So over the last six to seven years (after I graduated from IIT) there have been occasions when I have thought I’ve regained my coding mojo, only to lose it again very soon. I’m still very proud of that Excel+VBA model that I had written in the very first week of my third job. But a couple of months later, I was hating coding again. And so it was while debugging a complicated piece of code at work this morning that I realize why I have this love-hate relationship with coding.

It’s simple – basically I hate coding for others. I hate writing code that others will read or use. I don’t mind writing code that others would use as a black box, of course. But I think writing code that others will read or use puts too many constraints on the way you code. My instinct is always to stop doing something when I’m personally satisfied with it, and with code it seems like I’m satisfied sooner than others would be satisfied with my code.

At a fundamental level, I like coding and I think I’m pretty good at it, so it isn’t something I want to give up. But then the formal processes and endless testing involved with writing code for others really kills joy (as does GUI, and Java). Code saves a lot of time, and helps “studdize” what might be otherwise fighter work, so I like doing it.

In an ideal world, I would be writing code that I would alone be using, AND profiting from it (I never intend to sell code; I intend to sell the results of the said code, however; that would mean no one else would read/use my code per se, so I can write it the way I want). Hopefully I’ll get there, sometime.

Quiet Dissent

Sometimes when someone asks you to do something that you don’t want to do, your instinct would be to not do it. However, in certain situations it might unnecessarily offend the person who asked you to do it, and you may not really enjoy fighting with him/her. On the other hand, doing it would be like simply listening to what that person said, and effectively inviting him/her to run you over (figuratively) at the next opportunity, so you would want to avoid that.

In such situations, I follow this policy of quiet dissent. I do whatever has been asked of me, but make it a point to register my dissent. I ensure that the person who has asked me to do the job knows that I didn’t like doing it; and by doing the job, I also try to communicate that I don’t mean any disrespect to the person who asked me to do the job, but that I’m opposed to that particular idea (me doing that job) of his/hers.

You might find it strange that the usually firebrand me is espousing moderate ideas such as this one, but I think I’m just being pragmatic, and I’ve found this technique to be quite useful in dealing with people you don’t want to piss off – because it’s not profitable for you to piss them off. Of course there is the chance that that person may not understand the subtlety of the action, and might interpret your voice of dissent as disrespect to them. I think if anyone thinks like that, they deserve the disrespect.

You know, I have this condition

My memory cache (talking about my memory, not my computer’s or my laptop) seems to have suddenly diminished. My life seems to have become very Markovian. In fact, a few months back, I used to think that a Markovian existence is the best kind of existence, since in that kind of a situation, you respond to every situation on instinct, don’t make plans, are always on the lookout to optimize, etc. Now that I’ve actually reached close to that state, I don’t know if it’s desirable.

So basically my already weak short-term memory has become weaker. I’ve already talked about one paradox – I’ve traditionally had great long-term memory but awful short-term memory. I remember strange things, dates when those things happened, the colour of the shirt I was wearing when certain things happened, etc. And I typically can’t remember much of what someone told me recently, or what my mom asked me to buy at the market. The explanation I give myself for this is that I’m weak with details – and missing out on details is not as critical when you are talking about long-term stuff as it is in the short term.

Anyways what has been happening to me of late is that days seem very long. Towards the evening of most days, I really can’t remember what I did that morning. Ok, it’s not that bad – I can remember with some effort, but that effort is approximately equal to the effort required to remember what I did a year ago or some such. Once I get into doing a certain activity, I completely forget about everything I was doing prior to that particular activity – it all goes into memory, rather than staying in cache (like it used to earlier).

The most interesting (and scary) part of the deal is that my memory loss seems to be especially bad when it comes to numbers. This evening, I was out shopping for a computer table. I checked out stuff at some four shops, but as soon as I entered one shop, I completely forgot about the prices quoted in the previous shop. So I actually didn’t have a handle on comparative price. Tomorrow, I’ll mostly go buy that table which I liked best, and trust the shopkeeper to rememeber the price that I’ve bargained.

Considering that I’ve traditionally been a “numbers guy” and have a good eye for numbers, this is extremely scary. I just hope it’s some minor problem caused due to something like lack of sleep (i sleep only 8 hours a day) or hunger (i eat at least 6 times a day) and not something more serious. For example, I use a prepaid mobile phone. And each time after a call or a message I see the balance, I don’t know how much I’ve spent becasue I don’t know the previous balance. I remember that the last time I used a pre-paid phone (the same number; back wehn i was at IIMB) I would meticulously keep track of expenses.

While the condition lasts, I seem to be enjoying myself. Days seem so much longer, so I can relax so much more in the given time. I occasionally feel bored, but quickly find myself something to do, and get engrossed in it. I don’t get easily distracted like I used to. I don’t multitask (earlier I was a compulsive multitasker). I’m able to concentrate again, like I used to during the days of blindfold chess in the backbench. I don’t get worried. I don’t remember a thing from my previous jobs – though I’m sure I can pull it up from secondary memory if absolutely required.

But you know, I have this condition..

Process

A couple of days back, I was debugging some code. And yes, for those of you who didn’t know, coding is a part of my job. I used to have this theory that whatever job you take, there is some part of it that is going to be boring. Or to put it in the immortal words of a brilliant co-intern at JP Morgan “chootiya kaam”. And in my job, the chootiya part of the kaam is coding. That doesn’t mean that I’m not enjoying it, though. In fact, for the first time in nine years (note that this takes me to a time before I’d started my BTech in Computer Science) I’m enjoying coding.

Coming back, I was debugging my code yesterday. It was one of those impossible bugs. One of those cases where you had no clue why things were going wrong. So I started off by looking at the log files. All clean, and no bugs located. While I was going through them, I got this idea that the strategy sheet might offer some clue as to why things aren’t doing well. Half the problem got diagnosed when I was looking at the strategy sheet. Some problem with cash management.

And when I thought looking at the trades might help. The bug was presently discovered. And then it hit me – that I’d unwittingly followed what seems like a “process”. Everything that I did had been guided by insight and instinct. Yet, the steps that I followed – 1. look at the logs; 2. look at the strategy sheet ; 3. look at the trades – seemed so much a preset process. It seemed to be like one of those things that I’d just ended up reading in some manual and following.

I realize that most “standard processes” that are followed by  various people in various places are stuff that were initially not meant to be processes. They were just an imprint of somone’s train of insights. It was as if someone had a series of insights that led to a solution to whta might have been a difficult problem. And then, he realized that this kind of a process could be followed to deal with all such similar problems. And so he wrote down the process in a book and taught a set of people to implement them. The field thus got “fighterized“.

The argument I’m trying to make here is that a large number of so-called “standard processes” are just an imprint of someone’s insight. They just happened to get into place because the inventor noticed this pattern in a bunch of things that he was doing. They need not be the best way of doing what is supposed to be done. Maybe there isn’t even a single best way of doing it that might work every time.

People who are likely to have worked on processes later in their life cycle are likely to have been people who are process-oriented themselves, and given how these kind of people work, it would have been likely that they would have resisted changes that could make the process worse in the short term. They are more likely to have been incremental in their approach. With a succession of such people working on improving the process, the process of refining the process would’ve ended up taking a hill-climbing algorithm and is likely to have ended up in a local maximum.

Once again, the large changes to the process would’ve happened when someone who was willing to take a large step backward worked on them, and it is again likely that such a person would be driven more by insight rather than by process.

My hypothesis is that most processes are likely to have been largely defined by people who are themselves not very process-oriented, and who thus will expect a certain level of insight and discretion on the part of the person implementing the process. And one needs to keep this in mind while following processes. That it would be good if one were to take a critical view of every process being used, and not be afraid to take a backward step or two in process development in order to achieve large-scale improvements.

On Large and Small Books

During my last binge at Landmark, I saw a book which I thought I’d like. It was priced at some six hundred rupees – a full fifty percent premium over what I’m usually willing to pay for a book – and was quite thick. My first thought was “ok on a pages-per-rupee basis, this seems to be doing quite well so I should buy it”. Then I had  second thoughts.

The question is – should you look at the size of a book as an advantage or as a disadvantage? I think the normal viewpoint (as reflected by my instinct) treats pages as assets. There might be historical backing for this. When books were read for timepass, the amount of value (the time that could be passed) that could be gleaned from the book would be proportional to the number of pages in the book. If the language was difficult to read, then even better – for now it allows one to pass even more time reading the book.

However, when one comes to “funda  books”, this argument fails spectacularly. When you read funda books, you don’t read to pass time. You read books in order to get fundaes. And once this happens, volume becomes not a benefit but a cost. When you are reading a book for the fundaes, then you are effectively paying two costs – one is the rupee cost of the book and the other is the time COST. The time that you spend reading the book now becomes a cost. And when time is a cost, then more pages need not be a benefit.

Unfortunately, when you are at the bookstore trying to make a decision about whether to buy a book, there is no way you can figure out how much of fundaes the book is likely to offer. It would have helped if you have read some reviews, which will allow you to make an informed decision. If you haven’t, then hard luck. Now, if you have no clue about that book that you have in your hand, and you need to make a decision on whether to buy it, then I won’t blame you for making your decision based on the thickness.

The unfortunate consequence of this is useless padding up of books. Authors and publishers know that a large section of the readers are likely to judge books based on their size. And they make things voluminous. They take 40 pages to tell stories that could’ve been written in 4. They end up saying the same thing time and again, just to increase the number of pages. And overall, end up boring the reader and lowering the net value added by their book.

So you have ideas which could have been communicated in a few blog posts developing into a book – after all, no one wouuld be willing to pay the same amount of money for a 20 page book as they would for a 200 page book right? even if it were to offer comparable amount of fundaes?

I don’t really know if there is a simple solution to this problem. Solving this would involve effecting a major shift in consumer behaviour. It is unlikely that blogging and online publication would become profitable, else we might have expected the disruption to come from there. Still, you can never say. All we can do is to wait and hope. And read reviews before choosing books.

PS: online purchase of books (via Amazon, etc.) might help mitigate this problem a little since you don’t really feel a book when you decide to buy it, and you have reviews available instantly. Nevertheless, I’m sure most buyers would be subsconsciously using the “number of pages” field while making their purchase decision.

PS2: I should make my blog posts less verbose