The story dates back to 2007. Fully retrofitting, I was in what can be described as my first ever “data science job”. After having struggled for several months to string together a forecasting model in Java (the bugs kept multiplying and cascading), I’d given up and gone back to the familiarity of MS Excel and VBA (remember that this was just about a year after I’d finished my MBA).
My seat in the office was near a door that led to the balcony, where smokers would gather. People walking to the balcony, with some effort, could see my screen. No doubt most of them would’ve seen my spending 90% (or more) of my time on Google Talk (it’s ironical that I now largely use Google Chat for work). If someone came at an auspicious time, though, they would see me really working, which was using MS Excel.
I distinctly remember this one time this guy who shared my office cab walked up behind me. I had a full sheet of Excel data and was trying to make sense of it. He took one look at my screen and exclaimed, “oh, so many numbers! Must be very complicated!” (FWIW, he was a software engineer). I gave him a fairly dirty look, wondering what was complicated about a fairly simple dataset on Excel. He moved on, to the balcony. I moved on, with my analysis.
It is funny that, fifteen years down the line, I have built my career in data science. Yet, I just can’t make sense of large sets of numbers. If someone sends me a sheet full of numbers I can’t make out the head or tail of it. Maybe I’m a victim of my own obsessions, where I spend hours visualising data so I can make some sense of it – I just can’t understand matrices of numbers thrown together.
At the very least, I need the numbers formatted well (in an Excel context, using either the “,” or “%” formats), with all numbers in a single column right aligned and rounded off to the exact same number of decimal places (it annoys me that by default, Excel autocorrects “84.0” (for example) to “84” – that disturbs this formatting. Applying “,” fixes it, though). Sometimes I demand that conditional formatting be applied on the numbers, so I know which numbers stand out (again I have a strong preference for red-white-green (or green-white-red, depending upon whether the quantity is “good” or “bad”) formatting). I might even demand sparklines.
But send me a sheet full of numbers and without any of the above mentioned decorations, and I’m completely unable to make any sense or draw any insight out of it. I fully empathise now, with the guy who said “oh, so many numbers! must be very complicated!”
And I’m supposed to be a data scientist. In any case, I’d written a long time back about why data scientists ought to be good at Excel.