Most people who call themselves “data scientists” aren’t usually fond of MS Excel. It is slow and clunky, can only handle a million rows of data (and nearly crash your computer if you go anywhere close to that), and despite the best efforts of Visual Basic, is not very easy to program for doing repeatable tasks.
In fact, some data scientists may consider Excel to be “too downmarket” for them to use. At one firm I worked for, I had heard a rumour that using Excel for modelling was a fire-able offence, though I’m glad to report that I flouted this rule without much adverse effect. Yet, in my years as a “data science” and analytics consultant, and having done several modelling jobs before, I think Excel is an extremely necessary tool in a data scientist’s arsenal. There are several reasons for this.
The main one is communication. “Business types” love Excel – they use it for pretty much every official activity (I know of people who write documents in Excel). If you ask for a set of numbers, you are most likely to find it in an Excel sheet. I know of fairly large organisations which use Excel to store and transmit data (admittedly poor usage). And even non-quantitaive business types understand some of the basic quantitative functions thanks to Excel, such as joining (VLookup), pivoting, basic data cleaning (TRIM, VALUE, etc.), averaging, visualisation and sometimes even basic statistics such as correlation and regression.
One of the main problems that organisations face is lack of communication between data scientists and the business side (I mentioned this in a talk I gave last month: video here and slides here). Excel is an excellent middle ground, since it is reasonably quantitative and business people know how to use it.
In fact, in my consulting experience I’ve found that when working with clients, using Excel can make your client (usually a business person) feel more comfortable and involved in the analysis, speeding up the process and significantly improving collaboration. They’ll feel more empowered to intervene, which means they can add value, and they can feel especially happy if you occasionally let them enter some simple quantitative formulae.
The next advantage of Excel is that it puts the numbers out there. A long time back, when I was still doing full time jobs, I was asked to build a forecasting model (using a programming language) and couldn’t get it right for several months. And then on a whim I decided to use Excel, and when I saw the data in front of me, it was clear why the forecasts were so useless – because the data was so random.
Excel also allows you to quickly try things and iterate, again by putting the data and the analysis in front of you. Admittedly, the toolkit available is limited compared to what programming languages or statistical softwares can offer, but through clever usage (especially with Visual Basic), there is a lot you can achieve.
Then, Excel sometimes nudges you towards finding simple solutions. It is possible when you’re using a programming language to veer towards overly complicated solutions, and possibly use the proverbial nuclear weapon against the sparrow.
When I was working on the forecasting work a decade ago, I found that the forecasts would feed into a fairly complicated-looking model that had been developed over several years by several developers. On a whim, I decided to “do more” in Excel and managed to replicate the entire model in Excel (using VB and Solver). The people leading the product weren’t particularly happy, but using Excel was critical in ultimately moving to a simpler solution.
A similar thing occurred recently as well. I had been building a fairly complex optimisation model, which I tried replicating in Excel for communication purposes (so I could work on it together with the client). And it turned out there was a far simpler solution that I had missed all this time, and the simpler solution became apparent only because I used Excel.
I’m sure this is not an exhaustive list. So, if you’re a data scientist, you will do well to be at least conversant with Excel. I know it may only serve limited needs in terms of analysis, but the effort in learning will get more than compensated for in the communication and collaboration and simplicity.
A long time ago, a co-worker passed by my desk and saw me work on Excel. He saw my spreadsheet and remarked, “oh, so many numbers! it must be very complicated” and went on his way. I don’t know if he is a data scientist now.