About a decade ago, I used to make fun of information technology company that hired developers based on the language they coded in. My contention was that writing code is a skill that you either have or you don’t, and what a potential employer needs to look for is the ability to think algorithmically, and then render ideas in code.
While I’ve never worked as a software engineer I find myself writing more and more code over the years as a part of doing data analysis. The primary tool I use is R, where coding doesn’t really feel like coding, since it is a rather high level language. However, I’m occasionally asked to show code in Python, since some clients are more proficient in that, and the one thing that has done is to teach me the value of domain knowledge of a programming language.
I take this opportunity to apologise for my prior belief that all that matters is thinking algorithmically, and language in which the ideas are expressed doesn’t matter.
This is because the language you usually program in subtly nudges you towards thinking in a particular way. Having mostly used R over the last decade, I think in terms of tables and data frames, and after having learnt tidyverse earlier this year, my way of thinking algorithmically has become in a weird way “object oriented” (no, this has nothing to do with classes). I take an “object” (a data frame) and then manipulate it in various ways, changing it, summarising stuff, calculating things on the fly and aggregating, until the point where the result comes out in an elegant manner.
And while Pandas allows chaining (in fact, it is from Pandas that I suspect the tidyverse guys got the idea for the “%>%” chaining operator), it is by no means as complete in its treatment of chaining as R, and that that makes things tricky.
Moreover, being proficient in R makes you think in terms of vectorised operations, and when you see that python doesn’t necessarily offer that, and and operations that were once simple in R are now rather complicated in Python, using list comprehension and what not.
Putting it another way, thinking algorithmically in the framework offered by one programming language makes it rather stressful to express these thoughts in another language where the way of algorithmic thinking is rather different.
For example, I’ve never got the point of the index in pandas dataframes, and I only find myself “resetting” it constantly so that my way of addressing isn’t mangled. Compared to the intuitive syntax in R, which is first and foremost a data analysis tool, and where the data frame is “native”, the programming language approach of python with its locs and ilocs is again irritating.
I can go on…
And I’m guessing this feeling is mutual – someone used to doing things the python way would find R’s syntax and way of doing things rather irritating. R’s machine learning toolkit for example is nowhere as easy as scikit learn is in python (this doesn’t affect me since I seldom need to use machine learning. For example, I use regression less than 5% of the time in my work).
The next time I see a job opening for a “java developer” I will not laugh like I used to ten years ago. I know that this posting is looking for a developer who can not only think algorithmically, but also algorithmically in the way that is most convenient to express in Java. And unlearning one way of algorithmic thinking and learning another isn’t particularly easy.