How Python swallowed R

A week ago, I put a post on LinkedIn saying if someone else working in analytics / data science primarily uses R for their work, I would like to chat.

I got two responses, one of which was from a guy who strictly isn’t in analytics / data science, but needs to analyse large amounts of data for his work. I had a long chat with the other guy today.

Yesterday I put the same post on Twitter, and have got a few more responses from there. However, it is staggering. An overwhelming majority of data people who I know work in Python. One of the reasons I put these posts was to assure myself that I’m not alone in using R, though the response so far hasn’t given me too much of an assurance.

So why do most companies end up using Python for analytics, even when R is clearly better for things like data wrangling, reporting, visualisation, dashboarding, etc.? I have a few theories on this, and I think all of them come together to result in python having its “overwhelming marketshare” (at least among people I know).

Tech people clearly prefer python since it’s easier to integrate. So the tech leaders request the data science leaders to use Python, since it is much easier for the tech people. In a lot of organisations, data science reports into tech, so this request is honoured.

Even if it isn’t, if you recall, “data scientists” are generally tech facing rather than business facing. This means that the models they build need to be codified, and added to the company’s code base. This means necessarily working together with tech, and this means using a programming language that tech is comfortable with.

Then, this spills over. Usually, someone has the bright idea that the firm shouldn’t use two languages for what is essentially the same thing. And so the analytics people are also forced to use python for their analytics, even if it isn’t built for the purpose. And then it spreads.

Next is the “cool factor”. There is this impression that the more technical a solution is, the more superior it is, even if it has no direct business impact (an employer had once  told me, “I have raised money saying we are using machine learning. If our investors see the algorithms you’re proposing, they’ll want their money back”).

So a youngster getting into data wants to do “all the latest stuff”. This means machine learning. Deep learning. Reinforcement learning. And all that. There is an impression that this kind of work is “better work” compared to let’s say generating business insights using data. And in general, the packages for machine learning have traditionally been easier in Python than they are in R (though R is fast catching up, and in general python is far behind R when it comes to user friendliness).

Then, the growth in data and jobs associated with it such as machine learning or data engineering have meant that a lot of formerly tech people have got into data work. Python is fundamentally a programming language, with a package (pandas) added on to do data work. Techies find it far more intuitive than R, which is fundamentally a statistical software. On the other hand, people who are coming from a business / Excel background find it far more comfortable to use R. Python can be intimidating (I fall in this bucket).

So yeah – the tech integration, the number of tech people who are coming into data and the “cool factor” associated with the more techie stuff means that Python is gaining, at R’s expense (in my circle at least).

In any case I’m going to continue to use R. I’m at least 10X faster in R than I am in Python, and having used R for 12 years now, I’m too used to that way of working to change things up.

Python and Hindi

So I’ve recently discovered that using Python to analyse data is, to me, like talking in Hindi. Let me explain.

Back in 2008-9 I lived in Delhi, where the only language spoken was Hindi. Now, while I’ve learnt Hindi formally in school (I got 90 out of 100 in my 10th boards!), and watched plenty of Hindi movies, I’ve never been particularly fluent in the language.

The basic problem is that I don’t know the language well enough to think in it. So when I’m talking Hindi, I usually think in Kannada and then translate my thoughts. This means my speech is slow – even Atal Behari Vajpayee can speak Hindi faster than me.

More importantly, thinking in Kannada and translating means that I can get several idioms wrong (can’t think of particular examples now). And I end up using the language in ways that native speakers don’t (again can’t think of examples here).

I recently realised it’s the same with programming languages. For some 7 years now I’ve mostly used R for data analysis, and have grown super comfortable with it. However, at work nowadays I’m required to use Python for my analysis, to ensure consistency with the rest of the firm.

While I’ve grown reasonably comfortable with using Python over the last few months, I realise that I have the same Hindi problem. I simply can’t think in Python. Any analysis I need to do, I think about it in R terms, and then mentally translate the code before performing it in Python.

This results in several inefficiencies. Firstly, the two languages are constructed differently and optimised for different things. When I think in one language and mentally translate the code to the other, I’m exploiting the efficiencies of the thinking language rather than the efficiencies of the coding language.

Then, the translation process itself can be ugly. What might be one line of code in R can sometimes take 15 lines in Python (and vice versa). So I end up writing insanely verbose code that is hard to read.

Such code also looks ugly – a “native user” of the language finds it rather funnily written, and will find it hard to read.

A decade ago, after a year of struggling in Delhi, I packed my bags and moved back to Bangalore, where I could both think and speak in Kannada. Wonder what this implies in a programming context!

Days of the week in Bahasa

Most languages name  their days of the week after a single source, and this is usually consistent across languages. For example, the original Latin names for the days of the week came from “planets” – Sun, Moon, Mars, Mercury, Jupiter, Venus and Saturn respectively. And this got copied into various languages.

So the days of the week as we know in English are derived from the names of these planets or Gods representing them (Thor giving Thursday and so on). Indian names for the days of the week are direct translations of the Latin names. And some days have multiple names in Indian languages, all of which mean the same thing.

So you have Ravivara and Bhaanuvara and Adityavara, all of which refer to Sunday, and all of which precisely translate to “Sun day”. The more formal name for Thursday is “bRhaspativara” but more commonly referred to as “Guruvara”, with “Guru” being the more common name for bRhaspati. And so forth.

Based on this background, I found the names of the week in Bahasa Indonesia, which I observed from signboards (Bahasa uses Roman scripts, so one level of Rosetta stoning can happen from signboards), rather interesting.

The names are (starting with Sunday):
Minggu
Senin
Selasa
Rabu
Kamis
Jumat
Sabtu

Ok I got that from this link as I was writing, but what I got from signboards yesterday was the names of Friday, Saturday and Sunday (Jumat, Sabtu and Minggu respectively). And I found it fascinating since it seems like they come from multiple sources.

So Jumat, it appears, is the day of prayer, or Juma. Considering that Indonesia is a Muslim-majority country (it’s not funny how empty restaurants are during lunch nowadays, since it’s Ramzan), naming Friday as “the day of prayer”, using the Muslim word for prayer, is absolutely logical.

Sabtu for Saturday is obviously derived from “Sabbath” – another day of prayer but for a different religion (Judaism). It looks like it’s derived from European names for Saturday – Saturday in Spanish is Sabado, for instance. So actually, in this case we are seeing a wider adoption of naming the day of week after its religious significance than the associated planet.

And Minggu, it appears, is diminutive for Domingo, the Spanish and Portuguese word for Sunday (and perhaps there are similar names in other European languages). And it appears that “Domingo” has nothing to do with the Sun, but instead is derived from Latin for “God’s day” (since Sunday is the day of the Christian God, who famously took rest on that day).

So it’s interesting that Bahasa has names for three days of the week which are not based on the planets, but on different versions of “God’s day”, with multiple origins among them! Or rather, that Bahasa has three “God’s day”s, with each referring to a different god.

I’m reminded of this store that existed a long time back close to where I currently live. It was called “yellAdEvarakRpe stores” (store with the grace of all gods).

Metric

image

This picture was taken at a restaurant called metric, where we went for dinner tonight. It’s located on the diagonal, an arterial road in Barcelona.

So we were walking, trying to find a place to have dinner. Pinky had a few options in her head but wouldn’t tell me. We passed a number of restaurants, all of which looked decent but not particularly spectacular, and I would wonder if she would take me into one of those. She didn’t.

And then we passed in front of metric. Even before she had indicated that this was part of her shortlist, i was walking inside. I couldn’t do much more though, since I don’t speak the language here

Some restaurants beckon to you just by the way they look. This one was brightly lit, done up in quirky furniture (we sat at an ordinary table but there were others where you has swings instead of chairs!!), with a great looking bar and the place was full. I didn’t care what kind of food they served, all the Tyler Cowen-esque economic reasoning I’ve been invoking before every single meal on this trip went out of the window, and I just walked in.

When traveling abroad, especially when in a country where they don’t normally speak English, it really helps to have someone around who speaks the local language and who can help you get around. Most times when I’ve been out by myself, apart from the time when I’ve been around touristy areas , I’ve been rather lost. I have no clue of Spanish, except for the odd word, and I’ve struggled.

I once had to go to the post office and get my mobile sim registered ( someone told me that was the procedure). I get there, approach the counter gingerly and before I know the lady assumes I’m there to receive a package from lycamobile!! After a few more minutes of futile attempt at conversation I moved on, defeated.

Given how awful I am at getting languages – I’m usually not bad with words but can never get grammar (and even today get confused between Telugu and Tamil because I learnt to understand the two languages simultaneously) – it’s a marvel how Pinky has picked up enough Spanish to get around, and even get complimented (by the waitress at metric) as to how good her Spanish is. She negotiated with the waitress about the menu, got the drinks menu “orally delivered” and translated it to enable me to make my choice (the passion fruit mojito was wonderful, btw) and even carried out some gossip with the waitress, as I looked on clueless, wondering how one can even learn a new language (I haven’t learnt one fluently ever since I was three).

Coming back to the restaurant, there’s something about places that have a very limited menu. It is generally an indicator that there are a few things they are good at, and that they like to stick to their area of core competency rather than experimenting around. A limited menu also means easier inventory management and the restaurant is likely to have fresh ingredients. While a large menu night be useful in terms of offering variety it more often than not comes at the cost is quality and reliability.

What you see in the front of the picture above us my burger. That’s how it arrived, and delicious though it was, I had no clue as to how to eat it. The lack of a covering bun meant I couldn’t pick it up and bite it. The side of bread at the bottom meant I couldn’t cut it with my knife! After a few minutes of fumbling (which included dropping a part of the patty on my jeans), I gave up and just separated the patty from the bread, eating the former with knife and fork and latter with my hands! It’s anyway not like I’m the types who cares what people think about me!!

Though I can’t rule out a stray thought in Pinky’s head on how she’s getting herself an international MBA and learning Spanish and becoming pseud and I’m still the same guy living in Bangalore!!

Tail piece: these Europeans take the metric system when beyond where Indians use. Nutritional information on food packages is in kilo joules, for example!!

Romantic Comedies in Hollywood and Bollywood

Assumption: The median age for marriage in urban India is much lower than the median age of marriage in urban United States of America

Hence, romantic comedies in hollywood, usually end up having characters who are older than corresponding comedies made by Bollywood. Thus, Hollywood romantic comedies can be made to be more mature than corresponding Bollywood romantic comedies.

Data point: Serendipity was remade as “Milenge Milenge”. I was watching the latter movie a few days back (couldn’t sit through more than five minutes of it, as I kept comparing each scene to the corresponding scene in the original). In Serendipity the protagonists are around 35, and thus show a maturity that corresponds to that age. You can see that in the way they behave, go about things, etc. And here, in Milenge Milenge you have Shahid Kapur and Kareena Kapoor singing and prancing around like Jackasses. You can’t watch too much of that, can you?

Tailpiece: My all time favourite romantic comedy (across languages) remains Ganeshana Maduve, starring Anant Nag and Vinaya Prasad. I’ll talk about the virtues of the movie in another post but I can’t think of any other movie that even comes close to this one. Meanwhile, if you haven’t watched this movie, get hold of a subtitled copy of it and watch it. Now.

Bangalore Book Festival

So today I made my way to Gayatri Vihar in the Palace Grounds to visit the Bangalore Book Festival, on its last day. It was interesting, though a bit crowded (what would you expect on the last day of an exhibition? and that too, when it’s a Sunday?). I didn’t buy much (just picked up two books) given the massive unread pile that lies at home. However, there was much scope for pertinent observations. Like I always do when I have a large number of unrelated pertinent observations, I’ll write this in bullet point form.

  • There were some 200 stalls. Actually, there might have been more. I didn’t keep count, despite the stalls having been numbered. Yeah, you can say that I wasn’t very observant.
  • All the major bookshops in Bangalore barring the multicity ones had set up shop there. I don’t really know what they were doing there. Or were they just trying to capture the market that only buys in fairs? Or did they set up stall there just to advertise themselves?
  • It seems like a lot of shops were trying to use the fair to get rid of inventory they wanted to discard. All they had to do was to stack all of this on one table and put a common price tag (say Rs. 50) on every book in that collection, and it was enough to draw insane crowds
  • One interesting stall at the fair had been set up by pothi.com an online self-publishing company. I’ll probably check them out sometime next year when I might want to publish a blook. Seems like an interesting business model they’ve got. Print on demand!
  • I also met the flipkart.com guys at the fair. Once again, they were there for advertising themselves. Need to check them out sometime. Given the kind of books I buy, I think online is the best place to get long tail stuff.
  • There was an incredibly large number of islamic publishing houses at the fair! And have you guys seen the “want qur an? call 98xxxxxxxx for free copy” hoardings all over the city? Wonder why the Bajrang Dal doesn’t target those
  • There was large vernacular presence at the fair. I remember reading in the papers that there was a quota for Kannada publishers, but there was reasonable presence for other languages also, like Gult, Tam, Mellu, Hindi
  • A large number of stalls were ideology driven. Publishing houses attached to cults had set up stalls, probably to further the cause of their own cult. So there was an ISKCON stall, a Ramakrishna Mutt stall, a Ramana Maharshi stall, etc.
  • Attendance at most of these niche stalls was quite thin, as people mostly crowded the stalls being run by bookstores in order to hunt for bargains. Attendance was also mostly thin at publisher-run stalls, making me wonder why most of these people had bothered to come to the fair at all.
  • I saw one awesomely funny banner at the place. It was by “Dr Partha Bagchi, the world leader in stammering for last 20 years” or some such thing. Was too lazy to pull out my phone and click pic. But it was a masterpiece of a banner
  • Another interesting ideological publisher there was “Leftword books”. Their two sales reps were in kurtas and carrying jholas (ok I made the latter part up). And they were sellling all sorts of left-wing books. Wonder who funds them! And they were also selling posters of Che for 10 bucks each
  • I wonder what impact this fair will have on bookstores in Bangalore in the next few days. Or probably it was mostly the non-regular book buyers who did business at the fair and so the regulars will be back at their favourite shops tomorrow.

I bought two books. Vedam Jaishankar’s Casting A Spell: A history of Karnataka cricket (I got it at Rs. 200, as opposed to a list price of Rs 500) and Ravi Vasudevan’s “Making Meaning in Indian Cinema”.

The Perils of Notes Dictation

Thinking about my history lessons in schools, one picture comes to mind readily. A dark Mallu lady (she taught us history in the formative years between 6th and 8th) looking down at her set of voluminous notes and dictating. And all of us furiously writing so as to not miss a word of what she said. For forty minutes this exercise would continue, and then the bell would ring. Hands weary with all the writing, we would put our notebooks in our bags and look forward to a hopefully less strenuous next “perriod”.

The impact of this kind of “teaching” on schoolchildren’s attitude towards history, and their collective fflocking to science in 11th standard is obvious. There are so many things that are so obviously wrong with this mode of “teaching”. I suppose I’ll save that for else-where. Right now, I’m trying to talk about the perils of note-making in itself.

Before sixth standard and history, in almost all courses we would be dictated “questions and answers”. The questions that would appear in the exam would typically be a subset of these Q&A dictated in class. In fact, I remember that some of the more enthu teachers would write out the stuff on the board rather htan just dictating. I’m still amazed how I used to fairly consistently top the class in those days of “database query” exams.

I’m thinking about this from the point of view of impact on language. Most people who taught me English in that school had fairly good command over the language, and could be trusted to teach us good English. However, I’m not sure if I can say the same about the quality of language of other teachers. All of them were conversant in English, yes, and my schoool was fairly strict about being “English-medium”. However, the quality of English, especially in terms of grammar and pronunciation, of a fair number of teachers left a lot to be desired.

I can still remember the odd image of me thinking “this is obviously grammatically incorrect” and then proceeding to jot down what the teacher said “in my own words“. I’m sure there were other classmates who did the same. However, I’m also sure that a large number of people in the class just accepted what the teacher said to be right, in terms of language that is.

What this process of “dictation of notes” did was that teachers with horrible accents, grammar, pronunciation or all of the above passed on their bad language skills to the unsuspecting students. All the possible good work that English teachers had done was undone.There is a chance that this bad pronunciation, grammar, etc. would have been passed on even if the teachers didn’t give notes – for the students would just blindly imitate what the teachers would say. However, the amount by which they copied different teachers would not then be weighted by the amount of notes that each teacher dictated, and I think a case can be made that the quality of a teacher is inversely proportional to the amount of notes he/she dictates.

Teachers will not change because dictation is the way that they have been taught to “teach”. The onus needs to go to schools to make sure that the teachers don’t pass on their annoying language habits to the students. And a good place to start would be to stop them from dictating notes. And I still don’t understand the value of writing down notes that you don’t really bother to understand when you have a number of reasonably good text books and guide books available in the market. I agree that for earlier classes, some amount of note-making might be necessary (I think even that can be dispensed with), but in that case the school needs to be mroe careful regarding the language skills of people it recruits in order to dictate these notes.

Mantras: Songs Fooled By Randomness?

A couple of weeks back, I happened to read Frits Staal’s Discovering the Vedas. I was initially skeptical of the book since it has been blurbed by Romila Thapar, thinking it might be some commie propaganda, but those fears were laid to rest after I read Staal’s interpretation of the so-called “Aryan Invasion Theory” and found it quite logical. I enjoyed the first half of the book, and then lost him. I couldn’t understand anything at all in the second half of the book.

The precise moment where I lost interest in the book was when Staal gave his theory as to why mantras and rituals have no meaning. I found his reasoning of the same quite weak, and since he kept referring back to that later in the book, it became tough to follow. Staal states the following three reasons to claim that mantras precede language, and they are more like bird calls.

  • Mantras are language independent: Anything in language can be translated whereas mantras remain the same in all languages.
  • Mantras, even though they seem to be in a language like Sanskrit, are not used for their meaning.
  • Mantras follow patterns, like refrain, which is not seen in language.

While I find the hypothesis interesting, the proof that Staal gives is hopelessly inadequate. The Beatles might have translated their songs into German, but songs are normally not translated, right? You don’t translate songs, and sing  them into the same tune, unless you are doing some MTV Fully Faltoo or some such thing. On the other hand, what if the songs are in a language that is completely alien to you? There is no way you can translate them, but since you like them you sing them anyway. Without bothering to know their meaning. And songs can definitely have refrain, right? It clearly seems like Staal is trying to force-fit something here. Hopefully he is force-fitting this here so as to prove some other theory of his. But you can never say.

As I had expected, Staal’s theory has caught the attention of the right-wing blogosphere. JK at Varnam writes

This athirathram, which was extensively covered in Malayalam newspapers, was highly respectful and the words I heard were not “playful” or “pleasurable.” I can understand singing for pleasure, but am yet to meet a priest who said, “it’s a weekend and raining outside, let’s do a ganapati homam for pleasure.”

Sandeep at sandeepweb goes one step further, and says:

Even a Hindu not well-versed with the nuances of Mantra intuitively senses that something “divine” or “other-worldly” is associated with every Mantra. In a very crude sense, a Mantra is to some people, a cost-benefit equation: you chant the Gayatri Mantra for spiritual upliftment, the Maha Mrityunajaya to ward off the fear of death, the Surya Mantras for health, and so on. Why, you chant just the “primordial sound(sic),” “OM” to get yet another benefit. Whether these benefits really accrue or or not is not the point. What is immediately discernible is that every mantra is associated with some God or principle. In other words, it has a very specific meaning.

I think mantras are simply songs, in an ancient language, fooled by randomness. As I had explained before I quoted JK and Sandeep, going by Staal’s hypothesis, and the precise reasons that he gives, it is not inconceivable that mantras were composed as songs, in a language that hasn’t survived. In fact, Staal’s “proof” can better explain the song hypothesis rather than a no-language hypothesis. I don’t know why those songs were composed, and I definitely won’t rule out the possibility that they were meant to be devotional (after all, a large amount of later Indian music (including all of Carnatic music) is fundamentally devotional). Anyways the exact reasons for composition may not matter.

So what might have happened is this. I suppose chanting of mantras and conducting rituals was a fairly common event in the Vedic age. I believe that we started off with a much larger repository of mantras and rituals compared to what survive today. And the ones that survive are the ones that were lucky enough to have been associated with certain good events. A chieftan happened to do a certain ritual before going to battle, which he happened to win. And this ritual came to become the “pre-war” ritual. Of course it wouldn’t have been one single event that would have established this as “the” pre-war ritual, but after a couple of “successive trials”, this would have become the definitive pre-war ritual.

Once a particular ritual or mantra got associated with a particular event, then reinforcement bias kicked in. Since it was now “established”, any adverse results were seen as being “in spite of”. Suppose a king dutifully did the pre-war ritual before he got thrashed in battle, people would say “poor guy. in spite of religiously doing his rituals he has lost”. The establishment meant that no one would question the supposed effectiveness of the ritual. And so forth for other mantras and rituals.

To summarize, we started off with a significantly larger number of mantras than we have today. Association of certain mantras with certain “good events” meant two things. One, they got instantly associated with such good events, and two, they got preference in propagation – limited bandwidth of oral tradition meant only a certain number could be passed on sustainably, and these “lucky mantras” (notice the pun – they brought luck, and they survived) became the “chosen ones”.

The sad part in the whole deal is that mantras were taught without explaining the meaning (similarly wiht rituals). Maybe the oral tradition didn’t permit too much bandwidth, and in their quest to learn the maximum number of mantras possible, people gave short shrift to the meanings. And by the time writing was established, the language had changed and the meaning of the mantras lost forever. In fact, this practice of mugging up mantras also gets reflected in the way education happens in India today, with an emphasis on knowledge rather than understanding. I suppose I’ll cover that in a separate blog post.