Jordan “visa interview on arrival”

The peak-end hypothesis means that we’ve come back from our trip to Jordan really happy. It was a brilliant and diverse experience, involving Roman History (Jerash, Amman Citadel), Christian Theology (Mount Nebo, Madaba), hill climbing (at Petra – more on that later), wilderness (Wadi Rum) and a resort and floating on water (Dead Sea).

However, preceding all this was an absolutely atrocious “process” that we had to go through at the Amman airport. I waited to return to India to write this.

Nominally Jordan has “visa on arrival” for Indians. This means you don’t need to get a visa before you travel. However, what they don’t really tell you is that it doesn’t work the same way as visas on arrival in other countries – such as Hong Kong or Thailand or Maldives (based on my limited experience), where you enter the passport control, get your passport stamped, maybe pay a fee and move on.

In Jordan that’s not the way it works. We had pre-bought a “Jordan Pass” that includes fees for the visa and to some of the historic attractions in the country. Upon landing at Amman Airport, we encountered a line saying “for jordan pass / visa on arrival”. And that’s where the arbitrariness started.

Firstly, it is the “border police” who man this, unlike India where it’s bureaucrats from the external affairs ministry. More importantly, there is no “process”. You go to the window where the person there leafs through the passport looking for active visas – if you have a valid US or UK or Schengen or even Saudi visa, your visa  gets printed on a paper and you get waved on to passport control. In the absence of all this, you are asked to “wait there”, without any further direction.

Then we were asked to go to “police in room 1”, which was some 200m away. This is where we had our first cultural shock of the trip – there was a heavy smell of cigarettes there, and we entered to see cops smoking there as they were talking to us.

The same process repeated – the cops leafing through the passport to see if there are any other valid visas, and then when not finding anything, asking us to “wait”. Again there was no definite timeline or process. We waited for a bit (during which the cops did namaz, and presumably stopped smoking while doing so), and then went in again and asked. Again we were asked to “wait”.

The cops all had identical uniforms so it was impossible for us to know who was “superior” or to escalate. After a few rounds of such waiting, my wife finally put senti saying we have a small child who is hungry (thankfully our daughter managed to produce a reasonably sad face at that time, though she was unable to cry), and finally they started considering our application.

We had printed out all our hotel reservations (I’d read on some forum that it might be required at “immigration” – though those fora didn’t mention how arbitrary the process is) and handed them over to the police, who went through them. One cop got convinced (I don’t know if it helped that we had booked in a few expensive hotels; he even asked us for our salaries and what work we do, etc.) and we got sent to another one. Yet again, and this was not the first time we were encountering him, he started the process all from the beginning, looking for valid visa stamps in our passports!

And then he started filling out some application. It was the first time I had seen someone actually write right to left, so it was mildly amusing (and it’s interesting that finally he stapled all our documents at the top RIGHT corner). He asked for our return tickets, which we hadn’t printed out, so I showed him on the phone. He took the phone and put it on the xerox machine and took a “copy” of the tickets! And then he stapled everything together and asked us to “wait”. Apparently his “boss” was supposed to call him (this guy took a picture of the application he had written and sent it to someone).

Then five minutes later, he gave us a small chit of paper and asked us to go back to the Visa On Arrival counter. I assumed we were almost through and messaged our driver that “we should be out soon”.

I don’t know if the guy at the visa on arrival counter was incompetent, but it’s not funny how many times he entered details of the same passports. In the middle of this, one lady walked near his counter, and he got busy talking to her while “processing” our stuff. And entered details many more times.

He got thoroughly confused because we had two Jordan Passes, and had to pay for our daughter’s visa (since she didn’t need a ticket to see the monuments this made more sense). In the middle he suddenly picked up all our passports and walked over to the police room. By now I was thoroughly psyched and had already swallowed my panic attack pill.

After yet another inordinate delay, he printed out our visas and sent us to passport control (a few metres away). Again we thought we were done, only to be told he had printed out my visa wrong (remember I said he entered details multiple times). Since the distance there was short, the passport control officer called the visa on arrival guy over and he took my passport YET AGAIN, and started entering details on his computer.

Another ten minutes later, he brought over my passport and visa to the passport control, where my passport was duly stamped and we were sent on our way.

Our bag was there in one corner, and we picked it up and walked out, feeling glad that we had booked a driver for the length of the trip who would be available for any further interfacing with Jordanian cops.

Overall, the whole process was rather bizarre. I’ve waited hours in line at Heathrow to be let in. I’ve visited the US, again waiting for a long time at JFK and even being pulled over for a customs check. None of that was even remotely comparable to our experience at Queen Alia International Airport last Tuesday.

If Jordan wants to outsource its visa process to more developed countries, that is fine, but they need to make it explicit. Turkey, for example, offers visa on arrival to Indians with a valid US or Schengen visa, but everyone else is expected to apply for a visa before travel.

Jordan says no such thing, and instead subjects people to arbitrary waits without any process in a smoky police station in the airport. Which is really really bizarre.

 

Round Tables

One of the “features” of being in a job is that you get invited to conferences and “industry events”. I’ve written extensively about one of them in the past – the primary purpose of these events is for people to be able to sell their companies’ products, their services and even themselves (job-hunting) to other attendees.

Now, everyone knows that this is the purpose of these events, but it is one of those things that is hard to admit. “I’m going to this hotel to get pitched to by 20 vendors” is not usually a good enough reason to bunk work. So there is always a “front” – an agenda that makes it seemingly worthy for people to attend these events.

The most common one is to have talks. This can help attract people at two levels. There are some people who won’t attend talks unless they have also been asked to talk, and so they get invited to talk. And then there are others who are happy to just attend and try to get “gyaan”, and they get invited as the audience. The other side of the market soon appears, paying generous dollars to hold the event at a nice venue, and to be able to sell to all the speakers and the audience.

Similarly, you have panel discussions. Organisers in general think this is one level better than talks – instead of the audience being bored by ONE person for half an hour, they are bored by about 4-5 people (and one moderator) for an hour. Again there is the hierarchy here – some people won’t want to attend unless they have been put on the panel. And who gets to be on the panel is a function of how desperate one or more sponsors is to sell to the potential panelists.

The one thing most of these events get right is to have sufficient lunch and tea breaks for people to talk to each other. Then again, these are brilliant times for sponsors to be able to sell their wares to the attendees. And it has the positive externality that people can meet and “network” and talk among themselves – which is the best value you can get out of an event like this one.

However, there is one kind of event that I’ve attended a few times, but I can’t understand how they work. This is the “round table”. It is basically a closed room discussion with a large number of invited “panellists”, where everyone just talks past each other.

Now, at one level I understand this – this is a good way to get a large number of people to sell to without necessarily putting a hierarchy in terms of “speakers” / “panellists” and “audience”. The problem is that what they do with these people is beyond my imagination.

I’ve attended two of these events – one online and one offline. The format is the same. There is a moderator who goes around the table (not necessarily in any particular order), with one question to each participant (the better moderators would have prepared well for this). And then the participant gives a long-winded answer to that question, and the answer is not necessarily addressed at any of the other participants.

The average length of each answer and the number of participants means that each participant gets to speak exactly once. And then it is over.

The online version of this was the most underwhelming event I ever attended – I didn’t remember anything from what anyone spoke, and assumed that the feeling was mutual. I didn’t even bother checking out these people on LinkedIn after the event was over.

The offline version I attended was better in the way that at least we could get to talk to each other after the event. But the event itself was rather boring – I’m pretty sure I bored everyone with my monologue when it was my turn, and I don’t remember anything that anyone else said in this event. The funny thing was – the event wasn’t recorded, and there was hardly anyone from the organising team at the discussion. There existed just no point of all of us talking for so long. It was like people who organise Satyanarayana Poojes to get an excuse to have a party at home.

I’m wondering how this kind of event can be structured better. I fully appreciate the sponsors and their need to sell to the lot of us. And I fully appreciate that it gives  them more bang for the buck to have 20 people of roughly equal standing to sell to – with talks or panels, the “potential high value customers” can be fewer.

However – wouldn’t it be far more profitable to them to be able to spend more time actually talking to the lot of us and selling, rather than getting all of us to waste time talking nonsense to each other? Like – maybe just a party or a “lunch” would be better?

Then again – if you want people to travel inter-city to attend this, a party is not a good enough excuse for people to get their employers to sponsor their time and travel. And so something inane like the “round table” has to be invented.

PS: There is this school of thought that temperatures in offices and events are set at a level that is comfortable for men but not for women. After one recent conference I attended I have a theory on why this is the case. It is because of what is “acceptable formal wear” for men and women.

Western formal wear for men is mostly the suit, which means dressing up in lots of layers, and maybe even constraining your neck with a tie. And when you are wearing so many clothes, the environment better be cool else you’ll be sweating.

For women, however, formal wear need not be so constraining – it is perfectly acceptable to wear sleeveless tops, or dresses, for formal events. And the temperatures required to “air” the suit-wearers can be too cold for women.

At a recent conference I was wearing a thin cotton shirt and could thus empathise with the women.

 

Shrinking deadlines

I’m reminded of this old joke/riddle, which also happened to feature in Gowri Ganesha. “If a 1 metre long sari takes 1 hour to dry in the sun, how long will and 8 metre long sari take to dry?”.

The instinctive answer, of course, is 8 hours, while if you think about it (and assume that you have enough clothesline space to not need to fold), the correct answer is likely to be 1 hour.

Now this riddle is completely unconnected to do with the point of the post, except that both have to do with time.

And then one day you find, ten years have got behind you.
No one told you when to run. You missed the starting gun. 

Ok enough distractions. I’m now home, home again.

Modern workspaces are synonymous with tight deadlines. Even when you give a conservative estimate on how long something will take, you get asked to compress the timelines further. If you protest too much and say that there is a lot to be done, sometimes you might get asked to “put one more person on the job and get it done quickly”.

This might work for routine, or “fighter” jobs – for example, if your job is to enter and copy data for (let’s say) 1000 records, you can easily put another person on the job, and the entire job will be done in about half the time (allowing for a little time for the new person to learn the job and for coordination).

As the job gets more complex, the harder it gets. At one level, there is more time to be spent by the new person coming into the job. Then, as the job gets more complex, it gets harder to divide and conquer, or to “specialise”. This means there is lesser impact to the new person coming in.

And then when you get closer and closer to the stud end of the spectrum, the advantage of putting more people to get the work done faster get lesser and lesser. There comes a point when the extra person actively becomes a liability. Again – I’m reminded of my childhood when occasionally I would ask my mother if she needed help in cooking. “Yes, the best way for you to help is for you to stay out of the kitchen”, she would say.

And then when the job gets really creative, there is a further limit on compression – a lot of the work is done “offline”. I keep telling people about how I finally discovered the proof of Ramsey’s numbers (3,3) while playing table tennis in my hostel, or how I had solved a tough assignment problem while taking a friend’s new motorcycle for a ride.

When you want to solve problems “offline” (to let the insight come to you rather than going hunting for it – I had once written about this) – there is no way to shorten the process. You need to let the problem stew in your head, and hope that some time it will get solved.

There is nothing that can be done here. The more you hurry up, the less the chances you give yourself of solving the problem. Everything needs to take its natural course.

I got reminded of it when we missed a deadline last Friday, and I decided to not think about it through the weekend. And then, an hour before I got to work on Monday, an idea occurred in the shower which fixed the problem. Even if I’d stressed myself (and my team) out on Friday, or done somersaults, the problem would not have been solved.

As I’d said in 2004, quality takes time.

Pre-trained models

On Sunday evening, we were driving to a relative’s place in Mahalakshmi Layout when I almost missed a turn. And then I was about to miss another turn and my wife said “how bad are you with directions? You don’t even know where to turn!”.

“Well, this is your area”, I told her (she grew up in Rajajinagar). “I had very little clue of this part of town till I married you, so it’s no surprise I don’t know how to go to your cousin’s place”.

“But they moved into this house like six months ago, and every time we’ve gone there together. So if I know the route, why can’t you”, she retorted.

This gave me a trigger to go off on a rant on pre-trained models, and I’m going to inflict that on you now.

For a long time, I didn’t understand what the big deal was on pre-trained machine learning models. “If it’s trained on some other data, how will it even work with my data”, I wondered. And then recently I started using GPT4 and other similar large language models. And I started reading blogposts on how with very little finetuning these models can do “gymnastics”.

Having grown up in North Bangalore, my wife has a “pretrained model” of that part of town in her head. This means she has sufficient domain knowledge, even if she doesn’t have any specific knowledge. Now, with a small amount of new specific information (the way to her cousins’s new house, for example), it is easy for her to fit in the specific information to her generic knowledge and get a clear idea on how to get there.

(PS: I’m not at all suggesting that my wife’s intelligence is artificial here)

On the other hand, my domain knowledge of North Bangalore is rather weak, despite having lived there for two years. For the longest time, Mallewaram was a Chakravyuha – I would know how to go there, but not how to get back. Given this lack of domain knowledge, the little information on the way to my wife’s cousin’s new house is not sufficient for me to find my way there.

It is similar with machines. LLMs and other pre-trained models have sufficient “generic domain knowledge” in lots of things, thanks to the large amounts of data they’ve been trained on. As a consequence, if you can train them on fairly small samples of specific data, they are able to generalise around this specific data and learn around them.

More pertinently, in real life, depending upon our “generic domain knowledge” of different domains, the amount of information that you and I will need to learn a certain amount about a certain domain can be very very different.

Everything is context-sensitive!

Channelling

I’m writing this five minutes after making my wife’s “coffee decoction” using the Bialetti Moka pot. I don’t like chicory coffee early in the morning, and I’m trying to not have coffee soon after I wake up, so I haven’t made mine yet.

While I was filling the coffee into the Moka Pot, I was thinking of the concept of channelling. Basically, if you try to pack the moka pot too tight with coffee powder, then the steam (that goes through the beans, thus extracting the caffeine) takes the easy way out – it tries to create a coffee-less channel to pass through, rather than do the hard work of extracting coffee as it passes through the layer of coffee.

I’m talking about steam here – water vapour, to be precise. It is as lifeless as it could get. It is the gaseous form of a colourless odourless shapeless liquid. Yet, it shows the seeming “intelligence” of taking the easy way out. Fundamentally this is just physics.

This is not an isolated case. Last week, at work, I was wondering why some algorithm was returning a “negative cost” (I’m using local search for that, and after a few iterations, I found that the algorithm is rapidly taking the cost – which is supposed to be strictly positive – into deep negative territory). Upon careful investigation (thankfully it didn’t take too long), it transpired that there was a penalty cost which increased non-linearly with some parameter. And the algo had “figured” that if this parameter went really high, the penalty cost would go negative (basically I hadn’t done a good job of defining the penalty well). And so would take this channel.

Again, this algorithm has none of the supposedly scary “AI” or “ML” in it. It is a good old rule-based system, where I’ve defined all the parameters and only the hard work of finding the optimal solution is left to the algo. And yet, it “channelled”.

Basically, you don’t need to have got a good reason for taking the easy way out now. It is not even human, or “animal” to do that – it is simply a physical fact. When there exists an easier path, you simply take that – whether you are an “AI” or an algorithm or just steam!

I’ll leave you with this algo that decided to recognise sheep by looking for meadows (this is rather old stuff).

Order of guests’ arrival

When I’m visiting someone’s house and they have an accessible bookshelf, one of the things I do is to go check out the books they have. There is no particular motivation, but it’s just become a habit. Sometimes it serves as conversation starters (or digressors). Sometimes it helps me understand them better. Most of the time it’s just entertaining.

So at a friend’s party last night, I found this book on Graph Theory. I just asked my hosts whose book it was, got the answer and put it back.

As many of you know, whenever we host a party, we use graph theory to prepare the guest list. My learning from last night’s party, though, is that you should not only use graph theory to decide WHO to invite, but also to adjust the times you tell people so that the party has the best outcome possible for most people.

With the full benefit of hindsight, the social network at last night’s party looked approximately like this. Rather, this is my interpretation of the social network based on my knowledge of people’s affiliation networks.

This is approximate, and I’ve collapsed each family to one dot. Basically it was one very large clique, and two or three other families (I told you this was approximate) who were largely only known to the hosts. We were one of the families that were not part of the large clique.

This was not the first such party I was attending, btw. I remember this other party from 2018 or so which was almost identical in terms of the social network – one very large clique, and then a handful of families only known to the hosts. In fact, as it happens, the large clique from the 2018 party and from yesterday’s party were from the same affiliation network, but that is only a coincidence.

Thinking about it, we ended up rather enjoying ourselves at last night’s party. I remember getting comfortable fairly quickly, and that mood carrying on through the evening. Conversations were mostly fun, and I found myself connecting adequately with most other guests. There was no need to get drunk. As we drove back peacefully in the night, my wife and daughter echoed my sentiments about the party – they had enjoyed themselves as well.

This was in marked contrast with the 2018 party with the largely similar social network structure (and dominant affiliation network). There we had found ourselves rather disconnected, unable to make conversation with anyone. Again, all three of us had felt similarly. So what was different yesterday compared to the 2018 party?

I think it had to do with the order of arrival. Yesterday, we were the second family to arrive at the party, and from a strict affiliation group perspective, the family that had preceded us at the party wasn’t part of the large clique affiliation network (though they knew most of the clique from beforehand). In that sense, we started the party on an equal footing – us, the hosts and this other family, with no subgroup dominating.

The conversation had already started flowing among the adults (the kids were in a separate room) when the next set of guests (some of them from the large clique arrived), and the assimilation was seamless. Soon everyone else arrived as well.

The point I’m trying to make here is that because the non-large-clique guests had arrived first, they had had a chance to settle into the party before the clique came in. This meant that they (non-clique) had had a chance to settle down without letting the party get too cliquey. That worked out brilliantly.

In contrast, in the 2018 party, we had ended up going rather late which meant that the clique was already in action, and a lot of the conversation had been clique-specific. This meant that we had struggled to fit in and never really settled, and just went through the motions and returned.

I’m reminded of another party WE had hosted back in 2012, where there was a large clique and a small clique. The small clique had arrived first, and by the theory in this post, should have assimilated well into the party. However, as the large clique came in, the small clique had sort of ended up withdrawing into itself, and I remember having had to make an effort to balance the conversation between all guests, and it not being particularly stress-free for me.

The difference there was that there were TWO cliques with me as cut-vertex.  Yesterday, if you took out the hosts (cut-vertex), you would largely have one large clique and a few isolated nodes. And the isolated nodes coming in first meant they assimilated both with one another and with the party overall, and the party went well!

And now that I’ve figured out this principle, I might break my head further at the next party I host – in terms of what time I tell to different guests!

Sierpinski Triangles

On Saturday morning, my daughter had made some nice art with sketch pen on an A4 paper. It was rather “geometric” consisting of repeating patterns across the page. My wife took one look at it and said, “do you know that you can make such art with computers also? Your father has made some”.

Some drawings I had made using code, back in 2016

“Reallly?”, piped the daughter. I had been intending for a while to start teaching her to code (she is six), and figured this was the perfect trigger, and said I will teach her.

A quick search revealed that there is an “ACS Logo” for Mac (Logo was the first “programming language” I had learnt, when I was nine). I quickly downloaded it on her computer (my wife’s old Macbook Air) and figured I remembered most of the commands.

And then I started typing, and showed her what they had showed me back in  a “computer class” behind my house in 1992 – FD for “forward”. RT for right turn. HT for hide turtle. Etc. Etc.

Soon she was engrossed in it. Thankfully she has learnt angles in her school, though it took her some trial and error to figure out how much to turn by for different shapes (later I was thinking this can also serve as a good “angles revision” for her during her ongoing summer holidays).

With my wife having reminded me that I could produce images through code, I realised that as my daughter was engrossed in her “coding”, I should do some “coding art” on my own. All she needed was some occasional input, and for me to sit right next to her.

Last Monday I had got a bit of a scare – at work, I needed to generate randomly distributed points in a regular hexagon. A lookup online told me that I could just get a larger number of randomly distributed points in a bounding rectangle, and then only pick points within the hexagon. And then take a random sample of those.

This had meant that I needed to write equations for whether a point lay inside a hexagon. And I realised I’d forgotten ALL my coordinate geometry. It took me over half an hour to get the equation for the sides of the hexagon right – I’m clearly rusty.

And on Saturday, as I sat down to make some “computer art”, I decided I’ll make some fractals. Why don’t I make some Sierpinski Triangles, I thought. I started breaking down what code I needed to write.

First, given an equilateral triangle, I had to return three similar equilateral triangles, each of half the side length of the original triangles.

Then, given the centroid of an equilateral triangle and the length of each side, I had to return the vertices.

Once these two functions had been written, I could just chain them (after running the first one recursively) and then had to just plot to get the Sierpinski triangle.

And then I had my second scare of the week – not only had I forgotten my coordinate geometry – I had forgotten my trigonometry as well. Again I messed up a few times, but the good thing about programming with a computer is that i could do trial and error. Soon I had it right, and started producing Sierpinski triangles.

Then, there was another problem – my code was really inefficient. If I went beyond depth 4 or 5, the figures would take inordinately long to render. Since I was coding in R, I set about vectorising all my code. In R you don’t write loops if you can help it – instead, you apply functions on entire vectors. This again took some time, and then I had the triangles ready. I proudly showed them off to my daughter.

“Appa, why is it that as you increase the number it becomes greyer”, she asked . I explained how with each step, you were taking away more of the filled areas from the triangles. Then I figured this wasn’t that good-looking – maybe I should colour it.

And so I wrote code to colour the triangles. Basically, I started recursively colouring them – the top third green, left third red and right third blue (starting with a red base). This is what I ended up producing:

And this is what my daughter produced at the same time, using Logo:

I forgot to “HT” before taking the screenshot. This is a “lollipop” 

Optimal quality of beer

Last evening I went for drinks with a few colleagues. We didn’t think or do much in terms of where to go – we just minimised transaction costs by going to the microbrewery on the top floor of our office building. This meant that after the session those of us who were able (and willing) to drive back could just go down to the basement and drive back. No “intermediate driving”.

Of course, if you want to drive back after you’ve gone for drinks, it means that you need to keep your alcohol consumption in check. And when you know you are going for a longish session, that is tricky. And that’s where the quality of beer maters.

In a place like Arbor, which makes absolutely excellent beer, “one beer” is a hard thing to pull off (though I exercised great willpower in doing just that the last time I’d gone for drinks with colleagues – back in feb). And after a few recent experiences, I’ve concluded that beer is the best “networking drink” – it offers the optimal amount of “alcohol per unit time” (wine and whisky I tend to consume well-at-a-faster-rate, and end up getting too drunk too quickly). So if you go to a place that serves bad beer, that isn’t great either.

This is where the quality of beer at a middling (for a Bangalore microbrewery) place like Bangalore Brewworks works perfectly – it’s decent enough that you are able to drink it (and not something that delivers more ethanol per unit time), but also not so good that you gulp it down (like I do with the Beach Shack at Arbor).

And this means that you can get through a large part of the session (where the counterparties down several drinks) on your one beer – you stay within reasonable alcohol limits and are not buzzed at all and easily able to drive. Then you down a few glasses of iced water and you’re good to go!

Then again, when I think about it, nowadays I go out for drinks so seldom that maybe this strategy is not so optimal at all – next time I might as well go to Arbor and take a taxi home.

The Law Of Comparative Advantage and Priorities

Over a decade ago I had written about two kinds of employees – those who offer “competitive advantage” and those who offer “comparative advantage”.

Quoting myself:

So in a “comparative advantage” job, you keep the job only because you make it easier for one or more colleagues to do more. You are clearly inferior to these colleagues in all the “components” of your job, but you don’t get fired only because you increase their productivity. You become the Friday to their Crusoe.

On the other hand, you can keep a job for “competitive advantage“. You are paid because there are one or more skills that the job demands in which you are better than your colleagues

Now, one issue with “comparative advantage” jobs is that sometimes it can lead to people being played out of position. And that can reduce the overall productivity of the team, especially when priorities change.

Let’s say you have 2 employees A and B, and 2 high-priority tasks X and Y. A dominates B – she is better and faster than B in both X and Y. In fact, B cannot do X at all, and is inferior to A when it comes to Y. Given these tasks and employees, the theory of comparative advantage says that A should do X and B should do Y. And that’s how you split it.

In this real world problem though, there can be a few issues – A might be better at X than B, but she just doesn’t want to do X. Secondly, by putting the slower B on Y, there is a floor on how soon Y can be delivered.

And if for some reason Y becomes high priority for the team, with the current work allocation there is no option than to just wait for B to finish Y, or get A to work on Y as well (thus leaving X in the lurch, and the otherwise good A unhappy). A sort of no win situation.

The whole team ends up depending on the otherwise weak B, a sort of version of this:

A corollary is that if you have been given what seems like a major responsibility it need not be because you are good at the task you’ve been given responsibility for. It could also be because you are “less worse” than your colleagues at this particular thing than you are at other things.

 

 

Code Density

As many of the regular readers of this blog know, I largely use R for most of my data work. There have been a few occasions when I’ve tried to use Python, but have found that I’m far less efficient in that than I am with R, and so abandoned it, despite the relative ease of putting things into production.

Now in my company, like in most companies, people use both Python and R (the team that reports to me largely uses R, everyone else largely uses Python). And while till recently I used to claim that I’m multilingual in the sense that I can read Python code fairly competently, of late I’m not sure I am. I find it increasingly difficult to parse and read production grade python code.

And now, after some experiments with ChatGPT, and exploring other people’s codes, I have an idea on why I’m finding it hard to read production-grade Python code. It has to do with “code density”.

Of late I’ve been experimenting with Spark (finally, in this job I do a lot of “big data” work – something I never had to in my consulting career prior to this). Related to this, I was reading someone’s PySpark code.

And then it hit me – the problem (rather, my problem) with Python is that it is far more verbose than R. The number of characters or lines of code required to do something in Python is far more than what you need in R (especially if you are using the tidyverse family of packages, which I do, including sparklyr for spark).

Why does the density of code matter? It is to do with aesthetics and modularity and ease of understanding.

Yesterday I was writing some code that I plan to put into production. After a few hours of coding, I looked at the code and felt disgusted with myself – it was a way too long monolithic block of code. It might have been good when I was writing it, but I knew that if I were to revisit it in a week or two, I wouldn’t be able to understand what the hell was happening there.

I’ve never worked as a professional software engineer, but with the amount of coding I’ve done, I’ve worked out what is a “reasonable length for a code block”. It’s like that apocryphal story of Indian public examiners for high school exams who evaluate history answers based on how long they are – “if they were to place an ordinary Reynolds 045 pen vertically on the sheet, the answer should be longer than that for the student to get five marks”.

An answer in a high school history exam needs to be longer than this. A code block or function should be shorter than this

It’s the reverse here. Approximately speaking, if you were to place a Reynolds pen vertically on screen (at your normal font size), no function (or code block) can be longer than the pen.

This easily approximates how much the eye can see on one normal Macbook screen (I use a massive external monitor at work, and a less massive, but equally wide, one at home). If you have to keep scrolling up and down to understand the full logic, there is a higher chance you might make mistakes, and higher difficulty for someone to understand the code.

Till recently (as in earlier this week) I would crib like crazy that people coding in Python would make their code “too modular”. That I would have to keep switching back and forth between different functions (and classes!!) to make sense of some logic, and about how that would make codes hard to debug (I still think there is a limit to how modular you can make your ETL code).

Now, however (I’m writing this on a Saturday – I’m not working today), from the code density perspective, it all makes sense to me.

The advantage of R is that because the code is far denser, you can pack in a far greater amount of logic in a Reynolds pen length of code. So over the years I’ve gotten used to having this much logic being presented to me in one chunk (without having to scroll or switch functions).

The relatively lower density of Python, however, means that the same amount of logic that would be one function in R is now split across several different functions. It is not that the writer of the code is “making this too modular” or “writing functions just for the heck of it”. It is just that their “mental Reynolds pens” doesn’t allow them to pack in more lines in a chunk or function, and Python’s density means there is only so much logic that can go in there.

As part of my undergrad, I did a course on Software Engineering (and the one thing the professor told us then was that we should NOT take up software engineering as a career – “it’s a boring job”, he had said). In that, one of the things we learnt was that in conventional software services contexts, billing would happen as a (nonlinear) function of “kilo lines of code” (this was in early 2003).

Now, looking back, one thing I can say is that the rate per kilo line of R code ought to be much higher than the rate per kilo line of Python code.

Cross posted on my now-largely-dormant Art of Data Science newsletter