Genetic Algorithms

I first learnt about Genetic Algorithms almost exactly thirteen years ago, when it was taught to me by Prof. Deepak Khemani as part of a course on “artificial intelligence”. I remember liking the course a fair bit, and took a liking to the heuristics and “approximate solutions” after the mathematically intensive algorithms course of the previous semester.

The problem with the course, however, was that it didn’t require us to code the algorithms we had learnt (for which we were extremely thankful back then, since in term 5 of Computer Science at IIT Madras, this was pretty much the only course that didn’t involve too many programming assignments).

As a result, while I had learnt all these heuristics (hill climbing, simulated annealing, taboo search, genetic algorithms, etc.) fairly well in theory, I had been at a complete loss as to how to implement any of them. And so far, during the course of my work, I had never had an opportunity to use any of these techniques. Until today that is.

I can’t describe the problem here since this is a client assignment, but when I had to pick a subset from a large set that satisfied certain properties, I knew I needed a method that would reach the best subset quickly. A “fitness function” quickly came to mind and it was obvious that I should use genetic algorithms to solve the problem.

The key with using genetic algorithms is that you need to be able to code the solution in the form of a string, and then define functions such as “crossover” and “mutation”. Given that I was looking for a subset, coding it as a string was rather easy, and since I had unordered subsets, the crossover was also easy – basic random number generation. Within fifteen minutes of deciding I should use GA, the entire algorithm was in my head. It was only a question of implementing it.

As I started writing the code, I started getting fantasies of being able to finish it in an hour and then write a blog post about it. As it happened, it took much longer. The first cut took some three hours (including some breaks), and it wasn’t particularly good, and was slow to converge.  I tweaked things around a bit but things didn’t improve by much.

And that was when I realise that I had done the crossover wrong – when two strings have elements in common and need to be crossed over, I had to take care that elements in common did not repeat into the same “child” (needed the subsets to be of a certain length). So that needed some twist in the code. That done, the code still seemed inefficient.

I had been doing the crossover wrong. If I started off with 10 strings, I would form 5 pairs from them (each participating in exactly one pair) which would result in 10 new strings. And then I would put these 20 (original 10 and new 10) strings through a fitness test and discard the weakest 10. And iterate. The problem was that the strongest strings had as much of a chance of reproducing as the weakest. This was clearly not right.

So I tweaked the code so that the fitter strings had a higher chance of reproducing than the less fit. This required me to put the fitness test at the beginning of each iteration rather than the end. I had to refactor the code a little bit to make sure I didn’t repeat computations. Now I was drawing pairs of strings from the original “basket” and randomly crossing them over. And putting them through the fitness test. And so forth.

I’m extremely happy with the results of the algorithm. I’ve got just the kind of output that I had expected. More importantly, I was extremely happy with the process of coding the whole thing in. I did the entire coding in R, which is what I use for my data analysis (data size meant I didn’t need anything quicker).

The more interesting part is that this only solved a very small part of the problem I’m trying to solve for my client. Tomorrow I’m back to solving a different part of the problem. Genetic algorithms have served their purpose. Back when I started this assignment I had no clue I would be using genetic algorithms. In fact, I had no clue what techniques I might use.

Which is why I get annoyed when people ask me what kind of techniques I use in my problem solving. Given the kind of problems I take on, most will involve a large variety of math, CS and statistics techniques, each of which will only play a small role in the entire solution. This is also the reason I get annoyed when people put methods they are going to use to solve the problem on their pitch-decks. To me, that gives an impression that they are solving a toy version of the problem and not the real problem – or that the consultants are grossly oversimplifying the problem to be solved.

PS: Much as some people might describe it that way, I wouldn’t describe Genetic Algorithms as “machine learning”. I think there’s way too much intervention on the part of the programmer for it to be described thus.

My tryst with Kannada media

So about a month or so back, I wrote up an essay on why the much-maligned TenderSURE project is a right step in the development of Bangalore, and why the Chief Minister’s comments on the issue were misguided and wrong.

Having written it, considering it worthy of a better forum than NED, I shared it with my Takshashila colleagues. They opined that is should get published in a Kannada newspaper, and Varun Shenoy duly translated the piece into Kannada. And then the story began.

We sent it to PrajaVani (which has published several other Op-Eds from other Takshashila people), but they summarily rejected this without giving reasons. We then sent it to UdayaVani, reaching it after passing some hoops, but then they raised some questions with the content, the answers to which had been made quite clear within the text.

I think Mint has spoilt me, in that I assume that it’s okay to write geeky stuff and have it accepted for publication. Rather, it is possible that they’ve recruited me so that they can further bolster their geek quotient. Last week, for example, I sent a piece on Fractional Brownian Motion, and it got published. A couple of years back I’d sent a formula with Tchebyshev’s inequality to be included in a piece on sampling, and they had published that too.

When translating my piece, Varun thought it was too geeky and technical, and he made an attempt to tone it down during his translation. And the translation wasn’t easy – for we had to find Kannada equivalents for some technical terms that I’d used. In some cases, Varun expertly found terms. In others, we simply toned it down.

Having toned down the piece and made an effort to make it “accessible”, UdayaVani’s response was a bit of a dampener for us – and it resulted in a severe bout of NED. And so we sat on the piece. And continued to put NED.

Finally, Varun got out of it and published it on the Takshashila blog (!!). The original piece I’d written is here:

A feature of Bangalore traffic, given the nature of the road network, is that bottlenecks are usually at the intersections, and not at the roads. As a consequence, irrespective of how much we widen the roads, the intersections will continue to constrain the flow of traffic in the city. In other words, making roads narrower will not have a material impact on the throughput of traffic in the city.

And Varun’s translation is here:

(Update: I tried to extract Varun’s piece here but it’s not rendering properly, so please click through and read on the Logos blog)

Read the whole thing, whichever piece you can understand. I think we are on to something here.

And on that note, it might make sense to do a more rigorous network-level analysis of Bangalore’s roads. Designing the graph is simple – each intersection (however small it might be) is a node, each “road segment” is an edge. The graph is both directed (to take care of one-ways) and weighted (to indicate width of roads).

We’ll need data on flows, though. If we can get comprehensive data of origin and destination of a large number of people, we should be able to impute flows in each segment based on that.

And then we can rigorously test the hypothesis (I admit that it’s still only a hypothesis) that bottlenecks on Bangalore’s roads are intersections and not roads.

Agile programming and Parliamentary procedures

Supposedly the latest (ok not that latest – it’s been around for a decade) thing in software engineering is this thing called “Agile programming“. It’s basically about breaking away from the old “waterfall model” (which we had learnt in a software engineering course back in 2003), and to a more iterative model with quick releases, updates, etc.

I’ve never worked as a software engineer, but it seems to me that Agile programming is the way to go – basically get something out and keep iterating till you have a good product rather than thinking endlessly about incorporating all design specifications before writing a single line of code. Requirements change rapidly nowadays, and unless you are “agile” (pun intended) to that, you will not produce good software.

Agile methodologies, however, don’t work in parliamentary procedures, since there is very high transaction cost there. Take, for example, the proposed Goods and Service Tax (GST). The current form of the Goods and Service Tax is an incredibly flawed bill, with taxes for movement of goods across states and certain products being excluded from the ambit altogether. Mihir Sharma at Business Standard has a great takedown of the current bill (there is no one quotable paragraph. Read the whole thing. I’m mostly in agreement).

So it seems to me that the government is passing the GST in its current half-baked form because it wants some version (like a Minimum Viable Product) off the ground, and then it hopes to rectify the shortcomings in a later iteration. In other words, the government is trying some sort of an agile methodology when it comes to passing the GST.

The problem with parliamentary procedures, however, is that transaction costs are great. Once a law has been passed, it is set in stone and the effort required for any single amendment is equal to the effort originally required for passing the law itself, since you have to go through the whole procedure again. In other words, our laws need to be developed using the waterfall model, and hence have full system requirement specifications in place before they are passed.

It’s not surprising since the procedure for passing laws was laid down back at a time when hardly any programming existed, leave alone agile programming. Yet, it begs the question of what can be done to make our laws more agile (pun intended).

PS: I understand that Agile software development has several features and this iterative nature is just one of them. But that is the only one I want to focus on here.

Gossip Propagation Models

More than ten years ago, back when I was at IIT Madras, I considered myself to be a clearinghouse of gossip. Every evening after dinner I would walk across to Sri Gurunath Patisserie, and plonk myself at one of the tables there with a Rs. 5 Nescafe instant coffee. And there I would meet people. Sometimes we would discuss ideas (while these discussions were rare, they were most fulfilling). Other times we would discuss events. Most of the time, and in conversations that would be entertaining if not fulfilling, we discussed people.

Constant participation in such discussions made sure that any gossip generated anywhere on campus would reach me, and to fill time in subsequent similar conversations I would propagate them. I soon got to know about random details of random people on campus who I hardly cared about. Such information was important purely because someone else might find it interesting. Apart from the joy of learning such gossip, however, I didn’t get remunerated for my services as clearinghouse.

I was thinking about this topic earlier today while reading this studmax post that the wife has written about gossip distribution models. In it she writes:

This confirmed my earlier hypothesis that gossip follows a power law distribution – very few people hold all the enormous hoards of information while the large majority of people have almost negligible information. Gossip primarily follows a hub and spoke model (eg. when someone shares inappropriate pictures of others on a whatsapp group) and in some rare cases especially in private circles (best friends, etc.), it’s point to point.

 

For starters, if you plot the amount of gossip that is propagated by different people (if a particular quantum of gossip is propagated to two different people, we will count it twice), it is very well possible that it follows a power law distribution. This well follows from the now well-known result that degree distribution in real-world social networks follows a power law distribution. On top of this if you assume that some people are much more likely to propagate quantums of gossip they know to other people, and that such propensity for propagation is usually correlated with the person’s “degree” (number of connections), the above result is not hard to show.

The next question is on the way gossip actually propagates. The wife looks at the possibilities through two discrete models – hub-and-spoke and peer-to-peer. In the hub-and-spoke models, gossip is likely to spread along the spokes. Let us assume that the high-degree people are the hubs (intuitive), and according to this model, these people collect gossip from spokes (low degree people) and transmit it to others. In this model, gossip seldom propagates directly between two low-degree people.

At the other end is the peer-to-peer model where the likelihood of gossip spreading along an edge (connection between two people) is independent of the nature of the nodes at the end of the edge. In this kind of a model, gossip is equally likely to flow across any edge. However, if you overlay the (scale free/ power law) network structure over this model, then it will start appearing to be like a hub and spoke model!

In reality, neither of these models is strictly true since we also need to consider each person’s propensity to propagate gossip. There are some people who are extremely “sadhu” and politically correct, who think it is morally wrong to propagate unsubstantiated stories. They are sinks as far as any gossip is concerned. The amount of gossip that reaches them is also lower because their friends know that they’re not interested in either knowing or propagating it. On the other hand you have people (like I used to be) who have a higher propensity of propagating gossip. This also results in their receiving more gossip, and they end up propagating more.

So does gossip propagation follow the hub-and-spoke model or peer-to-peer model? The answer is “somewhere in between”, and a function of the correlation between the likelihood of a node propagating gossip and the degree of the node. If the two are uncorrelated (not unreasonable), then the flow will be closer to peer-to-peer (though degree distribution being a power law makes it appear as if it is hub-and-spoke). If there is very high positive correlation between likelihood of propagation and node degree, the model is very close to hub-and-spoke, since the likelihood of gossip flowing between low degree nodes in such a case is very very low, and thus most of the gossip flow happens through one of the hubs. And if the correlation between likelihood of propagation and node degree is low (negative), then it is likely to lead to a flow that is definitely peer-to-peer.

I plan to set up some simulations to actually study the above possibilities and further model how gossip flows!

What makes for a successful networking event?

So the second edition of the NED Talks took place last night. And no, this is not a “match report”. So the purpose of the NED Talks is to put together a bunch of interesting people who are interesting in different ways, and get them to talk. There are approximately ten talks (the first edition had thirteen, the second eight) followed by interaction, and a round of interaction before the event. So in certain ways, NED Talks are networking events.

So the question is what the ideal “network structure” of the networkers is in order to ensure a successful networking session. The idea is that the network at the beginning of the session needs to be only slightly dense – if there are too many people who know each other at the beginning of the session, there is not much of a point in the event as a networking event, for the value add in a networking event is to bring together people who hitherto didn’t know each other, and to strengthen existing weak ties.

The network being not dense enough also is also a problem, for that means that people might be lost. So if you have a lot of people who have never known each other earlier, and if some of them are introverted (as is likely to happen when you put together intelligent people), the conversation can be a bit of a non-starter. So low density is also not a good thing.

Then there is the issue of cliques – if you have a bunch of people who all know each other from earlier, then the others might feel “left out”, and not be able to get into the conversation. There are likely to be “in-jokes” and “in-stories” which everyone else finds irrelevant. I remember being  at one such gathering where I was the only person who was not thus “in”, and so I got up and announced that I was getting bored and walked out.

Anyway, so let us represent all attendees to a party or event in the form of a graph (undirected). Each vertex represents an attendee and two attendees are connected by an edge if they know each other from before. Given such a graph, can we construct an algorithm to “verify” if it is a good set of people to have for the party? Oh, and this is one of the insights from yesterday’s NED talks – computational complexity can be measured in terms of how “easy” it is to verify a given solution, rather than generate a new solution.

The first thing you can do is to find the size of the largest clique – if it exceeds a certain proportion (a third maybe? a fourth?) of the total number of attendees, it is a bad idea, for that means that this clique might dominate the conversation.

Then you can calculate the “edge density” of the graph – the total number of edges on the graph to the number of “possible edges” (given by NC2 where N is the number of attendees). For example, the edge density of the first NED Talks was 3/26 (largely due to 5 attendees who were not connected to any other nodes) . The edge density of the second NED Talks was 1/3 (might have been higher but a not-so-well-connected attendee backed out at the last moment). What range of edge density makes sense? Or should we use the variance in edge density also?

Then there is the number of “components” in the graph – if the graph is mostly disconnected, the group might split up into small cliques which might defeat the purpose of the networking event itself and lead to disconnected conversation. Note that nodes of zero degree don’t matter here – it’s components wiht at least two people.

And so forth. So can anyone help me build an algo to “verify” if a party / networking event is going to be good given a graph of who knows who from before?

Deranging groups

Ok so this is a mathematical problem. I plan to give three group assignments to my IIMB class. Let’s assume that there are 60 kids and for each assignment I want them to form groups of four members each. For the first assignment I’ve let them form the groups themselves.

For the second assignment, though, I want to create a “derangement” of these groups – in the sense that I want to form a different set of 15 groups of 4 members each such that no pair of students who are in the same group for assignment 1 will be in the same group for assignment 2. And I’m looking for an algorithm to thus “derange” my students. And whether it is possible at all to derange students thus.

My inclination is that this might have a solution in graph theory – in terms of graph colouring or something? Take the students from the first group and join every pair of students that are in the same group with an edge. Then colour the nodes of the graph. Pick nodes of the same colour (these are students that haven’t been in groups together) and randomly assign them to new groups. Repeat for all colours.

Question is how many colours we need to colour the graph. If it’s planar, we’ll need only 4 colours! And considering that the first assignment has 4 students per group, the maximum degree of a node is 3. If the maximum degree of an edge is 3, does that say anything about the planarity of the graph? If I derange the students once for assignment 2, can I do it again for assignment 3 (now each node has a degree of 6!) ? How do I think about this mathematically? Can you help?

Data Science and Software Engineering

I’m a data scientist. I’m good with numbers, and handling large and medium sized data sets (that doesn’t mean I’m bad at handling small data sets, of course). The work-related thing that gives me most kicks is to take a bunch of data and through a process of simple analysis, extract information out of it. To twist and turn the data, or to use management jargon “slice and dice”, and see things that aren’t visible to too many people. To formulate hypotheses, and use data to prove or disprove them. To represent data in simple but intuitive formats (i.e. graphs) so as to convey the information I want to convey.

I can count my last three jobs (including my current one) as being results of my quest to become better at data science and modeling. Unfortunately, none of these jobs have turned out particularly well (this includes my current one). The problem has been that in all these jobs, data science has been tightly coupled with software engineering, and I suck at software engineering.

Let me stop for a moment and tell you that I don’t mind programming. In fact, I love programming. I love writing code that makes my job easier, and automates things, and gives me data in formats that I desire. But I hate software engineering. Of writing code within a particular system, or framework. Or adhering to standards that someone else sets for “good code”. Of following processes and making my code usable by some dumbfuck somewhere else who wouldn’t get it if I wrote it the way I wanted. As I’d mentioned earlier, I like coding for myself. I don’t like coding for someone else. And so I suck at software engineering.

Now I wonder if it’s possible at all to decouple data science from software engineering. My instinct tells me that it should be possible. That I need not write production-level code in order to turn my data-based insights into commercially viable form. Unfortunately, in my search around the corporatosphere thus far, I haven’t been able to find something of the sort.

Which makes me wonder if I should create my own niche, rather than hoping for someone else to create it for me.

Why You Should Not Do An Undergrad in Computer Science at IIT Madras

I did my undergrad in Computer Science and Engineering at IIT Madras. My parents wanted me to study Electrical Engineering, but I had liked programming back in school, and my JEE rank “normally” “implied” Computer Science and Engineering. So I just went with the flow and joined the course. In the short term, I liked some subjects, so I was happy with my decision. Moreover there was a certain aura associated with CS students back in IITM, and I was happy to be a part of it. In the medium term too, the computer science degree did open doors to a few jobs, and I’m happy for that. And I still didn’t regret my decision.

Now, a full seven years after I graduated with my Bachelors, I’m not so sure. I think I should’ve gone for a “lighter” course, but then no one told me. So the thing with a B.Tech. in Computer Science and Engineering at IIT Madras is that it is extremely assignment incentive. Computer Science is that kind of a subject, there is very little you can learn in the classroom. The best way to learn stuff is by actually doing stuff, and “lab” is cheap (all you need is a bunch of computers) so most courses are filled with assignments. Probably from the fourth semester onwards, you spend most of your time doing assignments. Yes, you do end up getting good grades on an average, but you would’ve worked for it. And there’s no choice.

The thing with an Undergrad is that you are clueless. You have no clue what you’re interested in, what kind of a career you want to pursue, what excites you and the stuff. Yes, you have some information from school, from talking to seniors and stuff, but still it’s very difficult to KNOW when you are seventeen as to what you want to do in life. From this perspective, it is important for your to keep your options as open as they can be.

Unfortunately most universities in India don’t allow you to switch streams midway through your undergrad (most colleges are siloed into “arts” or “engineering” or “medicine” and the like). IIT Madras, in fact, is better in that respect since it allows you to choose a “minor” stream of study and courses in pure sciences and the humanities. But still, it is impossible for you to change your stream midway. So how do you signal to the market that you are actually interested in something else?

One way is by doing projects in areas that you think you are really interested in. Projects serve two purposes – first they allow you to do real work in the chosen field, and find out for yourself if it interests you. And if it does interest you, you have an automatic resume bullet point to pursue your career on that axis. Course-related projects are fine but since they’re forced, you have no way out, and they will be especially unpleasant if you happen to not like the course.

So why is CS@IITM a problem? Because it is so hectic, it doesn’t give you the time to pursue your other interests. It doesn’t offer you the kind of time that you need to study and take on projects in other subjects (yeah, it still offers you the 3 + 1 months of vacation per year, when you can do whatever you want, but then in the latter stages you’re so occupied with internships and course projects you’re better off having time during the term). So if you, like me, find out midway through the course that you would rather do something else, there is that much less time for you to explore around, study, and do projects in other subjects.

And there is no downside to joining a less hectic course. How hectic a course inherently is only sets a baseline. If you were to like the course, no one stops you from doing additional projects in the same subject. That way you get to do more of what you like, and get additional bullet points. All for the good, right?

After I graduated, IIT Madras reduced its credit requirement by one-twelfth. I don’t know how effective that has been in reducing the inherent workload of students but it’s a step in the right direction. Nevertheless, if you are going to get into college now, make sure you get into a less hectic course so that the cost of making a mistake in selection is not high.

Dropping out

Less than a semester into my undergrad (Bachelor of Technology in Computer Science and Engineering at IIT Madras) I wanted to drop out, and start work. I didn’t want to be an “engineer”.

I didn’t know why I’d to spend all my Thursday and Friday afternoons filing away at some piece of iron in the “fitting workshop”. I didn’t have the patience to draw three views of a random object in “engineering drawing”.

And I had the reputation of being one of the studdest programmers in my school. Apart from winning competitions here and there and doing well in acads, I had enormous respect from peers for my programming skills. Given that it was a “high-performance school” (which subjected its own 10th standard students to a test before admitting them to 11th) I guess this peer respect does carry some weight.

So, being good at math, and having the reputation of being a stud programmer, I didn’t know what I was doing studying “engineering”. I wanted to be a programmer, and I wanted to drop out and take up a job. My JEE rank counted almost as much as an IIT degree, I thought. I didn’t have the balls, and I continued.

In hindsight, I’m happy I didn’t drop out. By the end of my second year, I knew for sure that I DIDN’T want to be a programmer. While the theoretical aspects of Computer Science excited me (algo analysis and stuff), I had absolutely no patience for “systems”, or “computer engineering”. I was perhaps alone in my class in my love for Microsoft products (easy to use).

I realized then that I liked only the algorithmic aspect of programming, where one solves a (mostly math) problem and codes it up in a simple program. Huge complicated systems-intensive programming, making GUIs etc. didn’t inspire me at all.

Looking back, all that “major” (i.e. Computer Science and Engineering) stuff that I’ve learnt and internalized was learnt in my first two years of engineering. Of course several concepts that are part of CS&E are taught in the last two years, but I ended up not liking any of that.

Looking back, I do find it positive that I did all those “general engineering” courses. I do find it really positive that we had to do 12 compulsory credits in Humanities and Social Sciences, for that allowed me to discover what I was really interested in, and indirectly led me to doing my MBA.

I have only one regret. That I wasn’t able to switch streams sooner than I could. That IIT, being a one-dimensional technology oriented university, didn’t allow me to transfer credits to a course that I would’ve liked better, simply because it offered undergrad courses only in engineering.

There was a humanities department, where I discovered what I was interested in, but unfortunately it was a “minor” department. It’s been partly rectified now, with the setting up of integrated MA courses, in Economics, etc. (if that course existed back when I was studying, there’s a good chance I’d’ve transferred to it from CS&E). But it’s not enough.

Kids at 17 have no clue what they want to do. What we need are flexible full-scale universities, which allow you to switch from any branch to any other branch after two years of reasonably generalized study (the earlier branch can then contribute to “minor” credits). We need to stop putting our colleges in silos such as “engineering”, “arts and science”, etc. Only then would our universities be truly world class, even from an undergraduate point of view.

And looking back, I’m really happy I didn’t drop out.

Coding

Back when I was in school (11th/12th) I think I was an awesome coder. I think I was especially good at what they called as “logic coding”, i.e. coming up with algos. I used to experiment quite a bit (as much was possible with TurboC) and had a lot of fun too. I remember doing graphics in TurboC, making a “pong” game, brick breaker, and a lot of other cool stuff. For our 12th standard project, Hareesh and I built this totally awesome cricket scoring program, which we unfortunately didn’t take forward (and went to college instead).

It was my love for coding that meant I fought with my parents (who wanted me to study Electrical) and decided to study Computer Science at IIT Madras. And then I lost it. Somewhere along the way. I didn’t enjoy coding any more. Soon, I began to hate coding. I would love coding when I would write the odd program in “pure” C, or when I would participate in contests such as BITWise. But I’d completely lost it.

So over the last six to seven years (after I graduated from IIT) there have been occasions when I have thought I’ve regained my coding mojo, only to lose it again very soon. I’m still very proud of that Excel+VBA model that I had written in the very first week of my third job. But a couple of months later, I was hating coding again. And so it was while debugging a complicated piece of code at work this morning that I realize why I have this love-hate relationship with coding.

It’s simple – basically I hate coding for others. I hate writing code that others will read or use. I don’t mind writing code that others would use as a black box, of course. But I think writing code that others will read or use puts too many constraints on the way you code. My instinct is always to stop doing something when I’m personally satisfied with it, and with code it seems like I’m satisfied sooner than others would be satisfied with my code.

At a fundamental level, I like coding and I think I’m pretty good at it, so it isn’t something I want to give up. But then the formal processes and endless testing involved with writing code for others really kills joy (as does GUI, and Java). Code saves a lot of time, and helps “studdize” what might be otherwise fighter work, so I like doing it.

In an ideal world, I would be writing code that I would alone be using, AND profiting from it (I never intend to sell code; I intend to sell the results of the said code, however; that would mean no one else would read/use my code per se, so I can write it the way I want). Hopefully I’ll get there, sometime.