Maths, machine learning, brute force and elegance

Back when I was at the International Maths Olympiad Training Camp in Mumbai in 1999, the biggest insult one could hurl at a peer was to describe the latter’s solution to a problem as being a “brute force solution”. Brute force solutions, which were often ungainly, laboured and unintuitive were supposed to be the last resort, to be used only if one were thoroughly unable to implement an “elegant solution” to the problem.

Mathematicians love and value elegance. While they might be comfortable with esoteric formulae and the Greek alphabet, they are always on the lookout for solutions that are, at least to the trained eye, intuitive to perceive and understand. Among other things, it is the belief that it is much easier to get an intuitive understanding for an elegant solution.

When all the parts of the solution seem to fit so well into each other, with no loose ends, it is far easier to accept the solution as being correct (even if you don’t understand it fully). Brute force solutions, on the other hand, inevitably leave loose ends and appreciating them can be a fairly massive task, even to trained mathematicians.

In the conventional view, though, non-mathematicians don’t have much fondness for elegance. A solution is a solution, and a problem solved is a problem solved.

With the coming of big data and increased computational power, however, the tables are getting turned. In this case, the more mathematical people, who are more likely to appreciate “machine learning” algorithms recommend “leaving it to the system” – to unleash the brute force of computational power at the problem so that the “best model” can be found, and later implemented.

And in this case, it is the “half-blood mathematicians” like me, who are aware of complex algorithms but are unsure of letting the system take over stuff end-to-end, who bat for elegance – to look at data, massage it, analyse it and then find that one simple method or transformation that can throw immense light on the problem, effectively solving it!

The world moves in strange ways.

Schoolkid fights, blockchain and smart contracts

So I’ve been trying to understand the whole blockchain thing better, since people nowadays seem to be wanting to use it for all kinds of contracts (even the investment bankers are taking interest, which suggests there’s some potential out there 😛 ).

One of the things I’ve been doing is to read this book (PDF) on Blockchain by Arvind Narayanan and co at Princeton. It’s an easy to read, yet comprehensive, take on bitcoin and cryptocurrency technologies, the maths behind it and so on.

And as I’ve been reading it, I’ve been developing my own oversimplified model of what blockchain and smart contracts are, and this is my take at explaining it.

Imagine that Alice and Bob are two schoolkids and they’ve entered into a contract which states that if Alice manages to climb a particular tree, Bob will give her a bar of chocolate. Alice duly climbs the tree and claims the chocolate, at which point Bob flatly denies that she climbed it and refuses to give her the chocolate. What is Alice to do?

In the conventional “contract world”, all that Alice can do is to take the contract that she and Bob had signed (assume they had formalised it) and take it to a court of law (a schoolteacher, perhaps, in this case), which will do its best possible in order to determine whether she actually climbed the tree, and then deliver the judgment.

As you may imagine, in the normal schoolkid world, going to a teacher for adjudicating on whether someone climbed a tree (most likely an “illegal” activity by school rules) is not the greatest way to resolve the fight. Instead, either Alice and Bob will try to resolve it by themselves, or call upon their classmates to do the same. This is where the blockchain comes in.

Simply put, in terms of the blockchain “register”, as long as more than half of Alice and Bob’s classmates agree that she climbed the tree, she is considered to have climbed the tree, and Bob will be liable to give her chocolate. In other words, the central “trusted third party” gets replaced by a decentralised crowd of third parties where the majority decision is taken to be the “truth”.

Smart contracts take this one step further. Bob will give the bar of chocolates to the collective trust of his classmates (the adjudicators). And if a majority of them agree that Alice did climb the tree, the chocolate will be automatically given to her. If not, it will come back to Bob. What blockchain technologies allow for is to write code in a clever manner so that this can get executed automatically.

This might be a gross oversimplification, but this is exactly how the blockchain works. Each transaction is considered “valid” and put into the blockchain if a majority of nodes agrees it’s valid. And in order to ensure that this voting doesn’t get rigged, the nodes (or judges) need to perform a difficult computational puzzle in order to be able to vote – this imposes an artificial cost of voting which makes sure that it’s not possible to rig the polls unless you can take over more than half the nodes – and in a global blockchain where you have a really large number of nodes, this is not feasible.

So when you see that someone is building a blockchain based solution for this or that, you might wonder whether it actually makes sense. All you need to do is to come back to this schoolkid problem – for the kind of dispute that is likely to arise from this problem, would the parties prefer to go to a mutually trusted third party, or leave it to the larger peer group to adjudicate? Using the blockchain is a solution if and only if the latter case is true.

Programming back to the 1970s

I learnt to write computer code circa 1998, at a time when resources were plenty. I had a computer of my own – an assembled desktop with a 386 processor and RAM that was measured in MBs. It wasn’t particularly powerful, but it was more than adequate to handle the programs I was trying to write.

I wasn’t trying to process large amounts of data. Even when the algorithms were complex, they weren’t that complex. Most code ran in a matter of minutes, which meant that I didn’t need to bother about getting the code right the first time round – apart from for examination purposes. I could iterate and slowly get things right.

This was markedly different from how people programmed back in the 1970s, when computing resource was scarce and people had to mostly write code on paper. Time had to be booked at computer terminals, when the code would be copied onto the computers, and then run. The amount of time it took for the code to run meant that you had to get it right the first time round. Any mistake meant standing in line at the terminal again, and further time to run  the code.

The problem was particularly dire in the USSR, where the planned economy meant that the shortages of computer resources were shorter. This has been cited as a reason as to why Russian programmers who migrated to the US were prized – they had practice in writing code that worked for the first time.

Anyway, the point of this post is that coding became progressively easier through the second half of the 20th century, when Moore’s Law was in operation, and computers became faster, smaller and significantly more abundant.

This process continues – computers continue to become better and more abundant – smartphones are nothing but computers. On the other side, however, as storage has gotten cheap and data capture has gotten easier, data sources are significantly larger now than they were a decade or two back.

So if you are trying to write code that uses a large amount of data, it means that each run can take a significant amount of time. When the data size reaches big data proportions (when it all can’t be processed on a single computer), the problem is more complex.

And in that sense, every time you want to run a piece of code, however simple it is, execution takes a long time. This has made bugs much more expensive again – the amount of time programs take to run means that you lose a lot of time in debugging and rewriting your code.

It’s like being in the 1970s all over again!

Dreaming on about machine learning

I don’t know if I’ve written about this before (that might explain how I crossed 2000 blogposts last year – multiple posts about the same thing), but anyway – I’m writing this listening to Aerosmith’s Dream On.

I don’t recall when the first time was that I heard the song, but I somehow decided that it sounded like Led Zeppelin. It was before 2006, so I had no access to services such as Shazam to search effectively. So for a long time I continued to believe it was by Led Zep, and kept going through their archives to locate the song.

And then in 2006, Pandora happened. It became my full time work time listening (bless those offshored offices with fast internet and US proxies). I would seed stations with songs I liked (back then there was no option to directly play songs you liked – you could only seed stations). I discovered plenty of awesome music that way.

And then one day I had put on a Led Zeppelin station and started work. The first song was by Led Zeppelin itself. And then came Dream On. And I figured it was a song by Aerosmith. While I chided myself for not having identified the band correctly, I was happy that I hadn’t been that wrong – given that Pandora uses machine learning on song patterns to identify similar songs, that Dream On had appeared in a LedZep playlist meant that I hadn’t been too far off identifying it with that band.

Ten years on, I’m not sure why I thought Dream On was by Led Zeppelin – I don’t see any similarities any more. But maybe the algorithms know better!

Coin change problem with change – Dijkstra’s Algorithm

The coin change problem is a well studied problem in Computer Science, and is a popular example given for teaching students Dynamic Programming. The problem is simple – given an amount and a set of coins, what is the minimum number of coins that can be used to pay that amount?

So, for example, if we have coins for 1,2,5,10,20,50,100 (like we do now in India), the easiest way to pay Rs. 11 is by using two coins – 10 and 1. If you have to pay Rs. 16, you can break it up as 10+5+1 and pay it using three coins.

The problem with the traditional formulation of the coin change problem is that it doesn’t involve “change” – the payer is not allowed to take back coins from the payee. So, for example, if you’ve to pay Rs. 99, you need to use 6 coins (50+20+20+5+2+2). On the other hand, if change is allowed, Rs. 99 can be paid using just 2 coins – pay Rs. 100 and get back Re. 1.

So how do you determine the way to pay using fewest coins when change is allowed? In other words, what happens to the coin change problems when negative coins can be used? (Paying 100 and getting back 1 is the same as paying 100 and (-1) ) .

Unfortunately, dynamic programming doesn’t work in this case, since we cannot process in a linear order. For example, the optimal way to pay 9 rupees when negatives are allowed is to break it up as (+10,-1), and calculating from 0 onwards (as we do in the DP) is not efficient.

For this reason, I’ve used an implementation of Dijkstra’s algorithm to determine the minimum number of coins to be used to pay any amount when cash back is allowed. Each amount is a node in the graph, with an edge between two amounts if the difference in amounts can be paid using a single coin. So there is an edge between 1 and 11 because the difference (10) can be paid using a single coin. Since cash back is allowed, the graph need not be directed.

So all we need to do to determine the way to pay each amount most optimally is to run Dijkstra’s algorithm starting from 0. The breadth first search has complexity $latex O(M^2 n)$ where M is the maximum amount we want to pay, while n is the number of coins.

I’ve implemented this algorithm using R, and the code can be found here. I’ve also used the algorithm to compute the number of coins to be used to pay all numbers between 1 and 10000 under different scenarios, and the results of that can be found here.

You can feel free to use this algorithm or code or results in any of your work, but make sure you provide appropriate credit!

PS: I’ve used “coin” here in a generic sense, in that it can mean “note” as well.

The Birthday Party Problem

Next Tuesday is my happy birthday. As of now, I’m not planning to have a party. And based on some deep graph theoretic analysis that the wife and I just did over the last hour, it’s unlikely I will – for forming a coherent set of people to invite is an NP-hard problem, it seems like.

So five birthdays back we had a party, organised by the wife and meant as a surprise to me. On all counts it seemed like a great party. Except that the guests decided to divide themselves into one large clique and one smaller clique (of 2 people), leaving me as the cut vertex trying to bridge these cliques. That meant the onus was on me to make sure the tiny clique felt included in the party, and it wasn’t a lot of fun.

The problem is this – how do you invite a subset of friends for a party so that intervention by the host to keep guests entertained is minimised?

Let’s try and model this. Assume your friends network can be represented by an unweighted undirected graph, with a pair of friends being connected by an edge if they know (and get along with) each other already. Also assume you have full information about this graph (not always necessary).

The problem lies in selecting a subgraph of this graph such that you can be confident that it won’t break into smaller pieces (since that will mean you bonding with each such sub-group), and no guest feels left out (since the onus of making them comfortable will fall on you).

whatsapp-image-2016-11-28-at-6-53-04-pm

Firstly, the subgraph needs to be connected. Then, we can safely eliminate all guests who have degree of either zero or one (former is obvious, latter since they’ll be too needy on their only friend). In fact, we can impose a condition that each guest should have a minimum degree of two even in the subgraph.

Then we need to impose conditions on a group in the party breaking away. We can assume that for a group of people to break away, they need to be a clique (it is not a robust requirement, since you and someone you find at a party can suddenly decide to find a room, but reasonable enough).

We can also assume that for a group to break away, the strength of their mutual connections should outweigh the strength of their connections to the rest of the group. Since we’re using unweighted graphs here, we can simply assume that a group can break away if the number of edges between this group and the rest of the network is less than the size of the group.

So if there is a group of three who, put together, have two connections to the rest of the group, the group can break away. Similarly, a clique of four will break away from the main group if they have three or less edges going over. And let’s assume that the host is not a part of this subgroup of guests.

Given these constraints, and constraints on party size (minimum and maximum number of guests to invite), how can we identify an appropriate subset of friends to invite for the party? And I’m assuming this problem is NP-Hard (without thinking too much about it) – so can we think of a good heuristic to solve this problem

Do let me know the answer before next Tuesday, else I may not be able to have a party this time as well!

The one bit machine

My daughter is two weeks old today and she continues to be a “one bit machine”. The extent of her outward communication is restricted to a maximum of one bit of information. There are basically two states her outward communication can fall under – “cry” and “not cry”, and given that the two are not equally probable, the amount of information she gives out is strictly less than one bit.

I had planned to write this post two weeks back, the day she was born, and wanted to speculate how long it would take for her to expand her repertoire of communication and provide us with more information on what she wants. Two weeks in, I hereby report that the complexity of communication hasn’t improved.

Soon (I don’t know how soon) I expect her to start providing us more information – maybe there will be one kind of cry when she’s hungry, and another when she wants her diaper changed. Maybe she’ll start displaying other methods of outward communication – using her facial muscles, for example (right now, while she contorts her face in a zillion ways, there is absolutely no information conveyed), and we can figure out with greater certainty what she wants to convey.

I’m thinking about drawing a graph with age of the person on the X axis, and the complexity of outward information on the Y axis. It starts off with X = 0 and Y = 1 (I haven’t bothered measuring the frequency of cry/no-cry responses so let’s assume it’s equiprobable and she conveys one bit). It goes on to X = 14 days and Y = 1 (today’s state). And then increases with time (I’m hoping).

While I’m sure research exists some place on the information content per syllable in adult communication, I hope to draw this graph sometime based on personal observation of my specimen (though that would limit it to one data point).

Right now, though, I speculate what kind of shape this graph might take. Considering it has so far failed to take off at all, I hope that it’ll be either an exponential (short-term good but long-term I don’t know ) or a sigmoid (more likely I’d think).

Let’s wait and see.

Dining philosophers in action

When we learnt semaphores as part of our Operating Systems course during undergrad, one of the illustrations that was used was the “dining philosophers’ problem“.

The situation is simple – there are six philosophers seated around a table and six spoons placed between them. Each philosopher (the problem is not sensitive to the profession of the eater) has to now pick up a spoon. It is not clearly mentioned if a philosopher has to pick up the spoon on his left or on his right.

This is now a coordination problem. If all philosophers go the same way, all is good, since each of them gets exactly one spoon. If any two philosophers go the opposite ways, it results in two philosophers fighting over the same spoon while one spoon remains unclaimed. The use of semaphores in the solution to this problem is outside the scope of this blogpost.

Back when I learnt about the dining philosophers, it seemed like a rather esoteric and academic exercise (one minor point of hilarity occurred when this particular chauvinist in my hostel refused to read this book on operating systems by Silberschatz and Galvin because the book referred to philosophers as “she”. “How can philosophers be female, da?” reasoned this guy). It seemed unlikely that it would actually occur in real life.

Until it did tonight. The wife’s graduation ceremony was followed by cocktails and a formal sit down dinner. The dinner had been arranged around round tables, with large plates being set in front of each chair. Bread plates had been placed between the large plates.

So there were eleven of us around the table, with eleven plates in front of us, and eleven bread plates between us. The positioning of the bread plates meant that each of us could have either picked the plate on the left of us or the plate on the right (it was symmetrical to our seat). As the wines started to be brought in by bearers, it was time to consume the bread.

Our table consisted of eleven Indians, all of whom were unaware of the convention regarding bread plates. All of us wanted to eat bread, though. Had any of us known the convention, that person would’ve confidently swooped, and the rest of the table would have followed. With all of us being unaware, we stared at one another blankly, waiting for someone to move. A perfect Dining Philosophers Problem had been set up (dining MBAs and families, to be more precise).

We proved to be inefficient philosophers. Some moved left, others moved right. Those that saw people go left followed left. Those who saw people go right followed right. Inevitably there was a conflict, as my mother-in-law and I reached for the same bread plate.

The conflict was resolved by looking around the table to see the side that was in majority – most people had gone for the bread plates on their right. The rest made adjustments and bread was broken. It would turn out later that we were all wrong – the convention is for the bread plate to be placed to the eater’s left.

Ordinarily I might have been disturbed about my lack of knowledge of social conventions, and resulting faux pas. Seeing the Dining Philosophers’ Problem in action more than compensated for that today! Seeing my excitement at seeing the problem in action, the others on the table might have thought I’d had a glass of wine too many.

Simulating segregation

Back in the 1970s, economist Thomas Schelling proposed a model to explain why cities are segregated. Individual people choosing to live with others like themselves would have the macroscopic impact of segregating the city, he had explained.

Think of the city as being organised in terms of a grid. Each person has 8 neighbours (including the diagonals as well). If a person has fewer than 3 people who are like himself (whether that is race, religion, caste or football fandom doesn’t matter), he decides to relocate, and moves to an arbitrary empty spot where at least 3 new neighbours are like himself. Repeat this a sufficient number of times and the city will be segregated, he said.

Rediscovering this concept while reading this wonderful book on Networks, Crowds and Markets yesterday, I decided to code it up on a whim. It’s nothing that’s not been done before – all you need to do is to search around and you’ll find plenty of code with the simulations. I just decided to code it myself from first principles as a challenge.

You can find the (rather badly written) code here. Here is some sample output:

Sample output

As you can see, people belong to two types – red and blue. Initially they start out randomly distributed (white spaces show empty areas). Then people start moving based on Schelling’s rule – if there are less than 3 neighbours of the same kind, you move to a new empty place (if one is available) which is more friendly to you. Over time, you see that you get a segregated city, with large-ish patterns of reds and blues.

The interesting thing to note is that there is no “complete segregation” – there is no one large red patch and one large blue patch. Secondly, segregation seems rather slow at first, but soon picks up pace. You might also notice that the white spaces expand over time.

This is for one specific input, where there are 2500 cells (50 by 50  grid), and we start off with 900 red and 900 blue people (meaning 700 cells are empty). If you change these numbers, the pattern of segregation changes. When there are too few empty cells, for example, the city remains mixed – people unhappy with their neighbourhood have no where to go. When there are too many empty cells, you’ll see that the city contracts. And so forth.

Play around with the code (I admit I haven’t written sufficient documentation), and you can figure out some more interesting patterns by yourself!

Large sites and universally accessible blocks

Currently reading this paper by Brelsford, Martin, Hand and Bettencourt of the Santa Fe Institute (did I tell you I just got my first MOOC certificate from this institute last week?) on the topology of cities. I have only a rudimentary understanding of topology (thanks to an excellent session with Dr. V Vinay), but considering that I know Graph Theory fairly well, I’ve been able to follow the paper so far.

The paper talks about “universally accessible blocks” in cities, which is basically about blocks where each unit can be accessed directly by road without going through other units. Developed cities, the paper argues, has mostly universally accessible blocks, while non-universally accessible blocks are artefacts of non-developed countries and old cities.

The problem with non-univerally accessible blocks is that the “inner units” in such blocks (which are not directly accessible by road) are mostly deprived of public and emergency services and this affects the  quality of life in such blocks. The paper, for example, talks about mostly slums having such architecture, and that newly developed cities usually try to have universally accessible blocks.

When Bangalore was developed in a planned fashion starting in the 1950s (led by the “City Improvement Trust Board” which later morphed into the “Bangalore Development Authority”), a number of new areas were designed for large houses. Large sites were allotted, and regulations framed such that buildings on such sites be sparse (they were called “bungalow sites”). The part of Bangalore I live in, Jayanagar, for example, has a large concentration of such bungalow sites.

While in theory such sites make sense, the fact is that not too many people were enthused about sparse buildings on such sites. So they took advantage of loopholes in regulation (even best designed policies have loopholes) to build multiple buildings on the site. Later on, these sites got partitioned into smaller sites, with at least one building on each smaller site. As a result of partitioning, a large number of units thus created were not “accessible”.

Allotting big sites and getting people to build big houses on them in order to “lead development” into a new area might have been a great idea in theory, but the fact that most people could not afford to build such big houses and loopholes in regulation resulted in non-accessible units! Of course it results in lower infrastructure costs (since the road network is sparser than is necessary), but it comes at a price since not everyone has equal access to infrastructure.

As a wise man once said, #thatzwhy we need strong regulation.