Channel Coding Theorem in Real Life

One of my favourite concepts in Computer Science is Shannon’s Channel Coding Theorem. This theorem is basically about the efficiency of communication over a noisy channel. And as I was thinking a few minutes back, this has interesting implications in real life as well, well away from the theory of communication.

I don’t have that much understanding of the rigorous explanation of the theorem. However, I absolutely love the central idea of it – that the noisier a channel is, the more the redundancy you need in your communication, and thus the slower is your communication. A corollary of this is that every channel has a “natural maximum speed”, and as long as you try to communicate within that speed, you can communicate reliably.

I won’t go into the technical details here – that involves assuming that the channel loses (or garbles) X% of bits, and then constructing a redundant code that shows that even with this loss, you can communicate effectively.

Anyway, let’s leave behind the theory communication and go on to real life.

I’ve found that I communicate badly when I’m not sure what language to talk in. If I’m talking in English with someone who I know knows good English, I communicate rather well (like my writing 😛 ) . However, if I’m not sure about the quality of language of the other person, I hesitate. I try to force myself to find simpler / more obvious words, and that disturbs my flow of thought, and I stammer.

Similarly, when I’m not sure whether to talk in Kannada or English (the two languages I’m very comfortable in), I stammer heavily. Again, because I’m not sure if the words I would naturally use will be understood by the other person (the counterparty’s comprehension being the “noise in the channel” here), I slow down, get jittery, and speak badly.

Then of course, there is the very literal interpretation of the channel coding theorem – when your internet connection (or call quality in general) is bad, you end up having to speak slower. When I was hunting for a job in 2020, I remember doing badly in a few interviews because of the quality (or lack thereof) of the internet connection (this was before I had discovered that Google Meet performs badly on Safari).

Similarly, sometime last month, I had thought I had prepared well for what I thought was going to be a key conversation at work. The internet was bad, we couldn’t hear each other and  kept repeating (redundancy is how you overcome the noise in the channel), and that diminished throughput massively. Given the added difficulty in communication, I didn’t bring up the key points I had prepared for. It was a damp squib.

Related to this is when you aren’t sure if the person you are speaking to can hear clearly. This disability again clouds the communication channel, meaning you need to build in redundancy, and thus a reduction in throughput.

When you are uncertain of yourself, or underconfident, you end up tending to do badly. That is because when you are uncertain, you aren’t sure if the other person will fully understand what you are going to say. Consequently, you end up talking slower, building redundancy in your speech, etc. You are more doubtful of what you are going to say, and don’t take risks, since your lack of confidence has clouded the “communication channel”, thus depressing your throughput.

Again a lot of this might apply to me alone – I function best when I’m talking / writing at a certain minimum throughput, and operating at anywhere below that makes me jittery and underconfident and a bad communicator. It is no surprise that my writing really took off once I got a computer of my own.

That was in the beginning of July 2004, and within a month, I had started (the predecessor of) this blog. I’ve been blogging for 19 years now.

That aside aside, the channel coding  theorem works in non-verbal contexts as well. Back in 2016, before my daughter was born, I remember reading somewhere that tentative mothers lead to cranky babies. The theory was that if the mum was anxious or afraid while handling her baby, the baby wouldn’t perceive the signals of touch sufficiently, and being devoid of communication, become cranky.

We had seen a few examples of this among relatives and friends (and this possibly applies to me as well – my mother had told me that I was the first newborn she ever handled, and so she was a bit tentative in handling me). This again can be explained using the Channel Coding Theorem.

When the mother’s touch is tentative, it is as if the touchy channel between mother and child has some “noise”. The tentativeness of the touch means the baby is not really sure of what the mother is “saying”. With touch, unlike language or bits, redundancy is harder. And so the child goes up insufficiently connected to its mother.

Conversely, later on in life, these tentative mothers tend to bring in redundancy in their communications with their (now jittery) children, and end up holding them too hard, and not letting them go (and some of these children go to therapists, who inevitably blame it on the mothers 😛 ). Ultimately, all of this stems from the noise in the initial communication channel (thanks to the tentativeness of the source).

Ok I’ve rambled on here, so will stop now. However, now that I’ve seeded this thought in you, you too will start seeing the channel coding theorem everywhere (oh – if you think this post is badly written, then that is again like reading this over a noisy channel. And you will get irritated with the lack of throughput and pack).

Compression Stereotypes

One of the most mindblowing things I learnt while I was doing my undergrad in Computer Science and Engineering was Lempel-Ziv-Welch (LZW) compression. It’s one of the standard compression algorithms used everywhere nowadays.

The reason I remember this is twofold – firstly, I remember implementing this as part of an assignment (our CSE program at IITM was full of those), and feeling happy to be coding in C rather than in the dreaded Java (which we had to use for most other assignments).

The other is that this is one of those algorithms that I “internalised” while doing something totally different – in this case I was having coffee/ tea with a classmate in our hostel mess.

I won’t go into the algorithm here. However, the basic concept is that as and when we see a new pattern, we give it a code, and every subsequent occurrence of that pattern is replaced by its corresponding code. And the beauty of it is that you don’t need to ship a separate dictionary -the compressed code itself encapsulates it.

Anyway, in practical terms, the more the same kind of patterns are repeated in the original file, the more the file can be compressed. In some sense, the more the repetition of patterns, the less the overall “information” that the original file can carry – but that discussion is for another day.

I’ve been thinking of compression in general and LZW compression in particular when I think of stereotyping. The whole idea of stereotyping is that we are fundamentally lazy, and want to “classify” or categorise or pigeon-hole people using the fewest number of bits necessary.

And so, we use lazy heuristics – gender, caste, race, degrees, employers, height, even names, etc. to make our assumptions of what people are going to be like. This is fundamentally lazy, but also effective – in a sense, we have evolved to stereotype people (and objects and animals) because that allows our brain to be efficient; to internalise more data by using fewer bits. And for this precise reason, to some extent, stereotyping is rational.

However, the problem with stereotypes is that they can frequently be wrong. We might see a name and assume something about a person, and they might turn out to be completely different. The rational response to this is not to beat oneself for stereotyping in the first place – it is to update one’s priors with the new information that one has learnt about this person.

So, you might have used a combination of pre-known features of a person to categorise him/her. The moment you realise that this categorisation is wrong, you ought to invest additional bits in your brain to classify this person so that the stereotype doesn’t remain any more.

The more idiosyncratic and interesting you are, the more the number of bits that will be required to describe you. You are very very different from any of the stereotypes that can possibly be used to describe you, and this means people will need to make that effort to try and understand you.

One of the downsides of being idiosyncratic, though, is that most people are lazy and won’t make the effort to use the additional bits required to know you, and so will grossly mischaracterise you using one of the standard stereotypes.

On yet another tangential note, getting to know someone is a Bayesian process. You make your first impressions of them based on whatever you find out about them, and go on building a picture of them incrementally based on the information you find out about them. It is like loading a picture on a website using a bad internet connection – first the picture appears grainy, and then the more idiosyncratic features can be seen.

The problem with refusing to use stereotypes, or demonising stereotypes, is that you fail to use the grainy pictures when that is the best available, and instead infinitely wait to get better pictures. On the other hand, failing to see beyond stereotypes means that you end up using grainy pictures when more clear ones are available.

And both of these approaches is suboptimal.

PS: I’ve sometimes wondered why I find it so hard to remember certain people’s faces. And I realise that it’s usually because they are highly idiosyncratic and not easy to stereotype / compress (both are the same thing). And so it takes more effort to remember them, and if I don’t really need to remember them so much, I just don’t bother.

Communicating Numbers

Earlier this week I read this masterful blogpost on Andrew Gelman’s blog (though the post itself is not written by Andrew Gelman – it’s written by Phil Price) about communicating numbers.

Basically the way you communicate a number can give a lot more information “between the lines”. Take the example at the top of the article:

“At the New York Marathon, three of the five fastest runners were wearing our shoes.” I’m sure I’m not the first or last person to have realized that there’s more information there than it seems at first. For one thing, you can be sure that one of those three runners finished fifth: otherwise the ad would have said “three of the four fastest.” Also, it seems almost certain that the two fastest runners were not wearing the shoes, and indeed it probably wasn’t 1-3 or 2-3 either: “The two fastest” and “two of the three fastest” both seem better than “three of the top five.” The principle here is that if you’re trying to make the result sound as impressive as possible, an unintended consequence is that you’re revealing the upper limit.

Incredible. So 3 in 5 means one of them is likely to be 5th. And likely one is fourth as well. Similarly, if you see a company that calls itself a “Fortune 500 company”, it is likely closer to 500 than to 100.

The other, slightly unrelated, example quoted in the article is about Covid-19 spread in outdoor conditions. There is another article that says that “less than 10% of covid-19 transmission that happens indoors”. This is misleading because if you say “less than 10%”, people will assume it’s 9%! The number, apparently, is closer to 0.1%.

There are many more such examples that we encounter in real life. If you write on LinkedIn that you went to a “top 10 ranked B-school”, it means you DID NOT go to a “top 5 ranked B-school”.

Loosely related to this, I’ve got a bit irritated over the last year and a bit in terms of imprecise numerical reporting by the media (related to covid-19). I won’t provide links or quotes here, since what I can remember are mostly by one person and I don’t want to implicate her here (and it’s a systemic problem, not unique to her).

You see reports saying “20000 new cases in Karnataka. A majority of them are from Bangalore”. I’ve seen this kind of a report even when 90% of the cases have been from Bangalore, and that is misleading – when you say “majority”, you instinctively think of “50% + 1”. Another report said “as many as 10000 cases”. Now, the “as many as” phrasing makes it sound like a very large number, but put in context, this 10000 wasn’t really very high.

Communication of numbers is an art that is not very well spread. Nowadays we see lots of courses on “telling stories with data”, “data visualisation”, graphics, etc. but none in terms of communication of sheer numbers itself.

Maybe I should record an episode about this in my forthcoming podcast. If you know who might be a good guest for it, AND can make an introduction, let me know.

Risk and data

A while back a group of <a large number of scientists> wrote an open letter to the Prime Minister demanding greater data sharing with them. I must say that the letter is written in academic language and the effort to understand it was too much, but in the interest of fairness I’ll put a screenshot that was posted on twitter here.

I don’t know about this clinical and academic data. However, the holding back of one kind of data, in my opinion, has massively (and negatively) impacted people’s mental health and risk calculations.

This is data on mortality and risk. The kind of questions that I expect government data to have answered was:

  1. If I get covid-19 (now in the second wave), what is the likelihood that I will die?
  2. If my oxygen level drops to 90 (>= 94 is “normal”), what is the likelihood that I will die?
  3. If I go to hospital, what is the likelihood I will die?
  4. If I go to ICU what is the likelihood I will die?
  5. What is the likelihood of a teenager who contracts the virus (and is otherwise in good health) dying of the virus?

And so on. Simple risk-based questions whose answers can help people calibrate their lives and take calculated enough risks to get on with it without putting themselves and their loved ones at risk.

Instead, what we find from official sources are nothing but aggregates. Total numbers of people infected, dead, recovered and so on. And it is impossible to infer answers to the “risk questions” based no that.

And who fill in the gaps? Media of course.

I must have discussed “spectacularness bias” on this blog several times before. Basically the idea is that for something to be news, it needs to carry information. And an event carries information if it occurs despite having a low prior probability (or not occurring despite a high prior probability). As I put it in my lectures, “‘dog bites man’ is not news. ‘man bits dog’ is news”.

So when we rely on media reports to fill in our gaps in our risk systems, we end up taking all the wrong kinds of lessons. We learn that one seventeen year old boy died of covid despite being otherwise healthy. In the absence of other information, we assume that teenagers are under grave risk from the disease.

Similarly, cases of children looking for ICU beds get forwarded far more than cases of old people looking for ICU beds. In the absence of risk information, we assume that the situation must be grave among children.

Old people dying from covid goes unreported (unless the person was famous in some way or the other), since the information content in that is low. Young people dying gets amplified.

Based on all the reports that we see in the papers and other media (including social media), we get an entirely warped sense of what the risk profile of the disease is. And panic. When we panic, our health gets worse.

Oh, and I haven’t even spoken about bad risk reporting in the media. I saw a report in the Times of India this morning (unable to find a link to it) that said that “young are facing higher mortality in this wave”. Basically the story said that people under 60 account for a far higher proportion of deaths in the second wave than in the first.

Now there are two problems with that story.

  1. A large proportion of over 60s in India are vaccinated, so mortality is likely to be lower in this cohort.
  2. What we need is the likelihood of a person under 60 dying upon contracting covid. NOT the proportion of deaths accounted for by under 60s. This is the classic “averaging along the wrong axis” that they unleash upon you in the first test of any statistics course.

Anyway, so what kind of data would have helped?

  1. Age profile of people testing positive, preferably state wise (any finer will be noise)
  2. Age profile of people dying of covid-19, again state wise

I’m sure the government collects this data. Just that they’re not used to releasing this kind fo data, so we’re not getting it. And so we have to rely on the media and its spectacularness bias to get our information. And so we panic.

PS: By no means am I stating that covid-19 is not a risk. All I am stating is that the information we have been given doesn’t help us make good risk decisions

Kneel down

When Colin Kaepernick knelt down during the national anthem, it was cool, and a strong sign of protest against racial violence in the United States. When other athletes, in the US and elsewhere decided to copy him (and did so on their own volition), it was cool as well.

What I find not so convincing is that after the Floyd murder earlier this year, sports organisations across the world decided to institutionalise the kneel down. When the English Premier League restarted after the covid-19 induced break, it was decided that all players and referees would kneel for a minute at kickoff.

Now it seems like it has been decided that the gesture will continue for the 2020-21 season as well – players and officials will take a knee for a minute at the beginning of each game. Of course, it has also been decided to make it “non-mandatory” – players who choose not to not join the protest will be free not to kneel.

The problem with the institutionalisation of the protest is that the protest loses its information content. Prior to the institutionalisation in June, if a player knelt, he/she was making a statement that he/she believed that “black lives matter”. Now that kneeling has become standard practice, there is no way for a player to convey this information.

Alternatively, it is possible now for a player to send out the opposite information (that he/she doesn’t believe in this protest) by refusing to join the protest. However, given the PR repercussions of such a move, it is unlikely that any player is going to take that stance (no pun intended).

Actually – by institutionalising the kneel, the protest level is getting changed, from individual players to leagues. I can see why the protest is going to be continued – it will be a continuing statement by the sporting leagues that they believe in the cause. However, individual players will not have the opportunity to show their protest (or dissent) any more.

I also wonder if and when this protocol is reversed, since it takes effort for some team or league to “bell the cat”. Even saying that “this is mere symbolism” is bound to attract wrath of protestors elsewhere, so teams are all caught in a Nash equilibrium where they continue to kneel down in protest.

And the longer this kneeling down protest continues, the more the meaning that it will lose. Rather than serving to make a statement, it will end up as yet another ritual.

Jio, Amazon and Information Content

A long long time ago I had installed the Jio Cinema app on my Fire TV Stick. I had perhaps watched two movies on that, and then completely forgotten about it. This evening, I had to look for a movie to watch my the wife, and having exhausted most of the “compatible content” (stuff we can watch together on Netflix) and been exhausted by the user experience on Prime Video, I decided to give this app a try.

I ended up selecting a movie, which I later found out has a 4.5 IMDB rating and doesn’t even have a Wikepedia page. Needless to say, we abandoned the movie midway. That’s when the wife went in to put the daughter to bed and my fun began.

So Jio Cinema follows what I call the “Amazon paradigm for product management”. Since Amazon tries to sell every product (or service) as if it is a physical book, it has one single mantra for product management. “Improve selection and they will come”.

The user experience doesn’t matter. How easy the product is to use, and how pleasing it looks on the eye, and whether it has occasional bugs, is all secondary. All that matters is selection. Given that the company built its business on the back of selling “long tail” books, this is not so surprising, except that it doesn’t necessarily work in other categories.

I’ve written about Amazon’s ineptitude in product management before, in the context of that atrocity of an app called Sony Liv. The funny thing is that the Jio Cinema app (on Fire TV Stick) looks and feels pretty much exactly like Sony Liv. Maybe there is an open source shitty fire TV app that these guys have based themselves on?

In any case, I started browsing the Jio Cinema app, and I found something called “movies in 15 minutes“. Initially I thought it was a parody. The first few movies I noticed there were things I had never heard of. “This is perhaps for bad movies”, I reasoned. I kept scrolling, and more recognisable names popped up.

I decided to watch Deewana, which was released just before the start of my optimal age of movie appreciation, and which, for some reason, we didn’t get home a video cassette of.

It’s basically a collage of scenes from the movie. It’s like someone has put together a “highlights package”, taking all the important scenes and then putting them together.

And for a movie like Deewana it works. The 15 minute version had all the necessary plot elements to fully follow the movie. It is a great movie, for 15 minutes. Maybe at 30 minutes as well it might be a great movie. However, I can’t imagine having watched it in the full version.

That was two hours back. I’ve since gone crazy watching 15 minute versions of many other movies (mostly from the 70s and 80s, though they have movies as recent as Jab We Met). It’s been fantastic.

However, I have one crib. This has to do with information content. Essentially, the premise behind “movies in 15 minutes” is that the information content in these movies is so little that the whole thing can be compressed into 15 minutes.  The problem is that not every movie has the same amount of information.

15 minutes was perfect for Deewana. It was also appropriate for Kasam Paida Karne Waali Ki, which I watched only because it gets referenced in Gangs of Wasseypur. Between these two, I “watched” Namak Halaal, and I didn’t understand the head or tail of it. I had to go to Wikepedia to understand the plot.

Essentially the plot of Namak Halaal is complex enough, I imagine, that compressing it into 15 minutes is impossible without significant information loss. And the loss of information was so much that I couldn’t understand the summary at all. Maybe I’ll watch the movie in full some day.

I’m writing this blogpost after watching the 15 minute version of Don. I guess whoever made the summary realised that the movie is so complex that it can’t really be compressed into 15 minutes – and so they have added a voiceover to narrate the key elements.

In any case, I’m feeling super thrilled. I normally don’t watch movies because the bit rate in most movies is too low. Compression means that I can happily watch the movies without ever getting bored.

I wish they made these 15 minute versions of all movies. Jio, all (your Amazon-style product maangement) is forgiven.

Now on to Amar Akbar Anthony.

Half-watching movies, and why I hate tweetstorms

It has to do with “bit rate”

I don’t like tweetstorm. Up to six tweets is fine, but beyond that I find it incredibly difficult to hold my attention for. I actually find it stressful. So of late, I’ve been making a conscious effort to stop reading tweetstorms when they start stressing me out. The stress isn’t worth any value that the tweetstorms may have.

I remember making the claim on twitter that I refuse to read any more tweetstorms of more than six tweets henceforth. I’m not able to find that tweet now.

Anyways…

Why do I hate tweetstorms? It is for the same reason that I like to “half-watch” movies, something that endlessly irritates my wife. I has to do with “bit rates“.

I use the phrase “bit rate” to refer to the rate of flow of information (remember that bit is a measure of information).

The thing with movies is that some of them have very low bit rate. More importantly, movies have vastly varying bit rates through their lengths. There are some parts in a movie where pretty much nothing happens, and a lot of it is rather predictable. There are other parts where lots happens.

This means that in the course of a movie you find yourself engrossed in some periods and bored in others, and that can be rather irritating. And boredom in the parts where nothing is happening sometimes leads me to want to turn off the movie.

So I deal with this by “half watching”, essentially multi tasking while watching. Usually this means reading, or being on twitter, while watching a movie. This usually works beautifully. When the bit rate from the movie is high, I focus. When it is low, I take my mind off and indulge in the other thing that I’m doing.

It is not just movies that I “half-watch” – a lot of sport also gets the same treatment. Like right now I’m “watching” Watford-Southampton as I’m writing this.

A few years back, my wife expressed disapproval of my half-watching. By also keeping a book or computer, I wasn’t “involved enough” in the movie, she started saying, and that half-watching meant we “weren’t really watching the movie together”. And she started demanding full attention from me when we watched movies together.

The main consequence of this is that I started watching fewer movies. Given that I can rather easily second-guess movie plots, I started finding watching highly predictable stuff rather boring. In any case, I’ve recently received permission to half-watch again, and have watched two movies in the last 24 hours (neither of which I would have been able to sit through had I paid full attention – they had low bit rates).


So what’s the problem with tweetstorms? The problem is that their bit rate is rather high. With “normal paragraph writing” we have come to expect a certain degree of redundancy. This allows us to skim through stuff while getting information from them at the same time. The redundancy means that as long as we get some key words or phrases, we can fill in the rest of the stuff, and reading is rather pleasant.

The thing with a tweetstorm is that each sentence (tweet, basically) has a lot of information packed into it. So skimming is not an option. And the information hitting your head at the rate that tweetstorms generally convey can result in a lot of stress.

The other thing with tweetstorms, of course, is that each tweet is disjoint from the one before and after it. So there is no flow to the reading, and the mind has to expend extra energy to process what’s happening. Combine this with a rather high bit rate, and you know why I can’t stand them.

Super Deluxe

In my four years in Madras (2000-4), I learnt just about enough Tamil to watch a Tamil movie with subtitles. Without subtitles is still a bit of a stretch for me, but the fact that streaming sites offer all movies with subtitles means I can watch Tamil movies now.

At the end, I didn’t like Super Deluxe. I thought it was an incredibly weird movie. The last half an hour was beyond bizarre. Rather, the entire movie is weird (which is good in a way we’ll come to in a bit), but there is a point where there is a step-change in the weirdness.

The wife had watched the movie some 2-3 weeks back, and I was watching it on Friday night. Around the time when she finished the movie she was watching and was going to bed, she peered into my laptop and said “it’s going to get super weird now”. “As if it isn’t weird enough already”, I replied. In hindsight, she was right. She had peered into my laptop right at the moment when the weirdness goes to yet another level.

It’s not often that I watch movies, since most movies simply fail to hold my attention. The problem is that most plots are rather predictable, and it is rather easy to second-guess what happens in each scene. It is the information theoretic concept of “surprise”.

Surprise is maximised when the least probable thing happens at every point in time. And when the least probable thing doesn’t happen, there isn’t a story, so filmmakers overindex on surprises and making sure the less probable thing will happen. So if you indulge in a small bit of second order thinking, the surprises aren’t surprising any more, and the movie becomes boring.

Super Deluxe establishes pretty early on that the plot is going to be rather weird. And when you think the scene has been set with sufficient weirdness in each story (there are four intertwined stories in the movie, as per modern fashion), the next time the movie comes back to the story, the story is shown to get weirder. And so you begin to expect weirdness. And this, in a way, makes the movie less predictable.

The reason a weird movie is less predictable is that at each scene it is simply impossible for the view to even think of the possibilities. And in a movie that gets progressively weirder like this one, every time you think you have listed out the possibilities and predicted what happens, what follows is something from outside your “consideration set”. And that keeps you engaged, and wanting to see what happens.

The problem with a progressively weird movie is that at some point it needs to end. And it needs to end in a coherent way. Well, it is possible sometimes to leave the viewer hanging, but some filmmakers see the need to provide a coherent ending.

And so what usually happens is that at some point in time the plot gets so remarkably simplified that everything suddenly falls in place (though nowhere as beautifully as things fall in place at the end of a Wodehouse novel). Another thing that can happen is that weirdness it taken up a notch, so that things fall in place at a “meta level”, at which point the movie can end.

The thing with Super Deluxe is that both these things happen! On one side the weirdness is taken up several notches. And on the other the plots get so oversimplified that things just fall in place. And that makes you finish the movie with a rather bitter taste in the mouth, feeling thoroughly unsatisfied.

That the “ending” of the movie (where things get really weird AND really simplified) lasts half an hour doesn’t help matters.

Why Brits talk so much about the weather

One stereotype about British people is that they are always talking about the weather. In the absence of any other topic to talk about, they get back down to talking about the weather.

Having lived here for a day after a half after moving here yesterday, I can offer one explanation about why Brits talk so much about the weather – the high information content. In the last day and half, the weather here has been so volatile that the information content in statements about the weather can be rather high.

Most places in the world have rather predictable weather. Delhi has hot summers and cold winters. It almost always rains in the tropical rain forests. It almost always rains in Mumbai during the monsoon. And so on. And when the weather is predictable, the information content in describing it is rather low.

For example, if there is a 90% chance that it will rain in Mumbai one monsoon day, a statement on the presence or absence of rain contains only 0.47 bits of information/entropy (-0.9 log 0.9 – 0.1 log 0.1). If the probability that a summer day in Delhi will be sunny is 99%, then the information content in talking about the weather is just 0.08 bits.

The thing with London weather, based on my day and half of observation, is that it is wildly volatile. This afternoon, for example, there was a hailstorm. And only a couple of minutes later there was bright sunshine. And then there was another hailstorm. I can see heavy rain from my window as I write this now.

Crow and fox getting married in London

A post shared by Karthik S (@skthewimp) on

So given how crazy and volatile the weather in London is, the information content in talking about the weather is rather high. As I write, there’s sunshine streaming through my window, and heavy rain outside. And I’m chatting with a friend who lives not very far from here, and whatever I tell her about the weather here is “information” to her, since it’s not the same there.

It’s this craziness and high volatility in weather in Britain that makes it worth talking about. The information content in a statement about the weather is always high. And this is not the case elsewhere in the world. And so people elsewhere get annoyed by Brits talking about the weather.

PS: What does it tell you that I’m blogging about the weather a day and half after landing in Britain?

The one bit machine

My daughter is two weeks old today and she continues to be a “one bit machine”. The extent of her outward communication is restricted to a maximum of one bit of information. There are basically two states her outward communication can fall under – “cry” and “not cry”, and given that the two are not equally probable, the amount of information she gives out is strictly less than one bit.

I had planned to write this post two weeks back, the day she was born, and wanted to speculate how long it would take for her to expand her repertoire of communication and provide us with more information on what she wants. Two weeks in, I hereby report that the complexity of communication hasn’t improved.

Soon (I don’t know how soon) I expect her to start providing us more information – maybe there will be one kind of cry when she’s hungry, and another when she wants her diaper changed. Maybe she’ll start displaying other methods of outward communication – using her facial muscles, for example (right now, while she contorts her face in a zillion ways, there is absolutely no information conveyed), and we can figure out with greater certainty what she wants to convey.

I’m thinking about drawing a graph with age of the person on the X axis, and the complexity of outward information on the Y axis. It starts off with X = 0 and Y = 1 (I haven’t bothered measuring the frequency of cry/no-cry responses so let’s assume it’s equiprobable and she conveys one bit). It goes on to X = 14 days and Y = 1 (today’s state). And then increases with time (I’m hoping).

While I’m sure research exists some place on the information content per syllable in adult communication, I hope to draw this graph sometime based on personal observation of my specimen (though that would limit it to one data point).

Right now, though, I speculate what kind of shape this graph might take. Considering it has so far failed to take off at all, I hope that it’ll be either an exponential (short-term good but long-term I don’t know ) or a sigmoid (more likely I’d think).

Let’s wait and see.