zipf law – Pertinent Observations

No, this post is not about the distribution of poverty. This is a rather technical post about probability distributions. Just that it has something to add to the poverty debate. And like the previous post, this is a departure from the normal RQ-type posts – there will be no graphs, no tables. Just theorizing.

So in the last week or two a lot of op-ed space in India has been consumed by what is described as the “poverty debate”. A recent survey by the National Sample Survey Organization (NSSO) has revealed that poverty levels in India have declined sharply in the last couple of years. And it only accelerates a sharp decline that started after a similar survey in 2004-05. Now, you have the “growthists” and the “distributionists”. The former claim that it is high economic growth in this time period that has led to the fall in poverty. The latter think it is due to redistributionist policies such as the National Rural Employment Guarantee Act (NREGA). Both sides have their merits. However, I’m not going to step into that debate now.

I ask a more fundamental question – how well can we trust the numbers that the NSSO has put out? My concern is this – that the poverty numbers have been gleaned out of a survey. I don’t have a problem with surveying – in fact surveying is a rather well-studied science, and I’m sure people at the NSSO are well-versed with it. My concern is that in this particular survey, the results may not have been properly extrapolated.

Most surveys rely on what is known as the “law of large numbers” and the “central limit theorem” and assume that the quantity being surveyed (people’s consumption expenditure as per this survey) follows a normal distribution. Except that we know that incomes (at least at the upper side of the scale) don’t follow a normal distribution. Instead, it has been shown that they follow what is called as a Power Law distribution.

While I don’t doubt the general quality of scholarship at the NSSO, I want to ask if they have actually studied the real distribution of incomes and used the appropriate one, rather than using a normal distribution. It could be that incomes at the lower end of the scale actually do follow a normal distribution, in which case standard sampling techniques might be used. If not, however, I expect and hope that the NSSO has used a sampling and extrapolation technique appropriate to the distribution incomes actually follow.

Let me illustrate the issue with an extreme example. Let’s say that one of the names drawn as part of the NSSO’s “random sample” for Mumbai is one Mr. Mukesh D Ambani. Assume that there are 99 other persons in Mumbai who are drawn in the same sample, and each of them has an annual household income of Rs. 1 lakh. What will be the mean income of the group? Assuming Mr. Ambani earns Rs. 10 Crore a year (number pulled out of thin air), the mean income of the group of 100 will come out to be close to Rs. 11 lakh!

This is the problem with estimating incomes using surveys and standard extrapolation techniques. While the above example might have been extreme, even in smaller groups of population, there will be “local Mukesh Ambanis” – people whose incomes are much higher than their peer group. Inclusion or exclusion of such people in a standard survey can make a massive difference.

I will end with an example and a request. I remember reading that any family in India that earns over Rs. 12 lakh a year (i.e. Rs. 1 lakh a month) is in the top 1% of all families in India! My family (wife and I) earn more than Rs. 12 lakh. But do we consider ourselves rich? By no means! Why? Because people who are richer than us are much richer than us! That is the problem with quantities that follow a power law distribution.

Now for the request. Can someone instruct me on the easiest way to get the raw data out of the NSSO? Thanks.

Tag: zipf law

Poverty and distributions