Occasionally, this blog takes a break from presenting interesting data to critiquing data-related journalism in the media. Our object of attention for this post is a report in the Hindustan Times that states that “Maharashtra has highest number of road accidents in the country”. The headline is factually correct, if you go by the data on the website of the Ministry of Road Transportation and Highways. The problem, however, is that it is a meaningless statistic.
It might be intuitive to you that one cannot compare the number of accidents in a large state like Maharashtra to that of a small state of Manipur – the former is so much larger than the latter that it is bound to have more accidents. Extending this argument, does it makes sense to compare states on the basis of sheer number of accidents? Does the statistic of “state with highest number of accidents” make sense? If not, what is a good metric to compare road safety in various states?
Comparing values that are measured in ‘absolute numbers’ across geographies makes no sense, for it doesn’t take into account the difference in size of the various geographies. In order to get a good comparison we need to “normalize” the measure that takes into account the relative sizes of the geographies. And it is important that we use the right metric in order to normalize the measures.
So how do we compare the accident rates in Maharashtra and Manipur, given their different sizes? An intuitive normalizing factor is the state population. Population might be a good metric for comparing birth rates or disease incidence rates, but road accidents? Population doesn’t account for people in one state driving more than in another state. We need a better metric.
Going back to the basics, what are we trying to achieve here by comparing accident rates across states? The accident rates is probably going to be used as a proxy for road safety. So how would you compare road safety across two different regions? A good metric, I would argue, is the likelihood of having an accident if you were to drive 1 kilometer. Or the number of accidents per vehicle kilometer. Notice that this at once takes care of both problems we have discussed above – sizes of states as well as propensity of people in various states to drive.
However, whether this is the best metric is debatable. For example, this metric ignores the “vehicle mix” in various states – so would “passenger kilometer” (rather than “vehicle kilometer”) be better? Perhaps. Again, this metric assumes that all kinds of roads are similar, and treats traveling along a kilometer of a highway as equivalent to traveling a kilometer on a village road. There are no “perfect” metrics or “normalizing factors” – so we have to choose one that is “good enough” and go with it.
Now, let us compare states based on their likelihood of accidents. Unfortunately, data on “vehicle kilometers” is hard to come by – in the absence of tolled roads, no one really keeps track of this. So we need to use a proxy. Again, it is debatable about what is the best proxy (remember there was already a debate on what is the best measure), but for ease of data capture (if not anything else) let us use “accidents per total road length” as a metric here. Drawbacks of this metric is that it doesn’t capture how busy these roads are, and are only a loose proxy for how much people drive.
The graph below shows the relative safety of roads in Indian states. Based on accidents per 10000 kilometers of roads, we see that Maharashtra (green) is quite close to the national average (blue). It turns out that it is the union territory of Lakshadweep that is the clear outlier on number of accidents per kilometer of road.
Based on this, we can say that the article in the Hindustan Times quoted at the beginning of this piece, while factually correct, does not present a correct picture.