WhatsApp Export Chat

There was a tiny controversy on one WhatsApp group I’m part of. This is a “sparse” WhatsApp group, which means there aren’t too many messages sent. Only around 1000 in nearly 5 years (you’ll soon know how I got that number).

And this morning I wake up to find 42 messages (many members of the group are in the US). Some of them I understood and some I didn’t. So the gossip-monger I am (hey, remember that Yuval Noah Harari thinks gossip is the basis of human civilisation?), I opened up half a dozen backchannel chats.

Like the six blind men of Indostan, these chats helped me construct a picture of what had happened. My domain knowledge had gotten enhanced. However, there was one message that had made a deep impression on me – that claimed that some people were monopolising whatever little conversation there was on that group.

I HAD to test that hypothesis.

The jobless guy that I am, I figured out how to export a chat from WhatsApp. With iOS, it’s rather easy. Go to the info page of a chat or a group, and near “delete chat/group”, you see “export chat/group”. If you say you don’t want media (like I did), you get a text file (I airdropped mine immediately into my Mac).

The formatting of the WhatsApp export file is rather clean, making it easy to parse. The date is in square brackets. The sender’s name (or number, if they’re not in your contact list) is before a colon after the square brackets. A couple of “separate” functions later you are good to go (there are a couple of other nuances. If you can read R code, check mine here).

chat <- read_lines('~/Downloads/_chat.txt')
tibble(txt=chat) %>% 
separate(txt, c("Date", "Content"), '\\] ') %>%
separate(Content, c("Sender", "Content"), ': ') %>%
mutate(
Content=coalesce(Content, Date),
Date=str_trim(str_replace_all(Date, '\\[', '')),
Date2=as.POSIXct(Date, format='%d/%m/%y, %H:%M:%S %p')
) %>%
fill(Date2, .direction = 'updown') %>%
fill(Sender, .direction = 'downup') %>%
filter(!str_detect(Sender, "changed their phone number to a new number") ) %>%
filter(!str_detect(Sender, ' added ') & !str_detect(Sender, ' left')) %>%
filter(!str_detect(Sender, " joined using this group's invite link"))->
mychat

That’s it. You are good to go. You have a nice data frame with sender’s name, message content and date/time of sending. And as one of the teachers at my JEE coaching factory used to say, you can now do “gymnastics”.

And so for the last hour or so I’ve been wasting my time doing such gymnastics. Number of posts sent on each day. Testing the hypothesis that some people talk a lot on the group (I turned out to be far more prolific than I’d imagined). People who start conversations. Whether there are any long bilateral conversations on the group. And so on and so forth (this is how I know there are ~1000 messages on this group).

Now I want to subject all my conversations to such analysis. For bilaterals it won’t be that much fun – but in case there is some romantic or business interest involved you might find it useful to know who initiates more and who closes more conversations.

You can subject the conversations to natural language processing (with what objective, I don’t know). The possibilities are endless.

And the time wastage can be endless as well. So I’ll stop here.

Evaluating WhatsApp groups

Over time I’ve come to become a member of several WhatsApp groups. Some of them are temporary, designed to simply coordinate on a particular one-off event. Others are more permanent, existing over a long term, but with no particular agenda.

Over this time I’ve also exited several WhatsApp groups, especially those that have gotten a bit annoying. I remember this day last year when I stepped in and out of a meeting, and I found a hundred messages on a family WhatsApp group, most of them being random forwards, and a few of them being over a page long. I quickly exited that group.

Not everyone quickly exits groups they don’t like, though. There is social pressure to remain, since anyone’s exit gets publicly broadcast in the group. Being a member of a WhatsApp group is the latest measure of conformity, and irrespective of how annoying some groups are, one is forced to endure.

Not all WhatsApp groups are annoying, though. Some groups I’m a member of are an absolute joy. There are times when I explicitly choose to initiate a conversation within the group, than bilaterally, so that others in the group can pitch in. And this taking of the conversation to the group is usually not minded by the intended counterparty as well.

Thinking about good and bad WhatsApp groups, I was wondering if there is a good and clean metric to determine how “good” or “useful” a WhatsApp group might be. Based on my experience, I have one idea. Do let me know if you know a better way to characterise whether a WhatsApp group is going to be good or bad.

When you have a WhatsApp group with N people, you are essentially bringing together N * (N-1)/2 pairs of people. Now, some of these pairs might get along fantastically well. Other pairs might loath each other. And yet others are indifferent to each other.

My hypothesis is that the more the number of pairs in a group that like to talk to each other, the better the group functions (yes it’s a rather simple metric).

Now, this hypothesis is rather simplistic – for example, you can have threesomes of people whose mutual relationship is very different from that of any pair taken together. So this ignores a higher order correlation term, but improves simplicity. It’s like that benzene ring, where six carbon atoms bond together in a way no two of them as a pair can (forget the scientific term for such bonding)!

Yet, what we have here is a good measure of cohesion within the group. It also explains why sometimes the addition of a single member can lead to the destruction of the group – for it can increase the proportion of people who don’t like to talk to each other!

The model is incomplete, though. For now, it doesn’t differentiate between “don’t care conditions” (people in the group who are indifferent to each other) and “don’t get alongs”. If we can incorporate that without making the formula more complex, I think we might be up to something.

Maybe we should form a WhatsApp group to discuss what a good formula might look like!