WhatsApp Export Chat

There was a tiny controversy on one WhatsApp group I’m part of. This is a “sparse” WhatsApp group, which means there aren’t too many messages sent. Only around 1000 in nearly 5 years (you’ll soon know how I got that number).

And this morning I wake up to find 42 messages (many members of the group are in the US). Some of them I understood and some I didn’t. So the gossip-monger I am (hey, remember that Yuval Noah Harari thinks gossip is the basis of human civilisation?), I opened up half a dozen backchannel chats.

Like the six blind men of Indostan, these chats helped me construct a picture of what had happened. My domain knowledge had gotten enhanced. However, there was one message that had made a deep impression on me – that claimed that some people were monopolising whatever little conversation there was on that group.

I HAD to test that hypothesis.

The jobless guy that I am, I figured out how to export a chat from WhatsApp. With iOS, it’s rather easy. Go to the info page of a chat or a group, and near “delete chat/group”, you see “export chat/group”. If you say you don’t want media (like I did), you get a text file (I airdropped mine immediately into my Mac).

The formatting of the WhatsApp export file is rather clean, making it easy to parse. The date is in square brackets. The sender’s name (or number, if they’re not in your contact list) is before a colon after the square brackets. A couple of “separate” functions later you are good to go (there are a couple of other nuances. If you can read R code, check mine here).

chat <- read_lines('~/Downloads/_chat.txt')
tibble(txt=chat) %>% 
separate(txt, c("Date", "Content"), '\\] ') %>%
separate(Content, c("Sender", "Content"), ': ') %>%
mutate(
Content=coalesce(Content, Date),
Date=str_trim(str_replace_all(Date, '\\[', '')),
Date2=as.POSIXct(Date, format='%d/%m/%y, %H:%M:%S %p')
) %>%
fill(Date2, .direction = 'updown') %>%
fill(Sender, .direction = 'downup') %>%
filter(!str_detect(Sender, "changed their phone number to a new number") ) %>%
filter(!str_detect(Sender, ' added ') & !str_detect(Sender, ' left')) %>%
filter(!str_detect(Sender, " joined using this group's invite link"))->
mychat

That’s it. You are good to go. You have a nice data frame with sender’s name, message content and date/time of sending. And as one of the teachers at my JEE coaching factory used to say, you can now do “gymnastics”.

And so for the last hour or so I’ve been wasting my time doing such gymnastics. Number of posts sent on each day. Testing the hypothesis that some people talk a lot on the group (I turned out to be far more prolific than I’d imagined). People who start conversations. Whether there are any long bilateral conversations on the group. And so on and so forth (this is how I know there are ~1000 messages on this group).

Now I want to subject all my conversations to such analysis. For bilaterals it won’t be that much fun – but in case there is some romantic or business interest involved you might find it useful to know who initiates more and who closes more conversations.

You can subject the conversations to natural language processing (with what objective, I don’t know). The possibilities are endless.

And the time wastage can be endless as well. So I’ll stop here.

Put Comment