Ranga and Big Data

There are some meeting stories that are worth retelling and retelling. Sometimes you think it should be included in some movie (or at least a TV show). And you never tire of telling the stories.

The way I met Ranga can qualify as one such story. At the outset, there was nothing special about it – both of us had joined IIT Madras at the same time, to do a B.Tech. in Computer Science. But the first conversation itself was epic, and something worth telling again and again.

During our orientation, one of the planned events was “a visit to the facilities”, where a professor would take us around to see the library, the workshops, a few prominent labs and other things.

I remember that the gathering point for Computer Science students was right behind the Central Lecture Theatre. This was the second day of orientation and I’d already met a few classmates by then. And that’s where I found Ranga.

The conversation went somewhat like this:

“Hi I’m Karthik. I’m from Bangalore”.
“Hi I’m Ranga. I’m from Madras. What are your hobbies?”
“I play the violin, I play chess…. ”
“Oh, you play chess? Me too. Why don’t we play a blindfold game right now?”
“Er. What? What do you want to do? Now?”
“Yeah. Let’s start. e4”.
(I finally managed to gather my senses) “c5”

And so we played for the next two hours. I clearly remember playing a Sicilian Dragon. It was a hard fought game until we ended up in an endgame with opposite coloured bishops. Coincidentally, by that time the tour of the facilities had ended. And we called it a draw.

We kept playing through our B.Techs., mostly blindfold in the backbenches of classrooms. Most of the time I would get soundly thrashed. One time I remember going from our class, with the half-played game in our heads, setting it up on a board in Ranga’s room, and continued to play.

In any case, chess apart, we’ve also had a lot of nice conversations over the last 21 years. Ranga runs a big data and AI company called TheDataTeam, so I thought it would be good to record one of our conversations and share it with the world.

And so I present to you the second episode of my new “Data Chatter” podcast. Ranga and I talk about all things “big data”, data architectures, warehousing, data engineering and all that.

As usual, the podcast is available on all podcasting platforms (though, curiously, each episode takes much longer to appear on Google Podcasts after it has released. So this second episode is already there on Spotify, Apple Podcasts, CastBox, etc. but not on Google yet).

Give it a listen. Share it with whoever you think might like it. Subscribe to my podcast. And let me know what you think of it.

Should this have been my SOP?

I was chatting with a friend yesterday about analytics and “data science” and machine learning and data engineering and all that, and he commented that in his opinion a lot of the work mostly involves gathering and cleaning the data, and that any “analytics” is mostly around averaging and the sort.

This reminded me of an old newsletter I’d written way back in January 2018, soon after I’d read Raphael Honigstein‘s Das Reboot. A short discussion ensued. I sent him the link to that newsletter. And having read the bit about Das Reboot (I was talking about how SAP had helped the German national team win the 2014 FIFA World Cup) and the subsequent section of the newsletter, my friend remarked that I could have used that newsletter edition as a “statement of purpose for my job hunt”.

Now that my job hunt is done, and I’m no more in the job market, I don’t need an SOP. However, for the purpose that I don’t forget this, and keep in mind the next time I’m applying for a job, I’m reproducing a part of that newsletter here. Even if you subscribed to that newsletter, I recommend that you read it again. It’s been a long time, and this is still relevant.

Das Reboot

This is not normally the kind of book you’d see being recommended in a Data Science newsletter, but I found enough in Raphael Honigstein’s book on the German football renaissance in the last 10 years for it to merit a mention here.

So the story goes that prior to the 2014 edition of the Indian Premier League (cricket), Kolkata Knight Riders had announced a partnership with tech giant SAP, and claimed that they would use “big data insights” from SAP’s HANA system to power their analytics. Back then, I’d scoffed, since I wasn’t sure if the amount of data that’s generated in all cricket matches till then wasn’t big enough to merit “big data analytics”.

As it happens, the Knight Riders duly won that edition of the IPL. Perhaps coincidentally, SAP entered into a partnership with another champion team that year – the German national men’s football team, and Honigstein dedicates a chapter of his book to this, and other, partnerships, and the role of analytics in helping the team’s victory in that year’s World Cup.

If you look past all the marketing spiel (“HANA”, “big data”, etc.) what SAP did was to group data, generate insights and present it to the players in an easily consumable format. So in the football case, they developed an app for players where they could see videos of specific opponents doing things. It made it easy for players to review certain kinds of their own mistakes. And so on. Nothing particularly fancy; simply simple data put together in a nice easy-to-consume format.

A couple of money quotes from the book. One on what makes for good analytics systems:

‘It’s not particularly clever,’ says McCormick, ‘but its ease of use made it an effective tool. We didn’t want to bombard coaches or players with numbers. We wanted them to be able to see, literally, whether the data supported their gut feelings and intuition. It was designed to add value for a coach or athlete who isn’t that interested in analytics otherwise. Big data needed to be turned into KPIs that made sense to non-analysts.’

And this one on how good analytics can sometimes invert hierarchies, and empower the people on the front to make their own good decisions rather than always depend on direction from the top:

In its user-friendliness, the technology reversed the traditional top-down flow of tactical information in a football team. Players would pass on their findings to Flick and Löw. Lahm and Mertesacker were also allowed to have some input into Siegenthaler’s and Clemens’ official pre-match briefing, bringing the players’ perspective – and a sense of what was truly relevant on the pitch – to the table.

A lot of business analytics is just about this – presenting the existing data in an easily consumable format. There might be some statistics or machine learning involved somewhere, but ultimately it’s about empowering the analysts and managers with the right kind of data and tools. And what SAP’s experience tells us is that it may not be that bad a thing to tack on some nice marketing on top!

Hiring data scientists

I normally don’t click through on articles in my LinkedIn feed, but this article about the churn in senior data scientists caught my eye enough for me to click through and read the whole thing. I must admit to some degree of confirmation bias – the article reflected my thoughts a fair bit.

Given this confirmation bias, I’ll spare you my commentary and simply put in a few quotes:

Many large companies have fallen into the trap that you need a PhD to do data science, you don’t.

Not to mention, I have yet to see a data science program I would personally endorse. It’s run by people who have never done the job of data science outside of a lab. That’s not what you want for your company.

Doing data science and managing data science are not the same. Just like being an engineer and a product manager are not the same. There is a lot of overlap but overlap does not equal sameness.

Most data scientists are just not ready to lead the teams. This is why the failure rate of data science teams is over 90% right now. Often companies put a strong technical person in charge when they really need a strong business person in charge. I call it a data strategist.

I have worked with companies that demand agile and scrum for data science and then see half their team walk in less than a year. You can’t tell a team they will solve a problem in two sprints. If they don’t’ have the data or tools it won’t happen.

I’ll end this blog post with what my friend had to say (yesterday) about what I’d written about how SAP helped the German National team. “This is what everyone needs to do first. (All that digital transformation everyone is working on should be this kind of work)”.

I agree with him on this.