baseball – Pertinent Observations

A few months back, at work, a couple of kids in my team taught me this concept called “SHAP“. I won’t go into the technical details here (or maybe I will later on in this post), but it is basically an algo that helps us explain a machine learning model.

It was one of those concepts that I found absolutely mind-blowing, to the extent that after these guys taught this concept to me, it became the proverbial hammer, and I started looking for “nails” all around the company. I’m pretty sure I’ve abused it (SHAP I mean).

Most of the documentation of SHAP is not very good, as you might expect about something that is very deeply technical. So maybe I’ll give a brief intro here. Or maybe not – it’s been a few months since I started using and abusing it, and so I’ve forgotten the maths.

In any case, this is one of those concepts that made me incredibly happy on the day I learnt about it. Basically, to put it “in brief”, what you essentially do is to zero out an explanatory variable, and see what the model predicts with the rest of the variables. The difference between this and the actual model output, approximately speaking, is the contribution of this explanatory variable to this particular prediction.

The beauty of SHAP is that you can calculate the value for hundreds of explanatory variables and millions of observations in fairly quick time. And that’s what’s led me to use and abuse it.

In any case, I was reading something about American sport recently, and I realised that SHAP is almost exactly identical (in concept, though not in maths) to Wins Above Replacement.

WAR works the same way – a player is replaced by a hypothetical “average similar player” (the replacement), and the model calculates how much the team would have won in that case. A player’s WAR is thus the difference between the “actuals” (what the team has actually won) and the hypothetical if this particular player had been replaced by the average replacement.

This, if you think about it, is exactly similar to zeroing out the idiosyncrasies of a particular player. So – let’s say you had a machine learning model where you had to predict wins based on certain sets of features of each player (think of the features they put on those otherwise horrible spider charts when comparing footballers).

You build this model. And then to find out the contribution of a particular player, you get rid of all of this person’s features (or replace it with “average” for all data points). And then look at the prediction and how different it is from the “actual prediction”. Depending on how you look at it, it can either be SHAP or WAR.

In other words, the two concepts are pretty much exactly the same!

There was a basketball epidemic when I was in high school. It was probably a result of two things – we used to play basketball regularly in school, and Star Sports (or was it still Prime Sports?) had started showing live games from the NBA. Everyone in school would talk about basketball. Your knowledge of basketball went beyond the Magic Johnsons and Michael Jordans. You learnt about teams with wonderful names such as “Utah Jazz”. And for reasons completely unknown to me, despite having never watched him play (I still haven’t) Patrick Ewing became my favourite player.

So one morning I decided to see what the fuss about NBA was all about, and watch a game. It made for horrible viewing. There were great plays, of course. It was a great spectator sport in that sense. But what annoyed me endlessly were the time outs and consequent advertising breaks. Just when I would get settled into the rhythm of the game, someone would call a time out and for the mid 90s, two minutes of advertising was a really long time!

I still continued to watch, for “pseud value”, so that I could talk about it in school. However, I could never get the kind of engagement that I could get with cricket (then) or football (now). The game was simply way too discontinuous. A game of basketball is supposed to last 40 minutes, but these things would last three times as long. I don’t think I watched more than 2-3 games.

As everyone on my facebook timeline talks about the Super Bowl, the only thing I can think of is how unwatchable American Sport is. I understand that you need the ads to fund the game, and that greater advertising revenue means greater revenue for players and hence greater quality of sport. What irks me however, is that these ads end up causing much discontinuity in the sport.

So this morning I was thinking about why I get irked so much about ads in American sports (basketball, american football, etc.) while I can still watch cricket, which has a fair share of ads. The answer lies in randomness. I know when a cricket telecast will switch to ads – at the end of every over, at the fall of a wicket, or in an innings break. When an advertisement comes in a cricket broadcast, I’m prepared for it (except of course, when greedy broadcasters cut to ads before the full over is bowled). It is a similar case in tennis, where I expect to switch to advertisements after every two games – there is a rhythm to it.

In American sport, it is not so. That teams can call for a timeout at any point in time, and that can completely put you off. The game cuts to advertisements at moments when you least expect it, and that can be a huge challenge for someone not used to it!

A year or so ago, I had attended this lecture on sports analytics in Bangalore, delivered by a University of Chicago professor. He said that the reason football hasn’t taken off in the US is because it is not television friendly. “Split a game into four quarters, introduce two time outs in each quarter, and you will see Major League Soccer taking off”, he said. The problem, however, is that this would simply ruin the continuity of the game – which is what a lot of people love about football. And looking at the funding of the clubs in the major European leagues, it is clear that football is making sufficient money from television in its current form, without any gimmickry.

An American colleague at my last job offered another perspective. “How can you watch a game continuously for 45 minutes”, he asked. “We are so used to breaks in play every few minutes that we can’t watch continuously for so long”. If I can extrapolate from this one data point and take it with conjunction with what the professor said, you know why football is not popular in the US.

When I woke up this morning I wanted to check if the Super Bowl was being telecast in India. Then I remembered my earlier experiences of trying to watch American football, and decided against it. It is too discrete a game for my liking. There are too many breaks in play. I’d any day watch rugby instead! It is a similar game but so much more elegant and continuous!

Tag: baseball

SHAP and WAR

The Problem With American Sport