## SHAP and WAR

A few months back, at work, a couple of kids in my team taught me this concept called “SHAP“. I won’t go into the technical details here (or maybe I will later on in this post), but it is basically an algo that helps us explain a machine learning model.

It was one of those concepts that I found absolutely mind-blowing, to the extent that after these guys taught this concept to me, it became the proverbial hammer, and I started looking for “nails” all around the company. I’m pretty sure I’ve abused it (SHAP I mean).

Most of the documentation of SHAP is not very good, as you might expect about something that is very deeply technical. So maybe I’ll give a brief intro here. Or maybe not – it’s been a few months since I started using and abusing it, and so I’ve forgotten the maths.

In any case, this is one of those concepts that made me incredibly happy on the day I learnt about it. Basically, to put it “in brief”, what you essentially do is to zero out an explanatory variable, and see what the model predicts with the rest of the variables. The difference between this and the actual model output, approximately speaking, is the contribution of this explanatory variable to this particular prediction.

The beauty of SHAP is that you can calculate the value for hundreds of explanatory variables and millions of observations in fairly quick time. And that’s what’s led me to use and abuse it.

In any case, I was reading something about American sport recently, and I realised that SHAP is almost exactly identical (in concept, though not in maths) to Wins Above Replacement.

WAR works the same way – a player is replaced by a hypothetical “average similar player” (the replacement), and the model calculates how much the team would have won in that case. A player’s WAR is thus the difference between the “actuals” (what the team has actually won) and the hypothetical if this particular player had been replaced by the average replacement.

This, if you think about it, is exactly similar to zeroing out the idiosyncrasies of a particular player. So – let’s say you had a machine learning model where you had to predict wins based on certain sets of features of each player (think of the features they put on those otherwise horrible spider charts when comparing footballers).

You build this model. And then to find out the contribution of a particular player, you get rid of all of this person’s features (or replace it with “average” for all data points). And then look at the prediction and how different it is from the “actual prediction”. Depending on how you look at it, it can either be SHAP or WAR.

In other words, the two concepts are pretty much exactly the same!

## The Problem With American Sport

There was a basketball epidemic when I was in high school. It was probably a result of two things – we used to play basketball regularly in school, and Star Sports (or was it still Prime Sports?) had started showing live games from the NBA. Everyone in school would talk about basketball. Your knowledge of basketball went beyond the Magic Johnsons and Michael Jordans. You learnt about teams with wonderful names such as “Utah Jazz”. And for reasons completely unknown to me, despite having never watched him play (I still haven’t) Patrick Ewing became my favourite player.

So one morning I decided to see what the fuss about NBA was all about, and watch a game. It made for horrible viewing. There were great plays, of course. It was a great spectator sport in that sense. But what annoyed me endlessly were the time outs and consequent advertising breaks. Just when I would get settled into the rhythm of the game, someone would call a time out and for the mid 90s, two minutes of advertising was a really long time!

I still continued to watch, for “pseud value”, so that I could talk about it in school. However, I could never get the kind of engagement that I could get with cricket (then) or football (now). The game was simply way too discontinuous. A game of basketball is supposed to last 40 minutes, but these things would last three times as long. I don’t think I watched more than 2-3 games.

As everyone on my facebook timeline talks about the Super Bowl, the only thing I can think of is how unwatchable American Sport is. I understand that you need the ads to fund the game, and that greater advertising revenue means greater revenue for players and hence greater quality of sport. What irks me however, is that these ads end up causing much discontinuity in the sport.