Core quants and desk quants on main street

The more perceptive of you might have realised that I’m in the job market.

Over the last one month, my search has mostly be “breadth first” (lots of exploratory conversations with lots of companies), and I’m only now starting to “go deep” into some of them. As part of this process, I need to send out a pitch to a company I’ve been in conversation with regarding what I can do for them.

So I’ve been thinking of how to craft my mandate while keeping in mind that they have an existing data science team. And while I was thinking about this problem, I realised that I can model it like how investment banks (at least one that I worked for) do – in terms of “core quants” and “desk quants”.

I have written about this on my blog before – most “data scientists” in industry are equivalent to what investment banks call “core quants”. They are usually highly technically accomplished people; in many cases they are people who were on an academic path that they left to turn to industry. They do very well in “researchy” environments.

They’re great at running long-gestation-period assignments, working on well defined technical problems and expressing their ideas in code. In general, though (I know I’m massively generalising), they are not particularly close to the business and struggle to deal with the ambiguities that business throws at them from time to time.

What I had mentioned in my earlier post is that “main street” (the American word for “general industry”) lacks “desk quants”. In investment banks, desk quants are attached to trading desks and work significantly closer to the business. They may work less on firmwide or long term strategic projects, but their strength is in blending the models and the markets, and building and making simple tweaks to models so that they remain relevant to the business.

And this is the sort of role in which I’m planning to pitch myself – to all potential employers. That while I’m rather comfortable technically, and all sorts of different modelling techniques, I’m not “deep into tech” and like to work close to the markets. I realise that this analogy will be lost on most people, so I need to figure out a better way of marketing myself. Any ideas will be appreciated.

Over the last month or so I’ve been fairly liberal and using my network to get introductions and references. The one thing I’ve struggled with there is how they describe me as. Most people end up describing me as a “data scientist”, and I’m not sure that’s an accurate description of what I do. Then again, it’s my responsibility to help them figure out how best to describe me. And that’s another thing I’m struggling in. “Desk quant” doesn’t translate well.

Tautological Claims

Sometimes the media can’t easily reason on what led to something that they consider to be negative. In such cases they resort to tautologies. One version of this was seen in the late 2000s, during the Global Financial Crisis. The crisis “was caused by greed”, claimed many a story. “It is because of the greed of a handful of bankers that we have to suffer”, they said.

Fast forward ten to twelve years later, and the global financial crisis is behind us (though many economies aren’t yet doing as well as they were before that crisis). The big problem that a lot of people are facing is addiction – to their smartphones, to apps, to social media, and so on. Once again, media at large seems to have been unable to reason effectively on why this addiction is happening. And so once again, they are resulting in “tautologies”.

“Apps are engineered so that you engage more with them”, they say. If you ask the product manager in charge of the app, you will find out that his metric is to increase user engagement, and make sure people spend more time on the app. “Apps use psychological tools to make you spend more time on them”, the outlets write, as if that is a bad thing.

However, if you are an overstretched product manager hard-pressed to increase engagement, there is no surprise that you would use every possible method – logical and psychological, to do so. And if that means relying on psychological research that talks about how to increase addiction, so be it!

It is tautological that social media companies “want to increase engagement” or “want to increase the amount of time people spend on the platforms”, and that they will try to achieve these goals. So when media agencies talk about these goals as something to be scared about, it’s like they’re bullshitting – there’s absolutely no information that is being added in such headlines.

It is similar to how a decade and a bit ago the same media decided to blame a fundamental human tendency – greed – for the financial crisis.

A banker’s apology

Whenever there is a massive stock market crash, like the one in 1987, or the crisis in 2008, it is common for investment banking quants to talk about how it was a “1 in zillion years” event. This is on account of their models that typically assume that stock prices are lognormal, and that stock price movement is Markovian (today’s movement is uncorrelated with tomorrow’s).

In fact, a cursory look at recent data shows that what models show to be a one in zillion years event actually happens every few years, or decades. In other words, while quant models do pretty well in the average case, they have thin “tails” – they underestimate the likelihood of extreme events, leading to building up risk in the situation.

When I decided to end my (brief) career as an investment banking quant in 2011, I wanted to take the methods that I’d learnt into other industries. While “data science” might have become a thing in the intervening years, there is still a lot for conventional industry to learn from banking in terms of using maths for management decision-making. And this makes me believe I’m still in business.

And like my former colleagues in investment banking quant, I’m not immune to the fat tail problem as well – replicating solutions from one domain into another can replicate the problems as well.

For a while now I’ve been building what I think is a fairly innovative way to represent a cricket match. Basically you look at how the balance of play shifts as the game goes along. So the representation is a line graph that shows where the balance of play was at different points of time in the game.

This way, you have a visualisation that at one shot tells you how the game “flowed”. Consider, for example, last night’s game between Mumbai Indians and Chennai Super Kings. This is what the game looks like in my representation.

What this shows is that Mumbai Indians got a small advantage midway through the innings (after a short blast by Ishan Kishan), which they held through their innings. The game was steady for about 5 overs of the CSK chase, when some tight overs created pressure that resulted in Suresh Raina getting out.

Soon, Ambati Rayudu and MS Dhoni followed him to the pavilion, and MI were in control, with CSK losing 6 wickets in the course of 10 overs. When they lost Mark Wood in the 17th Over, Mumbai Indians were almost surely winners – my system reckoning that 48 to win in 21 balls was near-impossible.

And then Bravo got into the act, putting on 39 in 10 balls with Imran Tahir watching at the other end (including taking 20 off a Mitchell McClenaghan over, and 20 again off a Jasprit Bumrah over at the end of which Bravo got out). And then a one-legged Jadhav came, hobbled for 3 balls and then finished off the game.

Now, while the shape of the curve in the above curve is representative of what happened in the game, I think it went too close to the axes. 48 off 21 with 2 wickets in hand is not easy, but it’s not a 1% probability event (as my graph depicts).

And looking into my model, I realise I’ve made the familiar banker’s mistake – of assuming independence and Markovian property. I calculate the probability of a team winning using a method called “backward induction” (that I’d learnt during my time as an investment banking quant). It’s the same system that the WASP system to evaluate odds (invented by a few Kiwi scientists) uses, and as I’d pointed out in the past, WASP has the thin tails problem as well.

As Seamus Hogan, one of the inventors of WASP, had pointed out in a comment on that post, one way of solving this thin tails issue is to control for the pitch or  regime, and I’ve incorporated that as well (using a Bayesian system to “learn” the nature of the pitch as the game goes on). Yet, I see I struggle with fat tails.

I seriously need to find a way to take into account serial correlation into my models!

That said, I must say I’m fairly kicked about the system I’ve built. Do let me know what you think of this!

Upside down pricing in payment services

Some Indian banks charge for services that are cheap to execute, and offer for free expensive services 

Last week I enddd up spending some time waiting at a teller counter at a bank. This was due to some mess up with a cheque I had received. During my time at the teller counter I had the opportunity to observe other people at the same counter. 

There were a few people depositing cash into their business accounts. A few others were depositing cheques. What caught my attention, however, was this guy from a nearby business who came to deposit a large number of cheques. 

He had an entire book of challan leaves (banks regularly issue those to business customers), to each of which was stapled a cheque. As I watched, the teller would put a seal on a cheque, its corresponding challan and another seal on the counter foil. This process was repeated for each challan in the book. 

And this process was only to accept the cheques. Later on there would’ve been further effort on behalf of the bank to cash the cheque and actually execute the fund transfer. And then add in the effort of writing out all those cheques, writing out all those challans (they’re hard to print) and then take them to the bank. 

It was a rather laborious process all round, on behalf of all parties involved. Yet, banks mostly execute this function for free for most customers. 

On the other hand, they charge for account to account transfers, and the amount isn’t particularly small. Like this morning I was moving money from one account  to another, a process that took me a minute and that wouldn’t have cost the bank any human minutes. And icici bank decided to charge me for it. 

It seems like banks have their pricing and the valuation of their own effort all wrong. For electronic payments the cost is direct – what the banks have to pay the payments systems and any per use software costs. And this makes it easier to value and charge for such services. 

The effort in transacting through cheques, on the other hand, is not directly measurable (though by no means an impossible exercise). There are back offices that do the job whose cost is easy to measure, but several employees who also do other things spend time processing cheques. And this difficulty in measurement means that most banks just don’t charge for cheques. 

Around 2000 when foreign banks expanded their branch networks in india there was an attempt to charge customers for walking into the branch – customers were encouraged to do their business at ATMs or over the phone, instead. This was in recognition of the costs of customer walkins into branches.  

Banks would do well now to do something similar for cheques as well – despite the cheque truncation system (CTS), the effort involved in organising payments through cheques is massive for the bank. 

There is only one upside to cheques – and this is a downside for customers. Cheques result in money going into limbo. The payer doesn’t know when the funds will leave his account and can’t use the funds. The recipient can’t use it either until he has got it. So for the duration that the amount is “in transit” (and this duration can vary significantly) banks can happily use these funds without them being called. 

It’s possible that the benefit to the banks from this float more than compensates for the pain of processing cheques. If not, cheques have no business existing any more! 

Financial inclusion and cash

Varad Pande and Nirat Bhatnagar have an interesting Op-Ed today in Mint about financial inclusion, and about how financial institutions haven’t been innovative to make products that are suited to the poor, and how better user interface can also drive financial inclusion. I found this example they took rather interesting:

Take, for instance, a daily wager who makes Rs200 on the days she gets work. Work is unpredictable, and expenses too can be volatile, so she has to borrow money for buying vegetables, or to pay the doctor’s fees when her children fall sick. Her real need is for a flexible—small ticket, variable amount, rapid approval—loan product that she can access instantly. Unfortunately, no institutional channel—neither the public sector bank where she has a “no frills” account, nor the MFI that she has previously borrowed from—offers such a product. She ends up borrowing from neighbours, often from the local moneylender.

Now, based on my experience in FinTech, it is not hard to design a loan product for someone whose cash flows are known. The bank statement is nothing but a continuing story of the account holder’s life, and if you can understand the cash flows (both in and out) for a reasonable period of time, it is straightforward to design a loan product that fits that cash flow pattern.

The key thing, however, is that you need to have full information on transactions, in terms of when cash comes in and goes out, what the cash outflow is used for, and all that. And that is where the cash economy is a bit of a bummer.

For a banker who is trying to underwrite, and decide the kind of loan product (and interest rate) to offer to a customer, the customer’s cash transactions obscure information; information that could’ve been used by the bank to design/structure/recommend the appropriate product for the customer.

For the case that Pande and Bhatnagar take, if all inflows and outflows are in cash, there is little beyond the potential borrower’s word that can convince bankers of the borrower’s creditworthiness. And so the potential borrower is excluded from the system.

If, on the other hand, the potential borrower were to have used non-cash means for all her transactions, bankers would have had a full picture of her life, and would have been able to give her an appropriate loan!

In this sense, I think so far financial inclusion has been going on ass-backwards, with most microfinance institutions (MFIs) targeting loans rather than deposits. And with little data to base credit on, it’s resulted in wide credit spreads and interest rates that might be seen as usurious.

Instead, if banks and MFIs had gone the other way, first getting customers to deposit, and then use the bank account for as much of their transactions as possible, it would have been possible to design much better financial products, and include more customers!

The current disruption in the cash economy possibly offers banks and MFIs a good chance to rectify their errors so far!

Intermediation and the battle for data

The Financial Times reports ($) that thanks to the rise of AliPay and WeChat’s payment system, China’s banks are losing significantly in terms of access to customer data. This is on top of the $20Billion or so they’re losing directly in terms of fees because of these intermediaries.

But when a consumer uses Alipay or WeChat for payment, banks do not receive data on the merchant’s name and location. Instead, the bank record simply shows the recipient as Alipay or WeChat.

The loss of data poses a challenge to Chinese banks at a time when their traditional lending business is under pressure from interest-rate deregulation, rising defaults, and the need to curb loan growth following the credit binge. Big data are seen as vital to lenders’ ability to expand into new business lines.

I had written about this earlier on my blog about how intermediaries such as Swiggy or Grofers, by offering a layer between the restaurant/shop and consumer, now have access to the consumer’s data which earlier resided with the retailer.

What is interesting is that before businesses realised the value of customer data, they had plenty of access to such data and were doing little to leverage and capitalise on it. And now that people are realising the value of data, new intermediaries that are coming in are capturing the data instead.

From this perspective, the Universal Payment Interface (UPI) that launched last week is a key step for Indian banks to hold on to customer data which they could have otherwise lost to payment wallet companies.

Already, some online payments are listed on my credit card statement in the name of the payment gateway rather than in the name of the merchant, denying the credit card issuers data on the customer’s spending patterns. If the UPI can truly take off as a successor to credit cards (rather than wallets), banks can continue to harness customer data.

Banks starting to eat FinTech’s lunch?

I’ve long maintained that the “winner” in the “battle” for payments will be the conventional banking system, rather than one of the new “wallet” or “payment service providers”. This view is driven by the advances being made by the National Payments Corporation of India (NPCI) which is owned by a consortium of banks.

First there was the Immediate Payment System (IMPS) which allows you to make instant inter-bank transfers. While technology is great, evangelism and product management on the banks’ part has been lacking, thanks to which it has failed to take off. In the meantime NPCI has come up with an even superior protocol called Universal Payment Interface (UPI), which should launch commercially later this year.

There is hope that banks do a better job of managing this (there are positive signs of that), and if they do that, a lot of the payment systems providers might have to either partner with banks (the BookMyShow wallet is already powered by RBL (the artist formerly known as Ratnakar Bank Limited) ).

In the meantime, banks have started encroaching on FinTech territory elsewhere. One of the big promises of FinTech (and one I’ve participated in, consulting with two companies in the space) has been to ease the loans process, by cutting through the tedious procedures banks have to offer, and making it a much more hassle-free process for borrowers.

A risk in this business, of course, has been that if banks set their eye on this business, they can eat up the upstarts by doing the same thing cheaper – banks, after all, have access to far cheaper capital, and what is required is a procedural overhaul. The promise in the FinTech business is that banks are large slow-moving creatures, and it will take time for them to change their processes.

Two recent pieces of news, however, suggest that large banks may be coming at FinTech far sooner than we expected. And both these pieces of news have to do with India’s largest lender State Bank of India (SBI).

One popular method for FinTech to grow has been to finance sellers on e-commerce platforms, using non-traditional data such as rating on the platforms, sales through the platform, etc. And SBI entered this in January this year, forming a partnership with Snapdeal (one of India’s largest e-commerce stores).

Snapdeal, India’s largest online marketplace, today announced an exclusive partnership with State Bank of India to further strengthen its ecosystem for its sellers. With this association, Snapdeal sellers will be able to get approval on loans from financers solely on the basis of a unique credit scoring model. There will be no requirement of any financial statements and collaterals.

Sellers on the marketplace can apply for loans online and get immediate sanction, thereby enabling “loans at the click of a button”. This innovative product moves away from traditional lending based on financial statements like balance sheet and income tax returns. Instead, it uses proprietary platform data and surrogate information from public domain to assess the seller’s credit worthiness for sanctioning of loan.

Another popular method to expand FinTech has been to lend to customers of e-commerce stores. And in a newly announced partnership, SBI is there again, this time financing purchases on the Flipkart platform.

State Bank of India, the country’s largest bank, announced a series of digital initiatives on Friday, including a first of its kind partnership with e-commerce giant Flipkart, to offer bank customers a pre-approved EMI facility to purchase products on the retailer’s website.

The bank, which celebrates its 61st anniversary (State Bank Day) on July 1, said the objective was to provide finance to credit worthy individuals, and not just credit card holders. The EMI facility will be available in tenures of six, nine and 12 months.

Just last evening, I was telling someone that there’s no hurry to get into FinTech since it will take a decade for the industry to mature, so it’s not a problem if one enters late. However, looking at the above moves by SBI, it seems the banks are coming faster!

 

Bonuses and federalism

I spent a couple of years working for an investment bank, and the way they would distribute (the rather hefty) bonuses in the organization was rather interesting. Each manager in the firm would receive two sums – the first was his own bonus, and the second was the bonus to be distributed among all his subordinates. If any of the said subordinates were managers themselves, they would similarly receive two sums – separately for themselves and for their subordinates.

This is pertinent in relation to the devolution of power between the states and the third level of government. Even though district, taluk and city governments have been empowered by the 73rd and 74th amendments, they don’t have much real power because their finances are controlled by their respective state governments. In banking terms, this is like giving a manager one pot, and asking him to divide it between himself and his subordinates. The incentive is obviously to distribute the minimum amount possible to keep the subordinates happy. And this is exactly what is happening to federalism in India today.

What we need is a strict rule-based formula of distribution of central government revenues between the central governments, states and the next level (rule can be made based on populations, etc.). What we also need is a requirement for states to enact similar rules to divide revenue between states, districts and sub-districts in a rule-based manner. Until this happens, true federalism will remain a pipe dream.

Provisioning for Non Performing Assets at Banks

K C Chakrabarty, a Deputy Governor at the Reserve Bank of India recently made a presentation on the credit quality at Indian banks (HT: Deepak Shenoy). In this presentation Dr. Chakrabarty talks about the deteriorating quality of credit in Indian banks, especially public sector banks.

What caught my eye as I went through the presentation, however, was this graph that he presented on “Gross” and “Net” NPAs (Non-Performing Assets). Now, every bank is required to “provision” for NPAs. If I’ve lent out Rs. 100 and I estimate that I can recover Rs. 98 out of this, I need to “provision” for the other Rs. 2 which I expect to become “bad assets”. Essentially even before there is the default of Rs. 2, you account for it in your books, so that when the default does occur, it won’t be a surprise to either you or your investors.

Now, NPAs are measured in two ways – gross and net. Gross NPAs is just the total assets that you’ve lent out that you cannot recover. Net NPAs are gross NPAs less provisioning – for example, if you expected that this year Rs. 2 out of Rs. 100 will not come back, and indeed you manage to collect Rs. 98, then your Net NPA is zero, since you’ve “provisioned” for the Rs. 2 of assets that went bad. If on the other hand, you’ve expected and provisioned for Rs. 2 out of Rs. 100 to be “bad”, and you manage to collect only Rs. 97, your “Net NPA” is Re. 1, since you now have Gross NPA of Rs. 3 of which only Rs. 2 had been provisioned for.

This graph is from Dr. Chakarabarty’s presentation, indicating the movement of total NPAs (across banks, gross and net) over the years:

Source: Presentation by K C Chakrabarty, RBI Dy. Gov. , via Capital Mind

What should strike you is that the net NPA number has always been strictly positive. What this means is that our banks, collectively, have never provisioned enough to offset the total quantity of loans that went bad. I’m not saying that they are not forecasting accurately enough – loan defaults are mighty hard to forecast and it is hard for the banks to get it right down to the last rupee. What I’m saying is that there seems to be a consistent bias in the forecast – banks are consistently under-forecasting the proportion of their assets that go bad, and are not provisioning enough for it. This has been a consistent trend over the years.

This fundamentally indicates a failure of regulation, on the part of both the bank regulator (RBI) and the stock market regulator (SEBI). That the banks are not provisioning enough means that they are misleading their investors by telling them that they are going to have lesser bad assets than actually are there (SEBI). That the banks are not provisioning enough also means that they are exposing themselves to a higher chance (small, but positive) of defaulting on their deposit holders (RBI).

How would this graph look like if the banks were provisioning properly?

The Gross NPA line would have remained where it is, for it doesn’t depend on provisioning. However, if the banks were provisioning adequately, the Net NPA line should have been hovering around zero, going both positive and negative, but mean-reverting to zero! This is because banks would periodically over and under-forecast their bad assets and provision accordingly, and then dynamically change the model. And so forth..

Read the full post by Deepak to understand more about our bank assets.

Correlations: In Traffic, Mortgages and Everything Else

Getting caught in rather heavy early morning traffic while on my way to a meeting today made me think of the concept of correlation. This was driven by the fact that I noticed a higher proportion of cars than usual this morning. It had rained early this morning, and more people were taking out their cars as a precautionary measure, I reasoned.

Assume you are the facilities manager at a company which is going to move to a new campus. You need to decide how many parking slots to purchase at the new location. You know that all your employees possess both a two wheeler and a car, and use either to travel to work. Car parking space is much more expensive than two wheeler parking space, so you want to optimize on costs. How will you decide how many parking spaces to purchase?

You will correctly reason that not everyone brings their car every day. For a variety of reasons, people might choose to travel to work by scooter. You decide to use data to make your decision on parking space. For three months, you go down to the basement (of the old campus) and count the number of cars, and you diligently tabulate them. At the end of the three months, you calculate that on an average (median), thirty people bring their cars to work every day. You calculate that on ninety five percent of the days there were forty or fewer cars in the basement, and on no occasion did the total number of cars in the basement cross forty five.

So you decide to purchase forty car parking spaces in the new facility. It is not the same set of people who bring their cars to work every day. In fact, each employee has brought his/her car to the workplace at least once in the last three months. What you are betting on here, however, is correlation, You assume that the reason Alice brings her car to office is not related to the reason Bob brings his car to office. To put it statistically, you assume that Alice bringing her car and Bob bringing his car are independent events. Whether Alice brings her car or not has no bearing on Bob’s decision to bring his car, and vice versa. And you know that even on the odd day when more than forty people bring their cars, there are not more than forty five cars, and you can somehow “adjust” with your neighbours to borrow the additional slots for that day. You get a certificate from the CEO for optimizing on the cost of parking space.

And then one rainy morning things go horribly wrong. Your phone doesn’t stop ringing. Angry staffers are calling you complaining that they have no place to park. Given the heavy rains that morning, none of the staffers have wanted to risk getting wet in the rain, and have all decided to bring their cars. Never before have they faced a problem parking so they are all confident that there will be no problem parking once they get to work, only to realize there is not enough parking space. Over a hundred employees have driven to work, and there are only forty slots to park.

The problem here, as you might discover, is that of correlation. You had assumed that Alice’s reason to get her car was uncorrelated to Bob’s decision. What you had not accounted for was the possibility that there could be an exogenous event that could suddenly drive the correlation from zero to one, thus upsetting all your calculations!

This is analogous to what happened during the Financial Crisis of 2008. Normally, Alice defaulting on her home loan is not correlated with Bob defaulting on his. So you take a thousand such loans, all seemingly uncorrelated with each other and put them in a bundle, assuming that 99% of the time not more than five loans will default. You then slice this bundle into tranches, get some of them rated AAA, and sell them on to investors (and keep some for yourself). All this while, you have assumed that the loans are uncorrelated. In fact, the independence was a key assumption in your expectation of the number of loans that will default and in your highest tranche getting a AAA rating.

Now, for reasons beyond your control and understanding, house prices drop. Soon it becomes possible for home owners to willfully default on their loans – the value of the debt now exceeds the value of their home. With one such exogenous event, correlations suddenly rise. Fifty loans in your pool of thousand default (a 1 in gazillion event according to your calculations that assumed zero correlation). Your AAA tranche is forced to pay out less than full value. The lower tranches get wiped out. This and a thousand similar bundles of loans set off what ultimately became the Financial Crisis of 2008.

The point of this post is that you need to be careful about assuming correlations. It is to illustrate that sometimes an exogenous event can upset your calculations of correlations. And when you go wrong with your correlations – especially those among a large number of variables, you can get hurt real bad.

I’ll leave you with a thought: assuming you live in a primarily two wheeler city (like Bangalore, where I live), what will happen to the traffic on a day when 10% more people than usual get out their cars?