Saturday 25 January 2020

animal spirits

I like the book "Animal Spirits" quite a bit and thought it might be worth my while to take this generalist economics text and see if I can do some thinking around its contents.  The book was written by Shiller and Akerlof around the time of the immediate aftermath of the great recession of 2009.

There are no mathematical equations in this book, and he attacks Friedman on a few key points.  Shiller is of course a Nobel prize-winning economist also, whose research into measuring cycles in housing and equity markets and whose criticism of the efficient markets hypothesis have made him justifiably praised.

I'd like to start by saying a few things about the left-right approach to economics and how this relates to mathematics.  One can trace through Keynes, Hicks, Tobin, Samuelson a strong line of economic reasoning with mathematical modelling.  Likewise with the work of Friedman, and the neo-liberal school.  But ever since Keynes there have been elements of the traditional story which operate beyond mathematical modelling.  Primarily of course was his idea of animal spirits, the subject of Shiller's book.

By analogy, there is the story of the elaboration of the theory of option pricing.  Great advances were made when stochastic modelling was applied to the problem of the price of a call option.  The mathematical model was able to express randomness sufficiently well to give birth to a whole branch of economics, and to enable a host of trading strategies.

In time, modellers of options realised that they could produce better models than the Black-Scholes, with its assumption of constant volatility over the life of the option.  These models included the possibility that the level of volatility itself, during the life of the option, might move around, and perhaps even jump in a discontinuous way.  These later models are preferred by many vol traders.

I can imagine in time a similar broadening of the mathematical models of the macro-economy to allow for modelling of what we now call animal spirits.  The phrase itself points to human psychology as a source of wisdom about the source of certain cycle-inducing phenomena which recur in many modern economies.  Indeed Shiller's book takes inspiration from Kahneman and Tversky.  But before I go through that, I'd like to point out that this does not mean we ought to abandon the possibility of finding and modelling animal spirits mathematically.  This is no inherent contradiction.  Indeed, options trading itself can be driven by psychology and yet the theory of option pricing proposed so brilliantly by Black clearly enhanced our understanding of the subject and also partly structures those markets.

Keynes in his General Theory used the phrase to describe an absence in his model, the absence of a coherent explanation or model for why businesses so frequently and so comprehensively change their mind on the level of desired business investment.  Shiller also widens this analysis to many more actors - to consumers, to central bankers, to economists even, and to market participants.

He also elaborates on the primary dimensions, the pathways of expression of the idea of animal spirits: confidence, fairness, corruption and bad faith, money illusion and stories.  Through these pathways do animal spirits exert themselves.  Through these pathways also, I suggest, can stochastic models be built; though he does not do it in the book.

I'd like to spend some time on categorising and distilling these five animal spirits.  Four of the five are themselves expressed in cyclical ways, only money illusion is, so to speak, cycle agnostic - our human capacity to perceive and act upon real prices rather than nominal ones, our inability to cope with inflation, seems to be permanent.  Yet of course this blind spot can hurt us more at certain points in the cycle.  Indeed it could be argued that all five psychological dispositions, or weaknesses, in a sense, are permanent features of the human psyche, and that would be true; but as a business cycle evolves, we clearly express these four traits more and less fully, whereas with inflation-blindness, we appear to more or less continually fail to see it.  Clearly, in times when we have high inflation and expected high inflation, money illusion could have a tendency to do more damage to us, which might paradoxically lead us to then pay attention to it; conversely we may worry less about it in low inflation environments, hence any longer term model of markets might be more susceptible to this blindness.

Also I see fairness and corruption/bad faith as two sides of the same coin. I also see stories (our incredible sensitivity to story telling as communication) as a vector of transmission rather than an animal spirit itself.

Confidence can these days be somewhat measured.  Central banks can ask questions of the relevant actors, and these responses can be transformed into indices, which can then act as independent variables in more or less complex modelling frameworks for our location on the business cycle.  It also underpins in essence , all the other animal spirits of Shiller's book.  In a way, one could consider reports of fairness or reports of bad faith as merely adjunct mechanisms to measure confidence, as it is expressed through collective stories.

In other words there seems to be a single primary source of energy, capable of being expressed in an official central bank or governmental confidence measure, capable also of showing up in stories, both directly expressing the ebullience of the age and indirectly, reflecting on scandals of fairness, scandals of corruption and bad faith.

Money illusion, then, acts like the original Keynesian multiplier - it acts as a form of leverage.  When we are looking forwards and backwards in a world of cycles, we fail to properly compute the fair value of various modelling elements.  This, like the amplifying effect of the presence or absence of the spending multiplier, will make cycles more extreme, and perhaps also last longer.

At a rough approximation, business cycles are determined somewhat in the rear view mirror, and are largely bounded by and reset at the time of a recession.  This starts the clock ticking again, and the economy ventures haltingly higher, until another reset event happens.   The more relevant mathematical model for this is to model recessions as events in a Poisson process, where perhaps the $\lambda$ parameter isn't constant.

This one could call 'ground zero' cycle modelling.  You're modelling the moment that the clock gets reset. I personally like this as it is simple.  However, there are other approaches.  For example, you can model phases of the cycle - for example break up the cycle into four periods, then work out the probability that you are in any one of those periods.

I think the virtue of these models is that it is quite likely that e.g. factor premia vary by phase of the business cycle.  If you modelled it as a Poisson process for recessions, then you'd be having to build out some kind of time-based prediction of which phase of the business cycle you were in already.

If I were to insist on modelling the cycle as a Poisson process, then there would be some form of cyclical element which controlled the $\lambda$ parameter.  That time-based parameter would in effect be a model of confidence.

Ideally you'd be looking to process news stories, count word frequencies, etc, to build a model of reported confidence, reported fairness, reported corruption and bad faith.  Again, money illusion is presumed to be constant (if anything it ought to be waning as people become more familiar with it, though that seem not to be happening; it certainly isn't getting worse on a secular basis).

The political message from the Shiller book, in direct opposition to the neo-liberal school, is that capitalism cannot self-police.  Friedman had a lot more confidence that it could.  And the von Mises-Hayek Austrian school perhaps goes one better - perhaps they're also prepared to admit that cycles are real risks but they think that the periodic devastation is best left to play out, with no or minimal government intervention or massaging of the economy.  This question is the essence of the modern politics of left and right.  And behind it is the practical experience of recessions, their frequency, their solubility, their desirability.  It is in a sense the primary question of macroeconomics and one which economists appear not to have reached agreement on, which means it is not a trivial problem.

Imagine a world were the business cycle was tamed somehow.  Risk would be taken out of the markets.  In the limit, the risk premium of the equity market would drop towards the real rate - investors would not demand to be rewarded as much for taking smaller risks.  If company returns in aggregate were more like a safe bet, then we would be faced collectively with the prospect of a retirement (an increasingly larger fraction of our collective lives) being funded out of an asset return stream which in real terms would be around 2%. 

There would of course be a remaining equity risk premium, since some companies will always be more successful in their industry than others and you don't know that a priori, so the premium would be a default like component.  Perhaps the current equity risk premium could be considered a premium for recessions generally plus a premium for the risk of individual default or failure to compete properly - a competition component.  Here I could imagine a default premium and a dividend payment risk premium being in effect the same factor.

How does an equity premium which can be componentised in terms of recent equity factors (size, value, momentum, quality, profit, carry, default) relate to an equity premium for competition and recession?  Speaking of factors, it always struck me as weird that announced M&A deals aren't treated as special cases in the calculation of market $\beta$ and CAPM generally.  They represent these weird equity moments where the market becomes increasingly certain that the future price of an equity (on deal close date) is known.  This is certainly not best modelled as geometric Brownian motion.  

Shouldn't we model all stocks as driven by two components - the degree to which they are or could be in play from an M&A point of view, and the degree to which they're best modelled as lognormally distributed stochastic processes.  Alternatively, this is just a case of a bifurcated set of market participants - the usual group, plus a new, clearly defined group who, using knowledge of their own and the target's balance sheet, have come up with an alternative valuation for the (target) firm.  This is clearly a battle of valuations, since the acquirer is always paying a hefty premium for the target,.  What is this premium telling us?  Is it telling us that some small part of the market thinks the market generally is wrong in its valuation?  Or is it deliberately overpaying the shareholders of the target in order to encourage them to make the deal happen?  To what degree is there an M&A premium already in each price?  Perhaps the M&A premium is in every stock, to some degree.  And it is a premium paid to shareholders somewhat akin to a special dividend, monetised via the share price.

If the M&A premium is a hurdle cost, a transaction cost, and if it is always present to some degree in all stocks, then clearly it can suddenly grow in magnitude at short notice.  If so then it is akin to the transaction cost associated with buying a house.

Back to Shiller he believes in bubbles, in surfeits and deficits of confidence, or the tendency for booms and busts.  But he believes governments can do something to tamp down the extremity of these cycles.  Whereas Friedman believes that government is somewhat complicit in creating them in the first place, thanks to blind tinkering.  And the Austrians think that we should honour the regenerative power of booms and busts as a way to keep a pure, thriving capitalism going.

Shiller has an interesting theory of twentieth century economic history.  He thinks that the Keynesian model gradually, internally and externally, had the animal spirits surgically removed from the theory.  Internally, it built its own models which were themselves increasingly mathematical and had less of a role for confidence, whilst externally, Friedman and to a lesser extent, the Austrians build models which absolutely had no role for animal spirits but which were rational (the the pre-Kahneman sense).  This mathematical approach to Keynesianism, Shiller claims, was strategic, since it allowed classical economists to see themselves more in the Keynesian model and hence to 'get on board'.  Shiller almost paints Minsky as a John the Baptist figure, decrying this banalisation of Keynesian confidence/uncertainty.  Minsky certainly kept that concept close in his own theory.  Shiller makes a great point that this emasculated Keynesianism, with no animal spirits, might have brought on board some classical economists, but it opened itself up to criticisms from Friedman et. al.  The nadir from the internal critique came in the 70s in the form of the New Classical Economists, who finally expunged animal spirits from the nominal Keynesian approach.  Involuntary unemployment was banished from the model. It was replaced by a model where unemployment was a choice, determined by the level of inflation one wanted one's economy to bear.

Important economic events are psychological.  Shiller in saying this is not claiming that human psychology itself is cyclic, like circadian rhythms.  Rather he is saying that there may be measurable observable phenomena out there in the world, perhaps phenomena engineered by humans themselves to reflect these psychological traits (confidence indices, house price indices, news stories, etc).  Indeed I'd say that the act of measuring these also externalises these inner psychological states and hence  makes them amenable to scientific experiment.  A confidence index is an amazing object in this regard  Likewise a consumer price index, a housing index, a cyclically adjusted P/E ratio.

There's a lovely symmetric irony in Shiller's message.  Whereas neo-liberals thought that booms and busts were laid at the foot of the governments, central bankers and economists, Shiller thought that the over-sold rational expectations theory was part of a story which lulled economists, governments and us all, with a reassuring but false paradigm of the perfection of the market, so we sleepwalked into the great recession.   Shiller is sounding quite Austrian here, but at the story level, rather than at the policy level.  The Austrians think we need a flimsier real net underneath us to keep us all competitive.  Shiller is saying that the neo-liberal consensus was itself a gargantuan and in effect useless safety net underneath us.  The monetarist story played out its psychological effect on us.    What would the theoretical Calvinist Minsky have to say?  His story was one of the fatalist unavoidability of the business cycle.  This was already probably quite appealing in its own way to the neo-liberal school.  And here was Shiller painting neo-liberal orthodoxy as itself an element explaining the mad exuberance and eventual collapse.

This divergent approach strikes me a reminiscent of the religious wars of free will and determinism.  The determinists felt that god had already chosen those to be saved and hence their actions could not change their ultimate destiny.  The counter-reformation position was a lot more tolerant of the power of agency, of the individual, mediated through the church, to make a meaningful difference when faced with the temptations of the devil.  By analogy, Shiller is a pope, and the business cycle is he devil; Friedman and the Austrians are Calvinist fatalists, Minsky is a Manichean.

Through all the weakening of the grip which Keynesian 'animal spirits' had exercised over economists and governments, one psychological factor held on - the money illusion as applied to workers wages.  The logic was that, through money illusion, workers were unwilling to accept a lower wage, and as a result employers were more often in the habit of sacking workers rather than talking them into settling for a lower wage, when times got tough.  This acted as a form of leverage, rather as the spending multiplier did.  It amplified preexisting (and perhaps random, according to non-Keynesians) distortions from equilibrium.  Added to the volatility of employment.

Another difference in approach which Shiller perceives is his preference for empirical rigour - models have to more strongly agree with reality, and he casts the rational expectations theory as austere, beautiful even, rigorous , but also not well calibrated to observation.

The essence of the Keynesian multiplier effect was in essence the amplificatory power of the feedback loop.  So too, claims Shiller, with confidence.

However, the arch empiricist is quite cautious when reporting on economic research which claims to show causal linkages between confidence survey results and subsequent GDP, or between credit spread predictors of GDP.  His point here is that these measures could instead be sensitive to a different variable, not confidence, but say expectations on future income.  Likewise he sees the correlation between confidence and expected income itself to swing from high to low as the cycle evolves - as the economy enters a downtown this correlation will increase, and at other times, the linkage may be weaker.  However he sees the confidence multiplier as a master switch, affecting other more traditional multipliers like the spending multiplier.  They can moderate the effectiveness of the spending multiplier.  Of course, this is a dangerous point, and an honest one, for an interventionist like him to make.  In effect he is claiming that fiscal multipliers are smaller precisely at the time you need them most, when confidence is low and the economy is consequently in the dumps.  Perhaps this argues rather for a 'Bazooka' approach to establishing confidence in the markets at the critical stages: central bankers make it powerfully clear that they are determined to grow the economy (that is, to reestablish confidence).  In any event, probably monetarists and Austrians alike will appreciate this story of less effective central bank effectiveness just when we need it.

Stories of fairness bolster our confidence in working with the strangers we  do work with every day in the economy.  Stories of endemic corruption do precisely the opposite.  Whilst we can ask consumers and purchasing managers and other economic actors about their self-calibrated feelings of confidence, we don't as yet have any metric materialised to measure the volume of fairness-stories or corruption-stories to which we are all exposed in the daily news.  But this ought not to be a difficult problem, though it may require machine learning, especially natural language processing.

But what precisely varies in time here when it comes to community attitudes to corruption or fairness.  First of all, notice how fairness builds confidence and corruption tears it down.  And it is of course the downside which everyone cares most about.  Perhaps our innate confidence builds models of fairness (who knows, perhaps the rational expectations model is the pinnacle of this line of thinking) and base reality reveals to us the ways in which people are not as simple as rational expectations and general concepts of fairness would have us believe.  We then lose confidence.  

Going back to my first suggestion of a model of the business cycle, as a Poisson process with a time dependent $\lambda$, there may be behind that $\lambda$ a natural growth process which corresponds to an expansive form of confidence growth, and a subtractive model based on the frequency with which scams, corruption, defaults, accounting scandals etc appear.  Such models are called non-homogeneous Poisson processes.

Of course, the linking concept here is 'negative news', whether that is in corporate earnings or corruption scandals.  If we imagine the set of firms as a pyramid of relations , the firms at the leaf nodes, and sectors at higher levels, then news stories can enter at low or higher levels, and would also have a tendency to spread locally (and upwards), based on the virulence of the news item.  Perhaps also the sensitivity of spread is also a function of the general health of that pyramid.  A robust pyramid remains exposed only locally whereas a systemically weakened edifice has transmission avenues more widely opened.  Modelling the pyramid would be a more adventurous approach.  You'd need elements not only for firms but also for government actors,  central banks.  And you'd need to decide how to handle geographical inter-relationships in this still-globalised economy.

Shiller talks about temporal variation on the perceived penalties and implicit rewards for corrupt behaviour - the idea here is a mini- decline and fall of of the Roman empire, set to the music of Nietzsche's eternal recurrence.  He also mentioned that new innovations, including new financial innovations, bring with them a period of lax regulation.  Finally, as well as punishment-variability and new technology, Shiller mentions broad cultural changes as having an effect on perceptions of corruption, and gives the example of prohibition era failures to implement the ban on alcohol consumption.

It is entirely possible to imagine a model for confidence, but how to calibrate it? So much would hinge on this calibration.  If you made it too sensitive, the model would be unstable.  Too robust and it wouldn't predict (or now-cast)  recessions well.  I think this latter case is so much better than the former.  The default setting for all models ought to be a form of ignorance. So, for an equity factor model, it might be a recommendation to invest in the market portfolio without making any differentiation.  Models ought to regress in the face of surprises.  If you don't know what's coming next, genuinely, then invest and act like you don't know what's coming next.  Models ought to contain components which are always looking to degrade their specificity to the point of low information confidence.  The models therefore have to continually prove themselves to elevate themselves above their baseline suggestions.  So, for example, a model for confidence which didn't currently have strong readings ought to have automatically regressed to a simpler model which doesn't have an informationally rich confidence component.

Shiller tells the interesting story of how one roaring econometric success, the Phillips curve relating the trade off between inflation and employment.  Remember how the last remnant of animal spirits which remained in the late '50s and early '60s was the supposed stickiness of nominal labour wages.   This was driven by inflation ignorance.  Phillips studied how this related to the employment level and found a pretty strong relationship, which Friedman ultimately challenged and destroyed.  And the way he destroyed it was to deny money illusion as a causative phenomenon in economics, specifically in labour wage negotiations.

Through what Shiller refers to as a "sleight of hand" Friedman claimed that there is only one non-inflationary/non-deflationary (i.e. inflation-stable) level of employment, not a more or less linear relationship as constructed by Phillips.  Friedman suddenly gave the monetary authorities less dials to tweak in the economy.  They could no longer decide where to be on the employment/inflation spectrum, but had to, according to Friedman, just hang about at the non-inflationary rate of unemployment.

In theory, you could do what you wanted with inflation, since Friedman thought he'd broken the link with unemployment.  As long as you stabilised unemployment.  In practice, he recommended a low inflation level, since you wouldn't be punished, as Phillips's model claimed, by high unemployment.  Friedman was saying you could have your cake and eat it.  Or rather, utility maximisation theory ate the last animal spirit.

On reflection, you could see why man of the people Friedman liked this move.  It gave more intellectual credit to the person on the street - he allowed them to be undaunted by the so-called money illusion.  They saw through it, making it disappear.  Of course, it was (and perhaps still might be) entirely possible that in so doing, this model helped to educate ordinary people to be less susceptible to money illusion, hence bringing it to life in reality.  So far (2020), this has not happened.

It just so happened that at this time, inflation and unemployment both increased, something not likely to happen in Phillips's linear model.  This evidence was very supportive to the Friedman model.  Shiller cites the fact that wage negotiations still don't have cost of living adjustments built in to any serious degree, nor do loan or bond contracts, nor do corporate accounts. We all operate with nominal money as our unit of account.

Shiller makes some very thought provoking and true sounding claims about the psychological importance of stories as mechanisms of remembering - we tend to forget stories we don't retell.  And when looking back at historical data, we don't have access to those stories (unless we lived through it ourselves), and hence econometric and quantitative models are using just the price action data they see and are trying to tease out relationships and explain mysteries in an incomplete data-set.  This is a deep point.  He particularly emphasises new-era stories.  

Perhaps the equity risk premium can reverse out atomic default and macro recession risks and you could imply the residual as confidence.  Then use this to calibrate the Poisson model.  It would be great if this residual confidence measure correlated well with some independent measure of the positivity and negativity of stories found historically.

Shiller sees the 1930s depression as having been made worse by central banks raising rates too much, in various attempts to support the gold standard.  He also blames unions for participating in the very money illusion he claims still remains today.  But is this something he ought to see as actually blameworthy?  Was it ever likely the job of the unions to destroy money illusion?  Is he asking the unions to do a job which Friedman later thought he'd do?  Surely not.  He sees many news stories historically which aren't quantitative in nature and hence are dismissed by modern day economists as anecdotal.

Friday 24 January 2020

Searching for crazy animals in the factor zoo

The equity factor approach coming out of the work of Fama, Ross, Sharpe is largely based on a rational expectations / utility maximisation framework in economics which sees no role for animal spirits or irrational behaviour.  Now since the 2008 recession, there has been quite a lot of push back against this psychologically naive view of economic behaviour, yet none of that has made its way into academic factor work.  In that factor work,  despite the fact that equity markets are micro-efficient yet macro-inefficient (to quote Samuelson) leads modellers to produce bottom up explanations of the power and value of factors.  Factors, in this sense, are aggregate phenomena, arrived at through linear summations of atomic firms; these factors are then judged based on various perceived qualities of those aggregates - the market factor, small size, momentum, profitability, quality, term, carry - being persistence, breadth, robustness, investability, economically plausible, etc.

I like to think of that model as a model where the motivating power of the model resides solely with leaf firms in a tree of equity relations.  But those leaf points are not the only possible entry points.  Motive power could strike at higher levels, and trickle down or across.

Take Shiller's work on the cyclically adjusted PE ratio.  From it, it is clear that there's more volatility in the equity market than changes in fundamentals would suggest.  To me, in the factor world, this suggests we re-partition the equity risk premium in a different way.  We perhaps model 'confidence' as a time-varying fraction of the equity risk premium and model factors in the remnants.  Perhaps we would see momentum vanish.  We ought also to partition history into slices of a universal business cycle, and learn to model conditional factors.

Most factor research has been academic research, yet investment firms have much more data.  Those firms need factor architecture which is built around the possibility of multiple parallel factor models.  Why?  Because they will always want to have a reference implementation which agrees with the outside world (the Fama French model, the Bloomberg model, the Barra model).  The degree to which they match is a validation step on the firm's architecture.  This ought to bring the confidence level of the firm internally concerning the specific quality of their in house model to a higher level.

I also think the basic CAPM model market factor ought to be revised with respect to in play m&a firms.  Perhaps via a joint m&a/CAPM/factors model where the weighting of usually towards the factors side, and drifting towards the m&a side.  This would make the equity market risk premium to be the sum of those two disjoint premia.  There is, after all, a completely different set of risks accruing to an in-play m&a target (and acquirer) versus one which is exposed to the arrival of new information in the usual way.

I think a model of information arrival is a way to unite these two.  When a target is in play, then it becomes sensitised to a different realm of information.  It is being informed by the skill of the purchaser's assessment plus the regulatory climate.  A target in a  very well respected deal with a low probability of deal break is a very unusual thing - it represents a firm whose future value at some horizon is known with much higher certainty than other firms' stock values at that time horizon.

This raises a general point about the meaning of 'market beta' in a multi factor world.  As you add more factors, the meaning of market beta and differs.  A CAPM single factor market premium is not in general the same as the market premium associated with a three factor model.

I think factor models will also rapidly become essential tools for risk managers, as important in time as the greeks are for the volatility space.  In addition to managing trader specific risk limits, the risk team ought also be on the lookout for firm-wide risks, cross-cutting currents which may not be visible to any one trader, or even necessarily to the busy CIO.  Equity factor models would help greatly in this respect.  So they should be the common property of the firm, not a trader tool.  

Thursday 23 January 2020

The whole Black-Scholes formula derivation in one post

First some intuition on the journey through.  A call option is a security whose value is somehow dependent on the price of the underlying and it is our job to create a model for how to value it.  It is so strongly dependent on the price of the underlying that I am even prepared to say that there is a single source of uncertainty here which both of these instruments share.  Imagine you're chaperoning your drunk friend home at the end of the night.  He staggers around on the street, giving perhaps too much of his money to some strangers he meets, stealing flowers from pretty girls, asking for cigarettes from smokers.  You chase behind him, apologetic, retrieving the money so foolishly given away, apologetically returning the cigarettes and flowers.  At the end of the night, everyone goes to bed with the items they owned at the start of your friend's stagger home.  Good job.  You just neutralised him, clinically, precisely, completely.  His behaviour was a single source of randomness there, which explained and motivated both his own and your own behaviour.   And so too a portfolio of being long some quantity of a call option and short some quantity of the underlying stock could too end the night with no change in financial value (transaction costs, among other things, are assumed to be zero).  You could make matters worse, of course, and copy the drunk friend, leveraging his social inadequacies.  And in a similar way, owning a portfolio of long some number of call options and long some number of stocks would amplify your returns compared to how you might have performed if you just owned the stock. Imagine further that the net set of objects he collected on his drunken wander had a value, and this value was called his drift.  One Friday, his drift might amount to a gain of 10 dollars and the next Friday, a loss of 3 dollars.  But given this perfect chaperone role you perform week in week out, none of us care what the actual drift is, week in week out, since you do so lovely a job of neutralising it.  This act of not caring what the estimate of the drift value is turns out to be the part of the puzzle which eluded Black and Scholes for years, until it was suggested to them by Merton.

First we need to define how a random time evolving process might be described.  In fact, following  ThieleBachelier, Einstein and Weiner, we use the following formula to say, in essence, that we would like to build a model of the return on a stock price as a periodic drift together with a random  (normally distributed) piece of noise overlayed on top, and it can be thought of as shaped Brownian motion with a drift overlay:
$\frac{dS}{S} = \mu dt + \sigma dz$

The noise element, $dz$ is the novel element here and it represents a sequence of normally distributed increments which has a mean of zero and a variance of $dt$.  
$ dz = \epsilon \sqrt{dt} $
Where $ \epsilon $ represents a pure Weiner process, with unit variance, normally distributed.  To this we scale the variance by $ \sigma^2 $, the variance of our stock.  This stock is risky, so it must also have positive expected return (according to CAPM, otherwise why would anyone want to own it).  This is a time process as well as being random in some additional dimension, and we know that variance is additive in time. A characteristic attribute of $dz$ is that any and all future random steps are independent of any previous step.  So by stating the above, you're deciding that stock returns could be somewhat accurately modelled as a normal distribution with a specific expected drift, $ \mu $ and a variance $ \sigma^2 $.  Think of the above equation as saying, stock price returns are distributed normally, that is,
$\frac{\delta S}{S} \sim \phi(\mu  dt, \sigma \sqrt{\delta t}) $
and that this is operationally achieved by decomposing the process into a pure random element, isolating it, if you like, and a drift component, an instantaneous expected mean value.  So $\mu$ is the gradient of a straight line, that line being a non-deviating march directly from here to there for the stock price.  Imagine a little tin toy mouse is wound up (the degree of winding up corresponds to how long the mouse will move for) and starts in the corner of an empty room.  You pointing the mouse's nose in a particular direction is akin to you determining the drift component.  Then you let it go, and its path is pretty much determined.  You could draw a line on that floor and this would be the line the mouse follows.  But imagine a randomising element being applied to the wheels.  These are just as likely to jig the mouse momentarily left or right with equal probability.  Now, at the outset you aren't as confident about the precise location of the mouse, but your estimate of his general direction is still the same, if you were forced to guess.  That mouse is still roughly following the same heading.  Also, if you are forced to estimate a location for the mouse, and the experimenter offers you the choice of guessing its location soon after the start of the process or quite near the end of the process, pick the former.  There's clearly less randomness that has accumulated then.  The more that time elapses, the more chances there are for random events to occur, making it much harder for you to guess the location correctly.  Variance is additive, and specifically in this case, variance is additive in time.  You also don't want the wheels to jig the mouse too violently, as this too makes your prediction job harder.  This is just another way of saying that volatility, like time, makes your prediction job harder.  The more volatility and time there is, the harder the life of a predictor is.

OK so where do we go from here?  Well, we have just set a model for returns, but going back to the original problem, the relationship isn't between a call price and a stock return, no, it is between a call price and its underlying stock price.   So how do we go from a model which says returns are normally distributed to a model which describes the price itself?  Easy, just multiply both sides by $ S $ to give
$dS = \mu S dt + \sigma S dz$
As a side note, this is now called an Ito process, since the drift and variance are both expressed as functions of the underlying, rather than as absolutely static constants and it is also sometimes called geometric Brownian motion, since price changes are scaled with the current level of the stock; they are still scalars 'in the present moment' since, at any time you are expected to know what the stock price or stock volatility is (i.e. they are assumed known or at least estimable and are most certainly not considered to be random over the little chink of time $dt$ which the model considers).   This subtle move from a Weiner to an Ito process means that the chaperone must be continually alert to changes in his friend's behaviour - if his friend followed a Weiner process, then the chaperone could have a firmer idea at the outset of how busy he was going to be on average.  Now that we have a model for how a stock price changes over a short period of time, we can ask the major question - how can we describe the movement of the call price changes so that it is fully in terms of the stock price changes?

Now, we'd like our chaperone behaviour to be modelled mathematically as being a function of his friend's behaviour.  The so-called chain rule, which many of us learned in school,  shows us how a small change in an underlying function gets geared into a small change in the outer function; it states that:
$ \frac{d}{dt}[C(S(t))] = C'(S(t))\times S'(t) $
However, this won't work directly since the friend's behaviour is random (as are stock price returns, according to our primary model choice) and the chain rule is only for smooth functions.  Random functions are not mathematically differentiable in the same way.  So we have hit a brick wall.  We can't directly use the chain rule to see how our call price (C) might move as a function of the stock price (S).  What next?    The Taylor polynomial.  Why?  Because this approximation helps Ito to create a rather fearsome version of the chain rule for the world of random processes.  And once we have the chain rule, we can progress with our challenge to express the somewhat random movement of the chaperone (the call option) in terms of the somewhat random movement of the drunk friend (the stock).

The Taylor polynomial for some function $C(S,t)$ up to and including second order terms is
$\delta C = \frac{\partial C}{\partial S} \delta S + \frac{\partial C}{\partial t} \delta t + \frac{1}{2} \frac{\partial^2 C}{\partial S^2} \delta S^2  + \frac{1}{2} \frac{\partial^2 C}{\partial t^2} \delta t^2 + \frac{\partial^2 C}{\partial S \partial t} \delta S \delta t + \ldots $

As we take both $S$ and $t$ closer to 0, we are usually happy to drop all second order (including cross) elements, in the familiar world of non-random functions, which might there leave something like the familiar school chain rule
 $ \delta C \simeq \frac{\partial C}{\partial S} \delta S + \frac{\partial C}{\partial t} \delta t  $
But in our case we know that $dS = \mu S dt + \sigma S dz$ .  So we need to be careful when disposing of those second order terms.  $t$ can uncontroversially go to 0 in the limit, so any term with $\delta t^2$ or $\delta t \delta S$ can be dropped, leaving a question over what to do with the term for $\delta S^2$.  Ito helpfully points out that, $dS = \mu S dt + \sigma S dz$ is really the same, in a small discrete timeframe, as $\delta S = \mu S \delta t + \sigma S \epsilon \sqrt{\delta t}$

Let's square the discretised formula $\delta S^2 = \mu^2 S^2 \delta t^2 + \sigma^2 S^2 \epsilon^2 \delta t$,  and again we can throw away any term where $\delta t$ appears in second order.  This leaves us with $\delta S^2 \simeq \sigma^2 S^2 \epsilon^2 \delta t$. 

What do we know about $\epsilon$?  It is has a mean of zero, $E[\epsilon]=0$ and a variance of 1, by construction of a Weiner process.  We know the alternative definition of variance as $E[\epsilon^2]-{E[\epsilon]}^2$ and that this equals 1.  So $E[\epsilon^2]$ must also be 1 (1-0=1).

What a piece of magic.  We now are armed with a chain rule for functions which have a stochastic element $\delta C = \frac{\partial C}{\partial S} \delta S + \frac{\partial C}{\partial t} \delta t + \frac{1}{2} \frac{\partial^2 C}{\partial S^2} \sigma^2 S^2 \delta t   $

This little step is quite a milestone.  It has 3 terms, the first two of which are just the usual first order terms for the chain rule in two variables, then comes the little monster, the second order term but in $dt$ and which also brings with it a derivative not even in $dt$.

Next we realise we already know that $dS = \mu S dt + \sigma S dz$ so we can do a substitution right away $dC = \frac{\partial C}{\partial S} (\mu S dt + \sigma S dz) + \frac{\partial C}{\partial t} dt + \frac{1}{2} \frac{\partial^2 C}{\partial S^2} \sigma^2 S^2  dt   $

When this is tided up you get Ito's lemma, which is really the chain rule, with a bit of Weiner process magic caused by a consequence of variance being additive in time, hence standard deviation grows with the square root of time, which when it appears in a second order term of a Taylor polynomial, switches allegiance from $dz$ to $dt$, in a way.   This has partly helped us eliminate an element of randomness from the proceedings, but now, thanks to Black and Merton largely, we are going to eliminate the remaining element of randomness.

So we have Ito telling us that $dC =  (\frac{\partial C}{\partial S} \mu S + \frac{\partial C}{\partial t} + \frac{1}{2} \frac{\partial^2 C}{\partial S^2} \sigma^2 S^2) dt  + \frac{\partial C}{\partial S} \sigma S dz$ and now we consider a portfolio consisting in short one call and long some known amount of stock, the amount cleverly chosen to be $\frac{\partial C}{\partial S}$.  We don't as yet know what that instantaneously constant value is yet, but it is some value which we shall use as a fraction.  So our portfolio can be described as $\Pi = \frac{\partial C}{\partial S} S - C$ and correspondingly the change in the portfolio value is $\delta \Pi = \frac{\partial C}{\partial S} \delta S - \delta C$.  When we substitute in for $\delta S$ and $\delta C$ using that specially chosen ratio, then the $dz$ terms cancel out perfectly, leaving $\delta \Pi = (- \frac{\partial C}{\partial t}  - \frac{1}{2} \frac{\partial^2 C}{\partial S^2} \sigma^2 S^2) \delta t$

The next step in this mammoth piece of deductive logic is to realise that this change in portfolio value has lost its randomness, then the principle of no arbitrage would imply that the change in portfolio value must also be riskless.  If you got more than that, then you would have made money with no risk, a violation of CAPM (and no arbitrage).  So with an instantaneous risk free rate $r$ and a little bit of time $\delta t$,  $\delta \Pi = (- \frac{\partial C}{\partial t}  - \frac{1}{2} \frac{\partial^2 C}{\partial S^2} \sigma^2 S^2) \delta t = r(\frac{\partial C}{\partial S} S - C) \delta t$.  $\delta t$ cancels on both sides and, rearranging you get the legendary Black Scholes partial differential equation

$\frac{\partial C}{\partial t} + rS \frac{\partial C}{\partial S} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 C}{\partial S^2} = rC$

All that's left to do is for Black an Scholes is to solve it.  It turns out this is a fairly unsurprising heat equation, and it certainly can be solved by numerical methods.  For certain kinds of option it can be solved mathematically.  Black got here in June 1969.  Ed Thorp was already in 1967 claiming to have worked this out himself and was making money doing it.  It was in fact Scholes who suggested constructing a 'zero beta' portfolio to Black, who by then had been sitting on the equation, but without having any way to solve it.  It was Merton who suggested a replicating portfolio / no arbitrage approach which didn't rely on CAPM (beta).  The shocking thing is, in constructing a pair of portfolio weights designed to get rid entirely of the randomness, the expected return on the stock also disappears.

At this important juncture it is worth reflecting that no limit or constraint was placed on the interpretation of C above, so the resulting partial differential equation will hold for all options.  But, in order to solve for options, we now have to get specific about which options we would like to solve for, and this will be easier for some than for others.  And, by easier, the ideal end goal is to solve the PDE analytically.  Luckily, for european call options an analytic solution exists.

Now, the consequence of $\mu$ not appearing in the Black-Scholes equation is fully exploited.  We live in a single world, the one where stock prices are risky, and hence where the unknown but estimable $\mu$ is the return a stock has to pay an investor such that they are happy to take on the risk of that stock.  In CAPM, the risk premium, of course, is $\mu -r$, and this, when properly diversified across a sufficient number of stocks, will indicate the equity risk premium.  And in this world, the Black-Scholes equation must hold.  But there are other worlds, and in all of those other worlds, the Black-Scholes equation must hold too.  Since it doesn't need $\mu$ it doesn't care for the expectations of investors.  They could demand 200% return on stocks, or -10% or even the risk free rate.  It doesn't matter.  In all of those worlds of varying investor preferences, the equation must still hold true.  So the thought emerges, why not make life easier, why not assume that the fundamental stock process itself didn't have $\mu$ in it at all, but instead $r$, the risk free rate.  As a further simplifying factor, this risk free rate is assumed to be a constant throughout the life of the option.  Both of these factors assisted in producing an analytical solution to the case of a European call.

This is such an alien world, the risk neutral world.  All assets, in this world, return the same as treasuries;  Amazon, options on Netflix, corporate bonds on Disney.  How strange.  But this world, as one might imagine, is a world which is just that little bit simpler to mathematically process.  The randomness is still there, the disparate volatilities are still there, but the drift element is $r$ for all of those stock processes.  But to step back a little, let's just look at the complexity of the issue of finding a formula for the price of a call option in the absence of these simplifying assumptions which drop out of the Black-Scholes equation.

In general, the problem facing Black at the outset, and which faced Samuelson too, is how to solve $C_t = E_t[e^{- \int_{t}^{T} r_u du} \max(0, S_T, K)]$ where $T$ is the maturity of the call and $K$ is the strike price.  $\max(0,S_T,K)$ is the terminal value of the call in the instant it expires, some time way out in the future, at $T$, and we want to find its expected value (in this risky world).  We also want to discount its expected value back using the risk free rate.

The first simplifying step happens in the Black-Scholes assumptions, namely that the risk free rate is constant for the life of the option.  With this, we can extract the rate from the expectation bracketing, and simplify to $C_t = e^{r(T-t)}E_t[\max(0,S_T-K)]$.  At this point, Paul Samuelson didn't really progress beyond making assumptions about the required return on the stock and option.  


Recall that the drift term didn't appear in the Black-Scholes partial differential equation.  This means you can consider the driving stochastic equation $dS = \mu S dt + \sigma S dz$ for the movement of the changes in a stock price to be replaced by $dS = r S dt + \sigma S dz$.  In other words, if we pretended that all stocks drifted like treasuries, whilst still retaining their prior degree of randomness (which, as we have seen, through portfolio construction we can eliminate), we would still end up with the Black Scholes equation.  But the simplification might make pricing easier.  And indeed it does.

Next comes a discussion about a lognormal process.  I have found the books to be not quite as clear here in discussing what it is and why it is important.   All this time, we have been working with the returns of a stock price, modelling them as normally distributed.   But prices themselves aren't normally distributed - since prices are bounded by 0 at the lower end.  Note this isn't an additional assumption Black is making here on top of the assumption that returns are normally distributed but rather it is an inevitable mathematical consequence of how continuously compounded normal returns work when applied to a (non-negative) starting price.  

However, what justifies the use of the lognormal process mathematically?  The assumed process for the returns of a stock is a multiplicative (geometric) process; whereas the central limit theorem is additive (arithmetic).  The central limit theorem says that if independent and identically distributed random variables have samples drawn from them then the arithmetic mean is normally distributed.

But we have been talking about $\frac{dS}{S}$, which can be thought of in discrete form as $\frac{\Delta S}{S}$, or spelling that out a bit differently, $\frac{S_{t+\Delta t}}{S_t}$.  This is a price relative, i.e. 1+ a geometric return.  How do we massage it so that the central limit theorem applies?   We can consider it to be a multiplicative process made up of even smaller parts.  If we apply the log function to that equation we have a sum of logs, this sum now being amenable to the central limit theorem.  So the $\log(\frac{S_{t+\Delta t}}{S_t})$ can be claimed to be normally distributed.  A lognormal distribution is skewed so the median  (geometric mean) isn't the same as the mean (arithmetic).  To translate between both you need to apply a correction term.  But what is this correction term?  You can work that out by a second application of Ito's Lemma.

$G = \log S, \frac{\partial G}{\partial S} = \frac{1}{S}, \frac{\partial^2 G}{\partial S^2} = \frac{1}{S^2}, \frac{\partial G}{\partial t} = 0$ so by Ito's lemma $dG = (\mu - \frac{\sigma^2}{2})dt + \sigma dz$.  As you can see, the arithmetic to geometric adjustment factor turns out to be $-\frac{\sigma^2}{2}$.  Notice that , $dG$, in English, is just the change from one moment to the following moment in the logarithm of the price of the stock.  I.e. for times $t$ and later $T$, this difference is $\log S_T - \log S_t$, so really, $\log S_T - \log S_t$ is going to be normally distributed with the drift term equal to $(\mu - \frac{\sigma^2}{2})(T-t)$ and a familiar variance of $\sigma^2(T-t)$.  Just two more steps here: first $\log S_T$ itself must have the drift term of $ \log(S_t) + (\mu - \frac{\sigma^2}{2})(T-t)$; second $\mu = r$ in a risk neutral world, namely so we can say the drift is $ \log(S_t) + (r - \frac{\sigma^2}{2})(T-t)$, variance still as before.

So taking the exponent on both sides of the process, $S_T=S_0 e^{(r-\frac{\sigma^2}{2})(T-t)+\sigma \sqrt{(T-t)} \epsilon}$ describes how the stock price is distributed at time horizon $T$ (to correspond to the expiry of the option).  Now let's work through our 'discounted expected value' formula again, knowing what we now know: $C_t = e^{r(T-t)}E_t[\max(0,S_T-K)]$

First of all, to make it clear this isn't an expectation over a real probability distribution $\mathcal{P}$ but rather one over a risk neutral assumption, call it $\mathcal{Q}$ then $C_t = e^{r(T-t)}E^{\mathcal{Q}}_{t}[\max(0,S_T-K)]$.  We now know what the distribution of $S_T$ is, thanks to the lognormal reasoning above, so we insert that in to get:


$C_t = e^{r(T-t)}E^{\mathcal{Q}}_{t}[\max(0,S_t e^{(r-\frac{\sigma^2}{2})(T-t)+\sigma \sqrt{(T-t)} \epsilon}-K)]$

Yes this looks horrible but it quickly simplifies down.  First, you only get paid on condition that $S_T \ge K$.  So We can zone in on that condition.  All the other possible values of the stock price less than the strike will result in a terminal value of 0, so regardless of their probability, their value contribution will be 0 to the overall expectation.  In other words, we will be looking to break the $\int_{-\infty}^{\infty}$ bounds into that fraction which is valueless and that fraction which is valuable, when we come to break out the expectation.  And solely concentrate on the valuable fraction.  But where is that dividing line?

So  $S_t e^{(r-\frac{\sigma^2}{2})(T-t)+\sigma \sqrt{(T-t)} \epsilon} \ge K$ or  $ e^{(r-\frac{\sigma^2}{2})(T-t)+\sigma \sqrt{(T-t)} \epsilon} \ge \frac{K}{S_t}$ and taking logs, $(r-\frac{\sigma^2}{2})(T-t)+\sigma \sqrt{(T-t)} \epsilon \ge \log(\frac{K}{S_t})$

Now we know what $\epsilon$ is, namely a normal distribution (for which we have lookup tables for specific values as we do in one sided statistical tests), and here we have an inequality with it, so we will continue to re-arrange to find that condition for the value of the normal distribution to be greater than.  This threshold value is when the normal distribution is greater than the cutoff value
$\frac{\log(\frac{K}{S_t})-(r-\frac{\sigma^2}{2})(T-t)}{\sigma \sqrt{T-t}}$

This begins to look familiar.  Take out -1 from the top line and you get $-\frac{\log(\frac{S_t}{K})+(r-\frac{\sigma^2}{2})(T-t)}{\sigma \sqrt{T-t}}$.  We have our cutoff point and now we can return one last time to the 'discounted expected value' formula, using a shorthand of $-d_2$ to refer to this cutoff point.  Why $d_2$, well, it is the value for the bound which Black uses in the original paper.

Recall the distribution of a normal variable is given by the density function $f(x)  = \frac{1}{\sqrt{2 \pi}} e^{\frac{1}{2}x^2}$ so we'd like to integrate the formula for cases where the terminal stock price is greater than the strike, but only for values from $-d_2$ to infinity.

$e^{-r(T-t)}  \frac{1}{\sqrt{2 \pi}} [\int_{-d_2}^{\infty} (S_t e^{(r-\frac{\sigma^2}{2})(T-t) + \sigma \sqrt{T-t}x} - K) e^{\frac{1}{2}x^2} dx ] $

The final flourish is all down to the convenience of dealing with exponentials in the context of integration.  First, let's break this in two

$e^{-r(T-t)}  \frac{1}{\sqrt{2 \pi}} [ \int_{-d_2}^{\infty} S_t e^{(r-\frac{\sigma^2}{2})(T-t) + \sigma \sqrt{T-t}x}  e^{\frac{1}{2}x^2} dx  - \int_{-d_2}^{\infty}  K e^{\frac{1}{2}x^2} dx ] $

and that becomes

$  \frac{1}{\sqrt{2 \pi}} [ S_t \int_{-d_2}^{\infty} e^{-r(T-t)} e^{(r-\frac{\sigma^2}{2})(T-t) + \sigma \sqrt{T-t}x}  e^{\frac{1}{2}x^2} dx  - e^{-r(T-t)} K \int_{-d_2}^{\infty}  e^{\frac{1}{2}x^2} dx ] $

The $r(T-t)$ elements cancel in the first term when you bring in the exponent inside the integral and on the second term, using $N(x)$ to represent the cumulative normal distribution (for which we have tables)

$  \frac{1}{\sqrt{2 \pi}} [ S_t \int_{-d_2}^{\infty}  e^{\frac{\sigma^2}{2}(T-t) + \sigma \sqrt{T-t}x  - \frac{1}{2}x^2}  dx ] - e^{-r(T-t)} K N(d_2)  $

Now there's a square which can be completed in that first term,

$  \frac{1}{\sqrt{2 \pi}} [ S_t \int_{-d_2}^{\infty}  e^{-\frac{1}{2}(x-\sigma \sqrt{T-t})^2 }  dx ] - e^{-r(T-t)} K N(d_2)  $

OK.  It is change of variable time for that first element.  Let $y = x - \sigma \sqrt{T-t}$  To make this a proper $dy$ equivalent, we must also tweak the bounds.  Since $x$ previously started at $-d_2$, it now has to start a bit lower than that, at $-d_2 - \sigma \sqrt{T-t}$, also known as $-(d_2 + \sigma \sqrt{T-t})$ which in Black nomenclature is referred to as $-d_1$.  So, that results in

$   S_t [\frac{1}{\sqrt{2 \pi}} \int_{-d_1}^{\infty}  e^{-\frac{1}{2}y^2 }  dy ] - e^{-r(T-t)} K N(d_2)  $

This change of variable trick makes us see that this first term is also just a read from the cumulative normal distribution, only at a slightly different point

$   S_t N(d_1) - e^{-r(T-t)} K N(d_2)  $


And this is the value now of a European call option.  Sweet Jesus.  Roll credits.  Give that man a Nobel prize.

Wednesday 15 January 2020

CAPM and the risk free rate

In a recent post, I was musing about Sharpe and Lintner's decision to treat the risk free rate as an external fact about the world, and not endogenous to their model.  I noted that if you do this, then the curvy efficient frontier flattens down and becomes the capital market line.  Instead we have a set of risky assets and an additional tangent mechanism whereby the line running from the risk free (a.k.a. zero volatility) rate to the tangent point on the efficient frontier is introduced into the CAPM world.

I found out subsequently, re-reading the excellent Fischer Black biography that he considered it endogenous.   In his 1969 initial extension of CAPM, Black sees no role for the monetary authority, and models the risk free rate as the equilibrating rate which satisfies those timid investors who prefer to have larger fractions of their wealth in safe assets and who therefore want to lend all their money (via depositing it with the bank) to leveraged and more aggressive investors, who are happy to borrow that money and climb up past the tangency point on the CML.  

In this model, having a monetary authority messing with the risk free rate in order to stabilise the price level or even worse, to manipulate the economy, implied the system was in a non-equilibrium state, and hence had opportunities for profit.

There's something temptingly beautifully simple in this model of people having two choices for their wealth, and their changing distribution of animal spirits driving not only the price of money but also the price of risk.  It certainly doesn't much correspond to reality but then again nor did option prices much coalesce around the famous Black-Scholes formula pre-1973.  On the other hand, it says nothing about the origins of those animal spirits which stir up price discovery for the prices of money and risk.  Perhaps the only institution the libertarian radical markets theorist needs is human psychology.

Tuesday 7 January 2020

How a broken machine which banks used to estimate regulatory reserves found new life in Hedge Funds

TimeLine

1980s Arrival on wall street of mathematically inclined employees, some of whom by 1987 has arrived at senior positions

1987 Stock Market Crash

Post 1987 - a dilemma: if you're honest about including so-called Black Swan event statistics into your day to day trading strategies, then these new distributions built models which no longer claimed to make money.  If you ignored them, your models prompted you to trade day by day and make money as before, until the next black swan arrived to potentially wipe you out.  These events seemed to be arriving at a frequency of once or twice per decade, they were unpredictable, unknowable and seemed to be happening faster than the usual normal distributions were predicting.

The thought emerged: allow specific market traders to continue as before but to develop a firm-wide measure to carve out extreme event spaces and add further firm-wide risk limits to those carved out extreme event spaces.  Unfortunately while still working on the base assumption of normal distribution for (log) returns.

This new focus (and new limits) specifically on tail risk coming out of each of a bank's trading markets could the be aggregated into a single all-market score.  This single score pulled together risks from all trading activity at a firm, it was hoped, in a way which would flag up, like a canary in a mine, prior to black swans.  As a prediction tool for black swans, it was always likely to fail (read Taleb for why) but it certainly could be a useful way of getting a handle on (and applying limits to) some measure of the risk taken or loss expected (measured in dollars) for a firm.  In other words, this measure, when tracked by a firm itself, every day, could provide a useful context for how much risk a firm was taking, when that increased, when it decreased, etc.

Late 1980s JP Morgan and Bankers Trust trading units were using this measure, and JP Morgan in particular pioneered the rolling up of various unit VaR numbers into a single firmwide VaR number.  Once it arrived at the level of the firm, it could be used as a measure of minimal capital adequacy.  That is to say, the firmwide dollar amount could be interpreted as the minimal amount of liquid capital a firm needed to have on hand in case that modelled-black-swan event (almost an oxymoron, actually) occurred.  Regulators were quick to pick up on this idea.  Note the flaws: tail events are unpredictable; they happen more frequently than the normal-distribution models suggest; the losses which might incur are often greater than the VaR number suggests.  

Basel I - 1988 - G10 Central Banks (plus Luxembourg) publish a set of minimal capital requirements as part of a voluntary regulatory framework.
Given the known limits of this measure, why was it accepted?  Well, it was considered a flawed step in the right direction, and given that it was a broad bank regulation, there would have been pressure to introduce a measure which was conservative - aimed at approaching theoretically perfect capital adequacy from the bottom up, gradually.  However, as Keynes, Minsky, Shiller, Taleb etc all point out, there exist mechanisms, psychological and institutional, which result in variance of animal spirits.  VaR was always going to be one of those devices which lead to a false sense of comfort so essential to the Minskyian story of incrementally greater risk being taken in a world which seemed increasingly, measurably predictable.  In this sense, Taleb is the ignored  peripheral voice shouting "Don't forget your Minsky" to which the Central Banks replied, "Lord, let me get to Minsky, but not quite yet". 

Hedge funds are thought to be more lightly regulated and more leveraged than banks, but this is not typically so (there may of course have been historical exceptions).  They are certainly more lightly regulated but are less leveraged than banks.  Banks on a weighted asset basis own much safer asset types compared to hedge funds, and both will use hedging techniques to manage their risk.  Banks are usually larger, and more systemically important, hence require more intense public scrutiny.  Of course, certain hedge funds  individually can fall into this category too and again, rightly so. Likewise, the hedge fund industry itself can be considered en masse as a systemically important area of the economy, which is why many of them have to report in quite some detail on their activities.  Thirty percent of hedge fund derivatives risk is concentrated in the top ten hedge funds (according to a January 2019 SEC report), so perhaps regulators can get good results by increasing scrutiny on those giant funds.

The problem of a consistent - temporally and across markets and jurisdictions - measure for hedge funds is a non-trivial (and unsolved) one.  Certainly AiFMD leverage is an attempt but it reflects termsheet measures, that is to say, easy to calculate and uncontroversial.  

The two places you access leverage at a hedge fund are though asset class choice, particularly in the use of derivatives, and secondly via the ongoing leverage which banks' prime brokerage units offer hedge funds.  Those two sources are, of course, related - the more asset leverage a fund takes on, the more careful its PB is likely to be in offering its own funding leverage, ceteris paribus.

Regulators don't ask hedge funds to report VaR for capital adequacy reasons, but hedge funds do calculate it nonetheless.  Why?  Well, it has become known in the industry - banks are aware of it of course, and PB units after all have to feed in their contribution of VaR to the overall bank position, after all.  Investors know about it also.  But VaR is not used by hedge funds to judge how much capital must be set aside to cover tail events.  They usually have instead, in their risk policy, specific liquidity buffer requirements expressed as a fraction of their current AUM.  They also have ongoing negotiations with PBs on an almost deal by deal basis and feel the pain via increased funding costs for particularly burdensome trades (due to their size or their asset leverage or their perceived capital-at-risk).  For those reasons too it would be of interest for regulators.

VaR is still useful in hedge funds, and risk policies often express a limit as a a fraction of AUM.  For example, the firm might state that it aims to manage risk in a way which keeps capital usually below a VaR limit of 1% of AUM.  This can be tracked over time to give a useful measure of risk taking.  It will indirectly guide the capital allocation function in setting risk capital to traders.

However, precisely the same risks of complacency apply here - VaR doesn't predict anything, and being regularly under the stated level should give an investor only a passing feeling of comfort that tail risks are being managed better.

Operationally, calculating a firm's VaR is a labour intensive process - who's looking after the effects of all those illiquid and private assets as they're fed into the VaR machine?  Who is modelling and managing all the correlations and volatilities?  If you're using historical implementations of VaR, how are convertibles behaving as you roll back the corporate events?  Who handles the effect of unusual splits on options?  How do you model M&A deal risk?  SPACs?  Etc Etc.

1997 VaR found a further role as the quantitative measure of choice for making public statements about the degree of firms' derivative usage, following an SEC 

Basel II, 1999, which now just now (2020) mostly implemented, solidifies VaR as the preferred measure of market risk.  It identifies and places an onus on related institutions to quantify and incorporate three components of risk in their capital adequacy calculations, namely market, credit and operational risk.  These three numeric inputs feed into a  minimum capital calculation, the threshold of which is set by regulators.  VaR is the preferred market risk measure; there are more procedural and varied options for the other two, based on how large and complex the organisation is.

For advanced (in-house) credit measures, the firm makes an estimate, for each of its significant creditor risks, the probability of default, the loss given default, and the expected exposure at default.  For advanced operational risk  measures, you take as a starting point all your business line gross incomes and multiply them by a scaling factor to estimate the risk to each business line).

Conditional VaR, also known as expected shortfall or tail loss, provides a more accurate estimate (and larger) of the expected loss than plain VaR.  The following article spells that out nicely.   The essential point is worth quoting:
A risk measure can be characterised by the weights it assigns to quantiles of the loss distribution. VAR gives a 100% weighting to the Xth quantile and zero to other quantiles. Expected shortfall gives equal weight to all quantiles greater than the Xth quantile and zero weight to all quantiles below the Xth quantile. We can define what is known as a spectral risk measure by making other assumptions about the weights assigned to quantiles.

However even here the hedge fund faces the same problem of feeding a potentially large VaR engine with many recalcitrant asset types in order to produce a meaningful output.  Given that, hedge funds sometimes try to measure firm risk via a collection of scenario models, together with specific hand crafted 'worst case' analyses of major risks/trades.  These worst case analyses, however, sometimes don't have correlations burned into them, hence aren't so useful as firm measures of risk in the same way that VaR can be, but they can be used to manage trading and investors also find them interesting when aggregated chronologically as measures of concentration through time.

Basel III - November 2010 (post the 2008 financial crisis) but implementation extended to Jan 2022.
Its goal was more work to be done on capital requirements and to bring down leverage.  Broadly there were three limits here - again capital adequacy set as a fraction of risk weighted assets, a non-risk weighted leverage ratio based on tier 1 capital and total balance sheet exposure and finally a liquidity requirement whereby the firm has to prove it can cover with liquid assets 30 days of expected outflows.  Capital adequacy and leverage are both very closely related, and I think of the leverage addition as a way of further tracking derivative exposure.  Hence capital adequacy and leverage are both size measures and liquidity is a flow measure.  Of course, liquidity also affects risk, and hence the risk weighted capital numbers, so in a way all three cohere.


From this it becomes clear just how clearly a modern risk manager is a child of the Basel accords, and those in turn are children of that time when for the first time trained statisticians and mathematicians faced the aftermath of the unpredictable turn in business cycles.

Generally Basel says to relevant institutions: (1) you must calculate these items; (2) you must allow us to make sure you're doing it right by showing us what you did, and (3) you must periodically tell the market what those numbers are too.  These are the so-called three pillars of the accord.  Step 1 is done in Risk and Compliance departments in the firm, Step 2 is largely covered by the regulatory reporting that hedge funds have to perform, and step 3 appears in marketing documentation such as offering memoranda and investor newsletters and Opera.

But really, on the assumption that tail events (and business cycles generally) are unknowable, regulators are in an impossible position here.  All they can do is make sure the risks don't grow in the normal course of events, which is definitely something.  They cannot now, or ever, I feel, see or predict a black swan in any of this data.  Lastly, expected loss is even more tenuous than VaR.  If so much doubt exists about our ability to measure even VaR at the 99th percentile, imagine how less certain we ought to be about the whole fat tail from 99 out to 100.  So how can expected loss be in any sense more accurate when it more fully resides in the unknowable region of black swan possibility?

Those of a statistical bent will immediately recognise the structural similarity between a one-sided hypothesis test and the VaR calculation.  And with that connection it becomes clear that what the originators intended was to carve away the space of unknowability, looking very much from the perspective of 'business as usual' market conditions.  It is very much a statement of how the normal world might apply and perhaps even deliberately said not very much about what might transpire beyond that threshold of normal market conditions.  Think of how it is usually expressed: "an amount of loss you aren't likely to exceed over your horizon window, given your certainty parameter".

Two of Taleb's as yet unmentioned criticisms of VaR are that it wasn't born out of trading experience, rather it was born out of quant applied statistics.  This isn't necessarily a bad thing.  And second, that traders could exploit its weakness by burying enormous risks in the final 1%.  This is why a modern hedge fund has literally thousands of relevant risk controls in place to practically mitigate this weakness.  However his point on its generation of black-swan complacency was already well made at a broader level by Minsky and others.   Perhaps my favourite quote about VaR is from Einhorn, who compared it to an airbag which works beautifully under all circumstances except during crashes.

These are both unfair, since all it takes is for the industry to realise its weakness, and the complacency argument goes away.  Hedge funds have thousands of risk controls.  Banks too.  They are not complacent as a rule, though, according to Minsky, we all vary in time in our level of complacency, which I would agree with.  Paired with other controls, and in the context of the proper level of scepticism, and with its other uses to track firm-wide consumption of risk over time, it is in fact practically useful.

Second order benefits include: you won't get good VaR until your volatilities, correlations and prices are good, so it is a fantastic tool to be able to spot modelling or booking anomalies.  Secondly you can throw it into reverse and imply out a firm-wide volatility also, and compare that to the realised volatility of observed firm-wide returns.  When also calculated at the trader and strategy level, you can put numbers on the diversification contribution each trader or trading style makes.  So-called marginal VaR recovers the prized concept of sub-additivity.  And of course conditional VaR (expected loss) makes pretence of a peak inside the black swan zone.

To summarise the Minsky point from a risk management perspective: most models are calibrated on historical data, usually recent historical data (perhaps even exponentially weighted).  When your calibration history looks euphoric, your models tell you everything is gonna be alright, which in aggregate grows confidence, which leads to growth in a feedback loop which could be described as positive-then-very-negative, when the surprise event happens, and none of your recent-history calibrated models saw it coming.  If you know this weakness and build that into your models, then your day to day returns will be so below your competitors in normal market conditions that chances are good you'll go out of business, eaten by competition before the next black swan hits.


As an interesting footnote, the Wiki page for  Dennis Weatherstone states:

JPMorgan invented value-at-risk (VaR) as a tool for measuring exposure to trading losses. The tool emerged in the wake of the 1987 stock market crash when Sir Dennis Weatherstone, JPMorgan's British-born chairman, asked his division chiefs to put together a briefing to answer the question: "How much can we lose on our trading portfolio by tomorrow's close?"

VaR of course didn't quite answer this, it answered the question: with 99% confidence, how much are we sure not to lose more than by tomorrow's close assuming the recent past is kind of like today?  Wheatherstone's question is partly answered by the  related conditional VaR, or expected loss calculation.  The question is beautifully simple to ask and devilishly hard to answer.

Less interesting footnote:  the team lead at JP Morgan for VaR creation, development and implementation was Till Guldimann, who was Swiss, and born in Basel, so VaR is simply coming home. 


VaR lead to the spin out of of RiskMetrics, the risk consultancy.A great legacy with VaR at banks was gaming the system - stuffing risk into the tail.  It is simultaneously the most you could lose 99% of the time and the least you could lose 1% of the time.  That's a fantastic definition.  If you have a lot of historical data, deal in liquid assets and have the resources to man the coal room, VaR is useful.

Saturday 4 January 2020

Advice from a dead investor: distance yourself from financial animus

I just started to read "The Intelligent Investor" by Benjamin Graham and right away I was struck by the essence of the problem he was trying to solve, in this pre-Markowitz book.  His rather practical solution to the problem of volatility of (downside) returns is to burn in the margin of safety, that is to say, to seek out value.  What you're doing there is in essence finding companies where book value is larger (by some margin) than the market price, then hoping this is enough to minimise downside variance.  Secondly, he carves away 'speculator' from his interest, leaving only the investor, who in fact also attacks volatility by virtue of being a long term investor.  

Thirdly, he seems to weigh the phenomenon of momentum trading poorly.  He sees no value whatsoever in buying when the stock price is going up, and selling when that price is going down.  In fact, this, to him, is the essence of the problem that speculators have.  They expose themselves to too much trading and too much short-termism.  But subsequent work by factor researchers confirms the hard-to-justify (on fundamental grounds) reality of a positive return to the momentum factor.  

He gives the credit to the idea of 'dollar cost averaging' to Raskob, head of DuPont who suggested in 1929 that regular monthly investments in the stock market would lead to long term success.  What I didn't realise, and it only hit me now, is that, this too is a way of defeating the vol of vol.  By investing over a period of time, you increase the probability of avoiding a large investment just before you get hit by a bout of very high (downside) volatility.  Spreading it over time means you experience closer to the 'average' volatility of that market.  Subsequent research shows that 'lump sum' investing is (on average) more rewarding than dollar cost averaging.  This may be true on average but there is no guarantee that you won' the the investor who walks right into a large downturn the week after you invest your whole lump sum.

The published version of the book which I'm reading is the 2006 reprint, with commentary provided by Jason Zweig, a personal finance journalist, and I'm finding the commentary interesting and also dated.

Graham already knew and reported on the high likelihood that precious few investors beat market averages.  He also has this strangely resonant view on trading, that enthusiasm usually leads to disaster, which I personally struggle against, but think is correct.  I have met with and worked with many traders in my life and have also made my own investments and I can confirm that many traders, even regularly successful ones, have a rather unenthusiastic persona.  Perhaps that after all is a necessary but not sufficient trait.  I have, on the other hand, also met unenthusiastic and poor traders, so I'm not entirely sure.

Finally, in a way, part of Graham''s message here is to remain calm, to quell the emotional volatility which resides in some investors hearts.  All in all, whilst volatility isn't really explicitly discussed, it is seen as an enemy, to be challenged, avoided, fought.  If one is rich enough, one can afford to accept low return and experience the low volatility of sovereign bond returns, but for everyone else, we must dip our toes in volatility.  The essence of the modern view on good investing is that one only gets paid for accepting volatility.  This emphasis isn't quite the same as in Chapter 1 of this book.