Wednesday 30 October 2019

Markowitz 1952, what it does and does not do

Portfolio Selection, the original paper, introduces mean variance optimisation, it sets quantities as weights, it prioritises risk as variance, it operationally defines risk as variance as opposed to e.g. semi-variance, it gives geometric demonstrations for portfolios of up to four securities.  It comes from an intellectual statistical pedigree which is pro Bayesian (Savage).  It briefly connects E-V portfolios with Von Neumann Morgenstern utility functions.  It deals with expected returns, expected correlations.  It is neutral on management fees, transaction costs, if you would like it to be, since you can adjust your raw expected returns to factor in expected costs.

It doesn't give a mathematical proof in $n$ securities.  It doesn't generalise to dynamic expectations models $E[r_{i,t}]$ but assumes static probability distributions $E[r_i]$.  It doesn't introduce the tangent portfolio (a.k.a. the market portfolio).  It doesn't treat cash as a distinguished and separate asset class to be bolted on at the end of a 'risky assets only' E-V analysis.  It doesn't postulate what would happen if everyone performed mean-variance optimisation in the same way, i.e. it doesn't perform an equilibrium analysis.  It doesn't draw the risk-free to tangent 'capital allocation line' as a mechanism for leverage.  It doesn't assume unlimited borrowing.  It doesn't allow short positions.   It doesn't give techniques for solving the optimisation problem.  It doesn't talk about betas.  It doesn't prove which sets of utility functions in the Von Neumann-Morgenstern space are in fact economically believable and compatible with the E-V efficiency.  It doesn't just assume you look at history to derive returns and correlations and your're done.

Taking Sharpe and Markowitz as canonical, I notice that Shape seems less enamoured with Bayesian approaches (he critiques some Robo-advisors who modify their MPT approach with Black-Littterman Bayesian hooks.  For seemingly different reasons, they both end up not embracing the market portfolio/tangential portfolio idea; in Markowitz's case it is because he doesn't agree with the CAPM model assumptions which theoretically get you to the market portfolio in the first place, and with Sharpe, it is because he moved his focus from the domain he considers as having been converted already to pro-CAPM approaches, namely the professional investment community focused on accumulation of wealth, towards the individual circumstances surrounding retirees, in the decumulation stage.  However, I think, if you strip away why he's allowing more realism and individuality into the investment decisions of retirees, it boils down to Markowitz's point also.  Namely that realistic model assumptions kind of kill many flavours of pure CAPM.


Markowitz v shareholder value

Isn't it strange that Markowitz taught us that when it came to returns, maximising value is a stupid idea, whereas when it comes to evaluating the behaviour of managers in firms, maximising value stands still alone as a universal goal in US/UK models of capitalism.

Or, spelled out a little, companies are allowed to act as though they have permission exclusively to increase the share price (and hence increase the period return on the share price) as their operational definition of the goal of maximising shareholder value as opposed, for example, to maximising risk adjusted expected returns.

If risk adjusted returns are the goal for investors in portfolios of stocks, then why aren't they also the goal for owners of individual stocks.

Shiller advice to Oil heavy central banks

By the way, in the same video, did Robert Shiller really advise Norway and Mexico to take up massive short oil futures positions just to get them on to the efficient frontier?  He forgets to mention here that in doing so in such a size, you're bound to impact the underlying oil market, adversely, so that cost needs to be written against the benefit you'd have in moving closer to a more efficient national portfolio.  Another cost would be the cost of all those short futures would increase the basis between oil futures and oil itself.  You'd be paying that price on an ongoing basis as each future rolled.   Thirdly there's the mark to market issue.  Fourth there's the issue of which magnitude to short, the extracted oil only?  The total resource in the country?  Not at all quite as clear cut advice as he makes it sound here.


portfolios of asset types can contain hidden correlation

The risk of creating portfolios with asset classes is that there is hidden correlation.  For example, Shiller in this lecture, around the 55 minute mark in explaining the virtues of efficient portfolios, claims that having stocks, bonds and oil in your portfolio in some combination is a good thing, since it reduces correlation.

Well, to carry that point of efficient E-V further, you end up wanting to dis-articulate stocks into factors, since some stocks are more heavily oil sensitive than others, some stocks, with stable and predicable dividends, are more like bonds than others.  Just leaving the object set of the portfolio at assets leaves some hidden correlation off the table.  

In a sense, then, factor models are ways of taking x-rays of a security to see how correlated they are to fundamental economic elements (oil, carry, momentum, etc.)

In the limit, I think a good model needs also an element on the cyclicity of factors.  The most stable, that is, acyclic, factors are already found and have reasonable stories which persist through business cycles.  But this doesn't mean the rest of the factor zoo is for the dump.  If they can be attached to a meaningful theory of the business or credit cycle, then a factors carousel can be created.   Not all correlations are linear and constant.  Some can by cyclic, so perhaps linear regression isn't the ideal form for producing and measuring these correlations.

But getting a nowcast or forecast of economic conditions is not easy, nor do I think it properly interacts with factor models.

Portfolios of what?

Markowitz clearly had portfolios of stocks in mind.  It is also possible to see cash as another asset in the mix there, and government bonds.  But why not strategies or asset classes or even factors.  I really like the idea of strategies-and-factors.  To make this clear, imagine there was a well represented tradable ETF for each of the major strategies, being macro economic, convertible arbitrage, credit, volatility, distressed, m&a, and equity long short, commodities, carry trade.  Furthermore imagine that the equity long/short was itself a portfolio of factors, perhaps even itself an ETF.

A portfolio of factors from the factor zoo makes for an interesting though experiment.  I realise just how important it is to understand the correlation between factors.

Also, in the limit, imagine a long stock and a short call option on the same stock.  Can delta be recovered here using the linear programming (or quadratic programming) approach?  Unlikely.  But it highlights one of the main difficulties of the portfolio approach of Markowitz - just how accurate (and stable) can our a priori expected returns and expected covariances be?  

Imagine a system whose expected returns and expected covariances are radically random on a moment by moment basis.  The meaning and informational content of the resulting linearly deduced $x_i$s must be extremely low.  There has to be a temporal stability in there for the $x_i$s to be telling me something.  Another way of phrasing that temporal stability is: the past is (at least a little bit) like the expected future.  Or perhaps, to be more specific, imagine a maximum entropy process producing a high variance uniformly distributed set of returns; the E-V efficient portfolio isn't going to be doing much better than randomly chosen portfolios.

Also, surely there ought to be a pre-filtering step in here, regardless of whether the portfolio element is a security or a factor or an ETF representing a strategy, or perhaps even an explicit factor which is based not on and ETF but on the hard groundwork of approximating a strategy.  The pre filtering strategy would look to classify the zoo in terms of the relatedness of the strategies, on an ongoing basis, as a way of identifying, today, a candidate subset of portfolio candidates for the next period or set of periods.  Index trackers (and ETFs generally) already do this internally, but it ought to be a step in any portfolio analysis.  The key question you're answering here is: find me the cheapest and most minimal way of replicating the desired returns series such that it is within an acceptable tracking error.

Tuesday 29 October 2019

Markowitz the practical

The expected trajectory of the Markowitz story is, Harry gets randomly pointed to work on portfolio selection by an anonymous broker who he met in his supervisor's waiting room.  A couple of years later, randomly, William Sharpe turns up and asks Markowitz what should he work on for his own thesis, and out of this CAPM is born.   Both Sharpe and Markowitz get Nobel prizes for this, but, fast forward to 2005 and Markowitz publishes a paper which in effect, blows up the pure CAPM, his own baby.  No doubt, CAPM has been blown up many many times in the intervening 40 years, nonetheless it is somewhat surprising to see a 40 years later article from the father of modern portfolio theory criticising CAPM so roundly.

The paper in question is "Market Efficiency: A Theoretical Distinction and So What?".  Such a dismissive sounding title.  Unusually so given the academic norms, even for a publication like the Financial Analysts Journal.  I'm reading it as an argument which placed mean-variance efficient portfolios above CAPM-compliant market portfolios, and the attack is on the principle of unlimited borrowing (and/or shorting).  He very much assigns this assumption to his Nobel peer, Sharpe ( Lintner probably should be in that list too but he died seven years earlier).

He makes this rather bold claim: 
Before the CAPM, conventional wisdom was that some investments were suitable for widows and orphans whereas others were suitable only for those prepared to take on “a businessman’s risk.” The CAPM convinced many that this conventional wisdom was wrong; the market portfolio is the proper mix among risky securities for everyone. The portfolios of the widow and businessman should differ only in the amount of cash or leverage used. As we will see, however, an analysis that takes into account limited borrowing capacity implies that the pre-CAPM conventional wisdom is probably correct.
This in effect completely blows a hole in the primary element of CAPM and CAPM-related models which privilege the market portfolio as most efficient of all, and most universal.

I think Markowitz wants more life to accrue to mean-variance optimisation, for there to be more and varied applications of it, using credible, practical, defensible assumptions, assumptions which in the limit are person-specific.  He makes similar points in his Nobel speech when he says:
Thus, we prefer an approximate method which is computationally feasible to a precise one which cannot be computed. I believe that this is the point at which Kenneth Arrow’s work on the economics of uncertainty diverges from mine. He sought a precise and general solution. I sought as good an approximation as could be implemented. I believe that both lines of inquiry are valuable.

So his claim is he likes practicality, both in models and in assumptions.  It was at RAND, after all, where Sharpe met him, and where he met mister simplex, George Dantzig.  Optimisation research will get you prizes in computer science, but of course not in economics.  It is worth mentioning that Markowitz also made strides in operations research (which I think of as a branch of computer science) - for example he was heavily involved in SIMSCRIPT and invented a related memory allocation algorithm for it, together with sparse matrix code.  The buddy allocation system made its way into linux, and hence in to pretty much every phone on the planet.  The very term sparse matrix was in fact coined by Markowitz.  So as you can see, his interests were very much algorithmic and practical, whether this was inside or outside of economics.


Sunday 27 October 2019

Markowitz the micro-economist of the investor

In 1990 Markowitz was awarded the Nobel prize, so I had a read of his short acceptance speech, which quite clearly sets the scene for his work.  He describes microeconomics as populated by three types of actor - the firm, the consumer and the investor (that last one being the actor he focuses on).  He then also interestingly creates binary divisions on work in on each of these three actors.  First, the individual and then the generalised aspect of their ideal behaviour.   How ought a firm best act?  A consumer?  An investor.  After having answered these questions, the generalisation is, how would the economy look if every firm, every consumer and every investor acted in the same way.

It is worth pausing on just this point about generalisation alone.  Clearly the question of uncertainty must raise its head to our modern ear.  Can one model all firms as following he same basic template, a so-called rational template?  If we can, then we may identify an economic equilibrium state.  Likewise, with consumers, how does an economy look if everybody is consuming according to the same basic utility function.  In both of these cases, whilst uncertainty is present, and known about by economic modellers, it is given a back seat.  Markowitz accepts this, but shows how it is literally impossible to background when it comes to the actions of the rational investor, since doing so leads to a model where every investor picks the single security with the largest expected return.  This does not happen, so any model which treats risk/uncertainty poorly is insufficient.

I think it is probably widely agreed that today, models of the firm's behaviour and of consumers' behaviour is best done with uncertainty built into the model.  The old linear optimisation models accepted that variability in firms, or consumers could be averaged away.  That is, that it was a valid approach to assume minimal uncertainty and see how, under those simplifying model assumptions, equilibrium models of the economy might be produced.

But fundamentally, portfolio investing in the absence of risk makes no sense at all.  In this case, in the limit, we find the portfolio with the best expected return, and put all our money in this.  However, not many people actually do that.  So, in the sense that the micro-economic models of the investor make claims to model actual behaviour, then uncertainty must play a more prominent role.

Markowitz also hands off on 'the equilibrium model of the investor' to Sharpe and Lintner's CAPM. He is happy to see basic portfolio theory as the element which attempts to model how people actually act (hence, a normative model) and leaves positive elements to Sharpe's theory, which I think he does so with only partial success.  But  certainly I see how he's keen to do so, especially since his mean variance functions are not in themselves utility functions, and in that sense don't touch base with economic theory as well as Arrow-Pratt.

Rather, looking back on his achievement, he makes a contrast between Arrow-Pratt and his own, perhaps more lowly contribution and praises his approach as computationally simpler.  This may be true, but it isn't a theoretically powerful defence.  However, I like Markowitz, I like his lineage, Hume, Jimmy Savage and the Bayesian statistical approach.  I'm happy to go along with his approach.

I notice how Markowitz gently chides John Burr Williams for describing the value of an equity as the present value of its future dividends, instead of describing it as the present value of its expected future dividends, that is to say, Markowitz draws out that these dividends ought to be modelled as a probability distribution, with a mean and with a variance.

Markowitz also highlights early on in his career that he reckons that downside semi-variance would be a better model of risk in the win-lose sense, but he notes that he's never seen any research which shows semi-variance captures a better model than variance.  This is a rather passive backing off of his original insight into semi-variance.  Did he not consider doing any real work on this?  Is it enough for him to note that he hasn't seen any papers on this?  However, it is certainly true that there isn't a huge numerical difference in equity index returns, usually, so I could well believe this doesn't matter as much as it sounds, though it would be good to know if someone has confirmed it isn't an important enough distinction.

What Markowitz in effect did was replace expected utility maximisation with an approximation function, which is a function of portfolio mean and portfolio variance, and then he, and others later, try to reverse this back in to particular shapes of utility function.  This is where the computer science algorithm of simplex, together with the ad hoc objective function involving maximising returns and minimising variance attempt to meet top quality economic theory, as expressed in Morgenstern and Von Neumann

Markowitz then spends the rest of his lecture showing how strongly correlated mean-variance optimisation is with believable utility functions.

He wraps up, as I'm sure many good Nobel laureates do, by talking about new lines of research.  Here, he lists three: applying mean variance analysis to data other than just returns.  He refers to these as state variables.  They too could have a mean-variance analysis applied to them.  Semi-variance, as mentioned already, is another possible new line of development, and finally he mulls over the seemingly arbitrary connection between certain utility functions and his beloved mean-variance approach.    The slightly point here is that all three of these potential lines of investigation were already candidates back in 1959, yet clearly here is Markowitz in 1990 repeating them as issues still.  


Where Portfolio Selection sits

Markowitz  (1952) is in effect a connection made between a piece of new computer science (linear programming and techniques such as simplex, and generally constrained optimisation solutions which arose out of the second world war) and an application in financial theory.  He tells the admirably random story of how he was waiting to see his professor when he struck up a conversation with another guy in the room, waiting to see the same professor, the guy being a broker, who suggested to Markowitz that he should apply his computer science algorithms skill to solving finance problems.

And given this random inspiration, he later finds himself in a library reading a book by John Burr Williams and he has a moment of revelation, namely that when you consider portfolios, the expected return on the portfolio is homogeneously just the weighted average of the expected returns of the component securities and so if this was the only criterion which mattered, your portfolio would just be 100% made up of that single portfolio which had the highest expected return.  You might call this the ancestral 'absolute alpha' strategy.  In knowing this single criterion was silly, he drew upon his liberal arts background, his knowledge of the Merchant of Venice, Act 1 Scene 1, as well as his understanding of game theory, particularly the idea of an iterated game and the principle of diversification, to seek out variance as an operational definition of risk.

He now had two dimensions to optimise, maximise returns whilst simultaneously minimise variance.  And finally, when he looks at how portfolio variance is calculated, he has his second moment of inspiration, since this is not just a naive sum of constituent variances, no, the portfolio variance calculation is a different beast.  This feeling, that the behaviour of the atoms are not of the same quality as the behaviour of the mass, is perhaps also what led John Maynard Keynes to posit a macro-economics which was different in quality to the micro- or classical economics of his education.

With normalised security quantities $x_i$ the portfolio variance is $\sum_i \sum_j x_i x_j \sigma_{i,j}$.

His third great moment was in realising that this was a soluble optimisation program, soluble in the case of two or three securities geometrically, but soluble in the general case with linear programming.  Linear programming also allowed for linear constraints to be added, indeed demanded that some be the case; for example that full investment occur, $\sum_i x_i = 1$, and that you can't short, $\forall i, x_i>0$.

However, notice the tension.  We humans often tend to favour one end of the normal distribution over another whereas mathematics doesn't care.  Take the distribution of returns, we cherish, desire even, the right hand side of the returns distribution and fear the left hand side.  So maximising the return on a portfolio makes good sense to us, but variance is not left or right handed.  Minimising variance is minimising the positive semi-variance and minimising the negative semi-variance too.  This is, so to speak, sub-optimal.  We want to avoid downside variance, but we probably feel a lot more positively disposed to upside variance.  Yet the mathematics of variance is side-neutral, yet we plug straight into that maths.

Wednesday 23 October 2019

Gut, Optimisation, Gut

The way that Markowitz (1952) introduces mean variance optimisation to the financial world is as a maths sandwich between two slices of guts.  I think in the end, both those pieces of gut will prove amenable to maths too.  The first piece of so called guts is Markowitz's 'step one', the idea that one arrives though experience and observation at a set of beliefs (probabilities) concerning future expected performances (general term there, think returns, risks) on a set of risky securities.

For me, this sounds like it was already anticipating Black Litterman, 1990 approach, which was in effect to operationalise experience and observation in a process of Bayesian probabilistic modelling.  This approach is itself a form of constrained optimisation, rather like the techniques in linear programming, for example with Lagrange multipliers.  The Bayesian approach is of course not limited to linear assumptions.

Prior to 1952,  Kantorovich and then Dantzig has produced solutions to linear programming problems.  Dantzig, for example, had invented the simplex method when he misinterpreted his professor Jerzy Neyman's list of unsolved problems as a homework exercise, and went ahead and solved it.  

So Markowitz goes into this paper knowing there's a solution to his 'step 2', being an optimisation of both mean and variance in a portfolio.

Finally, the second slice of gut involves investors deciding which level of return they want, given their preference for the level of risk they're prepared to bear.  I think too, in time, this will be amenable to a mathematical solution.  That is to say, their level of risk can, to take only a single example, can become a function of a macro-economic model.

Tuesday 8 October 2019

Covariance

If  $X$ and $Y$ are random variables then their covariance is the expected value of the product of their deviations from their means.  Or in mathematical form, $\sigma_{X,Y}=E[(X-E[X]) (Y-E[Y])]$.  There's a lot of juice in this idea, a lot. But interpreting it can be hard, since the value's meaning depends heavily on the units of $X$ and $Y$.  For example of $X$ and $Y$ are return streams, if you represent the returns as percentages, e.g. 4%, 3.5%, etc versus representing them as unit fractions, e.g. 0.04, 0.035, etc, then the covariance of one would be 10,000 times larger than the covariance of the other.

You can see that the variance is in fact just the self-covariance.  That is $\sigma_X^2 = \sigma_{XX} = E[{(X-E[X])}^2]$.  So going back to the covariance between two random variables, the largest possible value for the covariance of $X$ and $Y$ is going to be when $Y$ moves exactly like $X$, is in fact $X$.  

A useful way to normalise covariance was presented by Auguste Bravais, an idea which Pearson championed.  In it, the units of covariance are normalised away by  the product of the standard deviations of the variables.  The resulting measure, normalised covariance, which ranges from -1 to +1 had become better known as the Pearson correlation coefficient, or simply the correlation, or COVAR() in excel.  $\rho_{X,Y} = \frac{\sigma_{X,Y}}{\sigma_X \sigma_Y}$.  This is easier for humans to read, comprehend and for various covariances from different contexts to be compared and ranked.  But if you are building a square variance-covariance matrix, you now know it is just a covariance matrix.  Furthermore, if you square this normalised covariance, you arrive at the familiar $R^2$ measure, the coefficient of determination, which is also equal to the proportion of the variance explained by the model, as a fraction of the total dependent variable variance, being $\frac{\sigma_{\hat{Y}}^2}{\sigma_{Y}^2}$.

If $X$ is the return stream of an equity, and $Y$ is the return of the market, then by dividing the covariance by the variance of the market return, $\sigma_Y^2$, we end up with the familiar beta of the stock, $\beta_X = \frac{\sigma_{X,Y}}{\sigma_Y^2}$.  Notice how similar this is to the so-called Pearson correlation coefficient.  In fact $\beta_X = \rho_{X,Y} \times \frac{\sigma_X}{\sigma_Y}$.  That is to say, when you scale the correlation of the security returns to the market by a scaling factor of the security returns volatility per unit of market returns volatility, you get the beta.  Beta as correlation times volatility ratio, that makes sense for a beta.

Finally, 3 rules: 
  1. if $Y =V+W$ then $\sigma_{X,Y} = \sigma_{X,V} + \sigma_{X,W}$
  2. if $Y =b$ then $\sigma_{X,Y} =0$
  3. if $Y=bZ$ then $\sigma_{X,Y} = b \times \sigma_{X,Z}$ 
And of course it is on the basis of rule (1) that Sharpe makes the development from Markowitz.