Saturday 7 September 2013

Hiatus Terminando

I largely blame Coursera.org for the drop in number of postings as I've been gorging myself on much study.  However, I feel the need to say things slowly return so I will hopefully start posting again, if I can only remember what I was talking about.  Ah, yes, perhaps I should read my recent blog posts.

Saturday 20 April 2013

The Problem of Points - Symbols

Christian Kramp invented the usage of the symbol ! to represent factorial.  The idea of $n(n-1)(n-2)..2.1$ as an interesting product which represents all the ways that n unique and completely distinct objects can be permuted dates to at least the 12th century.  To show this is the case imagine the following thought experiment.

Imagine that a man walks into one of the world's most sumptuous brothels.  His reward was to have a sexual encounter with every available woman in the brothel.  He enters a room filled with seven of the most desirable women he has ever seen.  One has jet black hair, is nearly six feet tall and has a full and firm ass.  Two is smaller with short black hair and a wicked twinkle in here eye.  Three is central Asian looking, small perfectly formed breasts.  Four is from south east Asia, an imperturbable and fiercely independent soul.  Five is from central Europe, innocently cheeky.  Six is from the Caribbean  tall and rangy.  He knows that he will fuck them all.  He picks number one, goes with her, comes back, picks number three this time.  Repeats until he is done.  This is a permutation.  A single permutation.  How many ways could he have chosen?  The answer is 6!, which   is 720.  He could have had 720 chronologically different sexual experiences.

Imagine you are him.  Imagine his decision moments.  His first decision was to pick the tall jet black haired girl.  This was a so-called free choice.  A choice of one in six.  Immediately start imagining a total of six parallel worlds.  In each world, he started with one or another of the six girls.  OK.  Focus on his reality again.  In the real world, he now has to chose which one of the five remaining girls to fuck.  Now, in each of the 6 parallel worlds, he'll similarly need to make a 1 in 5 choice.  So for each and every one of the 6 imagined worlds, a further multiplication of 5 occurs.  There are now 6 times  5 or thirty worlds which capture all the possibilities he has to chose from.  Repeat and you will find 720 different worlds.  In making his particular ordering choice,  he in essence picked one of 720 different worlds.

Without knowing the man, his sexual preferences, etc, if we had to guess his selection we could imagine each of the 720 worlds was equally likely to be chosen.  That is, you might say that the likelihood of choosing the fucking order of <tall-dark then short dark then central Asian then SE Asian then central European and finally Caribbean> was $\frac{1}{6!}$. 

Insofar as every object is uniquely distinguishable from every other object, then this idea of permutations is quite core to the most basic choosing activity - their ordering.  Visually you can see this as lining them up in a particular way, touching them in a particular, order, placing them sequentially on a particular chronology.

Imagine you got to know that some kind of selection process has 1,000,000 different distinct possibilities, the final result being a collection of 6 uniquely distinguishable objects.  Well, if you didn't actually care about the ordering your set of possibilities would be reduced down to $\frac{1000000}{6!}$.  Imagine you needed to pick 6 women to come on a journey with you.  Someone informs you that the selection process imposed on you allowed you to pick a particular order-important set of 6 women with equal chances $\frac{1}{1000000}$.  But you know that it doesn't matter to you what order they got picked.  They're coming with you all the same.  In this case, you'd divide your million possibilities by 6!.  Thus $\frac{6!}{1000000}$ is the likelihood of picking those 6 women (or indeed any 6 women).  Here you're using the permutation formula to make order irrelevant.  By knowing the set of choices implied in a strict ordering, you can use that knowledge to throw away the importance of ordering.

Klamp chose the word 'factorial' because it sounded more French than its rivals.



Sunday 14 April 2013

The road not built

My main criticism of Hayek's book was his weakness in dealing with the possibility of political mixtures.  This also extends to the titular metaphor of the book itself - the road metaphor, one leading to serfdom and the other to liberalism nicely captures the starkness of the choice Hayek wants to present.  A traveller faces a choice of selecting which road to travel down.  But in the realm of ideas, the metaphor seems to restrictive.  Ideas can be freely constructed, built, developed.    Whereas it makes no sense for a traveller to build a road which is somehow a mixture of the collectivist and the liberal road - if your destination choices were the city of liberalism and the city of collectivism, for ideas, this kind of constraint is unjustified.  We can and do build as many alternative roads as we like.

His characterisation of the totalitarian destination is a great critique.  But it leaves out the pleasure some might find in becoming infantalised by the state.  I have heard even young Russians talk favourably about Stalin, which I find deeply worrying.  Likewise, his characterisation of the ideal of a nineteenth century liberalism is intellectually bracing but it becomes hard to see how a man with cystic fibrosis, just to take an example, couldn't help feeling even in a liberal polity, somehow infaltalised and dependent on the largesse of others, and also perhaps somehow defeated by that culture.  And finally his refusal to talk about the endless variety of middle ways, those unspoken roads, unimagined destinations remains not a logical consequence of some line of reasoning but as a rhetorical device in a well-intentioned struggle to place as much distance between the society he respected and the totalitarian regime he despised.

Saturday 13 April 2013

Problem of Points - The Solution

The solution to the problem of points tells you how you would divide up the stakes in a fair game (fair in the sense of each step outcome being equally likely to favour any player) between two players A and B if A needed $n_A$ more wins and B needs $n_B$.  Pascal and Fermat both end up counting the set of all possibilities and comparing the respective counts to each other and come up with the ratio 

$\sum_{k=0}^{n_B-1}\frac{(n_A+n_B-1)!}{k!(n_A+n_B-1-k)!}$ to $\sum_{k=n_B}^{n_A+n_B-1}\frac{(n_A+n_B-1)!}{k!(n_A+n_B-1-k)!}$.  

This rather ugly looking formulation is something I'll be looking at over the next couple of posts, in a way mathematicians usually don't.  Enamoured of Euclid, they think interesting maths involves proof and concise statement.  That does not work for me.  I want to unpack it and see some examples with real numbers.  Get a feel for it in use.  And after those posts, I'll be doing the same for gambler's ruin, which I personally think has caused me to think a lot more generally than this solution to the problem of points.

Before I finish on this short post, I'd like to say that this solution to the problem of the division of stakes, if you think about it, is the price of the seat if somebody wanted to buy you out of the game.  This is the fair price of your seat at that moment, or the fair price of your position in the same.  And given that the moment in question can analysed at any point in the game, including the moment before the game starts, it also represents an algorithm for working out the fair price of the game for both players, at all points.  That is, it tells you fully at all moments in the game the expected value of each hand.

If the stakes are a value of S then the expected value of one player is 

$S\frac{\sum_{k=0}^{n_B-1}\frac{(n_A+n_B-1)!}{k!(n_A+n_B-1-k)!}}{\sum_{k=0}^{n_B-1}\frac{(n_A+n_B-1)!}{k!(n_A+n_B-1-k)!} + \sum_{k=n_B}^{n_A+n_B-1}\frac{(n_A+n_B-1)!}{k!(n_A+n_B-1-k)!}}$

and for the other just has the other sum on the numerator.

Sunday 7 April 2013

Mongrel ascendancy - how democracy can enshadow Hayekian freedom

Some more thoughts of Hayek's position on the extreme desirability of freedom.  Having read the Road to Serfdom I decided that the biggest weakness was in dealing with the possibility of mixed political systems.  Clearly he ought to have been aware that collectivist systems could survive for potentially quite long times, at the very least.  That is, he accepts that they're not inherently and immediately unstable.  Given this, why is he so weak on the possibilities inherent in mixed styles, somewhat Hayekian, somewhat collectivist?  I pointed out he makes a William Paley-like blunder in alleging the necessary starkness of this choice by the populace, assuming of course the populace have been afforded a voice.

I'd like to take a leaf out of Hayek's book on argumentative style and describe a possible society where collectivism wins every time at the ballot box, and Hayek would need to decide whether democracy is allowed to win in that case, or whether he can find a way for his form of freedom to win.

The set up actually contains two independent societies and in each case the electorate votes always for collectivism.  Society one is what I'll call the post-liberal society.  Imagine a society where a large majority people support a welfare state, where the government runs large fractions of industry, where a persistent and recalcitrant underclass are politically alienated, dulled, and distracted by other parts of their culture.  Where even those who are doing well depend on the state for education, health of poorer family members, support for the many many times in their careers that they lose their job.  For the purposes of the argument, I'm agnostic on how the culture got itself into that state (i.e. whether this was the effect, as Hayek might argue, of creeping collectivism's insipid effect on human culture, whether this was the effect of a cruel and unequal capitalism driven by corporate cabals expropriating increasing fractions of output in the name of capital, leaving labour less secure) or even whether this state is necessarily a bad thing.  That is, despite how I described it, I am happy to remain morally neutral on the desirability of this culture for the purposes of this argument.  Now imagine a range of political parties which vie for the votes of the masses.  One subset of those parties are pro-Hayek and another subset pro-collectivism.  It is entirely possible that for centuries, perhaps even for millennia, the populace would vote consistently for a party from the collectivist subset.  Despite all arguments made by the pro-Hayekian subset.  This was likely on Hayek's mind when he wrote the Road to Serfdom.  Democracy will have defeated that form of nineteenth century freedom Hayek prefers, and if, by hypothesis, those parties fail to persuade, the idea of Hayekian freedom is indefinitely defeated.  

OK.  Now society two.  A society of Hayekian tactical voters.   Let's imagine that a majority of them are in a similar position of government dependence as society one, but that they're all politically engaged.  And, by hypothesis, they all thought about it and decided that, in their own heads, they accepted Hayek's point about the corrosive effect on freedom that this kind of collectivism imposes.  With the extra wrinkle that, from a purely self-interested point of view, they continue indefinitely to vote for one or other collectivist parties.  The Hayekians made all the right arguments, won over all the hearts and minds, and still they chose to vote for the status quo due to a particular brand of rationalist calculation which weighed the cost of the transition to the new order too expensive relative to their present state.  Perhaps their utility function has a very steep discount curve which dramatically devalues future worth steeply, who knows.  This is either logically possible to imagine, or not.  I guess the Hayekian would argue that it was logically impossible, but I'm not so sure.  In any case, what else could the committed Hayekian do - he's won all of the arguments but finds self-interest of the mass producing a series of collectivist variants of political parties which, for generations, retain power and replicate the status quo.
Real Hayekians cannot ban collectivist parties.  Imagine Norway, year 2200.  Several generations of superb sovereign wealth fund investment decisions followed by a series of further discoveries of commodity resources just offshore have effectively allowed Norwegians to idle in relative comfort at the support of the state.  The state in turn has amassed enough capital to ensure a fairly decent standard of living for non-workers.  They lock down their borders. Those who don't like it either leave or somehow manage to run or work in private enterprises. The population growth is manageable.  Norwegian industry becomes hopelessly unproductive when compared with other nations.  They become, in effect, a rent seeking nation living off the diversified capital owned by the state.  Every Norwegian party which gets to power promises to maintain this status quo, among other things.  All the Hayekian Norwegian parties win over the populace in theory concerning the long term benefits of freedom.  But Norwegians become Augustinian in their love of it - endless deferring to a time in the future when they feel ready to switch, but never switching.

Another point missed by Hayek in his insistence that no political regime can be somewhat Hayekian, somewhat collectivist.  That in multi-party democracies where the current incumbents on occasion get ejected produces a series of striped Hayekian, then collectivist, then Hayekian, then collectivist political regimes.  This is also, in a sense, a mixed political regime.  The legislature will be similarly striped by both forms of policy approach.  The various government departments likewise will exhibit battle scars indicating the endless flip-flopping into and out of collectivist and minimal government regimes.  Assume Hayek is right. Then this regime switching is necessarily less efficient than a permanently Hayekian one.  Assume the social democrats are right.  Likewise this regime switching is necessarily less efficient than a permanently social democratic one.  The desired pattern of behaviour of democracy, namely that incumbents occasionally get defenestrated, would then, no matter who was right, result in less efficient societies.  Unless, of course, the most efficient regimes were mixed regimes.

Anyway, where do political parties come from anyway?  In particular, their policy variability?  Imagine a Hayek victory.  A party in power which implements Hayekian hegemony.  Collectivist parties lose the voter base.  For democracy to survive, the likely outcome is a series of similar, but somewhat different generally Hayekian parties.  But I would argue that you can rank all the broadly Hayekian parties which thrive and survive in this hypothetical society with respect to the degree of collectivism implicit in their manifestos.  Hayek himself saw a wide and important role for the state.  A family of Hayekian parties surely would vary in ways which could be characterised as more or less government controlled.  More or less collectivist.  Even in Hayek's wildest dreams, surely he has to face up to mixed regimes and the bare possibility that they may be better?

Thursday 4 April 2013

The road not taken

I've been reading Hayek's Road to Serfdom this Easter and hugely enjoyed it, despite a couple of significant holes in the argumentative structure.

He does a great job bringing out the positive aspects of maintaining a liberal political structure in the face of totalitarian regimes.  And I mean great.  But he is incredibly weak on the possibility of what he refers to as a Middle Way between a liberal and a planned political economy.  He gives a corking definition of a liberal social order as follows : "..in the ordering of our affairs we should make as much use as possible of the spontaneous forces of society, and resort as little as possible to coercion".  This is a great definition and works really well if you are confident in the constancy and benign effects of those spontaneous forces (benign in the sense that their benefits outweigh the benefits of benign planning).

He traces the intellectual roots of this liberal view of his through Cicero, Tacitus, Montaigne, Hobbes (implicitly), Hume, Locke, Henry Sidgwick, John Stuart Mill.  But I also read him as a pessimist, in line with Schopenhauer, who lines up in direct opposition to all things Hegelian.  This is somewhat ironic given his view that liberalism died when it hit Germany.  The other ironic point here is that he tries to come up with a reason why the collectivist socialism which was born, he claims, in Germany should have had such an effect.  He says that the ideas were ".. supported .. by the great material progress of Germany .." among other things.  But he doesn't chase this potential connection any further - perhaps the ideas themselves stimulated a successful form of post-liberal social democratic capitalism which worked quite well.  This is all apposite in the current (2013) European crisis, I thing, insofar as that particular form of German capitalism which took such a hit in the 1940s and 1950s is once again demonstrating its ongoing robustness.  At some point, if not now already, this evidence of economic success may justifiably cause a re-valuation of the forms of post-liberal social democracy and stakeholder capitalism they represent.

Whereas Hayek tries his best to demolish the alleged benefits of centralised planning by criticising even idealised forms of it, you can see how Buchanan later comes in to point out the detail of the painful reality of central planning with public choice theory.  But Hayek's generalised arguments against the planning function are quite rich and varied.

He resists a movement in the meaning of 'freedom' away from the freedom from "..the arbitrary power of
other men" to "freedom from necessity".  This is the essence of his resistance and pits him as Schopenhauer against Bismarckian Hegelian Prussia.  The social democrats see it as an evolution up the hierarchy of wants, but Hayek sees it as a wrong turn.  This is a loss of power for you - the power for an other to order a re-distribution of your wealth for an end decided externally to you.  His is a literal and metaphorical refusal to travel down a Hegelian road which he thinks always (inevitably, I'm tempted to say only it would sound too Hegelian) leads to a bad destination.

His key line: "..competition ...cannot be combined with planning to any extent we like without ceasing to operate as an effective guide to production".  This is how he argues against the 'middle way'.    Just for now, leave out 'to any extent we like' and you have a statement which is plain wrong.  How did liberalism evolve in the first place?  And in the dozens of places where it did evolve, surely it managed precisely this combination.  It kind of reminds me of William Paley's famously wrong argument against evolution with its metaphor of the perfection of the eye's design.  I think Hayek sees  liberalism a bit like this.  Now the phrase I left on the side 'to any extent we like'.  Well, when you add that back in, the claim becomes incredibly weaker, merely stating that some limits must exist for the combination of liberalism and central planning you're now considering.

On that same page as the above quote, he suggests the two approaches are, when combined "..poor and efficient tools if they are complete", a classic Paley-like comment.  He goes on: "they are alternative principles used to solve the same problem, and a mixture of the two means that neither will really work and that the result will be worse than if either system had been consistently relied on".  This is, it seems to me, utterly unjustified by him anywhere.

The book makes wonderful contrasts between this artificially black and white choice and does so at precisely a moment in recent world history where the blackness and whiteness seemed most justified, mid-way through the second world war.

In chapter 4, he does a decent job of knocking out collectivist arguments by debunking the so-called inevitability of the breakdown of competition as capitalism evolves through accumulated technological progress.  I think it is one of his strongest chapters.

He's on shakier grounds in dealing with planning and democracy, in chapter 5.  He bemoans the vagueness of collectivist political ends whilst failing to notice precisely the same vagueness of a Sidgwickian utilitarianism.  He thinks he can see a withering or failure of a complete ethical code, something he says the collectivists need, but one argument for this withering is a reduction in the number and generality of ethical rules.  But their number and generality are neither here nor there.  What matters surely is the quantity of their real effect.  His pessimism on our own natural tendency to understand and favour our own kind, our own community leads him to happily abandon hope of any kind of collectivist planning, but he's happy to see the possibility of a liberal-inspired international federated political order.  Though, to be fair to him, he does distinguish between the kind of economic planning collectivists seek and the liberal 'negative' Rule of Law based planning which sees the creation of international federated political institutions.

For him the Rule of Law  ".. means that government in all its actions is bound by rules fixed and announced beforehand".  I could imagine middle way social democrats as happy to sign up to this with no fear of logical contradiction.  However when he claims that collectivist planning ".. necessarily involves deliberate discrimination  between particular needs of different people, and allowing one man to do what another must be prevented from doing" my first thought was that this is what a market also does.

He's back on form in the chapter on economic control and totalitarianism which a large number of on-target criticisms of some of the collectivist's favourite arguments.

A key point in his chapter 'Who, whom?' is the choice he thinks which we have to make in two systems around who does the planning of whom.  Either a small set of planners, versus a combination of our own individual enterprise together with a large dose of randomness.  But the success of individual enterprise is itself a function, partly, of capital, so we're right back to a small set of lucky planners.  And surely, at the macroeconomic level, if you actively chose option 2, namely randomness and enterprise, then institutionalising randomness like this we're forcing it to remain forever random.  Spelled out, it is like saying: do not try to fathom the business cycle as any attempt to understand it with a view to mitigating it is necessarily impossible.  This stationary position didn't win him too many friends going into the great depression, it is true, but chimes with his generally much more pessimistic view of the limits of human reasoning.  He spells out the likely psychological and economic consequences of a planned collectivist society well here too.  However, having argued in favour of randomness and enterprise picking winners, by chapter 9, on security, he's arguing that mitigating business cycles via the good kind of planning is something worth doing.  However he's withering on other forms of economic security or protectionism, and he tackles these by showing their unintended consequences which are, he believes, self-defeating in the long run.  This Schumpeterian view does strike the modern reader as harsh.  Ignatief's 'The needs of strangers' is an old favourite of mine and makes a decent case for a degree of economic security for citizens.

As I've pointed out, Hayek doesn't deliver on knock out arguments against a third way between liberalism and a splash of collectivist planning, something which seems a priori worth much deeper investigation by him.  

Secondly, his anti-Hegelian spirit may have left him with too static a view of the ideal of liberalism.  Surely he ought to be more pragmatic in the Rortian sense of avoiding such a historically rooted final vocabulary?  

Third, he appears to be peculiarly anti-Fractal in his implicit raising of the nation state to a level of importance which he doesn't justify. He clearly loves the individual and sees societies of individuals working best when there's no centralisation or concentration of power.  And this reasoning he extends to nation states, then finally to supra-national federations.  But why those boundaries?  Couldn't he work 'inwards' to fiefdoms and cliques below the level of nation states?  He talks often of collections of 'small nations' and perhaps this is implicit in his position - the deconstruction of the larger states into many more mini-states and state-lets.  But he would prefer his aggregated collections of humans to reflect precisely the same kind of liberalism?  Or variants?  Or other forms, including totalitarianism?  How does the variance of political structure compare to the variance amongst real individuals' motivations?  And on a galactic scale?  What if planet earth was one instance of a form of political life, against a whole universe of alternative forms?  Wouldn't nation state-hood seem somewhat parochial in this context?  You certainly need to see nineteenth century British liberalism in a historical context.  Seeing it as such of course doesn't force you to abandon liberal principles, but you may need to argue harder for them.

But for all that, Hayek's message is powerful - we are often tempted by the social utopia of collectivism but it pays us to look beyond our emotional response to the desirability of these utopias to the unintended consequences they may bring.  This scepticism we must keep, but balanced with optimism and what Rorty called called solidarity.

Saturday 30 March 2013

The absent-minded psychic

Neo-classical economics makes dramatically different assumptions for how the head of an economic agent works when looking into the future than when looking into the past.  This looks like mathematical expediency to me.  Looking backward.  The efficient market hypothesis says that there's no point looking at any point beyond the most recent price, since that market price fully reflects all of the information available to rational agents.  This fits nicely with Markov model mathematics, memoryless sources of randomness.  On the other hand, when looking into the future, the rational expectations theory suggests that the agent has perfect (in the sense of being as good as it could ever be) foresight of the downstream consequences of hypothesised economic choices.  This fits nicely with optimisation mathematics, calculus, iterated game theory. Insofar as, for them everything is a market within which they make long term equilibrium-oriented decisions, they can operate with no memory and ideal foresight.

Monday 25 March 2013

The patron saint of quants


Pascal was a competent but by no means brilliant mathematician.  He designed and set into production a mechanical computer.  He leveraged off the back of superior mathematical talent.  He is said to have invented decision theory with his wager, giving birth to a whole industry, via Daniel Bernoulli, Bentham, Samuelson down to Black, which inappropriately attempted to extend the concept of fair value calculation way beyond any practicable remit.  Pascal  briefly hung around with wealthy gamblers who used his intellectual horsepower to help enrich themselves.  He quit his game while still relatively young with more than a few regrets.  Pascal was the patron saint of quants

Probability preferences : the source of randomness is not the game

Fermat's version of the solution to the problem of points was to create a grid of possibilities reaching fully out to the point beyond which no doubt could exist as to who the winner would be.  This grid of possibilities included parts of the tree which, on one view, would be utterly irrelevant to the game in hand, and on another view, incorrectly modelled the set of possibilities embedded in the game.

Pascal's solution, by way of contrast, was a ragged tree of possibilities stretching out along each branch only as far as was needed to resolve the state of the game in question, and no further.

Pascal additionally made the mistake, in interpreting Fermat's solution, of ignoring order when tossing three dice/coins and in this mis-interpretation came up with an answer in the case of three players which diverged from his own reverse recursive solution based on the principle of fair treatment at each node of his ragged tree.

Because Pascal's wrong-headed idea of Fermat's solution did not match his own, he jumped to the conclusion that what must be wrong in Fermat's method was the extension of the tree of possibilities beyond those parts which the game in hand required.  Pascal consulted Roberval on the likely legitimacy of this fully rolled out tree of possibilities and Roberval seems to have told Pascal that this is where Fermat is going wrong, namely that this 'false assumption' of theoretical play of zombie-games leads to bad results.  It doesn't.

The evolution in time of a source of randomness was seen clearly by Fermat as separate from the rule, game or activity sitting on top of it.  In this case the game was the 'first to get N wins'   Modern derivatives when tree based methods are used all apply this same move.  First the random process's set of possibilities are evolved on a lower, supporting layer, then the payoff of the contract is worked out at the terminal time horizon.  Both in De Mere's game and with an option, there's a clearly defined termination point.  With De Mere's game, the point happens when the first player reaches N wins.  With options, the termination point is the expiry of the option.  Gambler's ruin, as I'll discuss later, doesn't have such a straightforward termination point.  So step 1 is to lay out all the possible states from now to the termination point, the tree of possibilities for the stochastic process.  Then you work out the terminal value of the contract or game and use Pascal's fairness criterion to crawl back up the second tree, until you reach the 'now' point, which gives you the fair value of the contract.  This is the essence of the finite difference solution set, and it works for path dependent and path independent pricings.  The implications of the game is that the tree is re-combinant, which means the binomial coefficients become relevant when working out the probability that each path is traversed.

Fermat has a clearer and earlier conception of this separation.  But Roberval and Pascal were right to flag this move up - what grounds did Fermat give for the move?  In modern parlance, we can see that the stochastic process, often a stock price or a spot FX or a tradeable rate, is independently observable in the market.  But back then, Pascal was struggling to separate the game from the source of randomness.  F. N. David suggests that Pascal sets Roberval up as the disbeliever as a distancing mechanism for his own failure to grasp this point.  Likewise, David suggests perhaps Pascal only solved his side of the problem after initial prompting from Fermat, in a letter which starts off the correspondence but which unfortunately no longer exists.

Of course, this isn't a solution of an unfinished game, but the fair value of the game at any point during its life. Each author I read seems clear in his mind that one other other of the great mathematicians' solution is preferred.  Is this just ignorance, aesthetic preference masquerading as informed opinion?  Yes, largely.  But my own opinion is that the both solutions share many similarities - both need to evolve a tree of possibilities, a binary tree, for which the binomial coefficients come in handy as the number of steps increases.  Both then involve evaluating the state of the game at the fixed and known horizon point.  Fermat's tree is a set of possibilities of a stochastic process.  His solution takes place exclusively at that final set of terminal nodes, but working out the ratio of the set of nodes in which player A is the winner over the total set of terminal nodes.  Pascal's tree is the tree of game states.  He reasons in a reverse iterative way until he reaches the start point, and the start point gives him his final answer.  The arithmetic triangle could help both these men build their trees as the number of steps increases.

Friday 22 March 2013

Probability preferences : expectation is secondary

I didn't realise counting was so important to the theory of probability.  First you have the simplified sub-case where all N disjoint outcomes are mutually exclusive, in which case you can use combinatorics to estimate probabilities.  Combinatorics just being counting power tools.  In effect the move is to set all of these $\frac{1}{n}$ probabilities to be mapped to the natural numbers.  Then comparing probability areas becomes a question of counting sample space elementary outcomes.  

Second, even in the case where it is a general (non equi-probable) distribution, you can look at the set of outcomes themselves and map them to a series of numbers on the real (or whole) line.  So say you have a die with six images on them.  You could map those images to six numbers.  In fact, dice normally come with this 1-to-6 mapping additionally etched onto each of the faces.  The move from odds-format to ratio-of-unity format that we see in probability theory is crying out for a second number, representing some kind of value, perhaps a fair value, associated with some game or contract or activity.  In other words, now we've partitioned the sample space into mutually exclusive outcome weights, let's look at finding numerical values associated with the various states.  When it comes to pricing a financial contract which has an element of randomness in it (usually a function of some company's stock price, which serves nicely as such a source), then a careful reading of the prospectus of the derived instrument ought to be able to be cashed out in terms of a future value, given any particular level of the stock.

I've seen Pascal's wager claimed to be the first use of expectation in a founding moment for decision theory.  By the way, that's a poorly constructed wager since it doesn't present value the infinite benefit of God's love. That could make a dramatic difference to the choices made.  Anyway, Huygens himself wrote about expectations in his probability book, but for me, the warm seat problem (the problem of points) represents an attempt to find a mean future value starting from now during a game.  This is an expectation calculation, even though the word may not have been used in this context.

Thursday 21 March 2013

Warm Seat

I am really rather pleased with my reading of the history of the theory of probability.  Four points struck me about it, firstly that Cardano has a much stronger claim than the authors of histories of probability give him credit for.  Second that Pascal was wrong in criticising Fermat's combinatorial approach in the case of more than two players in the problem of points and that his mistake was an equivalence class / ordering misunderstanding about the reading of three thrown dice.  Third, that Pascal's solution is a bit like using dynamic hedging for an exotic option (one which doesn't exist yet, but which I'll call a one-touch upswing option).  And fourth, that Huygens's gambler's ruin can be made into a problem of points by using participant stakes and separately some tokens which are transferred from the loser to the winner after each throw.  On the last three of these points Todhunter and the authors Shafer and Vovk agree with me, variously.

A better name for the problem of points is the warm seat price.  And the original first-to-six game, and also Gambler's ruin with plastic tokens and stakes can both be seen as specific games for which there's a warm seat price - the fair value of the game for a participant if he wanted to get out of the game immediately.  Gambler's ruin doesn't have a definite time in the future at which point it will with certainty be known who the winner is.

It is also amusingly my warm seat moment since I didn't discover anything myself, but followed in other peoples' footsteps, and have experienced the warm seat experience of discovery others had made before me.

Wednesday 20 March 2013

Probability preferences: the irrelevance of parallel/sequential distinction

In a sense, whether you throw one die sequentially n times to get a $6^n$ event space, or whether you simultaneously toss n distinguishable dice at one time, it doesn't matter.  As long as you read your die results in a way which preserves the identity of the die the number appears.  I'll leave off talking about what implication this has for the famous Pascal-Fermat problem of points until a later posting.  For now, consider what this means for the classic repeated experiment in probability theory.  If the events are genuinely independent, then it doesn't matter what relative time it is when you toss each one.  The law of large numbers could equally well be satisfied with a single massively parallel experiment in, say, tossing a coin than it is in tossing a coin sequentially n times.

Likewise in set theory, there's a curious atemporality to Venn diagrams.   And when discussing the joint probability of $A \cap B$, which is of course not the same as A then B.  Even with Bayes' theorem it is important to realise that the 'given' meaning in A|B is with respect to our knowledge of the occurrence of B, not that B happened first and then A subsequently happened.

Tuesday 19 March 2013

Probability Preferences : Independence is primary, multiple random sources secondary

I have already talked about the absolute importance of the idea of mutual exclusivity, disjunction to probability theory and how it enables the addition of probabilities.  I'd now like to chat about independence. Remember I said that the pairwise disjoint sets were absolutely dependent, in the sense that knowing one happened tells you everything you need to know about whether the other happened.  Note the opposite is not the case.  That is, you can also have absolutely dependent events which are nevertheless not mutually exclusive.  I will give three examples, though of course the classic example of independence is two (or more) separate randomisation machines in operation.

Take a die.  Give each face six different colours.  Then give the faces six separate figurative etchings.  Then add six separate signatures to the faces.  When you roll this die and are told it landed red face up, you know with certainty which etching landed face up, and which signature is on that face.  But those three events are not mutually exclusive.

Take another die, with the traditional pips.  Event E1 is tossing of an even number.  Event E2 is the tossing of 1,2,3 or 4. $P(E1)=\frac{1}{2}$ and $P(E2)=\frac{2}{3}$.  The occurrence of $E1 \cap E2$ is satisfied only by throwing a 2 or a 4 and so  $P(E1E1) = \frac{1}{3}$.  This means, weirdly, that E1 and E2 are considered independent, since knowing that one occurred didn't change your best guess of the likelihood of the other.  The events are independent within the toss of a single randomisation machine.

In a previous posting, I mentioned having 52 cards strung out with 52 people, and when someone decides, they pick up a card, and in that act, disable that possibility for the 51 others.  This system is mutually exclusive.  You can create independence by splitting the audio link into two channels.  The independence of the channels creates the independent pair of randomisation machines.

As the second example hinted at, independence means $P(E1E2) = P(E1) \times P(E2)$.  The most obvious way in which this can happen over one or more randomisation machines is for it to happen over two machines, where E1 can only happen as an outcome of machine 1 and E2 from machine 2.  This is what you might call segregated independence - all the ways E1 can be realised happen to be on randomisation machine 1 and all E2s on a second randomisation machine.  Example two could be called technical independence.

As the single randomisation machine becomes more complex - 12 faces instead of 6; 24 faces, 1000 faces, a countably large number of faces, it becomes clear that independence of a rich kind is entirely possible with just one source of randomness.  Another way of saying this is that multiple sources of randomness are just one way, albeit the most obvious way, of achieving independence.  Hence relegating that idea to the second tier in importance.

One gambler wiped out, the other withdraws his interest

In so far as odds are products of a book maker, they reflect not true chances but bookie-hedged or risk-neutral odds.  So right at the birth of probability theory you had a move from risk-neutral odds to risk neutral slices, in the sense of dividing up a pie.  The odds, remember, reflect the betting action, not directly the likelihood of respective outcomes.  If there's heavy betting in one direction, then the odds (and the corresponding probability distribution) will reflect it, regardless of any participant's own opinion on the real probabilities.  Those subjective assessments of the real likelihood start, at their most general, as a set of prior subjective probability models in each interested party's head.  Ongoing revelation of information may adjust that probability distribution.  If the event being betted on is purely random (that is, with no strategic element, a distinction Cardano made), then one or more participants might correctly model the situation in a way which is as good as they'll want, that is immune to new information.  For example, the rolling of two dice and the relative occurrence of pips summing to 10 versus the relative occurrence of pips summing to 9 is the basis of a game where an interested party may well hit upon the theoretical outcomes implied by Cardano and others, and would stick with that model.  

Another way of putting this is to say that probability theory only co-incidentally cares about correspondence to reality.  This extra property of a probability distribution over a sample space is not in any way essential.  In other words, the fair value of these games, or the various actual likelihoods are just one probability distribution of infinitely many for the game.  

Yet another way of putting this is to say that the core of the theory of probability didn't need to require the analysis of the fair odds of a game.  The discoverers ought to have been familiar with bookies odds and how they may differ from likely outcome odds.  Their move was in switching from hedge odds of "a to b" to hedge probabilities of $\frac{b}{a+b}$.  That it did bind this up with a search for fair odds is no doubt partly due to the history of the idea of a fair price, dating back in the Christian tradition as far back as Saint Thomas Aquinas.

Imagine two players, Pascal and Fermat, playing a coin tossing game.  They both arrive with equal bags of coins which represent their two wagers.  They hand these wagers to the organisers, who take care of the pair of wagers.  Imagine they each come with 6,000,000 USD.  The organisers hand out six tokens each , made of plastic and otherwise identical looking.  Then the coin is brought out.  Everyone knows that the coin will be very slightly biassed, but only the organisers know precisely to what degree, or whether towards heads or tails.  The game is simple.  Player 1 is the heads player, player 2 tails.  Player 1 starts.  He tosses a coin.  If it is heads, he takes one of his opponent's plastic coins and puts it in his pile.  If that happened, he'd have 7 to his opponent's 6.  If he's wrong, then he surrenders one of his tokens to his opponent.  Then the opponent takes his turn collecting on tails and paying out on heads.  The game ends when the winner gets to have all 12 tokens and the loser has 0 tokens.  The winner keeps the 12,000,000 USD, a tidy 100% profit for an afternoon's work.  The loser just lost 6,000,000 USD.  Each player can quit the game at any point.

Meanwhile this game is televised and on the internet.  There are 15 major independent betting cartels around the world taking bets on the game.  In each of these geographic regions, the betting is radically different, leading to 15 sets of odds on a Pascal or a Fermat victory.

Totally independent to those 15 cartels of betting, there are a further 15 betting cartels which have an inside bet on, which pays out if you guessed who would see 6 victories first, not necessarily in a row.

Now this second have is inside the first, since you can't finish the first game unless you collected 6 points too.  Pascal and Fermat don't know or care about the inner game.  They're battling it out for total ownership of the tokens, at which point their game ends.  The second betting cartel are guaranteed to finish in at most 11 tosses every time, and possibly as few as 6 tosses.

Just by coincidence, Fermat, player 1, gets 4 heads in a row, to bring him to 10 points of total ownership of all the tokens.  He only needs 2 more heads to win.  At this point Pascal decides to quit the game.  To betters in cartel 1 it looks like Pascal and Fermat are playing gambler's ruin, to cartel 1 it looks like they're playing 'first to get six wins', which is the game the real Pascal and Fermat analyse in their famous letters.

Soon after, Pascal's religious conversion wipes out his gambling dalliance, and Fermat, only partly engaged with this problem, withdraws his interest.  Both men metaphorically enacting gambler's ruin and the problem of points.

Monday 18 March 2013

Probability Preferences: Conjunction and Disjunction are primary, Disjunction more so

Making addition work with sets is the heart of probability theory.  Sure, a probability was really just a way of re-expressing odds, and that had been known about for ages before Cardano.  Odds of n : m means that the event has probability $\frac{n}{n+m}$, which allows you to work out a weighting.  But apart from nailing those numbers down to the range 0 to 1, the primary basic rule of probability can be thought of as specifying the conditions under which disjunction works at its most simple.  That is, it lays out what must be true about A and B to allow you to say that $P(A \cup B) = P(A) + P(B)$.  Set theory, of course, has its own history and life outside of probability theory, but probability theory becomes parasitic on set theory insofar as the general descriptions of events use the language of set theory.  In set theory, events are said to be disjoint when there's no possibility of overlap  The events have their own autonomous standalone identities, so applying the union operator allows for no possibility of double counting.  If you define a number of events $A_i$ and you want to state that there's no overlap anywhere you say they're pairwise disjoint, meaning that each and every pair from that list is disjoint.  We sometimes say the events $A_i$ are mutually exclusive.  What we don't say often enough is that this means they're utterly dependent on each other.  The occurrence of some particular of the $A_i$ disjoint events, say, $A_12$ tells you certainly that all the other events did not happen.  This is complete dependence and it isn't obvious just by looking at the corresponding Venn diagram.  If you have a full house, so to speak of events such that $A_1 \cup A_2 \dots  A_i = S$, the entire sample space of possibility, then you've fully specified a probability model and can say $P(\bigcup_{i=1}^\infty A_i) = \sum_{i=1}^\infty P(A_i)$.  With this single condition, the addition of probabilities is born.  It is a very constrained sort of addition, to be sure, since no matter how many disjoint events in your experiment, even an infinite number of them, your sum of probabilities (all those additions) will never result in a number greater than 1.

Examples of these utterly dependent events.  Rolling a die and getting 'red face up' (with a red-orange-yellow-green-blue-indigo die), tossing a coin and watching it fall heads-up, selecting the six of clubs by picking randomly from a pack of 52 playing cards, rolling a traditional pair or dice and getting a pair of sixes, rolling a die and getting an even (not an odd) number of pips face up.

With dice and coins, notice that it is in spinning them that we rely on their shape to select precisely one of n possibilities.  We initiate the random event and a separate object's physical shape guarantees the one of n result.  With picking a card, we initiate the random event but it is additionally in our act of selecting that we guarantee leaving the remaining 51 cards unturned.  Strictly speaking, the randomising action has already happened with the cards, when they were presumably shuffled thoroughly.  The shuffle, the toss, the flip.  These are the randomising acts.  With the toss and the flip, imagine the viewer closes his eyes on the toss and the flip.  Then he's in the same uncertain state as the person about to pick a card from a shuffled deck.

Notice you can fully specify a probability model with a complete set of pairwise disjoint events even if the events in question aren't elementary - the example above which I gave is of rolling a die and getting odd or even.

If I gave half the playing cards to one person and the other half to another person, perhaps in a different room, then if there was no form of communication possible between them, then we wouldn't have pairwise disjoint events across all 52 cards.  We'd have a pair of 26-card pairwise disjoint events, each of which was independent from the other.  Imagine if I gave one card each to 52 different people, in different countries.  Imagine further that I told them they could turn over their one card whenever they wanted, and as soon as they did so, to press a buzzer which had the effect of disabling the remaining 51 cards so that they could not be turned over.  Ignoring messy practical reality here, then there's no shuffle.  There's no natural sort order which could be applied to the geographical distribution of the people and cards, so no sense of working out whether they were randomly distributed in space.  Still, the buzzer and disabling devices make this a coherent utterly dependent trial.  

This requirement which allows simple addition of probabilities has implications for the randomisation machine - if it is to co-ordinate precisely 1 of n outcomes, then all n outcomes must be co-ordinated or constrained by someone or something.

Conjunction, on the other hand, cannot work in a world of mutually exclusive events.  By definition, there is no overlap anywhere.  So the major set up axiom of probability theory identifies a set of events on which it is impossible to perform intersection (probability multiplication).

In summary, the basic axioms of probability nail it as a real number in the range 0 to 1, and identify a set of events on which natural addition is absolutely possible and natural multiplication is absolutely impossible.  Finally, when you have a set of mutually exclusive, absolutely dependent events which cover all the outcomes of a trial, then the set of events is called a partition of the sample space.

Friday 15 March 2013

Probability Preferences: Equivalence Classes are primary

Another dimension of relevance when trying to judge what you might fairly expect is the achievements of the founder of probability theory is the idea of an equivalence class.  At its most fundamental, a randomisation device is a piece of technology which has $n \geq 0$ states, possibly infinitely many in the continuous case.  It is said to be in only one of those states, and the likelihood of it being in the $n_i$th state is $p_i$, the probability.  As mentioned in yesterday's post, there's no requirement that these probabilities follow any pattern whatsoever other than the primary one, namely that their sum is 1.

Start by imagining a traditional die, which has six distinctly marked faces, each with a different number of pips.  The fact that these faces are mutually distinguishable is important, not the fact that the distinction is achieved with pips indicating the numbers 1 to 6.  It could just as easily have been 6 colours or 6 pictures.  We can refer to this randomisation machine's $n$ states as its elementary states, its elementary outcomes.  It will have some particular 6-state discrete probability distribution, that is to say, some set of six probability numbers in the range 0 to 1 with the single additional constraint that $\sum_{i=1}^6 p_i =1$ 

Now imagine a different die, one which had on three faces a common colour, and on the other three faces a second colour.  This randomisation machine operates like a coin - it will have some particular 2-state discrete probability distribution.

Now imagine all possible combinations of six different colours written on the faces of dice.  That's a lot of dice, each with its own number of states, with its own discrete probability distribution.

Finally, imagine a die with no face markings.  This represents the minimal 0-state machine which doesn't technically get to be called a randomisation machine, since there's no uncertainty in rolling it.  But for completeness you can see how it fits in.

Without knowing anything about the particular probability distribution of a die, you can see that the first die I mentioned, the one with 6 distinct faces, somehow provides the most randomness.  That is, if you first build the die (and therefore fix its probability distribution), when you come to the decision of how to label its faces, there's something natural feeling about having all the available slots differently labelled.  It is more efficient, less wasteful of capacity, more generative of randomness.  In terms of information theory, it is the maximum entropy choice, given any particular probability distribution.  The maximum entropy choice would have you use all available slots on this randomisation machine, all other things being equal.  Likewise the faceless die is the minimum entropy configuration.  And in between, each and every labelling can be ranked by entropy.  The die's entropy must therefore be a function of the number of distinguishable states in the randomisation machine.  

Next let's turn our attention to the probability distribution we can construct for die.  To do that, let's hold the face-painting choice constant and go for the option of 6 distinguishable faces.  There are in theory an infinite number of sets of 6 real numbers which fall between 0 and 1 and which sum to 1.  For this 6 distinct-faced die, we can rank each and every one of them by entropy.  We'll discover that the maximum entropy probability model is the one which has equal probabilities for all faces, namely all faces have a probability of $\frac{1}{6}$.  And, for a 6 faced die, the minimum entropy die would be one where it was impossible to get 5 of the faces, but completely certain you'd get one particular face.  How to build such a die in practice is a different matter, but it isn't relevant here.

Now realise that you can run the same analysis for not just the 'maximum entropy' 6 face-distinguished die, but for all labellings down to the faceless die.  And there's a kind of global maximum and minimum entropy pair of dice in this universe of all possible dice, namely the equi-probable 6-label die and the totally faceless die.  And all couplings in between can get ranked by entropy.  When you tell a randomisation machine to produce for you an observable state (that is, when you roll the die), you get the most information out of it when you're rolling the maximum entropy die.

 It is a nice way of characterising a randomisation machine.  Knowing the maximum number of distinct states it is in.  That seems a kind of permanent, almost physical aspect of the machine.  Likewise, the probability distribution seems somehow 'built in' to the physical machine.  Of course, the machine doesn't need to be physical at all.  Still, the machine gets built and it is kind of natural to imagine a particular probability distribution burned in to the 'maximum entropy' face-painted die.  This is where we started.  Now, imagine we took that particular die - in fact, lets just for the sake of argument give it, off the top of my head, the distribution $\frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{12}, \frac{1}{4}$.  I could have picked equi-probable but decided not to.  And I paint the colours red, orange, yellow, green, blue, indigo on it, some colours of the rainbow.

It is done, the die is built.  But wait.  I decide on the following rule.  I want to make this particular die behave like a coin flip.  Rather than re-paint, I just decide in my head to count red or orange or yellow to represent one state, and the other three colours to represent another.  This is an equivalence class.  It is a transformation or grouping or partition of the elementary outcomes of a randomisation machine.  Likewise I can, by just deciding to do so, interpret my rainbow die castings to reproduce any particular painting.  I jjust need to remember in my head the equivalence rule.

So the rainbow die, with its specific probability distribution, is set in the middle of a room, filled with people, each with their own distinct equivalence class in mind.  Each roll of the die is seen by each person as a different outcome of his own randomisation machine.  By applying an equivalence class, you've got the randomisation machine to perform differently for you.  This is kind of like software.  The equivalence class being the program.  With each equivalence class, there's a way of rolling up the built-in probabilities to produce a transformation to a new probability distribution for that equivalence class.  Imagine the red face was $\frac{1}{12}$ and the orange $\frac{1}{4}$ and the rest of the colours $\frac{1}{6}$.  By the  red-or-orange-or-yellow versus green-or-blue-or-indigo equivalence class I have simulated a fair coin flip, even though the 'elementary' outcomes were not equi-probable.

So in a sense a so-called n-state randomisation machine describes the maximum entropy, all-states-distinguished equivalence class.  Even though I've been claiming this is natural, efficient, etc., in theory there's nothing special about this maximum entropy state except that it ranks top of the list for information content. It is as if each of the observers of the machine, through the glasses of his own equivalence class, sees a different reality, but it none of them can take the glasses off.  If you do privilege the maximum entropy equivalence class, then call all of its states elementary outcomes, elementary events or the sample space. If that's what you're going to do, then all the other equivalence classes represent composite events, or simply events, and you can work out the probability of these events by rolling up their constituent probabilities.   Executing or running a randomisation machine can then be said to reduce uncertainty in a disjoint set of n possible outcomes.  That is, a randomisation machine is a chooser.  It picks one of n.  It is an OR-eraser.  The concept of OR is primitive, it exists at the elementary outcome level.  It is a particularly tight kind of OR - one which is exclusive and all-encompassing. In other words the OR-eraser which is the random event picks exactly one of the n elementary outcomes.  If the act of causing a randomisation machine to run is an act of OR-erasure.  At the level of a single randomisation machine, there's no concept of AND.  A single choice, 1-of-n, is made.  At the equivalence class level the construction of the equivalence class can involve OR-construction (disjunction) and AND-construction (conjunction).

As I mentioned last night, the best a single die can hope to achieve is the uncertainty reduction of about 2.58 bits.  That's its maximum entropy.  The formula is $\sum_{i=1}^6 p_i \log p_i$.  This quantity is purely a function of the probability distribution, as you can see, but you should remember I chose colours as elementary outcomes partly because there's no natural mapping on to a number.  In this sense information is more fundamental than expectation, which I'll mention more of in another posting.  

My thought experiment of multiple people looking at the result of a randomisation machine's single run and seeing different (non-elementary) outcomes is clearer in the act of picking a random playing card.  Participant 1 sees a Queen of Hearts, another sees a Queen, another sees a Heart, another sees a Heart, another sees a Face card, etc.  And those are only the 'semantically coherent' equivalent equivalence classes - there are in face a whole bunch more.  

Thursday 14 March 2013

Probability Preferences: Event Space is primary, Equi-probable Event Space is secondary

Technically, probabilities are proportions, fractions of a nominally unitary whole.  Those proportions don't have to be the same size.  When they are, then counting tricks, combinatorics, can come into play.  In my four walls metaphor for probability the first wall is made up of bricks of uneven areas.  This the primary case in probability theory.  Understanding that you have an event space and that you sum regions of a unitary whole, this is all that you need.  With equally-sized areas, number theory tricks become relevant, since there's a mapping from each area to a whole number, and you arrive at your proportion by scaling it down by the sum of all such elementary outcomes, $\sum_n 1$

It is hugely important in my mind to see where and when numbers come into it all and at what stage.  Unevenly sized elementary outcomes don't map neatly to the whole number system, and that's OK.  On a related point, the event in question, elementary or otherwise, doesn't have to have a mapping on to a number either.  If it does, then you further can talk about expectations, functions of random variables, etc.  But you don't need that either.  What distinguishes an equi-probable random device is that this probability distribution is the maximum entropy one (2.58 bits in the case of a die, 1 in the case of a coin).  The mimimal entropy case for all randomisation devices is the one where all elementary outcomes, regardless of how biassed or unbiased the device is, map to one event.  In that case the information content is 0 and technically it is no longer a randomisation device, you've effaced its randomness, so to speak.  What makes these proportions of a unitary whole interesting is that, for any given activity, game or contract with randomness, there's a particular configuration of thee probabilities in your mathematical analysis which come close to the results you would expect if you carried out the experiment multiple times.


Isaac Todhunter's "History of the mathematical theory of probability from the time of Pascal to that of Laplace", 1865, is a key milestone in the history of probability theory.  F.N. David, also often quoted by many of the authors I've read, references Todhunter thus: "[he].. has been and always will be the major work of reference in this subject" (F.N. David, preface, ix).  Ian Hacking, in his amazing "The emergence of probability" says in the first sentence of chapter 1 "[Todhunter]...remains an authoritative survey of nearly all work between 1654 and 1812" (Hacking, p1).  Todhunter's very book title is revealing - he originates probability theory with Pascal.  This choice echoes down through all the probability books I've come across.

Todhunter was a senior wrangler, so his intellectual capacity is beyond doubt (just check out the list of former senior wranglers and the equally stellar top 12's).  He describes Cardano's "On casting the die" as a 15 page gambler's manual where ".. the discussions relating to chances form but a small portion of the treatise" (Todhunter, p2).

Cardano discusses the activity of throwing two dice and summing the number of pips across the two dice.  He lays out the theory of probability as 'proportions of a unitary whole' using the language of 'chances'.  That he chose dice rather than astragali is of merely historical interest since no doubt he is the first in the western tradition to make this proportions-as-chances analogy.  Cardano also nails the implications of all 36 elementary outcomes on the activity of 'summing the pips', which involves understanding that rolling two dice implicitly maintains a knowledge of which die is which.  In a sense, that each die is 'result-reading colour coded'.  In  a previous book he also talks about binomial coefficients, for which Pascal usually gets credit.  He performs the same analysis for three dice.  As I'll mention in a subsequent post (on parallel/sequential irrelevance), this is theoretically equivalent to predicting the future three steps out.  Keith Devlin in "The unfinished game" explicitly (and wrongly) gives Pascal and Fermat credit for this.

My suspicion is that this senior Wrangler naturally preferred the great mathematicians Pascal and Fermat and that he recoiled in disgust at the unloveable life which Cardano seems to have lived.  

F.N. David upgrades Cardano to ".. a little more achievement that Todhunter allows him but .. not .. much more" (F.N. David, p59).  Hacking ends his chapter on Cardano with this: "Do we not find all the germs of a reflective study of chance in Cardano?Yes indeed" (Hacking, p56).

Did Cardano understand the primacy of the 'variable sized brick' case?  Yes.  Hacking quotes this translated section from Cardano: "I am as able to throw 1,3 or 5 as 2,4 or 6.  The wagers are therefore laid in accordance with this equality if the die is honest, and if not, they are made so much the larger or smaller in proportion to the departure from true equality" (Hacking, p54).   F.N. David is not so sure since Cardano incorrectly treats of astragali as if they were equi-probable, though he admits this may just be due to Cardano's lack of experience with astragali.  Anyway, if not, surely you're allowed to totally mis-characterise one specific randomisation machine and still be the father of modern probability theory.

Tuesday 12 March 2013

Probability preferences

In order to support my claim that Pascal (and to some extent, Fermat) are too highly praised in the history of probability theory, I'd like to make a claim about what I see as important in the constellation of ideas around the birth of probability theory.  This is my opinion, and is based on what I know that has happened in the subject of probability theory since the time of Cardano, Pascal, Fermat and Huygens.

Concepts of primary importance in probability theory (in the pre-Kolmogorov world of Cardano, Fermat, Pascal)
  1. Event Space
  2. Independence. 
  3. Conjunction and disjunction.     
  4. Equivalence class.  
  5. Parallel/sequential irrelevance of future outcomes.  
  6. A relation between historical observed regularities and multiple future possible worlds.  
  7. A clear separation between the implementation of the random process(es) and the implementation of the activity, game, contract, etc. which utilises the source of randomness.

Concepts of secondary importance.
  1. Equi-probable event space. 
  2. Expectation.  
  3. Single versus multiple random sources.  
  4. Law of large numbers (though it is of primary importance to the dependent subject of statistics).
  5. i.i.d. (two or more random sources which are independent and identically distributed)
  6. A Bernoulli scheme
  7. The binomial distribution
  8. Stirling's approximation for n factorial
  9. The normal distribution
  10. Information content of a random device
  11. Identification of the activity, game, contract, etc, as purely random, or additionally strategic. 

I'd like to say something about each of these in turn.

Before I do, I'd like to say this - the Greeks didn't develop probability theory, as Bernstein and also David suggest, due to a preference for theory over experimentation, but perhaps because probabilities are ratios, and the Indians didn't invent base ten positional number notation until the eighth century A.D., making subsequent manipulations of these ratios more notationally bearable.  No doubt the early renaissance love of experimentation (Bacon and Galileo) may have assisted in drawing the parallel between the outcome of a scientific experiment and the outcome of a randomisation machine.

Sunday 10 March 2013

Musical Chairs

Most of the histories of probability trace their facts back to Hacking and David, and I agree these two books are the best of the bunch I have read.  The Hacking book itself references David.  I love the Bernstein book series but I noticed his page 43-44 has some musings on why the Greeks didn't bother with working out odds behind dice games.  I bet they did.  Anyway, he offers an example of the so-called sloppiness of the Greek observations of dice probabilities by mentioning some facts he gleans from David - namely that when using the astragali they valued the Venus throw (1,3,4,6) higher than (6,6,6,6) or (1,1,1,1). which are, he states "... equally probable".

 No, they are not.  David clearly states that the probability of throwing a six as one in ten; likewise with throwing a one.  And threes and fours are about four in ten events.  This means that even if  order is important, the Venus is indeed more likely.  Second mistake, there's only one permutation of four sixes, and only one permutation of four ones.  But there are many permutations of the four Venus numbers, meaning the probability of (1,3,4,6) in any permutation is even higher again than the strictly ordered (1,3,4,6).

It is this partition/permutation dilemma of probability theory, even today, which is so easy to get wrong.  I just re-read some earlier postings I made on equivalence classes and their information content and key milestones in probability theory, and I still like what I wrote.  Also check out a posting on combinatorics in Cardano and Lull.

It is just a throwaway comment in Bernstein's book and hardly invalidates his wonderful sweeping history of risk but is nicely illustrates the problems of thinking about event space and equivalence class.

Divorce born

I've been thinking about Cardano, Pascal, Fermat and  Huygens a lot recently and hope to make a number of postings.  For now I'd just like to bring some controversy to the usual story found in the literature about these characters and their relative importance.  According to this literature there are three pivotal moments - which I'll call Cardano's circuit, Pascal-Fermat's divorce settlement and Huygen's hope relating to the problems of complex sample space, the arithmetic triangle, and expected value of an uncertain outcome, or to simplify it even further, to factorial, binomial coefficients and the average, all fairly contemporaneous mathematical inventions or discoveries in the Western tradition.

The story usually told is one which lays great praise at the workings of Pascal and Fermat and which makes a big deal of the so-called problem of points.  What I'd like to do during this discussion is show how connected the problem of points is to another famous probability exercise, so-called Gambler's ruin.  I'd like to bring these two problems together and show ways in which they're related to many contemporary decision problems.  I'd also like to claim that the solution to Gambler's ruin is more important than the problem of points, and has more resonance today.  I'd also like to claim that Cardano's discussion of event space has the better claim to being the foundation of probability theory.

In all of the postings to come, I base my readings on the following books, plus free online primary sources, where available in an English translation.

One last introductory point - this thread is clearly a biassed Western history of ideas discussion.  Many of the  commentators below neglect to sufficiently emphasise the great world traditions in mathematics which played into this - especially from the Islamic, Chinese, Indian traditions.  These clearly played in to the so-called canonical view of the birth of probability but that weakness in the line of argument is a weakness for another time and another place.