Thursday, 14 March 2013

Probability Preferences: Event Space is primary, Equi-probable Event Space is secondary

Technically, probabilities are proportions, fractions of a nominally unitary whole.  Those proportions don't have to be the same size.  When they are, then counting tricks, combinatorics, can come into play.  In my four walls metaphor for probability the first wall is made up of bricks of uneven areas.  This the primary case in probability theory.  Understanding that you have an event space and that you sum regions of a unitary whole, this is all that you need.  With equally-sized areas, number theory tricks become relevant, since there's a mapping from each area to a whole number, and you arrive at your proportion by scaling it down by the sum of all such elementary outcomes, $\sum_n 1$

It is hugely important in my mind to see where and when numbers come into it all and at what stage.  Unevenly sized elementary outcomes don't map neatly to the whole number system, and that's OK.  On a related point, the event in question, elementary or otherwise, doesn't have to have a mapping on to a number either.  If it does, then you further can talk about expectations, functions of random variables, etc.  But you don't need that either.  What distinguishes an equi-probable random device is that this probability distribution is the maximum entropy one (2.58 bits in the case of a die, 1 in the case of a coin).  The mimimal entropy case for all randomisation devices is the one where all elementary outcomes, regardless of how biassed or unbiased the device is, map to one event.  In that case the information content is 0 and technically it is no longer a randomisation device, you've effaced its randomness, so to speak.  What makes these proportions of a unitary whole interesting is that, for any given activity, game or contract with randomness, there's a particular configuration of thee probabilities in your mathematical analysis which come close to the results you would expect if you carried out the experiment multiple times.

Isaac Todhunter's "History of the mathematical theory of probability from the time of Pascal to that of Laplace", 1865, is a key milestone in the history of probability theory.  F.N. David, also often quoted by many of the authors I've read, references Todhunter thus: "[he].. has been and always will be the major work of reference in this subject" (F.N. David, preface, ix).  Ian Hacking, in his amazing "The emergence of probability" says in the first sentence of chapter 1 "[Todhunter]...remains an authoritative survey of nearly all work between 1654 and 1812" (Hacking, p1).  Todhunter's very book title is revealing - he originates probability theory with Pascal.  This choice echoes down through all the probability books I've come across.

Todhunter was a senior wrangler, so his intellectual capacity is beyond doubt (just check out the list of former senior wranglers and the equally stellar top 12's).  He describes Cardano's "On casting the die" as a 15 page gambler's manual where ".. the discussions relating to chances form but a small portion of the treatise" (Todhunter, p2).

Cardano discusses the activity of throwing two dice and summing the number of pips across the two dice.  He lays out the theory of probability as 'proportions of a unitary whole' using the language of 'chances'.  That he chose dice rather than astragali is of merely historical interest since no doubt he is the first in the western tradition to make this proportions-as-chances analogy.  Cardano also nails the implications of all 36 elementary outcomes on the activity of 'summing the pips', which involves understanding that rolling two dice implicitly maintains a knowledge of which die is which.  In a sense, that each die is 'result-reading colour coded'.  In  a previous book he also talks about binomial coefficients, for which Pascal usually gets credit.  He performs the same analysis for three dice.  As I'll mention in a subsequent post (on parallel/sequential irrelevance), this is theoretically equivalent to predicting the future three steps out.  Keith Devlin in "The unfinished game" explicitly (and wrongly) gives Pascal and Fermat credit for this.

My suspicion is that this senior Wrangler naturally preferred the great mathematicians Pascal and Fermat and that he recoiled in disgust at the unloveable life which Cardano seems to have lived.  

F.N. David upgrades Cardano to ".. a little more achievement that Todhunter allows him but .. not .. much more" (F.N. David, p59).  Hacking ends his chapter on Cardano with this: "Do we not find all the germs of a reflective study of chance in Cardano?Yes indeed" (Hacking, p56).

Did Cardano understand the primacy of the 'variable sized brick' case?  Yes.  Hacking quotes this translated section from Cardano: "I am as able to throw 1,3 or 5 as 2,4 or 6.  The wagers are therefore laid in accordance with this equality if the die is honest, and if not, they are made so much the larger or smaller in proportion to the departure from true equality" (Hacking, p54).   F.N. David is not so sure since Cardano incorrectly treats of astragali as if they were equi-probable, though he admits this may just be due to Cardano's lack of experience with astragali.  Anyway, if not, surely you're allowed to totally mis-characterise one specific randomisation machine and still be the father of modern probability theory.