## Monday, 18 March 2013

### Probability Preferences: Conjunction and Disjunction are primary, Disjunction more so

Making addition work with sets is the heart of probability theory.  Sure, a probability was really just a way of re-expressing odds, and that had been known about for ages before Cardano.  Odds of n : m means that the event has probability $\frac{n}{n+m}$, which allows you to work out a weighting.  But apart from nailing those numbers down to the range 0 to 1, the primary basic rule of probability can be thought of as specifying the conditions under which disjunction works at its most simple.  That is, it lays out what must be true about A and B to allow you to say that $P(A \cup B) = P(A) + P(B)$.  Set theory, of course, has its own history and life outside of probability theory, but probability theory becomes parasitic on set theory insofar as the general descriptions of events use the language of set theory.  In set theory, events are said to be disjoint when there's no possibility of overlap  The events have their own autonomous standalone identities, so applying the union operator allows for no possibility of double counting.  If you define a number of events $A_i$ and you want to state that there's no overlap anywhere you say they're pairwise disjoint, meaning that each and every pair from that list is disjoint.  We sometimes say the events $A_i$ are mutually exclusive.  What we don't say often enough is that this means they're utterly dependent on each other.  The occurrence of some particular of the $A_i$ disjoint events, say, $A_12$ tells you certainly that all the other events did not happen.  This is complete dependence and it isn't obvious just by looking at the corresponding Venn diagram.  If you have a full house, so to speak of events such that $A_1 \cup A_2 \dots A_i = S$, the entire sample space of possibility, then you've fully specified a probability model and can say $P(\bigcup_{i=1}^\infty A_i) = \sum_{i=1}^\infty P(A_i)$.  With this single condition, the addition of probabilities is born.  It is a very constrained sort of addition, to be sure, since no matter how many disjoint events in your experiment, even an infinite number of them, your sum of probabilities (all those additions) will never result in a number greater than 1.

Examples of these utterly dependent events.  Rolling a die and getting 'red face up' (with a red-orange-yellow-green-blue-indigo die), tossing a coin and watching it fall heads-up, selecting the six of clubs by picking randomly from a pack of 52 playing cards, rolling a traditional pair or dice and getting a pair of sixes, rolling a die and getting an even (not an odd) number of pips face up.

With dice and coins, notice that it is in spinning them that we rely on their shape to select precisely one of n possibilities.  We initiate the random event and a separate object's physical shape guarantees the one of n result.  With picking a card, we initiate the random event but it is additionally in our act of selecting that we guarantee leaving the remaining 51 cards unturned.  Strictly speaking, the randomising action has already happened with the cards, when they were presumably shuffled thoroughly.  The shuffle, the toss, the flip.  These are the randomising acts.  With the toss and the flip, imagine the viewer closes his eyes on the toss and the flip.  Then he's in the same uncertain state as the person about to pick a card from a shuffled deck.

Notice you can fully specify a probability model with a complete set of pairwise disjoint events even if the events in question aren't elementary - the example above which I gave is of rolling a die and getting odd or even.

If I gave half the playing cards to one person and the other half to another person, perhaps in a different room, then if there was no form of communication possible between them, then we wouldn't have pairwise disjoint events across all 52 cards.  We'd have a pair of 26-card pairwise disjoint events, each of which was independent from the other.  Imagine if I gave one card each to 52 different people, in different countries.  Imagine further that I told them they could turn over their one card whenever they wanted, and as soon as they did so, to press a buzzer which had the effect of disabling the remaining 51 cards so that they could not be turned over.  Ignoring messy practical reality here, then there's no shuffle.  There's no natural sort order which could be applied to the geographical distribution of the people and cards, so no sense of working out whether they were randomly distributed in space.  Still, the buzzer and disabling devices make this a coherent utterly dependent trial.

This requirement which allows simple addition of probabilities has implications for the randomisation machine - if it is to co-ordinate precisely 1 of n outcomes, then all n outcomes must be co-ordinated or constrained by someone or something.

Conjunction, on the other hand, cannot work in a world of mutually exclusive events.  By definition, there is no overlap anywhere.  So the major set up axiom of probability theory identifies a set of events on which it is impossible to perform intersection (probability multiplication).

In summary, the basic axioms of probability nail it as a real number in the range 0 to 1, and identify a set of events on which natural addition is absolutely possible and natural multiplication is absolutely impossible.  Finally, when you have a set of mutually exclusive, absolutely dependent events which cover all the outcomes of a trial, then the set of events is called a partition of the sample space.