Saturday 16 April 2011

1,2,3 Throw!


Imagine I have three dice, a red one, a blue one and a green one.  I will perform five experiments.  First note the following.  Imagine an experiment has $n_1$ elementary outcomes.  Another experiment has $n_2$ elementary outcomes, and so on, up to $n_N$ elementary outcomes for the $N$th experiment.  Imagine further that all $N$ experiments are independent of each other - the easiest way to imagine this is to picture all $N$ experiments happening at exactly the same time (for example all $N$ dice rolled at the same time.  Now switch your attention to the combined outcome possibilities of another experiment which is nothing other than the collection of all $N$ independent experiments just mentioned.  This new super-experiment has $n_1 \times n_2 \times ... \times n_N$ elementary outcomes.  In the general case, no two experiments need to be the same in any way. For example, we could combine the tossing of a die with the flipping of a coin.  $n_1 =6, n_2=2$ so the combined experiment has $n_1 \times n_2 = 6 \times 2 = 12$ elementary outcomes.  Nor does each experiment's elementary outcomes need to be equi-probable.  Imagine a loaded die with probabilities for pips respectively $ \left\{   \frac{1}{6}, \frac{1}{6}-\frac{1}{100}, \frac{1}{6}-\frac{2}{100}, \frac{1}{6}-\frac{3}{100}, \frac{1}{6}-\frac{3}{100}, \frac{1}{6}+\frac{1+2+3+4}{100}\right\}$.  If you combine it with a loaded coin, whose probabilities for H and T respectively are $\left\{ \frac{49}{50}, \frac{51}{50}\right\}$, you'd still be entitled to claim that the joint experiment had 12 elementary outcomes.  Of course, when the experiments you're in the process of repeating or parallelising all have the same number of elementary outcomes $n$, then the combined experiment has $n^N$ elementary outcomes.

The first batch of three up are experiments where the order/colour is important.  Namely a red $1$ followed by a blue $2$ does not amount to the same thing as a blue $1$ and a red $2$.

Experiment 1
I toss the red die.  This has information content $I = \sum_6 \frac{1}{6} \log_2 \frac{1}{6} = 2.6$ bits

Experiment 2
I toss the red die and the blue one.  Now $I = \sum_36 \frac{1}{36} \log_2 \frac{1}{36} = 5.2$ bits

Experiment 3
I toss the red die and the blue one and finally the green one.  This has information content  $I = \sum_{6^3} \frac{1}{6^3} \log_2 \frac{1}{6^3} = 7.8$ bits

Each new die adds another $2.6$ bits of information.  Next up are repeats of experiments 2 and 3 where I don't care any more about colour.  I'll manually create the equivalence classes this time but in another post I'll do it via combinatorics.

Experiment 4
Be very careful here.  There are two analyses of pair/no-pair equivalence classes, both of which are valid in general, but only one of which is valid for my current purposes.  To highlight the danger, I'll show you the wrong analysis first.

  1. Of the 36 elementary outcomes, I can see two equivalence classes: pairs and non-pairs.  There are only 6 pairs.  An example of one member of the non-pairs equivalence class: {Red 1 + Blue 2, Blue 1 + Red 2}.  Each non-pair equivalence class has two members, so its probability must be $\frac{1}{36} + \frac{1}{36} = \frac{1}{18}$.  If each non-pair equivalence class eats up two elementary outcomes, then there must be only 15 of them $(36-6)/2$.  So $I = \frac{6}{36} \log_2 \frac{6}{36} + \frac{15}{18} \log_2 \frac{15}{18} = 0.65$ bits.  Wrong! The mistake here is I buried all pairs together.  Hence I turned my two-throw into an experiment with only two (highly unbalanced) meaningful outcomes.  Even tossing a fair coin reveals more information than this.
  2. I calculate the information content of a typical member of each of these equivalence classes.  Then I multiply each of these two representative information counts by their equivalence class size.   Mathematically, the equivalence class sizes never appear in the log.  So $I= 6\times I_{PAIR} + 15 \times I_{NO PAIR}= 4.3$bits.
This makes sense.  Throwing away some of the information puts this scenario somewhere between tossing one die and tossing two coloured dice.

Experiment 5
I'm already expecting the result of experiment 5 to be somewhere between $5.2$ and $7.8$ bits just by analogy to what I learned in experiment 4.  I'm going to try to work out the equivalence classes without combinatorics, which as you'll see is a pain.  Actually, the pain is not in the arithmetic, but in the mental accounting for the equivalence classes - a pain you still have when you switch the arithmetic to combinatorics. Anyway, we know there are $6^3=216$ elementary outcomes.  There are three categories of equivalence class: triplets (T), all different (D), two pairs and a single (P).  Note the lesson learned from 4.1 above.  I'm not saying there are 3 equivalence classes, just 3 categories of equivalence class.

First, I'd like to know how the $216$ elementary outcomes get divided up among the three categories of equivalence class.  The triplets are easy.  I expect 6 elementary outcomes in T.  Also easy is D.  I have a free choice (from 6) for the first die, then 5 choices for the second and finally 4 for the third, making a total of 120 in the pot.  I'm just going to imply the last pot size - P must see 90.

Next I take typical examples of each category of equivalence class and calculate their information value.
$P(D)=6 \times \frac{1}{6^3}$ $P(T) = \frac{1}{6^3}$ $P(P)=3 \times \frac{1}{6^3}$

I manually worked out the number of permutations for P $\left\{112,121,211\right\}$ and D $\left\{123,132,213,231,312,321\right\}$.


Now,  for D, if I've already accounted for 6 of the 120 elementary outcomes in D, then I should expect to see $\frac{120}{6}=20$ more like that.  Similarly I should expect to see $\frac{90}{3}=30$ more of the Ps.   $I = 6 \times I_T + 20 \times T_D + 3 \times I_P = 5.7$ bits.

Throwing three dice unordered turns out to be not much different to throwing two dice ordered.  Experiment 5 has also demonstrated something else - this manual method of working out permutations is un-scalable.  Combinatorics is just that - an industrial strength mathematical power tool for coping with larger and larger numbers of combinations and permutations.


The measured explosion of 1933


From the three rather spartan axioms of set-theoretic probability theory a whole world of results follow by proof of lemmas of increasing complexity. To help along the way we can now steal some of the basic findings of set theory.  I won't go into detail on them but take them as read.
  1. $\exists \emptyset$
  2. $\exists E$, the sample space
  3. Set sizes can be finite, countably infinite and uncountably infinite
  4. All subsets of the integers are at most countably infinite
  5. The set of real numbers is uncountably infinite
  6. The set of real numbers in the $\left[0,1\right]$ interval is also uncountably infinite
  7. $A\cup A = A$
  8. $A \cup \emptyset = A$
  9. $A \cup E = E$
  10. $A \cup B = B \cup A$
  11. $A \cup B \cup C = A \cup (B \cup C) = (A \cup B) \cup C$
  12. $A \cap \emptyset = A$
  13. $A \cap A = A$
  14. $A \cap E = E$
  15. $A \cap B = B \cap A$
  16. $A \cap B \cap C = A \cap (B \cap C) = (A \cap B) \cap C$
  17. $(A^c)^c=A$
  18. $\emptyset^c=E$
  19. $S^c=\emptyset$
  20. $A \cup A^c = S$
  21. $A \cap A^c = \emptyset$

Renovation at the basement gambling den



The foundations of probability theory were reset by Kolmogorov in the 1930s. Until then, they'd rested on the relative frequencies of outcomes of randomisation machines.  The new set of axioms made no mention of randomisation machines or experiments or gambling.  Instead they tied a set of numbers to a second set, of events, no need to say any more than that.  The first two axioms constrain the numbers, the third associates these numbers with the internal relationships of the reference set of events.  For any good reference set of objects, each set of numbers which satisfied the axioms could be called a probability distribution with respect to the reference set $E$.

  1.  $\forall A \in E,  P(A) \geq 0$
  2. $P(E)=1$
  3. $P(\bigcup_{k=1}^{\infty}A_k) = \sum_{k=1}^{\infty}P(A_k)$ for each and every infinite sequence of disjoint events $A_k$
Axiom 3 is doing the work.  The idea of disjoint events is from set theory.  The following shows two disjoint events.


 And here's a pair of events which are not disjoint.


While this isn't the whole story on independence and mutual exclusivity, I won't go further on it at this point.

Axioms 1 and 2 allow you to synthesise some of the phenomena of relative frequencies without having to mention them - namely that, since a relative frequency is a ratio of one count over a second, larger, all encompassing count, it will always lie somewhere between 0 and 1.  Kolmogorov, after all, is re-modelling the foundations, not pulling down the entire edifice.

Axiom 3 is all about relationships.  This set of numbers, this probability distribution, clearly can't just be any set of numbers.  Some of those numbers relate to each other in a very specific way.  Axiom 3 states that there's a summation relationship (actually infinitely many of them) between a number and a bunch of other numbers precisely insofar as their referent events, which, remember are disjoint, when $\bigcup$-ed together equal each other.

Spelling it out for just one case.  Imagine there is an infinite set of disjoint events $e_i$.  Performed a set union on them all let's call the resulting set $E_i$.  $\forall e_i$ and the single $E_i$ both are already in your reference set.  Now that we have identified this set relationship between  $\forall e_i$ and the single $E_i$ we  make a corresponding claim about the set of numbers that constitute this particular probability distribution.  Let's call that set of numbers $n_1, n_2, ..., n_i$ and the single number $N_i$.  Axiom 3 tells us that we can be certain that $n_1 + n_2 + ... + n_i = N_i$

This isn't a moral point about gambling or even empiricism.  If anything it is motivated by an aesthetic-mathematical impulse.