## Thursday, 14 April 2011

### At last, mathematical formulae on Blogger posts

See http://mnnttl.blogspot.com/2011/02/latex-on-blogger.html - it works for me.

### Sucking the pips off

Around any randomisation machine humans build rules which allow you to throw away information. How do you analyse this mathematically. Here, somewhat artificially, I'll constrain the analysis to dice.

Imagine one die with no distinguishing marks. What's the information content? Yes, 0. You learn nothing new on each throw. Mathematically, the information content is $-\sum{p_i \log_2(p_i)}$. You can think of this as a single probability $p$ value $1$.

Now re-imagine the pips. This time $I = -6\times{\frac{1}{6} \log_2(\frac{1}{6})} = 2.6$ bits. You can still throw a pipped die and mentally decide to ignore the number which lands face up. What have you done? You've created in your head an equivalence class. You've said to yourself, 'for my own purposes I will assume an outcome which is a combination of all six elementary outcomes'. Given each elementary outcome is independent, you can find the probability of your all-encompassing equivalence class as the sum of the elementary classes. Let $E_c$ be your all-encompassing equivalence class and $e_1$, be the elementary outcome of getting a 'one' face up, etc. Then $E_c = e_1 \cup e_2 \cup e_3 \cup e_4 \cup e_5 \cup e_6$ and, by one of the three basic axioms of probability, $P(E_c) = P(\bigcup_{i=1}^6 e_i) = \sum_{i=1}^6 P(e_i) = \frac{1}{6}\times 6$.

So, just by imagining it, you can turn off your randomisation machine. The same trick can be used to turn your randomisation machine into a coin-flipper, which as you can guess, provides just $1$ bit of information. Just imagine two elementary outcomes, even numbered pips and odd numbered pips. So what you have is a randomisation machine which has a maximal amount of information on offer to you. The rules of your game, your context, determine how you might want to throw some of that information away for the purposes of your game. You've combined elementary outcomes. So one die can deliver a uniform distribution of 6 events of probability $\frac{1}{6}$, or a coin flip. You can see how you could imagine randomisation machines giving two unbalanced equivalence classes, of probability $\left\{ \frac{1}{6} \frac{5}{6}\right \} I=0.65$ bits and $\left\{ \frac{1}{3} \frac{2}{3}\right \} I = 0.9$ bits. You could chose to implement this in a number of different ways. For example, in the $\left\{ \frac{1}{6} \frac{5}{6}\right \} $ by imagining 'one pip' to be the first of your equivalence classes and 'either 2 or 3 or 4 or 5 or 6 pips' to be your second. $E_1$ and $E_{2\cup3\cup4\cup5\cup6}$, if you will. But just as good a job could be achieved by $E_2$ and $E_{1\cup3\cup4\cup5\cup6}$, etc.

Equivalence classes are a mathematically formal way of expressing the idea of treating one or more possibilities as coming to the same thing for your current purposes. It maps out a geography of interest and that geography is only constrained by the granularity of the maximal information state of your randomisation machine (playing cards, for example, are more fine grained, since you can have $52$ distinct elementary outcomes).

In the next post, I'll look at how to interpret multiple repeats of the die tossing experiment but I'll end by pointing out that, from an analytical point of view it doesn't matter if you consider multiple repeats as either happening simultaneously (you roll two differently coloured dice) or serially (you roll the white die and note the result, then roll the blue die and note the result). As long as you are consistent in which of the two parallel-roll dice you report first. Since these two dice outcomes are genuinely independent, I'll show you how the informational additivity of independent random events works mathematically too. This leads in to considerations about becoming indifferent to order or retaining order (combinations and permutations respectively).

This reminds me of the Samuel Beckett sucking stones extract from Molloy.

Now re-imagine the pips. This time $I = -6\times{\frac{1}{6} \log_2(\frac{1}{6})} = 2.6$ bits. You can still throw a pipped die and mentally decide to ignore the number which lands face up. What have you done? You've created in your head an equivalence class. You've said to yourself, 'for my own purposes I will assume an outcome which is a combination of all six elementary outcomes'. Given each elementary outcome is independent, you can find the probability of your all-encompassing equivalence class as the sum of the elementary classes. Let $E_c$ be your all-encompassing equivalence class and $e_1$, be the elementary outcome of getting a 'one' face up, etc. Then $E_c = e_1 \cup e_2 \cup e_3 \cup e_4 \cup e_5 \cup e_6$ and, by one of the three basic axioms of probability, $P(E_c) = P(\bigcup_{i=1}^6 e_i) = \sum_{i=1}^6 P(e_i) = \frac{1}{6}\times 6$.

So, just by imagining it, you can turn off your randomisation machine. The same trick can be used to turn your randomisation machine into a coin-flipper, which as you can guess, provides just $1$ bit of information. Just imagine two elementary outcomes, even numbered pips and odd numbered pips. So what you have is a randomisation machine which has a maximal amount of information on offer to you. The rules of your game, your context, determine how you might want to throw some of that information away for the purposes of your game. You've combined elementary outcomes. So one die can deliver a uniform distribution of 6 events of probability $\frac{1}{6}$, or a coin flip. You can see how you could imagine randomisation machines giving two unbalanced equivalence classes, of probability $\left\{ \frac{1}{6} \frac{5}{6}\right \} I=0.65$ bits and $\left\{ \frac{1}{3} \frac{2}{3}\right \} I = 0.9$ bits. You could chose to implement this in a number of different ways. For example, in the $\left\{ \frac{1}{6} \frac{5}{6}\right \} $ by imagining 'one pip' to be the first of your equivalence classes and 'either 2 or 3 or 4 or 5 or 6 pips' to be your second. $E_1$ and $E_{2\cup3\cup4\cup5\cup6}$, if you will. But just as good a job could be achieved by $E_2$ and $E_{1\cup3\cup4\cup5\cup6}$, etc.

Equivalence classes are a mathematically formal way of expressing the idea of treating one or more possibilities as coming to the same thing for your current purposes. It maps out a geography of interest and that geography is only constrained by the granularity of the maximal information state of your randomisation machine (playing cards, for example, are more fine grained, since you can have $52$ distinct elementary outcomes).

In the next post, I'll look at how to interpret multiple repeats of the die tossing experiment but I'll end by pointing out that, from an analytical point of view it doesn't matter if you consider multiple repeats as either happening simultaneously (you roll two differently coloured dice) or serially (you roll the white die and note the result, then roll the blue die and note the result). As long as you are consistent in which of the two parallel-roll dice you report first. Since these two dice outcomes are genuinely independent, I'll show you how the informational additivity of independent random events works mathematically too. This leads in to considerations about becoming indifferent to order or retaining order (combinations and permutations respectively).

This reminds me of the Samuel Beckett sucking stones extract from Molloy.

### Tossing away information

Continuing from my initial analysis I would like to model the consequences of sheep bones (just like all other animals' bones) being white.

The crucial human animation in using sheep heel bones as random event generators is the act of tossing. In the act of tossing, you loose knowledge of the order in which your four bones will lie. This wouldn't be an issue if all four bones were of a different colour. Or perhaps if the bones were marked not with 1,3,4,6 on each of the four bones, but with 16 different numbers. If humans had etched 16 different numbers (pips) on their bones, they'd be using the maximum amount of information possible in that act, namely 6.88 bits. But that doesn't happen. Instead we humans make 4 more or less similar sets of markings on the bones. Then, when we toss, we toss away some information. But how much?

To answer this question, consider the die. One die has 2.6 bits per roll. With two dice, if order is important, then you have 5.16 bits (imagine each of the dice had a different colour). With three, 7.75 bits (again, imagine each of the three dice a different colour). You can see how when you run this experiment, information is additive as the sample space size grows multiplicatively. You can also see that, with the addition of colour, your parallel toss does not lose track of which die had which value. This parallel toss is the same as if you had tossed one die two (or three) times, and taken note of each result. It is a kind of 'sampling with replacement' activity in so far as the probability of any single outcome is independent of earlier throws. (The Markov property).

But bones are white. And there's a pre-existing tradition of making each bone contain the same number and type of markings as each of the others. Most likely early dice were crafted out of the astragalus, and they would have inherited this feature of being practically indistinguishable from each other. That means, when two or three are tossed, information is lost, in comparison to the 'order important' experiment. Of course, the lost information is precisely that of ordering. But how much? For two indistinguishable tossed dice you now only have 4.3 bits of information. When you do the 'order unimportant' analysis on four astragali, the information content drops from 6.88 to 4.3 bits.

Isn't that amazing? Given what we know about the colour of bones, the number of sheep legs and the casual act of tossing collections of indistinguishable objects, we built up a randomisation machine which would conveniently deliver for us 4.3 bits. When we smooth and chip away at a sheep-load of astragalli and make them into 6 sided dice, we manufacture a convenient randomisation machine which delivers 4.3 bits of information to us using only two dice. That is impressively close to the original. All those religious-oracular rule books, all those long-forgotten games could still be randomised by a machine of approximately the same degree. One die would not have been enough, three too much. But two would have been just right. Our culture got to keep the innovations built up around the astragali.

My guess as to why each bone (and, later, die) was marked identically is because in the beginning was the single astragalus. It was a much easier step to make a second instance of that same pip-marked bone. And a third, and a fourth.

But why did humans go to the bother of crafting a die if their main practice of four-bone tossing is equivalently replaced with two dice tossing? Was it a matter of aesthetics? Did the cubic astragalus with only 4 landable sides present an affront to our sense of symmetry? Surely it couldn't be to make any kind of mathematical analysis more amenable, though it isn't impossible that some people might have worked out the possibilities of dice. My money's on our desire to manufacture a symmetric, crafted, more aesthetically pleasing randomiser machine. The uniform distribution perhaps also made the construction of new equivalence classes and interpretive games based on that uniform element of randomness easier to plan and design.

Subscribe to:
Posts (Atom)