The Persistent Struggle: permutations

Showing posts with label permutations. Show all posts

Sunday, 9 June 2019

MISSISSIPPI scramble

How many ways are there, Blitzstein and Hwang ask, to permute the word Mississippi? There are two approaches. Broadly, we're in the 'naive probability' world where all possibilities are equally likely, so counting methods are going to be useful.

First, and this is generally a good strategy, we give all 11 letters a unique identity. That is to say, the first s must be thought of as distinct to the second, third and fourth, and so on. This is analogous to understanding that when you throw a pair of dice, logically there's always a first and second identity for each die.

There are $11!$ ways (39,916,800) to permute the 11 'distinct' letters (multiplication rule, based on sampling without replacement). That's more ways than there are actual people currently residing in Mississippi:

Looks like it would take another 500 years, at the current rate of population growth, before we'd run out of basic ways of sharing all the permutations of Mississippi with its inhabitants.

But at this point we want to throw away (divide out) certain permutations. From a schoolteacher point of view, wherever those four s's landed up, we don't care about their particular order now. In other words, we're making the conscious decision to consider an s like any other s. And an i like any other i. And a p like any other p. A kind of homogenisation. The m stands unique and proud at the head of the word. But in terms of permutations, the m is the unmodelled case, since, when you've decided how to fill the 10 other slots using all your s's, i's and p's, then the m just has to go where it is told. What a come down - tables turned on the m.

There are four s's, and four i's. And we need to know how much not to care about their order. If we did care for their order, then we could permute them $4!$ ways. Similarly there are $2!$ orderings of p. So we want to throw away in total $4!4!2!$ of the $11!$, meaning that the 'schoolteacher's answer' (were all letters are considered homogeneous) is $\frac{11!}{4!4!2!}$.

The second solution is to break it down into 3 sections, each with a binomial 'n chose k' element.

Let's start with the s. We have 11 slots and 4 of them. So $11 \choose 4$. Using the multiplication rule, you then could drop your 4 i's into any of the remaining 7 slots, $7 \choose 4$ and finally our two p's can take up any of the three remaining slots, $3 \choose 2$. Again, m is the unmodelled, bossed about letter who must take the last slot, wherever that is.

So, it must be the case that $11!/4!4!2! = {11 \choose 4}{7 \choose 4}{3 \choose 2}$.

What does R have to say about that?

It agrees, and gives us the final (schoolteacher's) answer, 34,650 versus the 'radical identity' answer of 39,916,800.

So perhaps we could give the school teacher permutations just to the Hispanic residents (excluding Puerto Ricans, sorry).

Tuesday, 25 April 2017

Birthdays

Call event $A$ the event that at least 2 people out of $k$ share the same birthday. The probability of this is one minus the complement $A^c$, namely that all $k$ individuals have different birthdays. This is easier to calculate using permutations and counting methods. The answer is $1 - \frac{365!}{365^k(365-k)!}$ To break this apart, you're finding the number of ways in which you can permute $k$ from $365$, which is $365 \times 364 \times ... \times 365 -k$. You divide this by the total number of days of the year each of $k$ people could conceivably have their birthday, in an unconstrained way. This is simply $365 \times 365, k$ times, or $365^k$. Generalising this away from dates, you get $1 - n!\binom{n}{k} \frac{1}{n^k}$. Selecting $k$ people like this is at base a kind of sampling without replacement. How does one solve a problem like this. The first step is to realise that solving its complement may be more tractable than solving the primary problem. This trick is as old as the origins of probability theory itself - see how Pascal used it for DeMere's foundational gambling problem.

Secondly, the basic law of sampling with replacement, namely the act of picking $k$ from $n$ with replacement can be done in $n \times n \times n ... \times n, k$ times, or $n^k$ for short. This is the free and unconstrained selection of $k$ objects each of which can have one of $n$ attributes. Then with this denominator counted, you work out the top line, namely the ways in which you can have $k$ all of whom have different birthdays. This would be sampling with replacement, also known as a permutation, on the numerator. Perhaps $1-n!\frac{\binom{n}{k}}{n^k}$ is clearer. Permutation is the act of creating a list from a set.

To get to this point mathematically you're simplifying reality. No twins considered, no increased school prevalence of joint birthday parties, no modelling of the non uniform distribution of birthdays throughout the year. The birthday problem appears in most probability textbooks. But now to find alternative uses for it and to come to a conclusion about the implementation of factorial in systems.

There are 2 dimensions to this problem - first you take a sample of $k$ objects, presumably from a larger population. This act of sampling is assumed to be completely uniform - namely that the likelihood of you having picked this object is considered to be the same as the likelihood of any other object. Notice that we don't have (or need) a handle on the size of the population.

The second dimension to the problem implies that every member of your population has been deterministically classified. And further, that this categorisation has a finite (or countably infinite) range of possibilities and that this categorisation can be mapped onto a a counting label. With birthdays, you can be sure that whoever you pick, they were surely born on one of 365 dates of the year. The cardinality of this categorisation is of course 365. Be clear however that the categorisation itself doesn't have to be inherently numeric, since you're attaching a cardinality to the categorisation. What I mean by this can be spelled out with another example. We could create a palette of 50 colours and for every member of the population map their eye colour to one of those 50 colour buckets. Clearly there's no inherent numerical or ordering aspect to these 50 colours yet we know that there are 50 of them. Notice here that with eye colour it is unlikely that the bucketing would be as uniformly distributed as the putative day of year was. Without the uniformity of the bucketing distribution you wouldn't arrive at the birthday solution but would need a more complex solution.

So let's step back. We need a categorisation which - unlike many real world categorisations, which are probably likely to be distributed by some kind of power law distribution - is uniformly distributed over the population.

Condition 1 says we implicitly pick $k$ objects with uniform likelihood from a population. Condition 2 says that the population must be categorised in a perfectly uniform way. Without condition 2 you don't get a denominator with $n^k$ in it.

What kind of categorisation is so perfectly uniformly distributed? How could it be so? Real world informationally contentful categorisations are rarely so uniform. Using the technical definition of information however, random categorisations are indeed the most informationally rich - they reduce the largest amount of uncertainty when the categorisation is revealed to you. Even if the categorisation is deterministic for the objects, you still, in selecting $k$ from the population, might chose any number of repeats. This probability calculation tells you how likely you are to achieve 'all clear' or not.

I must link these 2 dimensions together - you don't need the sampling process to have an unconditionally uniform likelihood of picking any member. It only needs to be uniform with respect to the categorisation, that is conditional on the particular categorisation you've chosen. In other words the selector of the sample might be quite non-uniform provided that the categorisation probability distribution is such that the probability of having selected any one object times the conditional probability of picking a category given the object works out as uniformly distributed.

OK so with this in mind I'd like to consider other examples of the birthday problem.

Start with a binary classification. Say boys and girls - probability examples often make the approximating assumption that they're equally likely in the population. Let's say we pick two people at random. What are the chances that the two people are of the same sex? Clearly this ought to be $\frac{1}{2}$. Let's see: $\frac{2\times 1}{(2^2)(0!)} $.

How about good old dice. I have a game with a single die. A play fee of $1$ is associated with the game where the player throws 3 times in a row and if the same number comes up you win. If you were going to devise this as a break-even profitable game, what reward ($r$) can you afford to give to the gambler? $1- \frac{6 \times 5 \times 4}{6^3}$ or $1-\frac{120}{216}$ which equals $0.444$ or 44.4%. So $0.444 \times r = 1$. Hence $r=2.25$ is a break-even level of reward.

A ticket machine allocates tickets for passengers to travel on a 20 carriage train such that it allocates the passenger to a carriage uniformly. 5 Drunken Irish men buy tickets. What are the chances that at least two of them end up in the same carriage, at which point they will start singing sad ballads loudly together? The answer is $1-\frac{20 \times 19 \times 18 \times 17 \times 16}{20^5}$ or $1-\frac{1,860,480}{3,200,000}$ or $1-0.5814$ or $0.42$ approximately, a 42% chance.

You have in your hands a digital contract. And a fraudulent one which benefits you. You have also access to the same hash function used by some cryptographic community - e.g. a bitcoin pay hash or a pgp hash. Now in a legitimate process, this hashed contract would be signed by the patsy's private key (which you don't have). So what you do is make a large number of informationally identical but subtly different copies of the legitimate contract and a large number of similar copies of your putative fraudulent contract. Keep producing new alternatives on both sides until finally you find one such that both the legitimate and the illegitimate evaluate to precisely the same hash. The collision you found is much easier than you might imagine, thanks to the birthday problem. Once found, you submit the legitimate contract to the patsy. The patsy signs this. You scrape away the digital signature and attach it to the same-hash-producing illegitimate contract, benefiting you. Present this for payment and you will have exploited the patsy.

You have to estimate the population of sparrows in a woodland area but you're on a budget. You catch 100 sparrows, tag them with 'TPS' for 'The Persistent Struggle' then release them again. A week later you return and you catch another 200. Of these 200, 6 had TPS tags (3% or 0.03). So you estimate the population in this woodland as 100/0.03 or 3,333 sparrows.

The following link contains some history on the birthday problem.

Thursday, 14 April 2011

Tossing away information

Continuing from my initial analysis I would like to model the consequences of sheep bones (just like all other animals' bones) being white.

The crucial human animation in using sheep heel bones as random event generators is the act of tossing. In the act of tossing, you loose knowledge of the order in which your four bones will lie. This wouldn't be an issue if all four bones were of a different colour. Or perhaps if the bones were marked not with 1,3,4,6 on each of the four bones, but with 16 different numbers. If humans had etched 16 different numbers (pips) on their bones, they'd be using the maximum amount of information possible in that act, namely 6.88 bits. But that doesn't happen. Instead we humans make 4 more or less similar sets of markings on the bones. Then, when we toss, we toss away some information. But how much?

To answer this question, consider the die. One die has 2.6 bits per roll. With two dice, if order is important, then you have 5.16 bits (imagine each of the dice had a different colour). With three, 7.75 bits (again, imagine each of the three dice a different colour). You can see how when you run this experiment, information is additive as the sample space size grows multiplicatively. You can also see that, with the addition of colour, your parallel toss does not lose track of which die had which value. This parallel toss is the same as if you had tossed one die two (or three) times, and taken note of each result. It is a kind of 'sampling with replacement' activity in so far as the probability of any single outcome is independent of earlier throws. (The Markov property).

But bones are white. And there's a pre-existing tradition of making each bone contain the same number and type of markings as each of the others. Most likely early dice were crafted out of the astragalus, and they would have inherited this feature of being practically indistinguishable from each other. That means, when two or three are tossed, information is lost, in comparison to the 'order important' experiment. Of course, the lost information is precisely that of ordering. But how much? For two indistinguishable tossed dice you now only have 4.3 bits of information. When you do the 'order unimportant' analysis on four astragali, the information content drops from 6.88 to 4.3 bits.

Isn't that amazing? Given what we know about the colour of bones, the number of sheep legs and the casual act of tossing collections of indistinguishable objects, we built up a randomisation machine which would conveniently deliver for us 4.3 bits. When we smooth and chip away at a sheep-load of astragalli and make them into 6 sided dice, we manufacture a convenient randomisation machine which delivers 4.3 bits of information to us using only two dice. That is impressively close to the original. All those religious-oracular rule books, all those long-forgotten games could still be randomised by a machine of approximately the same degree. One die would not have been enough, three too much. But two would have been just right. Our culture got to keep the innovations built up around the astragali.

My guess as to why each bone (and, later, die) was marked identically is because in the beginning was the single astragalus. It was a much easier step to make a second instance of that same pip-marked bone. And a third, and a fourth.

But why did humans go to the bother of crafting a die if their main practice of four-bone tossing is equivalently replaced with two dice tossing? Was it a matter of aesthetics? Did the cubic astragalus with only 4 landable sides present an affront to our sense of symmetry? Surely it couldn't be to make any kind of mathematical analysis more amenable, though it isn't impossible that some people might have worked out the possibilities of dice. My money's on our desire to manufacture a symmetric, crafted, more aesthetically pleasing randomiser machine. The uniform distribution perhaps also made the construction of new equivalence classes and interpretive games based on that uniform element of randomness easier to plan and design.