Thursday 27 April 2017

Combinations and permutations

I have always struggled with these words.  They convey too similar a set of ideas for me to ever have bothered to properly distinguish them.  Mathematically prior is set theory.  In sets, the order doesn't matter until you start imposing an order. 

When it is set on set action, you're talking about combinations.  That is to say when you define a subset of some known cardinality out of some other set and want to know all the ways you could pick that new subset, then you're performing a set on set operation and estimating a combination.  That is to say you're estimating the number of different ways you could have extracted the subset from the super set.  When the order of picking is important, you are permuting, that is, making a list from a set. 

You're the  best basketballer in school.  It is lunchtime and you're captain.  You get to pick your team of 5 from your 20 buddies all eager to play basketball.  The combination calculation is going on.  In the end when you're in the team you're in the team.  Order doesn't matter.  Now it is the end of the year and you're calling out the 5 best players in order to give them a prize.  This is a permutation.   Notice that in both cases there was a form of sampling without replacement.  When you're doing sampling without replacement like this, then factorials are in operation (and this generalises to the Gamma function).

We're all familiar with sampling with replacement but in real world situations we most often come across sampling without replacement.  Consider how lottery balls are drawn.  Or school places are selected.  Or subjects of a scientific experiement are ideally picked.  Of course, numerically when the population size is so much larger than the sample size then the effect on probabilities of the choice to replace or not becomes negligible.

Is one operation more basic than the other.  One way to look at this is to realise that the basic rule of combinations with replacement is the multiplication rule (which when the sample space stays the same from choice to choice results in $a^b$ patterns).  The factorial operation caters for non-replacement, and the power operation caters for replacement.

Still, is one operation more basic than the other.  From a human's point of view, the act of chosing is played out in real time, imposing an actual permutation order even if the permutation order is semantically and logically meaningless. From the point of view of mathematics, combinations can be considered the step of effacing this built in permutation order which we want to see as irrelevant.  Combinations don't care as much.  They are smaller numbers than their peer permutation.  By $n!$ in fact.

Selecting end of year basketball winners can be seen as the act of chosing the team (a combination) then looking at this team and further imposing a specific permutation on that team.  That permutation buys you another $n!$ possibilities.  Of coure, this isn't what it feels like when you're picking a list from a set.  It feels like you're going straight from the sample space to the list in a single move.
But when you think of picking a list in this way , then $n!\binom{n}{k}$ makes a bit more sense.
Alternatively, the direct act of selecting (without replacement) $k$ from $n$ in order feels naturally like $\frac{n!}{(n-k)!}$

Tuesday 25 April 2017

Birthdays

Call event $A$ the event that at least 2 people out of $k$ share the same birthday.  The probability of this is one minus the complement  $A^c$, namely that all $k$ individuals have different birthdays.  This is easier to calculate using permutations and counting methods. The answer is $1 - \frac{365!}{365^k(365-k)!}$  To break this apart, you're finding the number of ways in which you can permute $k$ from $365$, which is $365 \times 364 \times ... \times 365 -k$.  You divide this by the total number of days of the year each of $k$ people could conceivably have their birthday, in an unconstrained way.  This is simply $365 \times 365, k$ times, or $365^k$.  Generalising this away from dates, you get $1 - n!\binom{n}{k} \frac{1}{n^k}$.  Selecting $k$ people like this is at base a kind of sampling without replacement.  How does one solve a problem like this.  The first step is to realise that solving its complement may be more tractable than solving the primary problem. This trick is as old as the origins of probability theory itself - see how Pascal used it for DeMere's foundational gambling problem.

 Secondly, the basic law of sampling with replacement, namely the act of picking $k$ from $n$ with replacement can be done in $n \times n \times n ... \times n, k$ times, or $n^k$ for short.  This is the free and unconstrained selection of $k$ objects each of which can have one of $n$ attributes.  Then with this denominator counted, you work out the top line, namely the ways in which you can have $k$ all of whom have different birthdays.  This would be sampling with replacement, also known as a permutation, on the numerator.  Perhaps $1-n!\frac{\binom{n}{k}}{n^k}$ is clearer. Permutation is the act of creating a list from a set.

To get to this point mathematically you're simplifying reality. No twins considered, no increased school prevalence of joint birthday parties, no modelling of the non uniform distribution of birthdays throughout the year. The birthday problem appears in most probability textbooks.  But now to find alternative uses for it and to come to a conclusion about the implementation of factorial in systems.

There are 2 dimensions to this problem - first you take a sample of $k$ objects, presumably from a larger population.   This act of sampling is assumed to be completely uniform - namely that the likelihood of you having picked this object is considered to be the same as the likelihood of any other object. Notice that we don't have (or need) a handle on the size of the population.

The second dimension to the problem implies that every member of your population has been deterministically classified.  And further, that this categorisation has a finite (or countably infinite) range of possibilities and that this categorisation can be mapped onto a a counting label. With birthdays, you can be sure that whoever you pick, they were surely born on one of 365 dates of the year. The cardinality of this categorisation  is of course 365. Be clear however that the categorisation itself doesn't have to be inherently numeric, since you're attaching a cardinality to the categorisation.  What I mean by this can be spelled out with another example. We could create a palette of 50 colours and for every member of the population map their eye colour to one of those 50 colour buckets. Clearly there's no inherent numerical or ordering aspect to these 50 colours yet we know that there are 50 of them. Notice here that with eye colour it is unlikely that the bucketing would be as uniformly distributed as the putative day of year was. Without the uniformity of the bucketing distribution you wouldn't arrive at the birthday solution but would need a more complex solution.

So let's step back. We need a categorisation which - unlike many real world categorisations, which are probably likely to be distributed by some kind of power law distribution - is uniformly distributed over the population.

Condition 1 says we implicitly pick $k$ objects with uniform likelihood from a population. Condition 2 says that the population must be categorised in a perfectly uniform way.  Without condition 2 you don't get a denominator with $n^k$ in it.

What kind of categorisation is so perfectly uniformly distributed? How could it be so? Real world informationally contentful categorisations are rarely so uniform. Using the technical definition of information however, random categorisations are indeed the most informationally rich - they reduce the largest amount of uncertainty when the categorisation is revealed to you.  Even if the categorisation is deterministic for the objects, you still, in selecting $k$ from the population, might chose any number of repeats.  This probability calculation tells you how likely you are to achieve 'all clear' or not.

I must link these 2 dimensions together - you don't need the sampling process to have an unconditionally uniform likelihood of picking any member. It only needs to be uniform with respect to the categorisation, that is conditional on the particular categorisation you've chosen. In other words the selector of the sample might be quite non-uniform provided that the categorisation probability distribution is such that the probability of having selected any one object times the conditional probability of picking a category given the object works out as uniformly distributed.

OK so with this in mind I'd like to consider other examples of the birthday problem.
Start with a binary classification.  Say boys and girls - probability examples often  make the approximating assumption that they're equally likely in the population. Let's say we pick two people at random.  What are the chances that the two people are of the same sex?  Clearly this ought to be $\frac{1}{2}$.  Let's see:  $\frac{2\times 1}{(2^2)(0!)} $.

How about good old dice.  I have a game with a single die.  A play fee of $1$ is associated with the game where the player throws 3 times in a row and if the same number comes up you win.  If you were going to devise this as a break-even profitable game, what reward ($r$) can you afford to give to the gambler?  $1- \frac{6 \times 5 \times 4}{6^3}$ or $1-\frac{120}{216}$ which equals $0.444$ or 44.4%.  So $0.444 \times r = 1$.  Hence $r=2.25$ is a break-even level of reward.

A ticket machine allocates tickets for passengers to travel on a 20 carriage train such that it allocates the passenger to a carriage uniformly.  5 Drunken Irish men buy tickets.  What are the chances that at least two of them end up in the same carriage, at which point they will start singing sad ballads loudly together?  The answer is $1-\frac{20 \times 19 \times 18 \times 17 \times 16}{20^5}$ or $1-\frac{1,860,480}{3,200,000}$ or $1-0.5814$ or $0.42$ approximately, a 42% chance.

You have in your hands a digital contract.  And a fraudulent one which benefits you.  You have also access to the same hash function used by some cryptographic community - e.g. a bitcoin pay hash or a pgp hash.  Now in a legitimate process, this hashed contract would be signed by the patsy's private key (which you don't have).  So what you do is make a large number of informationally identical but subtly different copies of the legitimate contract and a large number of similar copies of your putative fraudulent contract.  Keep producing new alternatives on both sides until finally you find one such that both the legitimate and the illegitimate evaluate to precisely the same hash.  The collision you found is much easier than you might imagine, thanks to the birthday problem.  Once found, you submit the legitimate contract to the patsy.  The patsy signs this.  You scrape away the digital signature and attach it to the same-hash-producing illegitimate contract, benefiting you.  Present this for payment and you will have exploited the patsy.

You have to estimate the population of sparrows in a woodland area but you're on a budget.  You catch 100 sparrows, tag them with 'TPS' for 'The Persistent Struggle' then release them again.  A week later you return and you catch another 200.  Of these 200, 6 had TPS tags (3% or 0.03).  So you estimate the population in this woodland as 100/0.03 or 3,333 sparrows.

The following link contains some history on the birthday problem.