Tuesday 5 July 2011

The Grim Tweetbot

Twitter famously allows 140 characters.  There's a lot you can write in 140 characters.  Imagine someone died.  And you want to broadcast that out on a twitter channel.  Clearly you can. 
George Best drunk himself to death in 2005 after years of heavy drinking and some not inconsiderable womanising

Or perhaps
Pope John XII died while fucking one of his many mistresses.














Or
Connecticut, 2009. Donald Peters, 79, died of an unrelated heart attack two hours after purchasing a winning lottery ticket, worth $10m 

What about
Kung Fu found hanging in a Bangkok closet after an auto-asphyxiation accident  

Tweets are, apparently in UTF-8 format, which are multi-byte encodings with 1,112,064 distinct encodings.  But they're also backward compatible with ASCII - which can capture the basic alphabet and letters, plus common punctuation, more than enough for my current purposes.  Indeed, of the 127 ASCII characters only 95 printable characters.  

Imagine I had a machine which generated a random selection of 140 sequential characters from the 95 printable ones.  This is a kind of sampling with replacement.  So the probability of any one tweet would be $\frac{1}{{128}^{140}}$.  This is eminently finite.  Yet within its output lies the tweet describing your death.  And mine.  And anyone else's too.  And of many other descriptions of your death which turn out not to be true.  And the price of Apple stock on 1 Jan 2013a t 9:30 EST.

The only thing we don't have is time.  Not a problem.  Let's reduce it somewhat.  The grim tweetbot program foregoes literary pretension.  It will output only your full name, a date, and a death description you might expect on your death certificate, namely a noun phrase made up of no more than five medical words.

Version 3:  Generate the names from a putative list of all living humans (pretend we all had birth certificates).  That's $6,700,000,000$ names.  And imagine the year of death only is predicted, not the cause of death.  In 20110704 format.  That's $6,700,000,000 \times 8^{10} = 7,194,070,220,800,000,000$ tweets in totality.  You can continue bringing this sample space down by introducing more knowledge into the date format, but you should already be feeling less than impressed with the grim tweetbot.  It isn't telling you much.  It is merely listing your name with a collection of dates.

I notice in passing there already is a (human maintained) Grim Tweeter though Google currently doesn't return any hits for the phrase 'the grim tweetbot'.  So I'll lay claim to it here.

Bin Laden's death apparently sparked a record 12.4 million tweets per hour.  Going flat out at that rate for about 66,229 millennia we'd come to the end of Version 3.  But I'm sure this could be done much faster.  It would need to, since by then we ought to have a lot more people being born whose death it would need to tweet.

But Moore's Law, a version of which claims a doubling of processor capacity every 2 years, will make Version 3 finish in our own lifetime, within 49 years.  Think of it like this.  In the first year we can tweet $a = 12,400,000 \times 24 \times 365$ times.  We check back in 2 years time and we find we doubled our capacity to $2a$.  Two more years and we see $2^2a$.  After 25 iterations, we will have tweeted $7,289,633,025,888,000,000$ times.  The first iteration took a year, but all 24 subsequent iterations took 2 years, giving a total of 49 years to play out, riding on the back of Moore's Law.