Saturday, 30 January 2016

The Eleven Socks of Karl Broman

On 17th October 2014 Karl Broman, a professor at the University of Wisconsin-Madison, tweeted about his laundry and caused a bit of discussion.


Is it true that if you find that the first eleven socks you remove from your washing machine are unmatched it suggests that there are a lot more socks in the washing machine?  I intend to find out by building a Bayesian model in Python to test it.

Defining a Prior

Bayesian models require a prior.  This is similar to how humans process information.  That is, the conclusions we draw and the predictions we make are based on our prior experience.  Bayesian models make this idea more formal. 

We have several bits of prior information about this problem.
  • That there were eleven unmatched socks suggests that the minimum number of socks in the washing machine is 11.
  • He doesn't have an infinite quantity of socks (who does right?) so there should be a finite upper bound.  To work out what this might be I weighed my socks (yes, I actually did this) to work out how many would fit in the washing machine.  My socks were 100g (50g each) and the capacity of my washing machine is 7kg so I could (in theory) wash 70 pairs of (or 140 single) socks.  In the photo with Karl Broman's tweet its clear that there are kids socks and other items of clothing as well.  It might be reasonable to suggest that the most socks that could be in the machine is 200 but to allow for the additional clothing items I'll set the maximum at 100.
  • I know from experience with kids that often unmatched socks go into the wash.
We now have our priors.  For clarity let \(X\) be the total number of socks put into the washing machine and let \(Y\) be the proportion of those socks that are unmatched at the beginning of the wash.  \[X\sim \text{DU}\left(11, 100\right)\] \[Y \sim \text{UNIF}\left(0,1\right)\]

The Model

Now we simulate loading Karl Broman's washing machine.  We put some socks in, some in pairs, some as singles and we then sample eleven socks.  If the eleven socks are all different we count that as a success and record the number of pairs and number of singles that went into the wash.  If there is at least one pair in the first eleven sock we discard the observation that caused it.  We do this 1,000,000 times.  What we end up with is an array containing the combinations of odd and paired socks that gave us the outcome we're interested in.  If the combination only generates eleven unmatched socks very rarely then it does not appear in the array very often.  On the other hand if it generates the outcome fairly frequently it's in the array more often. Here's my python code to do this.
 import numpy as np  
 import random  
 from matplotlib.colors import LogNorm  
 import matplotlib.pyplot as plt  
 actual = 11 #The number of unmatched socks observed.  
 simlen = 1000000  
 i = 0  
 outcomes = np.zeros((simlen, 2))  
 while i < simlen:  
   no_socks = random.randrange(11, 100) # Prior on the total number of socks  
   prop_odd = random.random() # Prior on the proportion of socks that are odd  
   no_pairs = int(np.floor(no_socks * (1 - prop_odd) / 2))  
   no_odd = no_socks - no_pairs * 2  
   socks = list(range(no_pairs)) + list(range(no_pairs + no_odd)) # socks put into the washing machine  
   sample = np.random.choice(socks, 11, replace=False) # Sample eleven socks  
   if len(set(sample)) == actual:  
     outcomes[i, 0] = no_pairs  
     outcomes[i, 1] = no_odd  
     i += 1  
 %matplotlib inline  
 plt.hist2d(outcomes[:,0], outcomes[:,1], bins = 49, norm=LogNorm())  
 plt.colorbar()  
 plt.title("Density cases where the first eleven socks were odd.")  
 plt.ylabel("Number of odd socks in machine")  
 plt.xlabel("Number of paired socks in machine")  
 plt.axis([0,50,0,100])  
 plt.show()  

Results

When the code is run we get the following density plot of the combinations that resulted in eleven odd socks.

The outcome is not too surprising.  However, the bit that is surprising is that the actual number of socks in Karl Broman's washing machine is 21 pairs and 3 singletons.
This would be in the pale blue area of the chart suggesting that it is one to two orders of magnitude away from what might otherwise be expected.

So, my conclusion.  The fact that the first eleven socks were unmatched does not suggest that there are more socks in the machine.  It suggests that there was a high proportion of unmatched socks.  The fact the the real outcome was different does not change the conclusion.  It just means that the real outcome was unlikely based on the information we had at hand.  Sure, with more information we could have come up with a more refined prior.  But in the absence of information you do what you can.

No comments:

Post a Comment