poisson approximations for laundry socks problem

Introduction

In a previous post I derived an approximation to the probability of no matches in from \(k\) socks drawn at random from \(n\) paired socks as

$$ \exp(-\frac{k^2}{4n}).$$

The derivation used a normal approximation to binomial probabilities and some algebra. It also required that we get an expression for the exact probability (which wasn't too hard, in this case).

Below, we will derive the same approximation using a Poisson approximation, and extend the result to when we have both singletons and paired socks. We will use arguments from one of my favorite books: "Probability approximations via the Poisson clumping heuristic" by David Aldous. It's amazing how much you can do with some simple rules of thumb.

Paired socks

In Chapter E, Aldous outlines his analysis of the birthday problem. Our analysis is modeled after that. We pretend that socks are being drawn at random at rate 1. In that case, for small \(k\) relative to \(n\), the match process is a non-homogeneous Poisson process with rate \(\lambda(t) = t/(2n)\). If \(T\) is the time for the first match, then the probability of no match in \(k\) draws is the same as the waiting time \(T\) being bigger than \(k\), and that is

$$P(T>k) \simeq \exp(-\int_0^k \lambda(u) du) = \exp(-\frac{k^2}{4n}).$$

Paired and singleton socks

The usefulness of the approximation becomes more apparent for this more complex case. Here, we do have the exact expression (see previous post) for the probability, but it is more laborious to derive an approximation when the number of socks drawn (\(k\)) is small compared to the total number of socks (\(n_1\) singletons and \(n_2\) pairs).

The argument is pretty straightforward. Now the match process has rate

$$\lambda(t) = \frac{t}{n_1+2n_2} \frac{2n_2}{n_1+2n_2}.$$

The first term is basically the number of socks drawn divided by the total number of socks. The second term is the fraction of paired socks out of all socks.

Then if \(T\) is the time for the first match, using the same argument as in the previous section,

$$P(T>k) \simeq \exp\left( \frac{k^2 n_2}{(n_1+2n_2)^2} \right).$$

Here is the function:


poisApprox <- function(n1,n2,k)
    {
        exp(-n2*k^2/(2*n2+n1)^2)
    }

The approximation works pretty well when \(k\) is small compared to \(n_1+2n_2\).


> norepeat2(30,100,20)
[1] 0.4598519
> poisApprox(30,100,20)
[1] 0.4694734

For actual problem, \(n_1=2\), \(n2=21\), and \(k=11\), it does not work so well since \(k\) is not small compared to \(n_1+2n_2\), but the approximation is still decent.


> norepeat2(3,21,11)
[1] 0.2275212
> poisApprox(3,21,11)
[1] 0.2851286