# Peeter Joot's (OLD) Blog.

• ## Archives

 Adam C Scott on avoiding gdb signal noise… Ken on Scotiabank iTrade RESP …… Alan Ball on Oops. Fixing a drill hole in P… Peeter Joot's B… on Stokes theorem in Geometric… Exploring Stokes The… on Stokes theorem in Geometric…

• 298,046

## PHY452H1S Basic Statistical Mechanics. Problem Set 1: Binomial distributions

Posted by peeterjoot on January 20, 2013

# Disclaimer

## Question: Limiting form of the binomial distribution

Starting from the simple case of the binomial distribution

\begin{aligned}P_N(X) = 2^{-N} \frac{N!}{\left(\frac{N + X}{2}\right)!\left(\frac{N - X}{2}\right)!}\end{aligned} \hspace{\stretch{1}}(1.0.1)

derive the Gaussian distribution which results when $N \gg 1$ and ${\left\lvert{X}\right\rvert} \ll N$.

We’ll work with the logarithms of $P_N(X)$.

Note that the logarithm of the Stirling approximation takes the form

\begin{aligned}\ln a! &\approx \ln \sqrt{2\pi} + \frac{1}{{2}} \ln a + a \ln a - a \\ &=\ln \sqrt{2\pi} + \left( a + \frac{1}{{2}} \right) \ln a - a\end{aligned} \hspace{\stretch{1}}(1.0.2)

Using this we have

\begin{aligned}\ln \left((N + X)/2\right)!=\ln \sqrt{2 \pi}+\left(\frac{N + 1 + X}{2} \right)\left(\ln \left(1 + \frac{X}{N}\right)+ \ln \frac{N}{2}\right)- \frac{N + X}{2}\end{aligned} \hspace{\stretch{1}}(1.0.3)

Adding $\ln \left( (N + X)/2 \right)! + \ln \left( (N - X)/2 \right)!$, we have

\begin{aligned}2 \ln \sqrt{2 \pi}-N+\left(\frac{N + 1 + X}{2} \right)\left(\ln \left(1 + \frac{X}{N}\right)+ \ln \frac{N}{2}\right)+\left(\frac{N + 1 - X}{2} \right)\left(\ln \left(1 - \frac{X}{N}\right)+ \ln \frac{N}{2}\right)=2 \ln \sqrt{2 \pi}-N+\left(\frac{N + 1}{2} \right)\left(\ln \left(1 - \frac{X^2}{N^2}\right)+ 2 \ln \frac{N}{2}\right)+\frac{X}{2}\left( \ln \left( 1 + \frac{X}{N} \right)- \ln \left( 1 - \frac{X}{N} \right)\right)\end{aligned} \hspace{\stretch{1}}(1.0.4)

Recall that we can expand the log around $1$ with the slowly converging Taylor series

\begin{aligned}\ln( 1 + x) = x - \frac{x^2}{2} + \frac{x^3}{3} - \frac{x^4}{4}\end{aligned} \hspace{\stretch{1}}(1.0.5a)

\begin{aligned}\ln( 1 - x) = -x - \frac{x^2}{2} - \frac{x^3}{3} - \frac{x^4}{4},\end{aligned} \hspace{\stretch{1}}(1.0.5b)

but if $x \ll 1$ the first order term will dominate, so in this case where we assume $X \ll N$, we can approximate this sum of factorial logs to first order as

\begin{aligned}2 \ln \sqrt{2 \pi} -N+\left(\frac{N + 1}{2} \right)\left(- \frac{X^2}{N^2}+ 2 \ln \frac{N}{2}\right)+\frac{X}{2}\left( \frac{X}{N} + \frac{X}{N}\right) &= 2 \ln \sqrt{2 \pi} -N+ \frac{X^2}{N} \left( - \frac{N + 1}{2N} + 1\right)+ (N + 1) \ln \frac{N}{2} &\approx 2 \ln \sqrt{2 \pi} -N+ \frac{X^2}{2 N} + (N + 1) \ln \frac{N}{2}.\end{aligned} \hspace{\stretch{1}}(1.0.6)

Putting the bits together, we have

\begin{aligned}\ln P_N(X) &\approx - N \ln 2 + \left( N + \frac{1}{{2}}\right) \ln N - \not{{N}} - \ln \sqrt{2 \pi} + \not{{N}} -\frac{X^2}{2N} - (N + 1) \ln \frac{N}{2} \\ &= \left(-\not{{N}} + (\not{{N}} + 1) \ln 2\right)+\left(\not{{N}} + \frac{1}{{2}} - \not{{N}} - 1\right) \ln N- \ln \sqrt{2 \pi} - \frac{X^2}{2N} \\ &= \ln \left(\frac{2}{\sqrt{2 \pi N}}\right)-\frac{X^2}{2 N}\end{aligned} \hspace{\stretch{1}}(1.0.7)

Exponentiating gives us the desired result

\begin{aligned}\boxed{P_N(X) \rightarrow \frac{2}{\sqrt{2 \pi N}} e^{-\frac{X^2}{2 N}}.}\end{aligned} \hspace{\stretch{1}}(1.0.8)

## Question: Binomial distribution for biased coin

Consider the more general case of a binomial distribution where the probability of a head is $r$ and a tail is $(1 - r)$ (a biased coin). With $\text{head} = -1$ and $\text{tail} = +1$, obtain the binomial distribution $P_N(r,X)$ for obtaining a total of $X$ from $N$ coin tosses. What is the limiting form of this distribution when $N \gg 1$ and $\left\langle{X - \left\langle{X}\right\rangle}\right\rangle \ll N$? The latter condition simply means that I need to carry out any Taylor expansions in X about its mean value $\left\langle{{X}}\right\rangle$. The mean $\left\langle{{X}}\right\rangle$ can be easily computed first in terms of “r”.

Let’s consider 1, 2, 3, and N tosses in sequence to understand the pattern.

1 toss

The base case has just two possibilities

1. Heads, $P = r$, $X = -1$
2. Tails, $P = (1 - r)$, $X = 1$

If $k = 0,1$ for $X = -1, 1$ respectively, we have

\begin{aligned}P_1(r, X) = r^{1 - k} (1 - r)^{k}\end{aligned} \hspace{\stretch{1}}(1.0.9)

As a check, when $r = 1/2$ we have $P_1(X) = 1/2$

2 tosses

Our sample space is now a bit bigger

1. $(h,h)$, $P = r^2$, $X = -2$
2. $(h,t)$, $P = r (1 - r)$, $X = 0$
3. $(t,h)$, $P = r (1 - r)$, $X = 0$
4. $(t,t)$, $P = (1 - r)^2$, $X = 2$

Here $P$ is the probability of the ordered sequence, but we are interested only in the probability of each specific value of $X$. For $X = 0$ there are $\binom{2}{1} = 2$ ways of picking a heads, tails combination.

Enumerating the probabilities, as before, with $k = 0, 1, 2$ for $X = -1, 0, 1$ respectively, we have

\begin{aligned}P_2(r, X) = r^{2 - k} (1 - r)^{k} \binom{2}{k}\end{aligned} \hspace{\stretch{1}}(1.0.10)

3 tosses

Increasing our sample space by one more toss our possibilities for all ordered triplets of toss results is

1. $(h,h,h)$, $P = r^3$, $X = -3$
2. $(h,h,t)$, $P = r^2(1 - r)$, $X = -1$
3. $(h,t,h)$, $P = r^2(1 - r)$, $X = -1$
4. $(h,t,t)$, $P = r(1 - r)^2$, $X = 1$
5. $(t,h,h)$, $P = r^2(1 - r)$, $X = -1$
6. $(t,h,t)$, $P = r(1 - r)^2$, $X = 1$
7. $(t,t,h)$, $P = r(1 - r)^2$, $X = 1$
8. $(t,t,t)$, $P = r (1 - r)$, $X = 0$
9. $(t,t,t)$, $P = (1 - r)^3$, $X = 3$

Here $P$ is the probability of the ordered sequence, but we are still interested only in the probability of each specific value of $X$. We see that we have
$\binom{3}{1} = \binom{3}{2} = 3$ ways of picking some ordering of either $(h,h,t)$ or $(t,t,h)$

Now enumerating the possibilities with $k = 0, 1, 2, 3$ for $X = -3, -1, 1, 3$ respectively, we have

\begin{aligned}P_3(r, X) = r^{3 - k} (1 - r)^{k} \binom{3}{k}\end{aligned} \hspace{\stretch{1}}(1.0.11)

n tosses

To generalize we need a mapping between our random variable $X$, and the binomial index $k$, but we know what that is from the fair coin problem, one of $(N-X)/2$ or $(N + X)/2$. To get the signs right, let’s evaluate $(N \pm X)/2$ for $N = 3$ and $X \in \{3, -1, 1, 3\}$

Mapping between $k$ and $(N \pm X)/2$ for $N = 3$:

 X (N-X)/2 (N+X)/2 -3 3 0 -1 2 1 1 1 2 3 0 3

Using this, we see that the generalization to unfair coins of the binomial distribution is

\begin{aligned}\boxed{P_N(r, X) = r^{\frac{N-X}{2}} (1 - r)^{\frac{N+X}{2}} \frac{N!}{\left(\frac{N + X}{2}\right)!\left(\frac{N - X}{2}\right)!}}\end{aligned} \hspace{\stretch{1}}(1.0.12)

Checking against the fair result, we see that we have the $1/2^N$ factor when $r = 1/2$ as expected. Let’s check for $X = -1$ (two heads, one tail) to see if the exponents are right. That is

\begin{aligned}P_3(r, -1) = r^{\frac{3 + 1}{2}} (1 - r)^{\frac{3 - 1}{2}} \frac{3!}{\left(\frac{3 - 1}{2}\right)!\left(\frac{3 + 1}{2}\right)!}=r^2 (1-r) \frac{3!}{1! 2!}= r^2 (1 - r)\end{aligned} \hspace{\stretch{1}}(1.0.13)

Good, we’ve got a $r^2$ (two heads) term as desired.

Limiting form

To determine the limiting behavior, we can utilize the Central limit theorem. We first have to calculate the mean and the variance for the $N=1$ case. The first two moments are

\begin{aligned}\left\langle{{X}}\right\rangle &= -1 r + 1 (1-r) \\ &= 1 - 2 r\end{aligned} \hspace{\stretch{1}}(1.0.14a)

\begin{aligned}\left\langle{{X^2}}\right\rangle &= (-1)^2 r + 1^2 (1-r) \\ &= 1\end{aligned} \hspace{\stretch{1}}(1.0.14b)

and the variance is

\begin{aligned}\left\langle{{X^2}}\right\rangle -\left\langle{{X}}\right\rangle^2 &= 1 - (1 - 2r)^2 \\ &= 1 - ( 1 - 4 r + 4 r^2 ) \\ &= 4 r - 4 r^2 \\ &= 4 r ( 1 - r )\end{aligned} \hspace{\stretch{1}}(1.0.15)

The Central Limit Theorem gives us

\begin{aligned}P_N(r, X) \rightarrow \frac{1}{{ \sqrt{8 \pi N r (1 - r) }}} \exp\left(- \frac{( X - N (1 - 2 r) )^2}{8 N r ( 1 - r )}\right),\end{aligned} \hspace{\stretch{1}}(1.0.16)

however, we saw in [1] that this theorem was derived for continuous random variables. Here we have random variables that only take on either odd or even integer values, with parity depending on whether $N$ is odd or even. We’ll need to double the CLT result to account for this. This gives us

\begin{aligned}\boxed{P_N(r, X) \rightarrow \frac{1}{ \sqrt{2 \pi N r (1 - r) }} \exp\left(- \frac{( X - N (1 - 2 r) )^2}{8 N r ( 1 - r )}\right)}\end{aligned} \hspace{\stretch{1}}(1.0.17)

As a check we note that for $r = 1/2$ we have $r(1-r) = 1/4$ and $1 - 2r = 0$, so we get

\begin{aligned}P_N(1/2, X) \rightarrow \frac{2}{ \sqrt{2 \pi N }} \exp\left(- \frac{ X^2}{2 N }\right).\end{aligned} \hspace{\stretch{1}}(1.0.18)

Observe that both this and 1.0.8 do not integrate to unity, but to $2$. This is expected given the parity of the discrete random variable $X$. An integral normalization check is really only approximating the sum over integral values of our discrete random variable, and here we want to skip half of those values.

# References

[1] Peter Young. Proof of the central limit theorem in statistics, 2009. URL http://physics.ucsc.edu/ peter/116C/clt.pdf. [Online; accessed 13-Jan-2013].