Add Me!Close Menu Navigation

Counting Binary Trees (The Hard Way)

In this post, we prove the closed form of a nonlinear recurrence corresponding to the count of binary trees with n nodes.

Suppose I have a binary tree of n internal nodes, we can count all possible such configurations by looking at the tree like

Rendered by

where each internal subtree with n nodes is a vertex plus two trees whose node-sum is n-1, or

    \[T_n = \sum_{k=0}^{n-1} T_k T_{n-k-1}\]

This is pretty fucking hard to solve. Here’s a way where we do not need to appeal to generating functions.

Let’s begin with a reasonable hypothesis: T_n is a first order recurrence. After groping around for a while, we find that this is actually a nonlinear homogeneous recurrence:

    \[T_n = \lambda_{n-1} T_{n-1}\]

Here’s the first couple of terms:

    \[1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900, 2674440\]

No obvious pattern yet, but if we look at the sequence T_{n+1}/T_n, we see that it looks like


Notice how the primes in the denominator are exactly the right amount of spacing apart (7 is 2 away from 5, 11 4 away from 7, etc). Furthermore, for the nonprime denominators, they are always a divisor of what the expected value there is supposed to be (for example, the 5 in the number right before the 11), this seems to suggest that there’s always an inverse affine factor between two elements of the sequence. Since the denominator of \lambda_3 is 5, let’s conjecture that

    \[\lambda_n \sim \frac{1}{n+2}\]

In fact, the sequence \frac{T_{n+1}(n+2)}{T_n} looks really nice!


why, this is just the sequence (2 + 4n), so we can pretty accurately guess that

    \[T_0 = 1, ~~~~ T_{n+1} = \underbrace{\frac{2 + 4n}{n + 2}}_{\lambda_n} T_n\]

Warning: This is a really tedious proof; the point is to construct a series that cancels out the (n+2) on the denominator of \lambda_n at each step of the induction.

Proof: By strong induction, for some n+1, suppose that we have already shown that T_n = \lambda_{n-1} T_{n-1}.
Consider the series

    \[S_{n+1} = \sum_{k=0}^n k T_k T_{n-k}\]

In the n=4 case, this looks like

    \[S_5 = 0 T_0 T_4 + 1 T_1 T_3 + 2 T_2 T_2 + 3 T_3 T_1 + 4 T_4 T_0\]

notice that we can flip the order of this series and add with itself to get

    \[2S_{n+1} =  \sum_{k=0}^n (k + n - k) T_k T_{n-k} = n \sum_{k=0}^n T_k T_{n-k} = n T_{n+1}\]

By appealing to the induction hypothesis on T_k, we also have

    \begin{align*} S_{n+1} &=  \sum_{k=0}^n kT_k T_{n-k} \\ &= \sum_{k=1}^n k \lambda_{k-1}T_{k-1} T_{n-k} \\ &= \sum_{k=0}^{n-1} (k+1) \lambda_{k}T_{k} T_{n-1-k} \\ S_{n+1} +T_{n+1} &= T_0T_n + \sum_{k=0}^{n-1} \lambda_k T_k T_{n-1-k} + \sum_{k=0}^{n-1} \pa{k+1}\lambda_{k}T_{k} T_{n-1-k} \\ &= T_n + \sum_{k=0}^{n-1} \underbrace{\pa{k+2}\lambda_{k}}_{\g{cancel the denominator}}T_{k} T_{n-1-k} \\ &= T_n + \sum_{k=0}^{n-1} \pa{2 + 4k}T_{k} T_{n-1-k} \\ \intertext{by a simple substitution of $k = n-1-k$, we get} 2S_{n+1} + 2T_{n+1}  &= 2T_n + \sum_{k=0}^{n-1} \pa{2 + 4k + 2 + 4(n-1-k)}T_{k} T_{n-1-k} \\ &= 2T_n + 4n \sum_{k=0}^{n-1}T_{k} T_{n-1-k} \\ &= (2+4n)T_n \end{align*}

So, above, we’ve derived the following:

    \[2S_{n+1} = nT_{n+1}, ~~~~ 2S_{n+1} + 2T_{n+1} = (2+4n)T_n\]

Substituting the left into the right, we get

    \[(n+2)T_{n+1} - (2+4n)T_n = 0 => T_{n+1} = \frac{2+4n}{n+2} T_n\]

which concludes the proof. ~~~~\Box

This recurrence allows us to write

    \[T_{n+1} = \prod_k^n \lambda_k\]

Notice that this is the same as

    \begin{align*} T_{n+1} &= \frac{\prod_k 2 + 4k}{\prod_k 2+ k} \\ &= \frac{1}{(n+2)!} \prod_k^n 2(1 + 2k) \\ &= \frac{2^{n+1} \cdot 1 \cdot 3 \cdot 5 \cdots (2n+1)}{(n+2)!} \frac{n!}{n!} \\ &= 2\frac{1 \cdot 3 \cdot 5 \cdots (2n+1) \cdot 2 \cdot 4 \cdots 2n}{(n+2)!n!} \\ &= \frac{2}{n+2} \frac{(2n+1)!}{(n+1)!n!} \\ &= \frac{2}{n+2} {2n+1 \choose n+1} \sim \frac{4^{n+1}}{\sqrt{\pi (n+1)^3}} \end{align*}


Posted By Lee


  • Anderson Torres

    What if you have put some interesting restriction on the binary tree? Example: what about AVL trees?