# Neat way of counting OCaml objects satisfying certain types

We use a very natural semantic translation from OCaml styled types (or really any almost algebraic types) to the count of the instances of those types and do some cool stuff with this translation.

Full Disclosure: this isn’t mine. I got the idea from http://www.youtube.com/watch?v=YScIPA8RbVE and the book Analytic Combinatorics from Robert Sedgewick and Philippe Flajolet. I just thought the entire concept is really really neat.

If you’re like me, you spend your flight pondering the deepest and darkest corners of human knowledge, like

1. was the cashier flirting with me? (she wasn’t)
2. which one is better, the integral or the derivative? (definitely the derivative)
3. are we there yet? (no.)
4. are airlines the mortal enemies of tall people? (they are)

Of course, after a day of mentally exhausting travel, I’d like to sit back, relax, and ask myself how, given a specific type declaration in , I can figure out the number of all possible instances of that type.

For example, there is just one instance of the type unit (kind of funny because its name is unit) corresponding to the object in . There are two distinct instances of the type bool corresponding to the set , and about distinct instances of the type int.

### 1. Types in

Now, before we go on any further, let’s first take a look at the syntax of types.

 type suit = Diamond | Club | Heart | Spade 

this type declaration says that we want to create a new type whose instances are either labeled Diamond, Club, Heart, or Spade. The labels are capitalized to mean that they can serve as constructors of the type. So if you type in Diamond into the OCaml, it’ll construct an object of type suit corresponding to the Diamond label. Note that Diamond isn’t itself a type here.

It’s quite obvious that there are 4 distinct instances of the type suit that we can construct, namely .

We can also give each type constructor some extra stuff. For example, if we want types corresponding to the individual cards rather than suits, we can write something like

 type card = Diamond of int | Club of int | Heart of int | Spade of int 

so that we can construct an instance of the card Jack of Clubs as Club 11. Here, there are different instances of the type (we’re only familiar with 52 of these). Alternatively, we can also write each card as a pair of a suit and an integer.

 type card = Card of (suit * int) 

here, the notation suit * int stands for types whose objects are pairs, the first element of which is an instance of a suit and the second an integer; for example, Card (Club, 11) would be the jack of clubs.

One really interesting thing to observe here is that the two different ways of writing the card type above should be equivalent. In each, we are specifying a suit and also a value. Therefore, if there are distinct instances of the first type, there should also be instances of the second type, and it turns out, there are.

Furthermore, if we want to “parameterize” the types to be more generalized, we can do so by introducing type variables (variables that corresponds to types) as 'x that starts with a single apostrophe. So for example, if we want to build a linked list of whatever type we want, we could do something like

 type 'x list = Null | Cons ('x * 'x list) 

where the list of type int list (so we make the substitution ) is constructed from Cons(1, Cons(2, Cons(3, Null))).

A binary tree can be build up like

 type 'x tree = Leaf | Node of ('x tree * 'x * 'x tree) 

where the tree

has type int tree (so we make the substitution ) is constructed from

Node(
Node(Leaf, 2, Leaf),
1,
Node(
Node(Leaf, 4, Leaf),
3,
Node(Leaf, 5, Leaf)
)


### 2. Counting instances

#### One or unit

Now, we get to the interesting part. As discussed previously, we have already established that there’s only one object of the unit type (just ) and two (True and False) of the boolean type. Of course, there’s also only one object of the type

 type one = One 

and in fact, we can rename the constructor to anything (assuming that we don’t care about it being valid ocaml code) including the unit object , so it turns out that is equivalent to all types with a single constructor and equivalence isn’t effected under renaming of the constructor (well, there are certain rules to this renaming/substitution, and we’ll call this William Pitt Fairfield property equivalence). But is this the only property that preserves the count of the number of instances? Well, no, as it turns out, the following all only have one element.

 type one' = One' of unit type one'' = One'' of one' ... 

this is intuitive, if we have only one possible object for the argument of the constructor, and we only have that one constructor, then we can only construct one object for that type. In fact, from now on, we will say that an argument-less type construct C is just a syntax sugar for C of unit because doing so will not change the number of instances for that type.

#### Two or bool

We’ve also established that there are two instances of booleans, and . In type parlance, this would be

 type bool = True | False 

but remember earlier that we said if we give a type constructor not expecting any argument the unit argument, the type’s size is preserved. Therefore, we can rewrite the above as

 type bool = True of unit | False of unit 

and since renaming things doesn’t affect the size of the set of objects satisfying this type, this is also equivalent to

 type two = One of unit | Two of unit 

Neat. Let’s count up one more.

#### Three

There’s no native type in \f{OCaml} that only contains three instances, but it’s not too hard to guess how we’re going to construct this.

 type three = One of unit | Two of unit | Three of unit 

Here, we can either have or as instances of the type three. This gives a natural meaning to the notion of “or”. If I have types and , then the type

can either be one the or one of the , but since the two cases are necessarily distinct (because the label ), then if there are objects of type and objects of type , there must be objects of type .

It’s no coincidence either that the type is called a sum type between and (albeit for different reasons). Now that we’re warming to the notion that we can use these types almost as if we did in highschool algebra, let’s make some simplifications:

Since we do not ever care about what the labels themselves are, only that they are distinct, we can entirely leave out the labels for the type constructors. Therefore, we can write suit (type suit = Diamond of unit | Club of unit | ...) as

and the card type as either

or as

where the means the objects are of a pair type (with both a first and a second element). Note that we can also substitute suit directly in this type as in the second declaration of :

it’s no coincidence that there are also objects of type , but it’s also obvious that the two types are not equal syntactically. That is, .

Of course, it’s useful to construct another equivalence relation whereby type is translationally equivalent to type if both have the same number of type objects. In that case, we’ll say . As you probably have already guessed, we’re going to define some kind of a translation between these types into numbers denoting the number of objects of that type. We’ll go into detail more, but needless to say, we would translate as .

#### Four

Okay, at this point, we kind of already know what we are after. But first, I want to establish some basic laws seen in school algebra.

Recall that

and that

Now, obviously,

but let’s look at something else. Suppose we have a pair type such that

now, there are again four instances of , they are

so it seems as if as well. In fact it’s not hard to show that product types correspond to multiplication in the same that cartesian products corresponds to multiplication of the cardinality. (In fact, this question reduces to the cardinality of sets case).

#### Eight

Just to get to the point, how many functions are there for which the domain is of size and the codomain is of size ? For example, suppose that I have the type

where is taken to mean a function that maps a type with three elements to a type with two. Well, in general, suppose I have a function , then it turns out that there are such functions: reduce this problem to counting in base .

Suppose I have elements in that need to be mapped into . Suppose both and are finite and are order into the sequences and respectively, then I can encode one such function

and a second function

but since we don’t really want to clutter up the notation with all of the braces, the subscripts, and everything, we could just encode each function as a stream of digits where the digit = corresponds to which the element maps to. Then, we can just write

but this is the same as counting up to in base .

Anyways, from this argument, we would expect to have distinct objects, and hence .

### 3. The Translation

Let’s now define a translation between a simplified type declaration and a number denoting the total number of objects with that type declaration. We’ll write this relation . For now, pretend that is the naturals , but this will cause some subtle issues later on because we might not be able to translate certain types (like the list) directly into . Furthermore, is almost a function, but as discussed a few seconds ago, if we let be the naturals, it will not be able to translate certain types, so at best, it is a partial function. In that case, we will use the syntax to mean that .

To define this translation, let’s do it “recursively”. The validity of such a technique is beyond my scope, but convince yourself that because the right hand side is always translating smaller and smaller pieces, then eventually, the translation of any arbitrary type will terminate. Here, I will use to mean some type expression; that is, we will use it as a metavariable over the domain of types. Assume the same precedence level of as and power in school algebra.

Next, we define a notion of translational equivalence.

#### Translational Equivalence

note that is a weaker notion of syntactic equality of , therefore, if , then so too must owing to the fact that is a partial function. Furthermore, because of the construction of the sequence

then is obviously onto . However, because we need to have this notion of , it’s also very obvious that two different type expressions could have the same number of instances and hence is not one-one.

### 4. Examples

We’re going to take some extraordinary risks here (because we’re not mathematicians) and assume that this translation is correct (there are in fact many problems). Let’s do something with this new translation of ours

#### Distributivity

in fact, under translational equality, we can manipulate types algebraically as if in school algebra. More amazingly, that means that we can encode everything the same exact way. Therefore, if we wanted to pass around an object of type , then we can instead pass around an object of type instead!

#### Parameterized Types (Advanced, skip if you want)

Suppose we wanted to translate the list type above:

unfortunately, we don’t have a definition of . Rather, we merely have an equation that is expected to be satisfied in order for an object to be considered an instance of the type. This equation is

it turns out that is the least fixed point of the function . This is going a bit beyond the scope of this post, but that least fixed point can be computed on the ordering of immediate sub-type expression which turns the sequence into a complete partial order (complete because all subsets contain a least upper bound). Therefore, by the fixed point theorem, that least upper bound is the mysterious expression (where means the least upper bound)

now, the translation function is “continuous” in the sense that it preserves least upper bounds on the CPO . (Of course, this is why we can’t have = ), but loosely speaking, the operations can be substituted as

therefore,

unfortunately, if , the above sum diverges to the top element in , .

However, an interesting question arises that this type of analysis can readily answer: how many instances of the list type are translationally equivalent to the -tuple. In this case, it turns out that for whatever base-type , there are exactly lists.

Of course, there’s a slightly easier hack which comes from the above derivation directly (namely because the translation preserves monotonicity, verify this!).

#### Lists – the easier way

Why don’t we just translate the equation directly?

We’ll get

In fact, we could even get this by solving the equation and finding its generating function.

Neat. However, this brings up a good question. Can all such equations be solved? Unfortunately, no.

Let’s define a type

if we translate it, we get

now, if our codomain of the translation is just , then we would be forced to abandon hope. However, what we really did was we packed that codomain to include that you can think of as . What this can be interpreted as is that no matter what type variable becomes, type is going to diverge.

However, using the above trick, we still get the expansion

which says that all instances of are translationally equivalent to .

#### Binary Trees

Let

which under translation gives

where corresponds to the catalan numbers (i.e., we’re generating the sequence that count the number of binary trees translationally equivalent to the -tuple, which is the number of binary trees with nodes).

Woosh