Let’s take a look at an example of a situation where ambiguity occurs during the derivation of subtyping proofs in Java. In general, it’s conjectured that subtyping in Java is undecidable.
Table Of Content
true ? 1 : "Hello"
Take a look at the Java expression on the left hand side. Off of the top of your head, can you tell me if it will type check? If so what will be its type?
If you had asked me this question out of the blue, I would have probably said no; I would reason it as follows: this is a Java expression, so it must be assigned a type, meaning that we’re giving a system of equations saying that the type of
1 and the type of
"Hello" must be equal; this is clearly not the case as one is an Integer and the other a String, so this can’t possibly type check!
Of course, I’m actually just suffering from a common case of overlooking the obvious. It turns out that I’m forgetting about the whole subtyping thing built into Java and that I can actually assign
Object o = true ? 1 : "Hello";, because the type
Object sits at the of the type ordering.
1. Subtyping as Inequalities
Since we’re speaking of subtyping as a type relation anyways, let’s give it a symbolic name to make it easier for us: we’ll let and range over types in Java, and we’ll say
to mean that is a subtype of . Under one interpretation, we can think of this as saying that if we’re expecting a somewhere, we should theoretically be allowed to plug in a in its place and not expect weird or undefined behaviors as a result. Another perhaps more practical view is to claim that there exists some function called the coercion function (which we will denote, for the sake of historical consistency, ) that, given some subtype relation , will be able to translate an object of type into an object of type that behaves “similarly”.
Anyways, It’s obvious that both and (since everything’s a subtype of Object), so we can draw a subtyping hierarchy as a tree:
and we can type check
true?1:"Hello" : Object. This seems to suggest that we can typecheck all instances of to some type if we can typecheck to , to , and . In otherwords, is an upper bound of the types of and .
Of course, upon closer inspection, it turns out that both Integer and String share another common supertype: Serializable.
So just as well, we could claim that the ternary expression above extends Serializable. But if we want to “assign” a type to an expression, we don’t just want its supertypes.
2. Most Precise Supertype
Looking at the type hierarchy above, it’s clear that the lower a type is on that hierarchy, the more “precise” it is. When we say that an expression has a type, we don’t mean that it is bounded above by some set of super types; we actually mean that its most “precise” type is so and so. For example, we could just as well draw the type hierarchy for Integers as
and claim that a is a Serializable; but when I typically talk about the type of the expression , I’m usually talking about Integer, not Serializable.
From this, it’s clear that we also want a way to find the most precise (smallest) type of
true ? 1 : "Hello" such that it is also an upper bound of both and “Hello”; this is called the least upper bound, and we will for historical reasons call this the “join” of two types.
Symbolically, if we want to find the most precise upperbound (join) type of two types and , we will denote it as such that
the reason that the least upper bound is usually denoted as a square union symbol is actually somewhat natural: the union of two sets is the least upper bound of those two sets in the partial order (set inclusion), convince yourself that this is actually the case.
Note that whenever we join two types, the resulting type is “less precise”. We are after all finding an upper bound!
Going back to the problem at hand, the entire point of this exercise is then very simple: we want to find the least upper bound of Integer and String:
We’ve already seen earlier that both Object and Serializable are upper bounds of both Integer and String, but we also know that , so Serializable is a more precise (less) upper bound than Object is, but is Serializable the most precise?
In order to answer this question and find the least upper bound, we’re going to need the complete subtyping hierarchy. Consulting the Java documentation, we find that Integer is a subtype of Numbers, which are Serializable, and are themselves Comparable with other Integers. Furthermore, Strings are also Serializable and Comparable with other Strings! This gives the slightly more convoluted typing hierarchy:
At first first glance, it doesn’t seem possible that there’s a least upper bound: it seems like both Comparable and Serializable are the most precise upper bounds. Furthermore, this isn’t even necessarily true as Comparable itself might be too imprecise.
Wildcards and Bounded Quantifiers
Consider the constraint
now, we know that
so we can rewrite this as
by plugging in into the in . (Wildcards, amiright?) Similarly, we can say the same about integers, so we get
but we , so a more precise upper bound of is . But now, we can construct a sequence
each of which a more precise approximation than the previous, which undoubtedly converges to the most precise type: and here’s the kicker, that type cannot be expressed finitely!
So how does Java handle these types? Let’s try it out. I wanted Java to output the internal type representation, so I need to get a type-mismatch error during compile time.
It seems as if Java is reporting the type of the two as an unbounded quantification: just
Comparable<?>. This is intentional, Java treats the
? extends ... as a constraint (or a system of inequalities) and discards them during type checking and coercion in order to use them later solely for typechecking, so we seemed to have for now solved the problem (So even though , we’re going to treat them as if they weren’t).
No least upper bound?
But we still have another problem, there’s is no least upper bound in the above hierarchy! Again, let’s see what Java does:
It turns out that the least upper bound is just the “intersection” over the three types, which should then be the LEAST upper-bound of all three.