Add Me!Close Menu Navigation

Common set operations

From freshman year textbook, Discrete Mathematics and Its Applications (sixth edition), Chapter 2 asks the reader that given subsets A and B of some superset, implement A union B, A intersect B, A – B, and find the symmetric difference of A and B.

Blatant plagiarism for the definitions of union, intersection, difference, and symmetric differences in set theory.

  • The union of A and B is the set containing those elements that are in at least one of A and B Union
  • The intersection of A and B is the section containing those elements that are in both A and B Intersection
  • The difference of A and B is the set containing those elements that are in A but not in BDifference
  • The symmetric difference of A and B is the set containing those elements in exactly one of A and BSymmetric difference

1. First Approach:

Assuming that sets can be represented by indexed tables in Lua, the union of two sets is the concatenation of set A with the set of everything in B that’s not in A. So we can simply iterate through every element of B and test whether if it is in A. If the test fails, then we append it into A.

function union(a, b)
    local function find(a, tbl)
        for _,a_ in ipairs(tbl) do if a_==a then return true end end
    end
    for _,b_ in ipairs(b) do
       if not find(b_, a) then table.insert(a, b_) end
    end
    return a -- Hmm, I wonder if anything unexpected might happen...
end

The intersection of two sets then is everything that is in A and in B. We iterate through every element of A and test whether if it is also in B. If the test succeeds, then we retain that element. Since the intersection is itself a subset of both A and B, we don’t need to worry about whether we should fix A or B.

-- refactor find from union into the global scope
function intersection(a, b)
	local ret = {}
	for _,b_ in ipairs(b) do
		if find(b_,a) then table.insert(ret, b_) end
	end
	return ret
end

The difference of two sets is everything in A that is not in B, so we only need to negate the condition in intersection.

function difference(a,b)
	local ret = {}
	for _,a_ in ipairs(a) do
		if not find(a_,b) then table.insert(ret, a_) end
	end
	return ret
end

At this point, I got lazy and decided to write the symmetric difference as a composite of the existing functions. If we want everything in sets A and B but without those elements that are in both, we basically want the difference of the union of A and B and their intersection.

function symmetric(a, b)
	return difference(union(a,b), intersection(a,b))
end

If you’re following along with the code so far, you’ll probably get an unexpected result when you try to call symmetric. For example, if given symmetric({1,2,3,4,5}, {3,4,5,6,7}), we would expect to get {1,2,6,7}. But this is not the case, only {1,2} will return. What is going on here?

Lua passes tables and other composite objects by reference rather than value, which explains why {1,2,3} == {1,2,3} fails (this is analogous to comparing the address of the head of the table, which are obviously not the same as Lua allocates two different blocks of memory for each table). Since we pass A by reference, the function union will in turn modify the table referenced by A to be the union of the two sets, after which taking the intersection will return the entirety of set B, and the return value will then be the difference between A and B rather than the symmetric difference.

Because I’m lazy, I resolved this by computing the intersection before the union.

function symmetric(a, b)
	local int = intersection(a,b)
	return difference(union(a,b), int)
end

But for practical purposes (aka: avoiding unexpected errors like this), we should rewrite the union function so that we’re using a copy of set A.

function union(a, b)
	a = {unpack(a)} --unpack a returns a 'tuple' of all elements of a, and by recreating the table we assign it a different address
	for _,b_ in ipairs(b) do
		if not find(b_, a) then table.insert(a, b_) end
	end
	return a
end

2. A little bit of optimization

I don’t really feel comfortable calling this “optimization” as I have no idea how Lua handles dictionary lookups for tables. For all I know, an iterative for loop may actually be faster at finding elements of an indexed list than a dictionary lookup + the initial “hashing” cost (again, no idea how Lua actually hash elements). But assuming that it’s faster to do a simple dictionary lookup in a table and that the sets we’re working with are typically large, we can speed up our functions by replacing the find function.

function hash(tbl)
	for _,v in ipairs(tbl) do tbl[v] = v end
	return tbl
end

By calling hash on a table, we effectively allow direct index so we can do lookup in near constant time (and 1 iteration through the list initially). This technique however will fail for lists of integers however because the value may coincide with the list index so either the element is overwritten or a false index will return a true value.

3. Final Code

Here’s my (ugly?) rendition:

local function find(a, tbl)
	for _,a_ in ipairs(tbl) do if a_==a then return true end end
end

function union(a, b)
	a = {unpack(a)}
	for _,b_ in ipairs(b) do
		if not find(b_, a) then table.insert(a, b_) end
	end
	return a
end

function intersection(a, b)
	local ret = {}
	for _,b_ in ipairs(b) do
		if find(b_,a) then table.insert(ret, b_) end
	end
	return ret
end

function difference(a, b)
	local ret = {}
	for _,a_ in ipairs(a) do
		if not find(a_,b) then table.insert(ret, a_) end
	end
	return ret
end

function symmetric(a, b)
	return difference(union(a,b), intersection(a,b))
end
Posted By Lee

Woosh

Recent Comments