This is the first of two parts. The second will be available in a couple of weeks.

A superpermutation is a string of numbers that contains every permutation of those numbers. For example, ‘123121321’ contains all six permutations of three numbers:

One way to construct a superpermutation is to concatenate all N! permutations together but this is wasteful. We’d end up with a string of length 18 instead of 9:

The *trick* is to try and overlap permutations as much as possible. Most of the time, when we add a character this
also add a permutation, but not always. A ‘greedy’ algorithm works fairly well
to find *short* strings, but they might not be optimal.

So the question is: **how short can these strings be?**

Many people are trying to figure that out, either constructively, by exhaustive search for the shortest string, or non-constructively, by proof of its exact length. Ideally, we’d come up with an efficient process to build the shortest possible string.

I’m certainly no mathematician so I won’t tackle a proof, but I do know some
things about writing efficient search algorithms. In this article I’ll explain
my recent attempt to find **minimal superpermutations** - in this case, those that happen to be
palindromes.

Along the way I wrote some C and Rust code. I’ll focus on the high-level approach in this article and save a detailed walk-through of the code for the next.

In general, minimal superpermutations need not be palindromes. It so happens for N=1 to 4 they *are*, but no one has
proved whether or not this is always true. In fact, for N=5 there are
*eight* shortest strings
tied at 153 characters - one of which is a palindrome.

I decided to limit my search to palindromes because I think there’s fair chance
the shortest superpermutation *is* a
palindrome and this lets us reduce the search space, making the problem more
tractable. **Otherwise, the search might never finish.**

Actually, I wholly expect the search not to finish anyway - even with this limitation. I’m going to run it for a month or two on the N=6 problem to see how far it gets, but I have low expectations. The combinatorial abyss is vast and unforgiving.

There’s one more assumption I’m going to make about minimal superpermutations. That is: they do not contain the same permutation more than once. In other words, each and every permutation appears exactly once in the string, with no repetition.

Again, there’s no proof of this, but there’s mounting evidence this is the case. For the N=6 problem, more than ten thousand strings have been found with the shortest known length of 872. As far as I’m aware none of these repeat a permutation.

With this assumption in place, we can prune strings from our search that contain repetition - reducing the search space. This improves our chances of success.

At a high level, the algorithm works by searching for the shortest string that
contains exactly half the permutations. This string must also have the special
property that its reversal contains precisely the *other half* of the
permutations (the ones it’s missing).

When this string is glued together with its reversal, it forms a palindrome that is a superpermutation - a string that contains all N! permutations.

For example, ‘12312’ contains 123, 231 and 312 which are three of the six
permutations for N=3. Its reversal ‘21321’ contains 213, 132 and 321 which are
precisely the *other three* permutations it’s missing so it has the property we
want.

That means we can glue ‘12312’ and ‘21321’ together to obtain ‘123121321’ which now contains all N! permutations - making a (palindromic) superpermutation.

The search is depth first, starting from the 12...N permutation. This can be thought of as the root node of a search tree. At each step, we ‘expand’ one string by adding a single character to the end of it. This adds a child node to the search tree.

From Wikipedia: “A depth-first search explores as far as possible along each branch before backtracking.” When we translate that definition to this problem, it means we should expand longer and longer strings before backtracking to expand shorter ones:

There’s still the question of when to backtrack. This is where our ‘No repeats’ assumption comes into play. When a string is expanded, we check:

- Is the tail of the string a permutation?
- If so, have we seen it before?

If the answer is *yes* we backtrack. There’s
no point expanding the string further since we’re not interested in
superpermutations that contain repetition.

However, we can go one step further! We know we’re going to glue this string to
its reversal. That means **all permutations seen so far will be seen again** -
but in reverse. Our assumption is each permutation appears once in the
*entirety* of the string.

That means **we can check the tail of the string a second time**. If we’ve seen
its reversal before we can backtrack as well. This pre-empts the repetition
that *would occur* when the string is glued later. This lets us eliminate even
more strings!

Here’s what the code might look like:

I think of this as the *magic* of this approach. It’s this combination of ideas
that initially motivated me to try writing the search algorithm. Whether or not
it makes a significant dent in the problem remains to be seen but it’s worth a
shot.

Finally, there’s one last piece we need to supercharge the search algorithm.

The search incorporates a clever technique devised by Benjamin Chaffin. The basic idea is to break the problem into a sequence of subproblems. We can then use the solution to the first subproblem to speed up solving the second, and so on.

Each subproblem is stated in terms of ‘wasted’ characters. A wasted character is one that doesn’t add a new permutation itself but instead, prepares the string so the next character can add a permutation. Sometimes more than one in a row is needed.

For example, when ‘1’ is appended to ‘12312’
we get ‘123121’. The ‘1’ character is deemed *wasted* because it doesn’t add a
new permutation to the string. It does however prepare the string for a
subsequent ‘3’ which adds the 213 permutation.

Now that we understand wasted characters, subproblems are defined as:

How many permutations can fit into a string that has W wasted characters?

The algorithm starts by solving the W=0 case by depth-first search. It enumerates all strings that do not waste any characters and counts how many permutations appear in them. The maximal count of permutations is the solution of the subproblem.

The algorithm proceeds to solve the W=1 subproblem, using the solution of W=0 to speed up the search. This works by pruning strings that can’t possibly produce a maximal count of permutations because they’ve already wasted too many characters.

For example, when solving W=1, if we’ve seen ‘12312132’ which contains five permutations, there’s no point expanding ‘1232’ because we know from W=0 this string is doomed to failure. It can’t possibly beat the string with five permutations.

We then proceed to solve W=2, W=3 and so on. At each step we refer back to all previous solutions to prune the search. The code looks something like this:

A (contrived) analogy is to imagine climbing a ladder. The search algorithm is working on one of the rungs and is supported by preceding rungs to get its job done.

The addition of Chaffin’s method has a dramatic impact on the effectiveness of
the search. For N=5, the search completes
almost instantly. Without this code, it grinds to a halt. It would likely take
a *very* long time to finish the search.

In part two I’ll walk through the code and go into much more detail about how
the algorithm *actually* works. I haven’t written part two yet so check back
in a couple of weeks. I’ll also announce its publication
on Twitter.

If you can’t wait, the latest iteration of my (scrappy) code is here.