The Target puzzle: searching wordlists with tries

February 25, 2020

I was thinking about the Target puzzle from a certain Australian newspaper the other day: you're given 9 letters, and you have to create as many words as possible from them (length 4+). There's always at least one 9 letter word you can form.

An example Target puzzle

This is going to be a short post, as I basically wanted to check whether you could use a trie to solve this.

Tries are a way of storing strings (or other sequences) so they can be searched quickly: basically, you build a tree where each letter in the string becomes a node, with the next letter in the string becoming a child node. The time to search for a string of length $m$ is $O(m)$.

A simple trie storing a small list of words looks like:

def build_trie(words):
    trie = {}
    for word in words:
        current_dict = trie
        for index, c in enumerate(word):
            if index == (len(word) - 1):
                current_dict[c] = True
            else:
                if c not in current_dict:
                    current_dict[c] = {}
                current_dict = current_dict[c]
    return trie

simple_trie = build_trie(['act', 'bat', 'bag', 'bin'])
print(simple_trie)

{'a': {'c': {'t': True}}, 'b': {'a': {'t': True, 'g': True}, 'i': {'n': True}}}

Note that here, we're just looking for 9 letter words, so I just assume each word is equal length and terminate the branches with a True.

To check if a word occurs in the trie we start at the root, then for each letter in the word, we check if that letter is a child of the current node. If it is, we step down to that node and move to the next letter:

def word_in_trie(word, trie):
    current_dict = trie
    for letter in word:
        if not letter in current_dict:
            return False
        current_dict = current_dict[letter]
    return True

word_in_trie('bat', simple_trie)

True

To solve the Target puzzle, we'll use the standard word list built into the OS, and grab all the 9 letter words. We'll create a trie, as well as a standard word list that we'll use when comparing the trie approach to alternatives.

def generate_words(filename, word_length=9):
    with open(filename) as word_list:
        for line in word_list:
            word = line.strip().lower()
            if len(word) == word_length:
                if word.isalpha():
                    yield word
            
word_trie = build_trie(generate_words('/usr/share/dict/words'))
word_list = list(generate_words('/usr/share/dict/words'))

word_list[:3]

['abernathy', 'abyssinia', 'accenture']

To find a 9-letter word using the trie approach, we just generate permutations of the available letters, and check if any of them form a full word:

from itertools import permutations

def solve_target(letters, trie):
    letters = list(c.lower() for c in letters)
    for comb in permutations(letters, 9):
        if word_in_trie(comb, trie):
            return ''.join(comb)
        
solve_target('WDPARUPNE', word_trie)

'unwrapped'

This seems pretty fast, especially given that it's potentially having to run through a large number of permutations:

sum(1 for comb in permutations('WDPARUPNE'))

If my (admittedly fuzzy) knowledge of big-O notation is right, the runtime for this trie-based search should be something like $O(pm)$ where $p$ is the number of permutations, and $m$ is the length of the words ($m = 9$ here).

Unfortunately when I tested it against a set based implementation, it seems like the set wins out:

def solve_with_set(letters, word_set):
    letters = list(c.lower() for c in letters)
    for comb in permutations(letters, 9):
        if comb in word_set:
            return ''.join(comb)

%%timeit -n 10 -r 1
solve_target('WDPARUPNE', word_trie)

62 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)

%%timeit -n 10 -r 1
solve_with_set('WDPARUPNE', set(tuple(w) for w in word_list))

28.6 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)

The set-based solution should also have a worst-case runtime of $O(pn)$, where $n$ is the total number of words in the list, which should mean our trie wins out as the word list gets longer and longer. However, set lookups are often $O(1)$, and the Python implementation of them is pretty efficient, so we might only see the advantages of tries for this kind of puzzle under specific circumstances. I might have to put this down as an idea that didn't quite pay off.