## The Target puzzle: searching wordlists with tries

I was thinking about the Target puzzle from a certain Australian newspaper the other day: you're given 9 letters, and you have to create as many words as possible from them (length 4+). There's always at least one 9 letter word you can form.

This is going to be a short post, as I basically wanted to check whether you could use a trie to solve this.

Tries are a way of storing strings (or other sequences) so they can be searched quickly: basically, you build a tree where each letter in the string becomes a node, with the next letter in the string becoming a child node. The time to search for a string of length $$m$$ is $$O(m)$$.

A simple trie storing a small list of words looks like:

def build_trie(words):
trie = {}
for word in words:
current_dict = trie
for index, c in enumerate(word):
if index == (len(word) - 1):
current_dict[c] = True
else:
if c not in current_dict:
current_dict[c] = {}
current_dict = current_dict[c]
return trie

simple_trie = build_trie(['act', 'bat', 'bag', 'bin'])
print(simple_trie)

{'a': {'c': {'t': True}}, 'b': {'a': {'t': True, 'g': True}, 'i': {'n': True}}}


Note that here, we're just looking for 9 letter words, so I just assume each word is equal length and terminate the branches with a True.

To check if a word occurs in the trie we start at the root, then for each letter in the word, we check if that letter is a child of the current node. If it is, we step down to that node and move to the next letter:

def word_in_trie(word, trie):
current_dict = trie
for letter in word:
if not letter in current_dict:
return False
current_dict = current_dict[letter]
return True

word_in_trie('bat', simple_trie)

True


To solve the Target puzzle, we'll use the standard word list built into the OS, and grab all the 9 letter words. We'll create a trie, as well as a standard word list that we'll use when comparing the trie approach to alternatives.

def generate_words(filename, word_length=9):
with open(filename) as word_list:
for line in word_list:
word = line.strip().lower()
if len(word) == word_length:
if word.isalpha():
yield word

word_trie = build_trie(generate_words('/usr/share/dict/words'))
word_list = list(generate_words('/usr/share/dict/words'))

word_list[:3]

['abernathy', 'abyssinia', 'accenture']


To find a 9-letter word using the trie approach, we just generate permutations of the available letters, and check if any of them form a full word:

from itertools import permutations

def solve_target(letters, trie):
letters = list(c.lower() for c in letters)
for comb in permutations(letters, 9):
if word_in_trie(comb, trie):
return ''.join(comb)

solve_target('WDPARUPNE', word_trie)

'unwrapped'


This seems pretty fast, especially given that it's potentially having to run through a large number of permutations:

sum(1 for comb in permutations('WDPARUPNE'))

362880


If my (admittedly fuzzy) knowledge of big-O notation is right, the runtime for this trie-based search should be something like $$O(pm)$$, where $$p$$ is the number of permutations, and $$m$$ is the length of the words ($$m = 9$$ here).

Unfortunately when I tested it against a set based implementation, it seems like the set wins out:

def solve_with_set(letters, word_set):
letters = list(c.lower() for c in letters)
for comb in permutations(letters, 9):
if comb in word_set:
return ''.join(comb)

%%timeit -n 10 -r 1
solve_target('WDPARUPNE', word_trie)

62 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)

%%timeit -n 10 -r 1
solve_with_set('WDPARUPNE', set(tuple(w) for w in word_list))

28.6 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)


The set-based solution should also have a worst-case runtime of $$O(pn)$$, where $$n$$ is the total number of words in the list, which should mean our trie wins out as the word list gets longer and longer. However, set lookups are often $$O(1)$$, and the Python implementation of them is pretty efficient, so we might only see the advantages of tries for this kind of puzzle under specific circumstances. I might have to put this down as an idea that didn't quite pay off.