So there’s this bunch of friends who wanted me to play Scrabble with them. I’d like to play with them. Unfortunately, my relationship with words does not involve decomposing them into individual letters: I learn and use them as a whole, not an assembly of their parts. So I can’t “see” words when faced with a set of Scrabble tiles. Therefore, I suck at Scrabble.
But then there are two components to the game: finding candidate words (which I suck at) and then finding good positions to place them on the board and maximize your score (which I suck less at). Maybe if I could get some help on the first part, Scrabble could be fun. And then I figured computers are good at finding things.
So I decided to make a program to help me.
The first step was to realize I did not want to “solve” the entire game using a program. Although I have worked out in my head a program structure that would always find the best placement for words on the board, I would not enjoy writing and then using that program: it’s a complicated program, and it would make Scrabble too easy. I still want some challenge while playing.
So instead I decided to solve only the first part: automate the search of all dictionary words that I can build using the tiles in my hand and some tiles already on the board. I would then decide myself which word to place, based on the score and the overall layout of previous moves.
First I thought about how I wanted my program to look like from outside, as a user. Since I am playing in multiple languages, it should be able to read separate dictionaries for separate languages. Then I want to tell my program which letters I have available in my hand and which letters I want to use on the board. And I also want my program to only tell me which words “fit” at a position I choose. Finally, I want my program to print the list of candidate words alongside their score, sorted, higher score first. So I decided that running my program would look like this:
./find en qnrfjesn n
Where “en” designates which dictionary to use (eg. English), “qnrfjesn” indicates which tiles I have in my hand together with the tiles I want to use on the board (eg. here I have “qnrfjes” in my hand and I want to use a “n” already placed), and “n” designates a pattern of letters that must exist in the resulting words to be valid (eg. here the “n” must appear in the word since I am using it from the board). Supposing the program was completed, I would like to see the following output:
jen 12 ferns 8 fens 7 fern 7
Then it was time to get started. The first step is to figure out the “algorithmic heart” of the problem: given some available tiles and a word from a dictionary, is it possible to build the word using the available tiles? To solve this, I first put my available tiles in a small heap. Then I look at every letter in the word in turn, and I look if I have a tile available for it. If not, I know I definitely cannot build the word since a tile is missing. If I have a tile, then I take this tile out of my heap so that I do not consider it twice, and I move to the next letter in the word. If I can reach the end of the word without running out of tiles, I know I can build the word with the tiles I had. So there it goes:
def canbuild(tiles, word): available = list(tiles) for letter in word: if letter not in available: return False available.remove(letter) return True
This seems about right, but then I realize: there are jokers in Scrabble! That is, blank tiles which can play the role of any letter. I will use the period “.” to represent a joker, and then I modify the test as follows:
def canbuild(tiles, word): available = list(tiles) for letter in word: if letter in available: available.remove(letter) elif '.' in available: # use joker if available available.remove('.') else: return False return True
This seems about right. Let’s use it. I first assume somehow my program has the available tiles and dictionary with scores available as input. Then I can use “canbuild” in a loop like this:
for word, score in dictionary: if canbuild(tiles, word): print word, score
From this point, I now need to fetch the tiles and dictionary as input. For simplicity, I decide that I want to read the words and scores from the program’s standard input stream, and the tiles as a command line-argument. This way, the program does not exactly look like the specification I gave above, but I can fix that later. For the tiles, I work as follows:
import sys tiles = sys.argv
And then for the dictionary, which I assume is a file where each line contains a word and its score separated by a space, I do it as follows:
lines = (s.rstrip() for s in sys.stdin) dictionary = (s.rsplit(' ', 1) for s in lines if len(s) > 0)
(NB: the “if len(s) > 0” is there to ignore blank lines in the dictionary file.)
If I stick the code above together, I obtain a file “find.py” looking as follows:
#! /usr/bin/env python def canbuild(tiles, word): available = list(tiles) for letter in word: if letter in available: available.remove(letter) elif '.' in available: # use joker if available available.remove('.') else: return False return True import sys tiles = sys.argv lines = (s.rstrip() for s in sys.stdin) dictionary = (s.rsplit(' ', 1) for s in lines if len(s) > 0) for word, score in dictionary: if canbuild(tiles, word): print word, score
This Python program gets me a long way towards my solution: it takes a dictionary with weights on its standard input, the available tiles as argument, and prints all candidate words. For example, assuming my dictionary with scores is stored in the file “en”, I can use my program as follows:
python find.py qnrfjesn <en
It’s close to what we wanted, but not quite yet: I stated that I want to filter out all words which do not “fit” on the board. For this, I do not need to program anything; I can simply use “grep” and regular expressions next to my program! For example, as follows:
python find.py qnrfjesn <en | grep n
Here “grep” will only show words produced by “find.py” which actually contain the letter “n”. With “grep” I can make smarter filters too:
# All words starting with "n": python find.py qnrfjesn <en | grep "^n" # All words ending with "n" # (the space delimits the score that follows): python find.py qnrfjesn <en | grep "n " # All words using "v" and "a" already placed on the board, # one empty column apart: python find.py qnrfjesva <en | grep "v.a" # All words containing an n in 4th position: python find.py qnrfjesn <en | grep "^....n"
Since this seems to work well, time has come to “package” it with the user interface I really wanted. For this I create a regular shell script, “find”:
#! /bin/sh python find.py "$2" <"$1" | grep "$3"
Et voila! The program looks like what I want. Let’s try it:
./find en qnrfjesn n sh: no such file or directory: en
Uh-oh. I want to read from a dictionary with scores, but I forgot to prepare the input file! Hopefully, I only have to do that once. There are probably already Scrabble word lists with scores to be found on the Internet, but I was too lazy to search. Instead, I built my word lists really quickly as follows.
First, I knew that there are plenty of word lists for /usr/share/dict on Unix systems. The best place I know to quickly give me a download link is the Debian package manager: fill in the name of the language in the search box (eg. “english”, “dutch” etc) and you get a download link for a Debian package containing that word list. Since a Debian package is simply a .tgz file in disguise, it only takes a couple unarchiving steps to obtain plain word lists:
dutch british-english american-english-insane french
Of course these word lists do not have scores attached. But since I was playing Scrabble already, it was just a matter of looking at my existing games to determine which scores are used in each language. For English it goes as follows:
a 1 b 4 c 4 d 2 e 1 f 4 g 3 h 4 i 1 j 10 k 5 l 1 m 3 n 1 o 1 p 4 q 10 r 1 s 1 t 1 u 2 v 4 w 4 x 8 y 4 z 10
For Dutch like this:
a 1 b 4 c 5 d 2 e 1 f 4 g 3 h 4 i 2 j 4 k 3 l 3 m 3 n 1 o 1 p 4 q 10 r 2 s 2 t 2 u 2 v 4 w 5 x 8 y 8 z 5
And for French:
a 1 b 3 c 3 d 2 e 1 f 4 g 2 h 4 i 1 j 8 k 10 l 2 m 2 n 1 o 1 p 3 q 8 r 1 s 1 t 1 u 1 v 5 w 10 x 10 y 10 z 10
Now I need to annotate my dictionaries with their scores, and sort the words with the maximum scores at the beginning.
First I place the letter scores above in files: “en.scores”, “nl.scores”, etc. Then I make another program, which I call “sort.py”. The program first loads the scores from the file given as first argument:
import sys pairs = (s.split(' ') for s in file(sys.argv)) letterscores = dict(((s, int(s)) for s in pairs))
Then I read the words one by one, compute their score, and place them into a sorted mapping of scores to words:
from collections import defaultdict scores = defaultdict(list) for word in sys.stdin: # remove white spaces at beginning and end word = word.strip() # compute its score according to the (lowercase) letter weights, # ignoring letters not in the score table s = sum((letterscores[l] for l in word.lower() if l in letterscores)) # then add the word to its score bin. scores[s].append(word)
Then I can print the annotated dictionary, higher scores first:
for s in reversed(scores.keys()): for word in scores[s]: print word, s
Et voila, now I can process the dictionaries I have downloaded earlier:
cat american-english-insane british-english \ | python sort.py en.scores >en python sort.py nl.scores <dutch >nl python sort.py fr.scores <french >fr
Which gives me the dictionary files I need to play. From this point, my “find” program from earlier works:
./find en qnrfjesn n jen 12 jnr 12 nj 11 ferns 8 fens 7 fern 7 fren 7 nefs 7 fen 6 nef 6 fn 5
Since I have used the large dictionary “american-english-insane” there are many words which are not recognized by the Scrabble game. I don’t mind, since I can always try out the word and see if the Scrabble game accepts it.
And there I am, with vastly improved fun at playing Scrabble.
- 30s to come up with the idea,
- 30s to decide I did not want to solve the entire game, just search for words,
- 10mn to implement the first version of the program,
- 1h of playing to iron out the bugs,
- 1h to write this article and explain how it works.