You are on page 1of 11

Python Crossword Puzzle Generator

As my next miniature project will be a crossword puzzle maker (note: domain has been sold
to a nice fellow who is maintaining it) for teachers that will make random generation of
crossword puzzles and word search puzzles, I thought I’d share the code I developed to
create these puzzles on the fly. While I was working on it, I ran across many different
scripts to accomplish this, but none of them were in my most favorite of languages: Python.
Besides, I’d like the code to fit snugly in my web framework of choice: Django; the popular
PHP version just wouldn’t cut it. Anyways, scroll down to see the code, or read on for a little
primer about the process behind it.

Released under a BSD license. Copyright Bryan Helmig 2010.

Puzzles like these:

p u m p e r n i c k e l - p u m p e r n i c k e l v

a - - - - - - - a - - e - a w j m p c a y a w r e s

l - s n i c k e r - - a - l f s n i c k e r b z a x

a - a - - - - - a - - v - a f a z k e u i a b f v k

d - f - c - - - m - - e - d x f v c j f d m c n e x

i - f j o r d - e - - n - i d f j o r d z e j g n z

n - r - d - - - l i p - - n r r x d j a o l i p d j

- c o r a l - - - - i - - i c o r a l u s t o i x w

- - n - - i - - - - s - - m r n u e i i h o t s y w

- - - - - m i s t - t - - m w e x s m i s t r t u j

p l a g u e - - - - o - - p l a g u e b n h k o m s

- - - - - - - d a w n - - f m n v j f p d a w n c q

- - - - - - - - - - - - - m h j a e d p p r g t p j

Behind the Scenes


This program is actually very simple and creates completely random crosswords on the fly.
Naturally, the more words you have, the better it will be at placing the most possible on a
board. However, increasing the number of words will increase computation time.
Additionally, increasing the board size will severely increase computation time. To
counteract the fact that sometimes it will randomly generate a sub-par board, we will
generate many different boards in an allotted time and only keep the “best” board (in this
case, the board with the most words placed). So, as the board and word list gets bigger,
the number of prospective boards created decreases within a fixed time.

The code first randomizes the word list and then sorts by word length. The idea here is that
longer words are more difficult to place, so get them placed when the board is the most
open. Next, we place the longest word on the 1, 1 coordinate of the grid as the seed. In
tests, the placement of the first word at 1, 1 yielded by far the best results on average.
Then we go to the next longest word and loop over its letters and each cell in the grid.
When we find a match, we back it up and suggest a coordinate placement for that word.
Once we’ve checked every letter against every cell, we chose the best (the word best is
used very loosely here) coordinate and apply the word to the grid. Now we move on the
next word and so forth. Once we’ve made it through once, we can loop over the unplaced
words and looks for any lucky chances for a second placement.

This suggested coordinate system allows for a much faster fit than some methods I’ve seen
that will randomly place a word to see if it works. Additionally, it requires the word cross
other words which is the point of well, a crossword puzzle.

Operation

Be mindful when you create a word list to exclude words like “an” or “or” because these
have a tendency to be placed inside other already placed words. This can be confusing.
Simply run the code below.

You can feed the Crossword class a list of Word classes, or a list of tuples or lists with the
word and clue. Either way works.
When you call the compute_crossword(seconds) method, it does all the work of computing
the best crossword in however many seconds you passed. 1 second is probably enough for
crossword grids of less that 20×20 and 2 seconds is fine for 25×25 and 3 seconds is good
for 30×30. Additionally, if you have a massive word list, you may want to double the time
alloyed. Finally, if you can’t run psycho, quadruple these times for similar quality.

The Code:
import random, re, time, string
from copy import copy as duplicate

# optional, speeds up by a factor of 4


import psyco
psyco.full()

class Crossword(object):
def __init__(self, cols, rows, empty = '-', maxloops = 2000,
available_words=[]):
self.cols = cols
self.rows = rows
self.empty = empty
self.maxloops = maxloops
self.available_words = available_words
self.randomize_word_list()
self.current_word_list = []
self.debug = 0
self.clear_grid()

def clear_grid(self): # initialize grid and fill with empty character


self.grid = []
for i in range(self.rows):
ea_row = []
for j in range(self.cols):
ea_row.append(self.empty)
self.grid.append(ea_row)

def randomize_word_list(self): # also resets words and sorts by length


temp_list = []
for word in self.available_words:
if isinstance(word, Word):
temp_list.append(Word(word.word, word.clue))
else:
temp_list.append(Word(word[0], word[1]))
random.shuffle(temp_list) # randomize word list
temp_list.sort(key=lambda i: len(i.word), reverse=True) # sort by length
self.available_words = temp_list

def compute_crossword(self, time_permitted = 1.00, spins=2):


time_permitted = float(time_permitted)

count = 0
copy = Crossword(self.cols, self.rows, self.empty, self.maxloops,
self.available_words)
start_full = float(time.time())
while (float(time.time()) - start_full) < time_permitted or count == 0:
# only run for x seconds
self.debug += 1
copy.current_word_list = []
copy.clear_grid()
copy.randomize_word_list()
x = 0
while x < spins: # spins; 2 seems to be plenty
for word in copy.available_words:
if word not in copy.current_word_list:
copy.fit_and_add(word)
x += 1
#print copy.solution()
#print len(copy.current_word_list), len(self.current_word_list),
self.debug
# buffer the best crossword by comparing placed words
if len(copy.current_word_list) > len(self.current_word_list):
self.current_word_list = copy.current_word_list
self.grid = copy.grid
count += 1
return

def suggest_coord(self, word):


count = 0
coordlist = []
glc = -1
for given_letter in word.word: # cycle through letters in word
glc += 1
rowc = 0
for row in self.grid: # cycle through rows
rowc += 1
colc = 0
for cell in row: # cycle through letters in rows
colc += 1
if given_letter == cell: # check match letter in word to
letters in row
try: # suggest vertical placement
if rowc - glc > 0: # make sure we're not suggesting
a starting point off the grid
if ((rowc - glc) + word.length) <= self.rows: #
make sure word doesn't go off of grid
coordlist.append([colc, rowc - glc, 1, colc
+ (rowc - glc), 0])
except: pass
try: # suggest horizontal placement
if colc - glc > 0: # make sure we're not suggesting
a starting point off the grid
if ((colc - glc) + word.length) <= self.cols: #
make sure word doesn't go off of grid
coordlist.append([colc - glc, rowc, 0, rowc
+ (colc - glc), 0])
except: pass
# example: coordlist[0] = [col, row, vertical, col + row, score]
#print word.word
#print coordlist
new_coordlist = self.sort_coordlist(coordlist, word)
#print new_coordlist
return new_coordlist
def sort_coordlist(self, coordlist, word): # give each coordinate a score,
then sort
new_coordlist = []
for coord in coordlist:
col, row, vertical = coord[0], coord[1], coord[2]
coord[4] = self.check_fit_score(col, row, vertical, word) # checking
scores
if coord[4]: # 0 scores are filtered
new_coordlist.append(coord)
random.shuffle(new_coordlist) # randomize coord list; why not?
new_coordlist.sort(key=lambda i: i[4], reverse=True) # put the best
scores first
return new_coordlist

def fit_and_add(self, word): # doesn't really check fit except for the first
word; otherwise just adds if score is good
fit = False
count = 0
coordlist = self.suggest_coord(word)

while not fit and count < self.maxloops:

if len(self.current_word_list) == 0: # this is the first word: the


seed
# top left seed of longest word yields best results (maybe
override)
vertical, col, row = random.randrange(0, 2), 1, 1
'''
# optional center seed method, slower and less keyword placement
if vertical:
col = int(round((self.cols + 1)/2, 0))
row = int(round((self.rows + 1)/2, 0)) -
int(round((word.length + 1)/2, 0))
else:
col = int(round((self.cols + 1)/2, 0)) -
int(round((word.length + 1)/2, 0))
row = int(round((self.rows + 1)/2, 0))
# completely random seed method
col = random.randrange(1, self.cols + 1)
row = random.randrange(1, self.rows + 1)
'''

if self.check_fit_score(col, row, vertical, word):


fit = True
self.set_word(col, row, vertical, word, force=True)
else: # a subsquent words have scores calculated
try:
col, row, vertical = coordlist[count][0],
coordlist[count][1], coordlist[count][2]
except IndexError: return # no more cordinates, stop trying to
fit

if coordlist[count][4]: # already filtered these out, but double


check
fit = True
self.set_word(col, row, vertical, word, force=True)

count += 1
return
def check_fit_score(self, col, row, vertical, word):
'''
And return score (0 signifies no fit). 1 means a fit, 2+ means a cross.

The more crosses the better.


'''
if col < 1 or row < 1:
return 0

count, score = 1, 1 # give score a standard value of 1, will override


with 0 if collisions detected
for letter in word.word:
try:
active_cell = self.get_cell(col, row)
except IndexError:
return 0

if active_cell == self.empty or active_cell == letter:


pass
else:
return 0

if active_cell == letter:
score += 1

if vertical:
# check surroundings
if active_cell != letter: # don't check surroundings if cross
point
if not self.check_if_cell_clear(col+1, row): # check right
cell
return 0

if not self.check_if_cell_clear(col-1, row): # check left


cell
return 0

if count == 1: # check top cell only on first letter


if not self.check_if_cell_clear(col, row-1):
return 0

if count == len(word.word): # check bottom cell only on last


letter
if not self.check_if_cell_clear(col, row+1):
return 0
else: # else horizontal
# check surroundings
if active_cell != letter: # don't check surroundings if cross
point
if not self.check_if_cell_clear(col, row-1): # check top
cell
return 0

if not self.check_if_cell_clear(col, row+1): # check bottom


cell
return 0

if count == 1: # check left cell only on first letter


if not self.check_if_cell_clear(col-1, row):
return 0
if count == len(word.word): # check right cell only on last
letter
if not self.check_if_cell_clear(col+1, row):
return 0

if vertical: # progress to next letter and position


row += 1
else: # else horizontal
col += 1

count += 1

return score

def set_word(self, col, row, vertical, word, force=False): # also adds word
to word list
if force:
word.col = col
word.row = row
word.vertical = vertical
self.current_word_list.append(word)

for letter in word.word:


self.set_cell(col, row, letter)
if vertical:
row += 1
else:
col += 1
return

def set_cell(self, col, row, value):


self.grid[row-1][col-1] = value

def get_cell(self, col, row):


return self.grid[row-1][col-1]

def check_if_cell_clear(self, col, row):


try:
cell = self.get_cell(col, row)
if cell == self.empty:
return True
except IndexError:
pass
return False

def solution(self): # return solution grid


outStr = ""
for r in range(self.rows):
for c in self.grid[r]:
outStr += '%s ' % c
outStr += '\n'
return outStr

def word_find(self): # return solution grid


outStr = ""
for r in range(self.rows):
for c in self.grid[r]:
if c == self.empty:
outStr += '%s ' %
string.lowercase[random.randint(0,len(string.lowercase)-1)]
else:
outStr += '%s ' % c
outStr += '\n'
return outStr

def order_number_words(self): # orders words and applies numbering system to


them
self.current_word_list.sort(key=lambda i: (i.col + i.row))
count, icount = 1, 1
for word in self.current_word_list:
word.number = count
if icount < len(self.current_word_list):
if word.col == self.current_word_list[icount].col and word.row
== self.current_word_list[icount].row:
pass
else:
count += 1
icount += 1

def display(self, order=True): # return (and order/number wordlist) the grid


minus the words adding the numbers
outStr = ""
if order:
self.order_number_words()

copy = self

for word in self.current_word_list:


copy.set_cell(word.col, word.row, word.number)

for r in range(copy.rows):
for c in copy.grid[r]:
outStr += '%s ' % c
outStr += '\n'

outStr = re.sub(r'[a-z]', ' ', outStr)


return outStr

def word_bank(self):
outStr = ''
temp_list = duplicate(self.current_word_list)
random.shuffle(temp_list) # randomize word list
for word in temp_list:
outStr += '%s\n' % word.word
return outStr

def legend(self): # must order first


outStr = ''
for word in self.current_word_list:
outStr += '%d. (%d,%d) %s: %s\n' % (word.number, word.col, word.row,
word.down_across(), word.clue )
return outStr

class Word(object):
def __init__(self, word=None, clue=None):
self.word = re.sub(r'\s', '', word.lower())
self.clue = clue
self.length = len(self.word)
# the below are set when placed on board
self.row = None
self.col = None
self.vertical = None
self.number = None

def down_across(self): # return down or across


if self.vertical:
return 'down'
else:
return 'across'

def __repr__(self):
return self.word

### end class, start execution

#start_full = float(time.time())

word_list = ['saffron', 'The dried, orange yellow plant used to as dye and as a
cooking spice.'], \
['pumpernickel', 'Dark, sour bread made from coarse ground rye.'], \
['leaven', 'An agent, such as yeast, that cause batter or dough to rise..'],
\
['coda', 'Musical conclusion of a movement or composition.'], \
['paladin', 'A heroic champion or paragon of chivalry.'], \
['syncopation', 'Shifting the emphasis of a beat to the normally weak
beat.'], \
['albatross', 'A large bird of the ocean having a hooked beek and long,
narrow wings.'], \
['harp', 'Musical instrument with 46 or more open strings played by
plucking.'], \
['piston', 'A solid cylinder or disk that fits snugly in a larger cylinder
and moves under pressure as in an engine.'], \
['caramel', 'A smooth chery candy made from suger, butter, cream or milk
with flavoring.'], \
['coral', 'A rock-like deposit of organism skeletons that make up reefs.'],
\
['dawn', 'The time of each morning at which daylight begins.'], \
['pitch', 'A resin derived from the sap of various pine trees.'], \
['fjord', 'A long, narrow, deep inlet of the sea between steep slopes.'], \
['lip', 'Either of two fleshy folds surrounding the mouth.'], \
['lime', 'The egg-shaped citrus fruit having a green coloring and acidic
juice.'], \
['mist', 'A mass of fine water droplets in the air near or in contact with
the ground.'], \
['plague', 'A widespread affliction or calamity.'], \
['yarn', 'A strand of twisted threads or a long elaborate narrative.'], \
['snicker', 'A snide, slightly stifled laugh.']

a = Crossword(13, 13, '-', 5000, word_list)


a.compute_crossword(2)
print a.word_bank()
print a.solution()
print a.word_find()
print a.display()
print a.legend()
print len(a.current_word_list), 'out of', len(word_list)
print a.debug
#end_full = float(time.time())
#print end_full - start_full

Sample output:

You should be able to see the associated methods lining up with the output. A side note:
you must run the display() method before the legend() method can be ran.

mist
lime
snicker
paladin
caramel
leaven
pumpernickel
coral
fjord
plague
piston
lip
dawn
saffron
coda

p u m p e r n i c k e l -
a - - - - - - - a - - e -
l - s n i c k e r - - a -
a - a - - - - - a - - v -
d - f - c - - - m - - e -
i - f j o r d - e - - n -
n - r - d - - - l i p - -
- c o r a l - - - - i - -
- - n - - i - - - - s - -
- - - - - m i s t - t - -
p l a g u e - - - - o - -
- - - - - - - d a w n - -
- - - - - - - - - - - - -

p u m p e r n i c k e l v
a w j m p c a y a w r e s
l f s n i c k e r b z a x
a f a z k e u i a b f v k
d x f v c j f d m c n e x
i d f j o r d z e j g n z
n r r x d j a o l i p d j
i c o r a l u s t o i x w
m r n u e i i h o t s y w
m w e x s m i s t r t u j
p l a g u e b n h k o m s
f m n v j f p d a w n c q
m h j a e d p p r g t p j

1 4 8 -
- - - - - - - - - -
- 2 - - -
- - - - - - - - -
- - 6 - - - - - -
- 3 - - - -
- - - - - 10 12 - -
- 5 9 - - - - - -
- - - - - - - - - -
- - - - - 11 - - -
7 - - - - - -
- - - - - - - 13 - -
- - - - - - - - - - - - -

1. (1,1) across: Dark, sour bread made from coarse ground rye.
1. (1,1) down: A heroic champion or paragon of chivalry.
2. (3,3) across: A snide, slightly stifled laugh.
2. (3,3) down: The dried, orange yellow plant used to as dye and as a
cooking spice.
3. (3,6) across: A long, narrow, deep inlet of the sea between steep
slopes.
4. (9,1) down: A smooth chery candy made from suger, butter, cream or
milk with flavoring.
5. (2,8) across: A rock-like deposit of organism skeletons that make
up reefs.
6. (5,5) down: Musical conclusion of a movement or composition.
7. (1,11) across: A widespread affliction or calamity.
8. (12,1) down: An agent, such as yeast, that cause batter or dough to
rise..
9. (6,8) down: The egg-shaped citrus fruit having a green coloring and
acidic juice.
10. (9,7) across: Either of two fleshy folds surrounding the mouth.
11. (6,10) across: A mass of fine water droplets in the air near or in
contact with the ground.
12. (11,7) down: A solid cylinder or disk that fits snugly in a larger
cylinder and moves under pressure as in an engine.
13. (8,12) across: The time of each morning at which daylight begins.

15 out of 20
811

Related Interests