Creating 2500 Challenges
The Inside Story of How We Created 2500 Great Rush Hour Challenges.
By ThinkFun CEO Bill Ritchie
This is a tale about how the Rush Hour challenges were created. The story also provides a great demonstration of how much the world has changed from 1996 to 2009.
Nob Yoshigahara and his NOBrain Corps created 160 Rush Hour challenges total, which ThinkFun published in four challenge decks. When the last challenge deck went to press, Nob said “There are no more good challenges.” At the time, he was right. Nob was a puzzle genius with an extraordinarily thorough as well as creative mind. He and his team made sure that each challenge was developed from an underlying idea, a trick or a trap or misdirect. They built challenges by hand, then used computers to help analyze the shortest solution path to make sure they hadn’t overlooked anything. For Nob to find 160 different ideas really was a remarkable feat.
Sometime in 2008 I mentioned this to my puzzle expert friend Mark Engelberg, who works freelance as a computer researcher for us. “Oh really?”, he said.
Within months we were involved in conversations about how to program the ability to generate challenges to rival hand-built ones. There is a real art to creating a Rush Hour challenge, to make it clever and different and fun to play. Just pushing your way through a thicket of cars isn’t fun… a good Rush Hour challenge needs to have tricks and traps, and your next challenge needs to be at about the same challenge level, with its own tricks and traps.
Computer speeds and capacities and experience have increased by orders of magnitude this past decade, and Mark relishes challenges like the one that Nob had put down. He wanted to write a program to generate new challenges, and then to write a filter to sift in good ones and then sort them by difficulty. I was eager to let him do it.
We started by loading in the original Rush Hour puzzles into Mark’s Flash game version and playing through them, testing out his solver program and just having fun. It quickly became apparent that it was just as compelling to play a single challenge until you mastered it - got a perfect score - as it was to play through several challenges with just escaping the Red Car as your goal. So we started by getting a real sense of how Rush Hour should be played online.
From there Mark learned how to analyze challenges. He explained that you can picture all the possible moves you can make in a Rush Hour puzzle as a tree that branches out from the initial position. Modern computers can search this tree exhaustively in a fraction of a second, but humans solve Rush Hour puzzles with a mixture of intuition, logic, and trial and error. So to analyze whether a puzzle would be fun and interesting to a human, he needed to make the computer solve puzzles in a way that was less like a computer, and more like a human. The metaphor he used for the first phase of this process was `pruning the tree'. “So right now, I’m trying to think up various ways to “prune” the tree so that a pattern emerges. I’ve already had to prune out all the various long-winded and cyclic ways you can get to certain nodes. Did I prune away something interesting? That’s a question I need to think about. Clearly some of these paths lead nowhere useful. Can I prune those away safely? Most importantly, can I prune the tree in such a way that the paths that are left are the ones that humans are likely to find. So I need to think about the psychology of solving these puzzles, and see if I can prune in a way that matches this psychology.”, he described in an email.
I was looking especially for challenges that were simple to solve, but tricky to find the best solution. I sent Mark a couple examples of what I was looking for and asked, “Can you make more puzzles like this?” So he studied the sample puzzles, and responded “It seems that what we need to do is look for puzzles that have at least two substantially different solutions, one subtly better than the other.” Mark adjusted his pruning algorithms specifically to look for these “Two Fork” puzzles, and found that it worked. “I applied a graphing technique to your two specimen puzzles. Lo and behold, a pattern emerges! Your two puzzles both have noticeably different paths.”. This was a big development! It meant that we could identify challenges with this specific characteristic out of large batch of computer generated challenges. And that if we could define desirable specific traits, we could sort for these.
By this time Mark was ready to turn to challenge creation. “I think what I’ve got is sufficiently good that its time to move to the next phase, which is to build the part of the program that generates challenges.” And in a message shortly after… “This new spreadsheet contains 43,296 puzzles which I believe to be unique.”
Along with the puzzles themselves, Mark included a set of statistical measures about the puzzles, very useful for evaluating. Every one of these challenges went through a rough sort… they all had something going for them, they weren’t random. But we wanted to be able to accurately sort the puzzles into difficulty levels that matched the categories established by Nob's original puzzle set. This Mark did next.
He coded a “virtual player” that simulated a playtesting session by a human player with the ability to see three moves ahead. He then enhanced the player with the ability to seek out new positions and avoid board configurations that it has already seen. In effect, the player has a sense of memory.
The simulated playtesting worked. Using these and other techniques, Mark was able to clarify the challenge list to 12000 puzzles and then to 2500 challenges specifically selected for the iPhone. The challenges have been tested by computer and sampled by us at ThinkFun and our friends. We haven’t played them all, but from what we have played, they’re great!
Even though we have finished selecting the puzzles for the iPhone, Mark's research continues as he studies how people solve Rush Hour puzzles and looks for new and improved ways to model this solving process on the computer. “Next, I want to investigate a “backward chaining” approach which simulates the way people work backwards from the goal by using logic to analyze which cars have to be moved to get the red car out.”
We shall see… stay tuned!