This past weekend was the annual MIT Mystery Hunt: one of the largest events that are known as puzzle hunts. The Mystery Hunt has a tradition where the team that wins the hunt writes the hunt for the next year. As I was on the winning team in 2014 (known as One Fish Two Fish Random Fish Blue Fish), I was a part of writing the hunt in 2015. That means that the 2016 hunt was a return to solving after two years instead of just one. Our hunt last year was won by Luck, I Am Your Father (or Luck, for short), so they were writing this year's.
In addition to being relatively out of practice, our team went through large changes. In 2014, our team won in no small part because we had grown to an absurd size. That put a lot of strain on us in writing because a lot of those team members who had helped us win had little interest in writing, and also because as teams get bigger in general, the hunt needs to get longer to compensate, which means that the task of writing hunt gets bigger and bigger. Going into this year's hunt, we decided to split the team in order to not accidentally win again (a lot of us still haven't fully recovered from the stress of writing) and (at least for me) support the idea that teams should manage their size to prevent so-called "superteams" from defining the nature of the Hunt.
I felt that our split went rather well. We mostly defined the split by having the people who were most active in writing the hunt on one team, and having the people who were more there for the social ties to Random on the other team. The writing "half" ended up a little bit bigger and stronger this year, but since the Random team will grow more over time, that seems like a good outcome. Our team chose the name Hunches in Bunches, keeping with the Dr. Seuss theme from two years earlier.
The hunt this year started off with a kickoff that told us we were at the MIT "Muttstery Hunt" before things started getting weird. I didn't understand much of what happened at kickoff, so for the rest of the weekend I ignored pretty much all of the plot. This was disappointing to me since kickoff is often one of the things I look forward to the most.
The first round of the hunt was themed around a dog show. Once we got most of the way through the puzzles in that round, it was revealed that the hunt was based on Inception, under the name Huntception, and that our goal was to wake up a number of sleepers, the first of which being Rip Van Winkle.
Two years ago, I described myself as caring almost entirely about the "mechanics" of hunt, meaning the puzzles themselves. While I still certainly care about the puzzles more than the plot, my experience this year really pointed out to me how much the plot can affect how much I enjoy the overall experience.
This year's plot was based around Inception, and seems to draw the majority of its ideas from the movie. I feel that this strategy of taking an existing work of art and matching its plot closely is a mistake. First and foremost, by basing the Hunt so closely on that work of art, the plot becomes mostly about references to it, and as a result gets lost on anyone who isn't familiar with the source material. I felt this effect strongly with the 2012 hunt as well, which was based on the movie and musical The Producers, something I had never even heard about until that Hunt.
Another aspect of this year's plot that I disliked was that it felt very monotonous. The main hunt page was simply a page with "HUNTCEPTION" in stylized letters, and each letter pointed to a round of puzzles.
This is in stark contrast to what I feel was one of the best innovations in the hunt that we wrote. Our hunt index page had every single puzzle on it, which meant that hunters didn't need to search through the various round pages in order to find a particular puzzle. Additionally, it provided a strong sense of exploration, as teams started in the workshop at the top, and as they solved more puzzles they gained access to deeper and deeper parts of the ocean.
The round pages themselves had some flavor to them, but again they were mostly a list of puzzles with an image of whatever sleeper happened to be relevant to that round. The presentation of the puzzles themselves differed very little from round to round.
To illustrate, compare a puzzle from the Rip Van Winkle round to a puzzle from the Sleeping Beauty round. These are very different puzzles, but they're presented in the same way. The differences are limited to a different background image that is only visible around the edges, and a different image at the top left of the puzzle.
Now look at the 2013 hunt, where we can compare a puzzle from the Danny Ocean round to a puzzle from the Marty Bishop round. The difference in round here is extremely noticeable. The backgrounds change drastically, the stylization of the puzzle titles is different, and even the font of the text within the puzzles is different.
When we were writing the hunt, one of the proposals was a Lovecraftian Horror and Dr. Seuss mixup. I feel like had we gone with that theme, we would have fallen into many of these same pitfalls. The themes that I argued for were very loose themes that gave us as writers a lot of freedom so that we could build up a compelling world that made sense for our story, rather than trying to mold our story to fit an existing world. I think this was one of the best decisions we made, and I would strongly urge teams in the future to do the same.
Shortly after kickoff, we were informed by Luck that they had encountered significant technical issues with their server, and as a result the beginning of hunt was delayed by about an hour. That on its own wouldn't be too bad, but the tech failure meant that a lot of other aspects of hunt were changed as well. I don't blame Luck for running into last minute tech issues, and they did a good job of being able to run the hunt in spite of them, but I did feel like it lessened how much I enjoy the hunt and it's worth exploring why that was.
One result of the technical issues was that the website was not equipped with the ability for teams to log in. The solution to this was two-fold. One, each round would come with all of its puzzles already unlocked. Two, Luck created a number of distinct subdomains that each had a different number of rounds unlocked, so unlocking a round for a team meant that they would send out an email with a link to the new subdomain.
This solution came with a number of downsides.
- In order to see all the rounds you unlocked, you had to make sure you were using the proper subdomain. If you were on an old subdomain and went to the main HUNTCEPTION page, you would see fewer rounds unlocked than your team actually had.
- Answer submission was through a Google form instead of a specialized page on the website, meaning that we had to select our team and enter a "password" in order to verify that the answer was actually from us.
- Solved puzzles did not change in appearance on the various round pages.
- The mechanism for unlocking a round was extremely opaque.
Overall, these were relatively minor annoyances. They were enough, however, for me to try to interact with the hunt website as little as possible. Instead, I focused on the team's solving software, which indexed the puzzles by round and clearly marked which puzzles had been solved, and had links to the relevant puzzle pages.
I feel that the biggest downside to all of this was that each of the rounds came with all of its puzzles unlocked immediately. This is a topic with enough to talk about that it really warrants its own section.
I want to preface this by once again saying that I recognize that the unlock structure of this year's hunt was a side effect of the technical failures, and not a conscious design decision by Luck. However, I think it's a good example to use in exploring what constitutes a fun unlock structure.
One of the first major differences between this year's hunt and previous hunts came right at the start. The dog show round was released with 20 puzzles and 5 metas. It could hardly be more different from the 2014 hunt that started with a mere 3 puzzles.
I felt that 2014 started with too few puzzles and unlocked puzzles too slowly, but if I were to choose between these two options, I would take the "too few" option in a heartbeat. Starting with 20 puzzles is fine for a superteam, where even splitting 20 ways leaves several people per puzzle, but Hunt shouldn't be dominated by these superteams, and so having a more narrow start is preferable.
In designing an unlock structure, there are two metrics that I would look at a lot: working set size and branching factor. Working set size is the number of puzzles that are simultaneously available to solve. In other words, the number of unlocked puzzles minus the number of solved puzzles. Branching factor is the number of puzzles that become unlocked simultaneously, usually after solving a puzzle.
These two metrics have a significant impact on how it feels to progress. This doesn't apply to just Hunt, but is also apparent in puzzle games. SpaceChem and Snakebird are two puzzle games that I think are both extremely well made, and are known for being very difficult. However, their unlock structures are very different, and as a result they feel very different to play.
SpaceChem's unlock structure starts with a linear sequence of worlds where you need to solve all of the puzzles in a world in order to unlock the next (sometimes there is an optional puzzle that is not required, but players have no choice in which puzzles are required). Within a world, there are usually 5-7 puzzles layed out in a graph form, where you'll generally have between 1 and 3 puzzles unlocked. So the working set size stays between 1 and 3, and the branching factor stays between 0 and 2.
This unlock structure makes SpaceChem almost entirely linear, and it means that if you get stuck on a puzzle, often you have no choice but to keep thinking about it until you solve it. I've heard SpaceChem described as an oppressive game, which I think is largely caused by the fact that when you get stuck, there isn't another puzzle to go think about to take a break.
Snakebird's unlock structure is a single connected world, where solving a puzzle unlocks its neighboring puzzles, until the end at which point there are six puzzles on clouds that all need to be solved to unlock the final puzzle. The effect of this structure is that the branching factor goes up as high as 3, and while the working set size starts at 1, it quickly grows to about 10 and stays there for the majority of the game.
The effect of Snakebird's unlock structure is that if you ever get stuck on a puzzle (with the exception of near the end of the game when you're trying to solve the last few), you always have the option of stepping back and taking a look at a different puzzle. Since Snakebird's puzzles can often be quite difficult, this allows the player to effectively take a break without needing to step away from the game entirely. Snakebird also numbers its puzzles, which provides the player a guide if they want to do puzzles in roughly ascending order of difficulty.
This year's hunt was on the complete opposite end of the spectrum, with a working set size that started at 20 and mostly grew from there, with a branching factor that was 0 most of the time, with bursts of 10 or more (whenever a round unlocked). The effect of this is that teams get spread quite thinly, and that whenever there's a new unlock it's difficult to pick the next puzzle that you want to work on, because of how many choices there are. While this is something that can be solved by the teams themselves by having discipline in staying in larger solving groups, starting the working set small and growing it to about 20 is a good way for writing teams to ensure that small teams don't get overwhelmed by the number of puzzles available.
I thought that the puzzles themselves were overall well-written this year, but after reading through some of the solutions of the puzzles that our team got stuck on, the remainder of those puzzles was generally disappointing. One goal that I would recommend to future puzzle writers is to try as hard as possible to make it so that if a solver is stuck, it's on the fun part of the puzzle. In other words, the reaction upon hearing the solution should be "Oh, that's clever!" instead of "Oh, that's dumb." The most disappointing solves are where you stare at a spreadsheet for hours after doing data collection without seeing anything until someone miraculously comes up with the proper indexing operation to do. I'll touch on that a bit more as I go over the specific puzzles that I worked on. These will have some amount of spoilers in them, so now is a good time to stop reading if you are planning to try to solve the puzzles on your own.
This was a pretty good puzzle to start out the hunt with. The only real that I saw was that "X factor" sounds like it's referencing the TV show, when in fact the puzzle has nothing to do with it. I liked the play on words with *kiss* being X, although the fact that Harvard, Kendall, Porter, Central, etc are all squares was lost on me until I read the solution (we identified them as simply T stops). The other critique I would give this puzzle is that neither of the two lists appears to be sorted in any way, so after pairing them up it's ambiguous whether to use the order from the first column or from the second column. These sorts of ambiguities are the things that lead solvers to become stuck on the un-fun parts of puzzles.
Another great puzzle. We got stuck for a while after identifying the decompositions of the 5x5 grids into words until someone convinced us to write all the words out and we noticed that they made sensical phrases. The biggest inelegance I saw in this puzzle was the use of A=1, B=2, ... for indexing purposes. It's not obvious that the puzzle is better in this form than in a form where there is a number explicitly such as "A7" instead of "AG".
I don't think I have any complaints about this puzzle. The first step of decoding as a Vigenere cipher is natural enough that I don't think it needed to be clued any more than it was, and there are existing tools for getting approximate keys. Once one or two of the texts is decoded, the rest of the mechanics becomes clear. I liked the way that everything played in together and the attention to detail, such as texts corresponding to songs in a major key written in all caps and songs in a minor key in all lowercase.
Our team got stuck on this meta for an extremely long time. This is mostly on us, as we tried to read too hard into the ninepin theme of the round, while the actual mechanics of the meta were essentially all standard puzzle techniques. As one example, we had the idea to write out the puzzle answers in a diamond shape, but we used the order of pins as they are numbered in bowling (alternating sides front to back) rather than simply clockwise in a circle. Once we realized the correct solution, the meta felt pretty disappointing to me.
This was my second favorite puzzle of the hunt. "Identify, Sort, Index, Solve", or ISIS for short, is a type of puzzle that generally consists of four steps. First, you identify some set of objects. Second, you sort by some piece of data. Third, you index into the names of object by some piece of data. Fourth, you read off the answer. These are generally considered the most basic of Hunt puzzles, and as while writing I tried to avoid them as much as possible.
This puzzle took that basic idea and threw in a twist, which is that most of the puzzle is straightforward, except for sorting, which is usually the easiest out of the four steps. Sorting the numbers in this puzzle involves learning about several different notations for writing large numbers, and then doing careful computations when trying to compare them. As an additional bonus, the identities of the images played on the "big number" theme.
This was a fun puzzle that played on the word "rose" a lot, with both "rows" and "rhos". We ended up getting stuck at the clue phrase for several hours because the name we got could have referred to a lot of different people, and we didn't notice the extra message spelled out in the clues. Overall a good puzzle, but one example of many somewhat ambiguous clue phrases present throughout the hunt.
I came in after the first step of this puzzle where some of my teammates had realized to read the first letter of each word in the story to get another sequence of words. We then just had to find the next few mechanics before getting the answer. Overall, not a bad puzzle, but nothing particularly amazing either.
This was one of the few puzzles that I actually worked on while in my hotel room rather than on campus. I heard about a Magic: The Gathering puzzle and so of course I wanted to look at it. Most of the data collection had been done by the time I got there and extraction wasn't particularly difficult. Again, not a bad puzzle, but not very exciting. Puzzles in the past have incorporated Magic into them in more interesting ways, such as Turnary Reasoning from the 2013 Hunt.
This was my favorite puzzle in the hunt, and I think the first "reverse engineering" style puzzle that I contributed to solving (50/50 was another puzzle in that style). The writers did a good job of making the puzzle resistant to code analysis, so while we did modify the code a little bit in order to make a programmatic interface and have the page display more digits of precision, we were still playing by all the intended rules.
What made this puzzle great was the journey to understanding exactly what the rules were for Quantum Minesweeper. At first, we thought that the cells were roughly independent, and then we found a pair of cells that caused us to die whenever we clicked on both in either order. Eventually, we found a set of 3 where you needed to click all 3 to die, and then finally arrived at the correct mechanic, at which point we could do the work to extract the answer.
We didn't get very far into solving this puzzle, but we had already made several conjectures that were straight up wrong. We pretty quickly realized that the pie charts were color compositions of flags, and found several of the German state flags, but then we ran into two issues. First, because of the large number of German states that start with B and are very close alphabetically, we were expecting the entire set of flags to be in alphabetical order. Second, we thought that we should only be looking for flags of areas that are specifically called states (due to the flavor text), when in actuality we were supposed to use prefectures, provinces, and so on as well.
If I were editing this puzzle, I would question whether it really adds anything to have all the chains of flags be in a single line. Adding line breaks makes the mechanics more clear and the difficulty added by having them run into each other felt rather artificial.
I had just started working on this when the coin was found and pretty much our entire team left to go to a reuinion in Random, but I decided to keep working on it for a little bit longer. This puzzle has the unfortunate property that it's much more difficult to solve by hand than to use a SAT solver. I didn't quite get the SAT solver right because I forgot some of the constraints, but the rest of our team spent about 7 hours and only made it through about 40%.
The extraction mechanism was also rather disappointing. I was expecting to slice the cube seven ways (either in the way it's already sliced or a different way) and read off letters. Instead, it ended up being an application of A=1, B=2, something I'll touch on again later.
I felt that this was one of the better puzzles from late in the hunt. It avoided any issues with getting stuck on extraction by having a sample grid with a clear extraction mechanism, and so the focus of the puzzle was entirely on solving the grids, which was decidedly the fun part.
This puzzle was one of the biggest offenders for getting solvers stuck on the unfun part. We relatively quickly found the values for each variable and plugged it into the formula to read off the message COMPOSITESPLUSONE. But then we got stuck forever. We tried taking the composite valued variables and then adding one to them to plug back into the formula, but that didn't give anything. We also tried treating it as a cluephrase and called in answers like NONPRIMES, POSITIVENONPRIMES, and so on. We tried taking the composite valued variables and the variable with value 1, but that didn't work. We didn't try translating those values to letters because one of the numbers was 4383347, which happens to be divisible by 1163 and 3769.
It turns out in the end that the answer was to take the composite indexed variables (D, F, H, I, ...) and add one to the values before translating to letters. I talked to multiple teams who got stuck in this exact same spot. Some of them eventually found the intended meaning, but many didn't. It's possible that none of the testsolvers had trouble with this cluephrase, in which case Luck just had some bad luck, but in general I would expect this to show up in testsolving. If I saw this issue arise as an editor, I would suggest changing A, B, C, and so on to x1, x2, x_3, and so on, so that the variables themselves are more clearly numbered. While it is true that the variables had numbers next to them (not something that would usually be necessary), it's very easy to dismiss that as redundant information and completely ignore it.
The other disappointing thing about this puzzle is the fact that the extraction was a straightforward A=1, B=2 extraction. The formula felt complicated enough that it could have been used a second time if the writers wanted to, so I'm disappointed that it wasn't.
A note on extraction mechanisms
That concludes the puzzles that I spent a significant amount of time on. From looking at the puzzles that our team got stuck on as a whole, I noticed a pattern that in a lot of cases we got stuck on a step that seemed to be inteded as a simple extraction. Most of the time that this happens, it's because the solver believes that they have tried a mechanic and that it gave gibberish, while it actually being the correct mechanic.
What might cause that? Generally, solvers will stop trying their extraction idea whenever they have a reasonable belief that they aren't extracting English. This will usually be from a bad bigram or trigram, such as the letters starting with MPT, etc. If a solver is trying the correct extraction mechanic and getting bad letters, it is often because they have something wrong, causing the letters to be bad. Rather than forcing the solver to try all of their extraction ideas again each time they correct an error, puzzles should try to have error checking built in before the extraction step.
Another technique to reduce the amount of time that solvers get stuck is to try to write clue phrases that still look reasonable under small perturbations. One (half) clue phrase from this year was UTAHBALLTEAM. If something goes wrong and a team starts by seeing UTBHB, they are likely to stop before getting to the good part of the phrase. If you have the ability, prepending some non-clue-phrase English (even as simple as the word ANSWER) could be a good idea. Our writing team also had this issue. I remember someone telling me that they nearly stopped extracting letters when it started with SMSP (for the answer SMS POMMERN).
I also believe that answer extraction should either be dead simple, or just as elegant as the rest of the puzzle. There is a common technique for turning numbers into letters called A=1, B=2, .... However, I consider this a risky mechanic. In particular, if there are multiple different sets of numbers that solvers are considering, then they are likely to miss which set of numbers actually gives a good answer. That's what happened to us with Trivial Mathematics. My advice to puzzle writers is that you should try to have your letters come out of the puzzle as letters, and reserve A=1, B=2, ... for the cases where there is a single obvious set of numbers, or it is explicitly stated in the puzzle.
If you look at the list of puzzles using A=1, B=2, ... you'll find only six puzzles from 2015, which is significantly less than most other years, and something I consider a great victory. I hope to see future puzzle writers try to create extraction mechanisms that fit within their puzzle, instead of using an old standby.
I found this year's hunt to be enjoyable, though not one of my favorites. The biggest weaknesses for me were the plot and the effects of the technology failure, with some puzzles being disappointing as well. I want to end with my ranking of the hunts I've done onsite (starting with 2009). If anyone from one of the writing teams of a lower ranked hunt is reading, don't take this as a knock on your hunt. As more hunts get written, we get more information about what makes a hunt fun, so the main purpose of this list is to help provide that data to future writing teams.
- Escape from Zyzzlvaria (2009)
- Coin Heist (2013)
- Time Travel (2010)
- Video Games (2011)
- Alice in Wonderland (2014)
- Huntception (2016)
- The Producers (2012)