After writing last week’s post on a Bloxorz solution algorithm, I was inspired to revisit the solver to make it a little more useful. In addition to cleaning up a few minor bugs, I preloaded it with all 33 levels of the original puzzle, so that users would no longer have to input a level before finding its solution. I wasn’t entirely happy with the data format I ended up using for the levels; my mistakes may be instructive.
Requirements
My primary requirement for the data format was that it be easily inspected and edited with available UNIX tools. I planned to lay out levels using the solver’s graphical UI, but I wanted to be able to manually verify that the layout process was creating the intended data structures. I also wanted the ability to make changes without using the UI; this would allow me to ignore many corner-cases in the graphical layout tool, since I could always edit the map directly.
My secondary requirement was that the data structure be simple, easy to understand, and have straightforward load/save code.
Solution
I settled on JSON for the representation of the level data structures. Since the solver was written in JavaScript, JSON could be effortlessly parsed into objects. JSON, being text-based, could also be handily inspected and edited with Emacs, meeting my primary requirement.
Perhaps thinking of Hacker’s map data, I defined the following (nested) data structures:
- A level is an object, containing (along with some administrative data) a map
- A map is an array of cells; cells are sorted on (row, column) where the upper-left-hand corner is (0, 0)
- A cell is a 2-character string; the first character representing its type, and the second any special properties of the cell
Here is a representative snippet of the level data file:
{name: 'Level 01',
data: ['.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', 'N0', 'N4', 'N0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', 'N0', 'N0', '#1', 'N0', 'N0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', 'N0', 'N0', 'N0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',
'.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0', '.0',]}
Note that newlines and whitespace are inserted into the JSON representation so that the on-screen arrangement of strings mirrors the layout of the cells those strings represent.
Problems
Although this data format met my requirements, it had one major problem: It was verbose. Six bytes were devoted to the representation of each cell – the two string characters, two quotes, a comma, and a space. In fact, only six bits are really required to represent a cell. (The cell type can be represented in 3 bits, since there are only 7 possible types, and only one bit is needed for each of the 3 possible special properties a cell can have.) This means that my data format is approximately 8 times bigger than it needs to be.
Alternatives
One simple solution is to cut out the spaces between strings. This will shrink the map data by ~17%, but (in my opinion) at the expense of some readibility:
{name: 'Level 01',
data: ['.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','N0','N0','N0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','N0','N4','N0','N0','N0','N0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','N0','N0','N0','N0','N0','N0','N0','N0','N0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','N0','N0','N0','N0','N0','N0','N0','N0','N0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','N0','N0','#1','N0','N0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','N0','N0','N0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',
'.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0','.0',]}
Another approach is to make the data structures slightly more complex, redefining them such that:
- A map is a 1-D array of rows, stored from top to bottom
- A row is a space-delimited string of cells, stored from left to right
- Each cell is stored as a 2-character pair, as above
{name: 'Level 01',
data: ['.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 N0 N0 N0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 N0 N4 N0 N0 N0 N0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 N0 N0 N0 N0 N0 N0 N0 N0 N0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 N0 N0 N0 N0 N0 N0 N0 N0 N0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 N0 N0 #1 N0 N0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 N0 N0 N0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',
'.0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0',]}
This approach will shrink the map data by ~45%, at no real cost to complexity or legibility. On balance, I wish I’d gone this way.