|
|
|
In this article Jack Shirazi considers how efficient the rules he uses for Su Doku solving are and, in the process, finds that he falls into the usual benchmarking traps. And he's surprised at how far simple rules get you.Published May 2007, Author Jack Shirazi
I enjoy a Su Doku now and then. Usually it's a time filler for me, when I'm looking for something to occupy my mind while I'm not working and not interested in reading another article. Recently I wondered about my efficiency in solving these puzzles. Specifically, I wondered if the techniques people use to solve the puzzles is more efficient than a brute force search of the possible solutions. It seemed to me that applying the techniques I use to solve the puzzles should be more efficient than just a simple depth-first search to find a solution, though naturally a depth-first search would be much simpler to program. I wondered how many rules it needs to solve these puzzles, and how complex those rules need to be. So I decided to find out.
A depth-first search of the puzzle is easy enough to code: using a 2-dimensional array of int's, i.e. int[][], you set the cells already filled in (the puzzle starting point), and then apply the following algorithm (this is, of course, just one implementation out of many possible ones, I was looking for something simple to code, not necessarily the most efficient implementation I could find):
This is, of course, naturally coded as a recursive implementation, and the intermediate states are maintained on the stack so making it quite efficient to back out of any search node. Step 1 is straightforward - I use a cell containing 0 to be empty:
//find the first empty cell int x = 0; int y = 0; endfindzero: for (; x < 9; x++) for (y = 0; y < 9; y++) if (grid[x][y] == 0) break endfindzero;
Step 2, finding the smallest unused number for that cell, is a little bit more complicated, but not a lot. I elect to use a simple check against every integer starting from 1, e.g. given an integer N, the minimum unused number available for that cell is obtained with the following simple loop - just a simple check that the number is unused, coupled with a loop iteration:
public static int minAbove(int n1, int n2, int n3, int n4, int n5, int n6, int n7, int n8, int n9, int startingN) { int n = startingN; while(n < 10) { if (n1 != n && n2 != n && n3 != n && n4 != n && n5 != n && n6 != n && n7 != n && n8 != n && n9 != n) return n; else n++; } return 0; }
Then given any (0-offset) cell coordinate, the minimums for the enclosing square, row and columns are given by
int r = (x/3)*3; //getting the initial row coordinate for the enclosing square using integer division int c = (y/3)*3; //same for column int squareMin = minAbove(grid[r][c], grid[r][c+1], grid[r][c+2], grid[r+1][c], grid[r+1][c+1], grid[r+1][c+2], grid[r+2][c], grid[r+2][c+1], grid[r+2][c+2], startingN); int rowMin = minAbove(grid[x][0], grid[x][1], grid[x][2], grid[x][3], grid[x][4], grid[x][5], grid[x][6], grid[x][7], grid[x][8], startingN); int columnMin = minAbove(grid[0][y], grid[1][y], grid[2][y], grid[3][y], grid[4][y], grid[5][y], grid[6][y], grid[7][y], grid[8][y], startingN);
And now it's simple enough to check that either columnMin == rowMin == squareMin and they are all not 0, in which case we have that minimum we are looking for, or if not we can simply set startingN to one of the minimums found (ideally the largest) plus one, and then try the same check again until we find the minimum (or none can be found). Then Step 3 or 4 is followed quite easily.
So that is a fairly straightforward implementation, and testing it showed that it was pretty fast, taking microseconds to solve any of the puzzles I tried (all puzzles that I had manually solved in the weeks prior to implementing this). This is not the last word on this though - see the later section on The Data Set.
But I'm not looking for a Su Doku solver, I'm interested in whether the techniques a person uses is more efficient than a depth-first search. So I'll need to implement some of the rules people use to solve Su Doku puzzles. There are a few fairly simple rules, and more extensive set of quite complex ones (see this Times Newspaper article on how to solve super fiendish puzzles if you are interested in the more complex rules).
A set of the simpler rules that are easy to code for are:
Obviously these rules cannot be applied straightforwardly to the 2-dimensional array we used for the depth-first search. For a start each cell would need to have an associated list of numbers which denote the possible values the cell can hold, and that list needs to be a variable size. As you can see, applying the rules will require a more sophisticated set of classes and methods than the much simpler depth-first search already implemented. My own implementation ended up with seven classes: Grid; CellSet and its subclasses Colum, Row and Square; Cell; and NumberPair (helpful for handling that "pairwise" rule, the third rule listed above), but of course there are many possible implementations.
The first interesting thing I found on testing this second implementation is that it completely solves Su Doku puzzles of all levels up to Fiendish (some Fiendish level ones can be solved with this, others can't). Extra Fiendish and Super Fiendish tend to be beyond its capabilities. That was actually quite a surprise to me, that such a simple rule set was sufficient to solve a number of puzzles that I thought needed more sophisticated rules.
The next interesting thing was comparing the performance of the two solvers. Note that the performance of both implementations are perfectly adequate for actually solving a puzzle (completing in under a milliseconds without initialization costs) so the performance comparison is only to see the cost or benefit of the rules. And that performance comparison shows the rules-based solver was several hundred times slower than the simple depth-first solver. Of course some of this is the added complexity and the creation and freeing of objects, but even so that seems excessive. So I profiled the solutions using the simple -Xprof profiler, which is ideal for profiling a tiny program like this.
1. using -Xprof Grid.init shows as 66% of the time Compiled + native Method 66.2% 0 + 1009 Grid.<init> 17.4% 265 + 0 java.util.ArrayList.remove
The results of the profile showed that the time is mostly in creating my Grid objects, which is an artefact of the microbenchmark. To get measurable times and eliminate dynamic compilation overheads I loop several thousand times solving a set of puzzles in the microbenchmark, so its relatively easy to change the microbenchmark and reuse the Grid objects. This changes the timings slightly, but now the bottleneck has moved over to management of the list of numbers associated to each cell, and clearly, for this implementation that management swamps any possible speedup there may be from applying the rules.
Now more obvious that the overhead of the variuos elimination algorithms is the cost 50.9% 248 + 0 java.util.ArrayList.remove 12.5% 61 + 0 CellSet.eliminateSingleNumberExceptFrom 6.4% 31 + 0 java.util.ArrayList.get 5.3% 10 + 16 CellSet.eliminateSimplePairs 3.9% 19 + 0 CellSet.eliminateSingleNumbers 1.6% 8 + 0 java.util.HashMap$HashIterator.<init> 1.6% 8 + 0 java.util.HashMap.put 1.6% 5 + 3 java.util.HashMap.newKeyIterator 1.2% 6 + 0 java.util.HashMap.get 1.0% 2 + 3 SudokuSolver.solveUsingDepthFirstSearch
Of course I could go through multiple more cycles of tuning, starting with an implementation of List.remove() tuned for this program, or I could start again from the int[][] structure and come up with an efficient set of lists for each cell (such as a nine element boolean array for each cell indicating whether a number was available or not for that cell), but frankly I can't be bothered.
The result of all that was exactly the opposite of what I expected. The simplest brute force search implementation was significantly more efficient than the simplest rules based search, a result that goes completely contrary to my previous experience in searching search spaces of this size. My first thought is that maybe the search space is smaller than I initially thought, but a quick analysis of the search space shows that it is indeed large.
After years of performance tuning, my next thought is that there is something wrong with my benchmark. There are two types of common problems with microbenchmarks like this. The first is that the code is not measuring what you think it is measuring. This is typical, you forget to measure the cost of hotspot kicking in, or the harness is causing an issue (as it was until I fixed that this time). But I've eliminated this first issue as a possibility by profiling the slower rules based approach which showed me that this was fine (after fixing the harness), and by verifying the answers that the brute force approach gives me.
The second common problem for benchmarks being wrong is that you don't use appropriate or correct data. This is a very common reason for large applications using databases to have performance issues in production even where the functionality is fully tested in development - the test dataset used is often too small in critical areas. In my case, I've used a few puzzles that I've tried in the last few weeks, and although they seem representative to me, maybe they aren't. And indeed, a quick search of the articles about Su Doku produces this "Near worst case" sudoku puzzle for brute force solver from Wikipedia.
Throwing this at my two solvers produces a complete reversal in the solving times. I'm surprised to see the rules based solver solving this puzzle completely - remember it doesn't solve all puzzles completely, only those that can be solved using the four simple rules listed above (for puzzles it can't solve, it produces the partial solution that it had managed). And it solved it in about 50 milliseconds, and that includes initialization and JIT compilation! On the other hand, the brute force solver takes much much longer. I can improve it's times too, but again I can't be bothered to, I've discovered what I wanted.
So finally, I found the fascinating result that so few rules were all you need to apply to solve all but the most challenging Su Doku puzzles. I was really surprised to see that the simplest set of rules can solve such a range of Su Doku puzzles, I was expecting to have to implement at least one of the more sophisticated rules to get the rules solver to be this capable. And almost as interesting was the way quite a simple benchmark produced all the classic benchmarking traps.
For those of you interested in a more extensive set of rules for a Su Doku solver, there is, of course, an open source Java Su Doku solver.