Knight's Tour - A JVM thread performance benchmark

History

The applicability of the Knight's Tour (KT) problem as an algorithmic programming exercise is long known and many solutions exist. A few  years ago when Java multithreading support was in it's infancy I was looking at architecting a multithreaded solution and was considering Java. I needed something which could very simply evaluate the efficiency of the JVM at running such threads (and also to evaluate the then competing thread models). I chose the KT as it is suitable for solution with a multithreaded algorithm. It need not be dependant upon IO (you don't need to write out the solutions) and hence makes an effective evaluation tool.

The KT problem (just in case you are not familiar with it) involves finding a path by which a knight can visit every square on the chess board exactly once, returning to the square it originally started from. Very many such routes exist but they are difficult to discover by hand.
 

Algorithm

The algorithm is essentially very simple. A 2 dimensional array represents the x by x chess board (we don't have to limit ourselves to 8 by 8). The moves which the knight makes are recorded into this array, it having been being initialized with zeros. As it  moves through the tour we evaluate which jumps the knight can make next to as yet un-visited squares. We record the sequence number of the move into the appropriate element of the array as we jump into a particular square. From any one square there will be a maximum of 8 squares to which the knight can jump, this number will be restricted by the edges of the board and by squares which have previously been visited, (naturally the square from which the knight jumped to reach here will always be already taken). The set of potential moves numbered for the order in which we evaluate them is shown in the matrix below.
 
.
3
.
4
.
1 . . . 6
. . K . .
0 . . . 7
. 2 . 5 .

Knight's 8 potential moves as defined in the algorithm's MoveDef object

We evaluate each potential move in turn. Having recorded a new move into the array we recursively call our Move method to commence evaluation of  the moves now open to us from the new square. If a move turns out to be a dead-end then we simply exit the current Move method and we fall back into the underlying previous move position. The Move method when returned to clears the last (now finished branch) move and continues to search for the next one. In this way we build up and tear down the solution search tree. We know that we have found a possible solution when our search depth (or jumpnum) reaches the number of squares on the board. However we need to confirm that the knight could reach the initial square from the final square so that we can say that it has completed the tour. The Move method then exits so that further possible solutions can be explored from branches below.

Not all solutions are unique as there will always be identical solutions around all the lines of symmetry; horizontal, vertical, diagonal and rotational. We can use this feature to avoid undertaking some unnecessary early search branches. We should always start the search from one corner (0, 0). Now from a corner square there are only two possible initial jumps, labelled 5 and 7 in the diagram above (the solution gets rendered in the SE quadrant). Given that we would only repeat solutions around a diagonal line of symmetry if we search with both move 5 and 7 as an initial move we can reduce the solution time by half if we only ever commence with move 5. An implication of this is that the last move of the tour must always be to the square that we would end up on if we make move 7 from (0,0).  So we can further enhance the algorithm by reserving this square (2, 1) so that solution trees  don't  use this square until  the last ever move. This optimization gives a 5 fold improvement in overall solution time.

Although we make an attempt at avoiding duplication of solutions from diagonal symmetry (because it works in our favour with regards to solution time) we are not protected from accidentally reproducing solutions which are in reality  repeats of previous solutions about a particular line of symmetry; horizontal, vertical, the other diagonal or rotational. It would be possible to store all unique solutions as they are discovered  and to compare with each 'normalized' new one to confirm that it is indeed unique. The algorithm doesn't do this at present it, there isn't really much need.

There is a further technique for pruning search trees at a point at which we make a move that clearly blocks a solution path.  For every un-visited square which we could have reached with this jump, but didn't because we made a different move, we consider if an imaginary knight situated on these squares can now reach another vacant square. If it can not then clearly this un-visited square is now isolated because if the imaginary knight can't get out, none can get in. Hence this square has been isolated from the rest of the board by this move and consequently this particular solution tree is now doomed. We are not going to catch all doomed solution trees this way because we only detect isolated squares not isolated regions. To detect regions would just bloat the solution time too much. In fact this optimization is of marginal value because for each jump it adds an extra 2 levels of jump evaluation (up to 42 extra squares to consider). We have to scan for the other squares we could have reached (which is (8 - the actual square we went to - the square we came in from) * (8 - the square we had been on -> 6x7 = 42). In fact as we only re-discover that the square we had been on is used by inspection we have 6x8 = 48 extra squares to consider. So if the algorithm for pruning found no solutions to deprecate the algorithm would run about 50 times slower. For a 6x6 chess board the following is a list of how many solution branches get pruned at each level with this algorithm (depth and branches pruned).
 

0
0
6
0
12
 595
18
 636593
24
 11649971
30
 2761306
1
0
7
0
13
 4812
19
 2262119
25
 19722400
31
 1257816
2
0
8
0
14
 9429
20
 2634882
26
 12281111
32
 541492
3
0
9
9
15
 57553
21
7338566
27
 15476755
33
 53512
4
0
10
20
16
 97136
22
 6934413
28
 7737518 
34
 48652
5
0
11
266
17
 447541
23
 15166038 
29
 6661734 
35
 0

We therefore need to restrict the range in which we run the pruning test. Early pruning is advantageous because it removes otherwise long branches, however it can be seen that there are not many low level pruning opportunities. There are very many pruning opportunities in the middle depth range, except there are very many iterations run at these levels so the overhead of running the test is greatly multiplied. There are two parameters hard coded in the source Glb.PRUNE_START & Glb.PRUNE_END, adjusting these in conjunction with different javac optimization techniques will give different levels of  performance improvement (not all beneficial).
 
 

Multi-Threading

Java allows for multithreaded applications to be designed. The Knight's Tour problem is ideal for multithreading. Consider that we are evaluating potential moves for a knight at a particular depth. We can kick off a separate thread to evaluate an individual jump for us. When the application is started the number of threads which are to be run by the application can be specified, else the application defaults to running on 8 threads;
                  java -native KnightsTour $Num_Threads > $File_Out

The threads are managed by an object called ThreadManager. The ThreadManager acts as a synchronization object for all thread interactions. To ensure that we don't corrupt the output file by writing two solutions simultaneously even the PrintBoard() method has to be implemented in this object. The assignment of threads to tasks is handled by 2 methods within this object. ReportForWork() is called by threads which have exhausted the solution branch that they had been following and have consequently fallen out of the bottom of the recursive Move() method stack. HireHelp() is called by threads which are exploring new solution branches and have had one or more out of work threads attached to them.

In ReportForWork() a newly redundant thread scans the currentDepth of all threads to find out which one is functioning at the lowest depth. It chooses the thread currently working at the lowest depth because that shall be the most likely able to offer it the longest career opportunity. The thread adds it's self to the linked list of other optimistic helper threads which may have attached themselves to this particular thread, then calls wait. When a thread which is busy evaluating moves discovers that a thread has tagged itself to it then instead of making the next move itself it calls HireHelp() to obtain a thread to do it for it. HireHelp() pulls the first thread off the helperThread list. It copies it's current private view of the chess board over to the new thread's private copy and pre-sets the square it wants the thread to start at. Each thread has a KnightsMove class of it's own for such private state storage. The hiring thread then calls the new hire's inbuilt interrupt() method to wake the thread up. If a thread with tagged helpers hits a roadblock then it will donate all it's own helpers to the thread which it itself nominates to join. When on a call to ReportForWork() it is discovered that all threads are laid up we know that the solution search is finished and that the entire program should terminate.
 
 

Source Code

The source code provided has been configured for a 6x6 chessboard so that initial use will terminate in reasonable time. The pruning parameters have been tweaked such that this part of the algorithm is not run at any depth. This appears the best way for distribution to as yet unknown machine types and specifications. I have left in some diagnostic messages (non-indented lines) to give a feel for how successful the various threads are at finding solutions.

If you are able to make any improvements to this code or find any bugs please let me know. Also I would be interested in any feedback with regards to execution speed and the best pruning parameters for different classes of machine or JVM. If I get a significantly varied set of data points I shall publish them.

The source code is supplied to be used by the individuals who download it from this site and may not be redistributed for profit. The source may be  modified or redistributed for any legal non-profit purpose provided reference to the original authorship is maintained.

As is traditional - ENJOY.

James Radley (July 2002).

[ Home ]