Computes the universal T&E-classification of Sudoku puzzles and their B, BxB or BxBB sub-classifications
The development of the SHC was based on the same references as CSP-Rules or SudoRules (recalled at the end) but the implementations were totally independent.
All the classification results of the SHC and of SudoRules completely coincide on the large collections of puzzles used to compare them.
The SHC is much faster than SudoRules for the functionalities it implements.
As shown by recent discoveries of hard puzzles, the T&E-depth and BxB parts have been very useful in the search for the "hardest" puzzles.
For full details about the concepts and theories underlying the SHC, see [CRT] or [PBCS].
For a much shorter (almost) self-contained introduction to them and for the analysis of some of the SHC results, see [HCCS].
WARNING 1: Due to possible confusions with the BpB rating, what was previously called the BpB classification is now called the BxB classification. Similarly, what was previously called the BpBB classification is now called the BxBB classification. Corresponding name changes have therefore been made in the command line. Nothing else has changed.
WARNING 2: This new version of the command line abstract syntax is significantly different from version 6.1: it is simpler and it allows using sukakus as inputs. All the commands that were previoulsy valid remain valid - provided that they were consistent (-examples and -puzzle in the same command line is not consistent).
examples/B-input.txt
/B-output.txt
/B-output-expected.txt
/B-messages.txt
/BxB-input.tx
/BxB-output.txt
/BxB-output-expected.txt
/BxB-messages.txt
/BxBB-input.txt
/BxBB-output.txt
/BxBB-output-expected.txt
/BxBB-messages.txt
/TE-input.txt
/TE-output.txt
/TE-output-expected.txt
/TE-messages.txt
input.txt
LICENSE
output.txt
README.txt
SHC.jar
Expand.jar
As SHC is a Java program, you must first make sure you have Java installed on your machine.
The executable version of the SHC is a typical .jar file, SHC.jar, in the SHC root directory. As such, it is launched in that directory, in a standard way, by the following command line:
<java> -jar SHC.jar <CLASSIF> (<PUZZLE-TYPE>) <TECHNICAL-OPTIONS> (<INOUT-SPECS>)
where, as usual in abstract syntax descriptions:
▸ any part withing chevrons (i.e. < and > ) must be expanded;
▸ := means the left-hand side must be expanded according to the right-hand side;
▸ parts within parentheses are optional;
▸ a - sign at the start of a symbol means that it is a keyword;
▸ a star at the end of a part means "repeat this part any number of times (possibly 0)";
▸ a vertical bar (|) on the right-hand side of a definition means OR.
Note : a variant is allowed, in which <TECHNICAL-OPTIONS> and (<INOUT-SPECS>) are permuted.
▸ <java> := java | java.exe
(depending on your operating system)
▸ <CLASSIF> := version | TE-depth | B | BxB | BxBB; this is a mandatory choice, where you decide which of the four available classifications will be applied to the selection of puzzles defined by the <INOUT-SPECS>; (version is considered as the "empty classification"; it only outputs the SHC version number);
▸ <PUZZLE-TYPE> := sudoku | sukaku
Default value is sudoku.
▸ <TECHNICAL-OPTIONS> := <technical-option>*
▸ <technical-option> := -erase <erase> | -auto-end <auto-end> | -max-time <max-time> | -max-length <max-length> | -buffer-size <buffer-size>.
▸ <erase> := true | false;
By default, <erase> = true and the output file is emptied before writing new results to it;
you may change this behaviour by specifying "-erase false"; this will allow you to recover your previous calculations in case you forgot to copy them to another file; each time the SHC is launched with this option set to false, a title line recalling which computations follow will be added before the results;
(however, as previously mentioned, if -examples is selected, <erase> is automatically set to true and can't be changed; the reason is to allow easy comparisons with the expected results).
▸ <auto-end> := true | false;
By default, <auto-end> is true and the process fully terminates at the end of the computations; set <auto-end> to false if you want to recover the behaviour of the first releases (preventing the Windows console to close at the end of the computations, so that you have time to check the messages in the console); this option can be completely ignored by non-Windows users; having <auto-end> true by default will allow to more easily include SHC in scripts (in particular in the search for hard puzzles);
▸ <max-time> is an integer, the maximum time (with default value infinite), in minutes, allocated to the computation of all the puzzles in the input file. Warning to users of old releases: this time limit no longer applies to individual puzzles in the input file.
The following two options should be present only in case <CLASSIF> = B, BxB or BxBB; they will be recalled at the start of the program:
▸ <max-length> is an integer, the maximal length allowed for braids, with default value 8 in case <CLASSIF> = B, 14 in case <CLASSIF> = BxB and 5 in case <CLASSIF> = BxBB; the purpose is to avoid too long calculations of the B rating for very hard puzzles in T&E(1), but to leave a wide margin of possibilities for the BxB classification, allowing to find extreme T&E(2) puzzles up to the highest known ones (i.e. up to B14B); similarly, the largest known value for BxBB is 2, so that 5 leaves a large margin for new possibilities; note that pre-checking that the puzzle(s) is (are) in T&E(1), T&E(2) or T&E(3) is under the user’s responsibility;
▸ <buffer-size> is an integer defining the maximum number of partial braids the program can store; default value is 1,000,000 if <CLASSIF>= B or BxB and 30,000 if <CLASSIF>= BxBB; change it only if it is too small.
▸ <INOUT-SPECS> := -examples | <inout-files> | -puzzle <individual-puzzle>
The absence of any <INOUT-SPECS> means that the default values for <input-file> and <output-file> will be used; they are respectively the “input.txt” and “output.txt” files of the SHC root folder;
▸ -examples: if the <INOUT-SPECS> is -examples, the predefined examples for the classification previously chosen will be run; the input and output files for the examples are predefined and can't be changed: they are the xxx-input and xxx-output files of the "examples" folder, where xxx = <CLASSIF>; adapted specific values are automatically chosen for the technical-options and cannot be changed, except possibly -auto-end;
▸ <inout-files> := <input> <output>;
▸ <input> := -input <input-file>;
▸ <output> := -output <output-file>;
▸ <output-file> := path to a file; it allows to specify which output file will be used for the results (classifications);
▸ <input-file> := path to a file; it allows to specify which input file will be used for the data (sudokus or sukakus, depending on <PUZZLE-TYPE>);
Each line of <input-file> must start with a <sudoku-chain> if <PUZZLE-TYPE> is sudoku and a <sukaku-chain> if <PUZZLE-TYPE> is sukaku;
each line may contain additonal information (e.g. SER, creator...) after the mandatory chain, as is usual in sudoku puzzle collections; it will not be taken into account;
▸ <sudoku-chain> := sudoku puzzle in the standard 81-character chain format, with each character an element of {1 2 3 4 5 6 7 8 9 0 .};
▸ <sukaku-chain> := sukaku puzzle in the standard 729-character chain format, with each character an element of {1 2 3 4 5 6 7 8 9 0 .}.
▸ <individual-puzzle> := <sudoku-chain+> | <sukaku-chain+> | <sukaku-file>
▸ <sudoku-chain+> := <sudoku-chain><comments> (with no space between the two);
▸ <sukaku-chain+> := <sukaku-chain><comments> (with no space between the two);
▸ <comments> := any chain with no space and no semi-colon;
- <sudoku-chain+> is allowed only if <PUZZLE-TYPE> is sudoku;
- <sukaku-chain+> or <sukaku-file> are allowed only if <PUZZLE-TYPE> is sukaku;
- in the same way as <sudoku-chain> and <sukaku-chain> may be followed by additional information in each line of input-files, additional useless information (<comments>) may appear in the command line between the <sudoku-chain> or <sukaku-chain> and the end of line (or the next option in casse the variant of the syntax is used); however now, for technical reasons related to the operating system, no space or semi-colon may appear in ; underscores may be used instead; in any case, only the 81 or 729 first characters of <sudoku-chain> or <sukaku-chain> are taken into account.
▸ <sukaku-file> := path to a text file with very specific content; allows to deal with almost all the "markings" appearing in sukakus on Sudoku forums:
- the file must have 9 "useful" lines, each containing 9 "cells", where a "useful" line is defined as a line containing at least a digit form {1 2 3 4 5 6 7 8 9};
- a "useful" line must be composed of 9 "cells" with any number of separators between them;
- a separator is either a space or a member of the 3-element set {| ! :}
- each cell is a chain (without surrounding double quotes) containing at least one digit from {1 2 3 4 5 6 7 8 9} plus any number of "markers" from the sets {a ...z}, {A...Z} and {# & @ $ ( = + - * ^ % ) _ , ; /}; "markers" are widely used on Sudoku forums to refer to patterns.
For any puzzle, the result of each of the four classifications is an integer, normally positive or null.
However, it may take negative values in the following cases (with corresponding remediations):
-1 this puzzle is not in the right format, or it has no solution, or it has several solutions; check your puzzle;
-2 after the change of meaning for <max-time>, this value will no longer appear;
-3 <buffer-size> is too small for this puzzle; try increasing it;
-4 other problems encountered, such as:
<CLASSIF> is not relevant for this puzzle (e.g. applying the BxB classification to a puzzle in T&E(3)); check the T&E-depth of this puzzle;
<max-length> is too small for this puzzle (in case <CLASSIF> = B); try increasing it; note that if you increase <max-length>, you may also have to increase <buffer-size>.
Notice that, in the <CLASSIF> = B case, what is computed is actually a truncated B rating, with all the values above <max-length> leading to output -4.
This is justified for two reasons:
▸ in order to keep computation times and memory requirements within reasonable bounds when default values are used;
▸ because, in unbiased statistics, puzzles with B rating greater than 8 are rare.
In the SHC view, the main purpose of the B rating is to provide a rough sub-classification of T&E(1). Puzzles in T&E(1) that are beyond the chosen upper bound for chains length are considered as "exceptional".
The examples folder contains four collections of puzzles, each adapted to one of the four classifications computed by the SHC.
Each collection is a small part of the large collection of puzzles used by Denis Berthier to compare the SudoRules and SHC results (which also provides a cross validation for both, as they were indepedently implemented in totally different ways).
They are used to illustrate the results one can obtain with the SHC and to give an idea of the computation times one may expect.
For details about the selection of examples, see [HCCS].
A new functionality has been added to the SHC, considering the needs of the search for the hardest puzzles: expansion by Singles. Expand is launched in the SHC root directory and its nyntax is as follows:
java -jar Expand.jar (-input <input-file>) (-output <output-file>) (-erase <erase>) (-auto-end <auto-end>) (-puzzle <puzzle>)
with all the options working the same way as in SHC.jar.
[CRT]: BERTHIER D., Constraint Resolution Theories, Lulu.com Publishers, November 2011.
[HCCS1]: BERTHIER D., Hierarchical Classifications in Constraint Satisfaction, Lulu Press, October 2023.
[HCCS2]: BERTHIER D., Hierarchical Classifications in Constraint Satisfaction (Second Edition), Lulu Press, July 2024.
[HCCS]: any of [HCCS1] or [HCCS2].
[PBCS1]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (First Edition), Lulu Press, November 2012.
[PBCS2]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (Second Edition), Lulu Press, July 2015.
[PBCS3]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (Third Edition), Lulu Press, Novembre 2021.
[PBCS]: any of [PBCS1], [PBCS2] or [PBCS3].