Cgen
CGen is a tool for encoding SHA-1 and SHA-256 hash functions into CNF in DIMACS format, also into ANF polynominal system in PolyBoRi output format.
Install / Use
/learn @vsklad/CgenREADME
CGen
CGen is a tool for encoding SHA-1 and SHA-256 hash functions into CNF in DIMACS format, also into ANF polynominal system in PolyBoRi output format. The tool produces compact optimized encodings. The tool allows manipulating CNF encodings further via assignment of variable values and subsequent optimization.
- Project page: https://cgen.sophisticatedways.net.
- Source code is published under MIT license.
- Source code is available on GitHub: https://github.com/vsklad/cgen.
The current published version is 1.2. See Change Log for details.
Description
CGen is built to make it easier to analyse SAT problems through manipulation of their CNF (DIMACS) and ANF (PolyBoRi) representations. SHA-1/SHA-2 (SHA) encodings are chosen as a reference/example implementation. The tool produces SHA encodings with different assignments of both message and hash values. Design of the tool is not limited to these algorithms.
Beyond encoding, CGen implements a set of CNF pre-processing techniques which are applied after assigning of variable values. The techniques include unit propagation, equivalences reasoning, limited binary clauses resolution and removal of subsumed clauses. This way, CGen is a simple SAT preprocessor similar to SatELite and Coprocessor. CGen does not implement any variant of DPLL algorithm and does not make any decisions with respect to variable values.
CGen implements three notable features.
Compact encoding
The encoding is optimized to minimize both number of variables and number of clauses/equations. The assumption is that without redundancies, it may be easier to analyse the problem's complexity and structure. Application of the below techniques results in substantially more compact encodings than those published to date and known to the author. As an example, full SHA-1 is encoded into CNF with 26,156 variables and 127,200 clauses.
- New variables are introduced only when necessary. For example, rotation, negation, operations where all arguments are constants, require no additional variables.
- All boolean algebra operations/primitive functions are encoded in the most efficient possible way.
- For CNF, every combination of bitwise n-nary addition is statically optimized using Espresso Logic Minimizer, as a set of pre-defined clauses. This includes simplifications when one or more operands are constants.
- For boolean algebra operations, any resultant constant values are optimized away during encoding. For example, (x ^ ~x ^ y) is encoded as (y) without any new clauses or variables generated.
Optional simplification can be applied to any CNF formula including one encoded by the tool, particularly after assigning variable values:
- Elimination of tautological and duplicated clauses
- Unit propagation, equivalences reasoning and limited binary clauses resolution executed while assigning variables in an existing encoding
- Removal of subsumed clauses
- Elimination of gaps in variable numbering
Named variables
A set of literals and constants can be grouped and given a name. For example, hash value is a set of 160 literals and constants, depending on the encoding parameters. The value of such "named variable" may change throughout subsequent manipulations. CGen tracks these changes. Further, CGen looks for internal structure when a large number of binary variables is grouped together, to produce the most compact representation. For example, {1/32/1}/16/32 describes a set of 16 32-bit words with 512 variables corresponding sequentially to each bit. These descriptions are stored within the DIMACS file as comments of particular format.
In particular, named variables are specified as a combination of:
- A 1- or 2-dimentional array of CNF literals and constants. Second dimention allows for mapping arrays of words (useful for SHA algorithms).
- Sequences of literals and constants where diference (step) between elements is the same.
- Sequences of binary constants which are grouped into binary and hexadecimal numbers.
For a SHA encoding the tool defines all variables referenced in the algorithm. Out of those M is for message and H is for the hash value.
Assignment of parameters
CGen allows defining and assigning named variables via command line. For a new encoding, SHA message if specified, becomes embedded into it. In all other instances the tool performs basic validation of the input, then applies the implemented pre-processing techniques to optimise the encoding. Alternatively, it is possible to define named variables by editing the DIMACS file manually.
In particular, CGen supports:
- Use of binary, hexadecimal and character-sequence constants (with bits mapped to binary variables).
- Setting of specific bits/binary variables of the named variable.
- Assigning random values to a subset of randomly chosen binary variables.
- Padding of 1-block SHA messages.
- Computing hash value given a constant message, then assigning it along with some (or without) bits of the message fixed.
Furthermore, it is possible to combine or sequence multiple assignments of the same named variable within the same encoding, running the tool several times with different parameters and analysing the results.
Usage
Build
Source code is hosted at GitHub. To obtain and build:
git clone https://github.com/vsklad/cgen
make
CGen has no external dependencies other than C++ STL. C++ 11 is a requirement. The makefile relies on GNU g++ compiler. Verified with GCC 4.8.4, GCC 4.9.2 and Make 3.81.
Run
Launch "./cgen" for OSX/Linux or "cgen.exe" for Windows.
Command Line
The tool supports the following parameters:
cgen encode (SHA1|SHA256) [-r <rounds>] [-v <name> <value>]... [<encoder_options>] [<output_options>] [<output_file_name>]
cgen process [<variable>]... [<output_options>] <input_file_name> [<output_file_name>]
cgen --help
cgen --version
Commands
encode (SHA1|SHA256) - generate the encoding and
save it in the chosen format (-f option) to <output_file_name>;
can generate the encoding for specified number of rounds (-r option);
also, assign the message and hash values fully or partially (-v option);
if <output_file_name> is omitted, a default file name is assumed
process - process and existing CNF/DIMACS or ANF file
read the formula from <input_file_name>,
assign existing variables if specified then simplify the formula and save it to <output_file_name>;
for an existing variable, its spefification is updated or replaced (see -v replace);
otherwise if the specified variable does not exist, its definition is added;
this command accepts any CNF/DIMACS files, not necessarily produced by the tool;
if <output_file_name> is omitted, the resulting CNF is not saved;
this can be used to compute values of named variables
Options
<variable> = -v <variable_name> <variable_value> | -v<variable_name>=<variable_value>
specification of a named variable (template) or a single binary variable
named variable is a sequence of literals and constants represented with bits
its template maps to binary variables and/or constant values
named variable definitions are recorded in custom "var" statements
in comments within the output (e.g. DIMACS) file
<variable_name> is the name of the variable, case sensitive
if a decimal number, treated as a binary variable number
must be a valid binary variable number from the formula
otherwise, treated as a named variable name
may contain characters '0'..'1', 'a'..'z', 'A'..'Z', '_'
with first symbol may not a digit
an option with the particular variable may only appear once
<variable_value> is the variable value or specification
constant binary variables may be grouped into bytes
and further represented as a string, a hexadecinal or binary number
sequential representation can be further optimized as below
if a value includes spaces, it must surrounded with double-quotes
to allow the shell process it as a single argument
supported specifications:
<value> = ( <value_data> | random | ( compute [<compute_options>] )) [<except>]
<value_data> = ( <value_template> | <str_value> ) [<pad>] [ replace ]
<value_template> = (<literal> | <hex_value> | <bin_value> | <non_value>)[<sequence_clause>]
<value_template> = "{" <variable_template> ["," <variable_template>]..."}"[<sequence_clause>]
defines a named variable by describing a set of literals and constants
<sequence_clause> = "/"<count>["/"<step>]
defines a sequence of <count> elements where the first item is the specifie
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
