SkillAgentSearch skills...

Cgen

CGen is a tool for encoding SHA-1 and SHA-256 hash functions into CNF in DIMACS format, also into ANF polynominal system in PolyBoRi output format.

Install / Use

/learn @vsklad/Cgen
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CGen

CGen is a tool for encoding SHA-1 and SHA-256 hash functions into CNF in DIMACS format, also into ANF polynominal system in PolyBoRi output format. The tool produces compact optimized encodings. The tool allows manipulating CNF encodings further via assignment of variable values and subsequent optimization.

The current published version is 1.2. See Change Log for details.

Build Status

Description

CGen is built to make it easier to analyse SAT problems through manipulation of their CNF (DIMACS) and ANF (PolyBoRi) representations. SHA-1/SHA-2 (SHA) encodings are chosen as a reference/example implementation. The tool produces SHA encodings with different assignments of both message and hash values. Design of the tool is not limited to these algorithms.

Beyond encoding, CGen implements a set of CNF pre-processing techniques which are applied after assigning of variable values. The techniques include unit propagation, equivalences reasoning, limited binary clauses resolution and removal of subsumed clauses. This way, CGen is a simple SAT preprocessor similar to SatELite and Coprocessor. CGen does not implement any variant of DPLL algorithm and does not make any decisions with respect to variable values.

CGen implements three notable features.

Compact encoding

The encoding is optimized to minimize both number of variables and number of clauses/equations. The assumption is that without redundancies, it may be easier to analyse the problem's complexity and structure. Application of the below techniques results in substantially more compact encodings than those published to date and known to the author. As an example, full SHA-1 is encoded into CNF with 26,156 variables and 127,200 clauses.

  1. New variables are introduced only when necessary. For example, rotation, negation, operations where all arguments are constants, require no additional variables.
  2. All boolean algebra operations/primitive functions are encoded in the most efficient possible way.
  3. For CNF, every combination of bitwise n-nary addition is statically optimized using Espresso Logic Minimizer, as a set of pre-defined clauses. This includes simplifications when one or more operands are constants.
  4. For boolean algebra operations, any resultant constant values are optimized away during encoding. For example, (x ^ ~x ^ y) is encoded as (y) without any new clauses or variables generated.

Optional simplification can be applied to any CNF formula including one encoded by the tool, particularly after assigning variable values:

  1. Elimination of tautological and duplicated clauses
  2. Unit propagation, equivalences reasoning and limited binary clauses resolution executed while assigning variables in an existing encoding
  3. Removal of subsumed clauses
  4. Elimination of gaps in variable numbering

Named variables

A set of literals and constants can be grouped and given a name. For example, hash value is a set of 160 literals and constants, depending on the encoding parameters. The value of such "named variable" may change throughout subsequent manipulations. CGen tracks these changes. Further, CGen looks for internal structure when a large number of binary variables is grouped together, to produce the most compact representation. For example, {1/32/1}/16/32 describes a set of 16 32-bit words with 512 variables corresponding sequentially to each bit. These descriptions are stored within the DIMACS file as comments of particular format.

In particular, named variables are specified as a combination of:

  1. A 1- or 2-dimentional array of CNF literals and constants. Second dimention allows for mapping arrays of words (useful for SHA algorithms).
  2. Sequences of literals and constants where diference (step) between elements is the same.
  3. Sequences of binary constants which are grouped into binary and hexadecimal numbers.

For a SHA encoding the tool defines all variables referenced in the algorithm. Out of those M is for message and H is for the hash value.

Assignment of parameters

CGen allows defining and assigning named variables via command line. For a new encoding, SHA message if specified, becomes embedded into it. In all other instances the tool performs basic validation of the input, then applies the implemented pre-processing techniques to optimise the encoding. Alternatively, it is possible to define named variables by editing the DIMACS file manually.

In particular, CGen supports:

  1. Use of binary, hexadecimal and character-sequence constants (with bits mapped to binary variables).
  2. Setting of specific bits/binary variables of the named variable.
  3. Assigning random values to a subset of randomly chosen binary variables.
  4. Padding of 1-block SHA messages.
  5. Computing hash value given a constant message, then assigning it along with some (or without) bits of the message fixed.

Furthermore, it is possible to combine or sequence multiple assignments of the same named variable within the same encoding, running the tool several times with different parameters and analysing the results.

Usage

Build

Source code is hosted at GitHub. To obtain and build:

git clone https://github.com/vsklad/cgen
make

CGen has no external dependencies other than C++ STL. C++ 11 is a requirement. The makefile relies on GNU g++ compiler. Verified with GCC 4.8.4, GCC 4.9.2 and Make 3.81.

Run

Launch "./cgen" for OSX/Linux or "cgen.exe" for Windows.

Command Line

The tool supports the following parameters:

cgen encode (SHA1|SHA256) [-r <rounds>] [-v <name> <value>]... [<encoder_options>] [<output_options>] [<output_file_name>]
cgen process [<variable>]... [<output_options>] <input_file_name> [<output_file_name>]
cgen --help
cgen --version

Commands

    encode (SHA1|SHA256) - generate the encoding and 
        save it in the chosen format (-f option) to <output_file_name>;
        can generate the encoding for specified number of rounds (-r option);
        also, assign the message and hash values fully or partially (-v option);
        if <output_file_name> is omitted, a default file name is assumed

    process - process and existing CNF/DIMACS or ANF file
        read the formula from <input_file_name>,
        assign existing variables if specified then simplify the formula and save it to <output_file_name>;
        for an existing variable, its spefification is updated or replaced (see -v replace);
        otherwise if the specified variable does not exist, its definition is added;
        this command accepts any CNF/DIMACS files, not necessarily produced by the tool;
        if <output_file_name> is omitted, the resulting CNF is not saved;
            this can be used to compute values of named variables
        

Options

    <variable> = -v <variable_name> <variable_value> | -v<variable_name>=<variable_value>

        specification of a named variable (template) or a single binary variable 
        named variable is a sequence of literals and constants represented with bits
        its template maps to binary variables and/or constant values
        named variable definitions are recorded in custom "var" statements 
        in comments within the output (e.g. DIMACS) file
        
        <variable_name> is the name of the variable, case sensitive
            if a decimal number, treated as a binary variable number
              must be a valid binary variable number from the formula
            otherwise, treated as a named variable name
              may contain characters '0'..'1', 'a'..'z', 'A'..'Z', '_'
              with first symbol may not a digit
            an option with the particular variable may only appear once
        <variable_value> is the variable value or specification
            constant binary variables may be grouped into bytes
            and further represented as a string, a hexadecinal or binary number
            sequential representation can be further optimized as below
            if a value includes spaces, it must surrounded with double-quotes
            to allow the shell process it as a single argument
            
        supported specifications:
            <value> = ( <value_data> | random | ( compute [<compute_options>] )) [<except>]
            <value_data> = ( <value_template> | <str_value> ) [<pad>] [ replace ]
            
            <value_template> = (<literal> | <hex_value> | <bin_value> | <non_value>)[<sequence_clause>]
            <value_template> = "{" <variable_template> ["," <variable_template>]..."}"[<sequence_clause>] 
                defines a named variable by describing a set of literals and constants
            <sequence_clause> = "/"<count>["/"<step>]
                defines a sequence of <count> elements where the first item is the specifie

Related Skills

View on GitHub
GitHub Stars25
CategoryDevelopment
Updated1y ago
Forks6

Languages

C++

Security Score

80/100

Audited on Dec 1, 2024

No findings