RNGeesus

SMT based attacks on non cryptographic PRNGs

Generate Convert Improve

Install / Use

/learn @deut-erium/RNGeesus

About this skill

Quality Score

0/100

README

Project - Analysis, State and Seed recovery of RNGs

| | | | | | :--: | :--: | :--: | :--: | | Himanshu Sheoran | Yash Parmar | Lakshya Kumar | Sahil Jain | | 170050105 | 170050004 | 170050033 | 180050089 |

Abstract

Study of novel methods for seed and state recovery with reduced number of outputs for general purpose random number generators like MT19937, MT19937-64, LCGs & Truncated LCGs, LSFRs, using SMT/SAT solvers. SMT/SAT solvers are used extensively in software-verification, in this paper we demonstrate how SMT solvers can be used as powerful oracles to analyse and break some cryptograpic primitives like random number generators.

Introduction

Given a PRNG algorithm $A$, which is initialised using an initial value aka seed and $x_1, x_2, ..., x_k$ be the generated outputs from the random number generator, we wish to determine the starting seed or the state $S$ required to predict the future outputs of the generator.

Our work involves modelling the seed and state recovery problem of a PRNG by encoding it into a Satisfiability Modulo Theories (SMT) decision problem, which is an extension of SAT with theories in first order logic.
We used the SMT solver Z3Prover, as a sequential program written in theory of BitVectors for solving out SMT model of the PRNGs . After (painfully) modelling the program, we begin a SAT solver search (all-SAT to give all satisfying models for possible seed values) which leads us to a given sequence of outputs (the generated random numbers).

The core idea behind using z3 is that it mixes the program data and the program quite well, eases the modelling a lot.
Since theory of Bitvectors translates very well to algorithms designed and written in low level languages like C, and even assembly, most of them can be analysed very efficiently.
All we need to care about the correct SMTlib implementations to use as the general notion of various operators like bitshifts, comparisons are translated differently based on different scenarios by a compiler.

With our method, we were able to recover the seed of standard mersenne twister (MT19937), which is the most used PRNG across all software systems, using only 3 outputs using SMT solvers in under 5 minutes, whereas all previous work is on state recovery using 624 consecutive outputs.

Application of SMT encoding on truncated LCGs yielded comparable and in some cases way better results than the existing lattice based recovery techniques.
SMT based attack on Geffe generator results was found to be significantly faster than the known correlation attacks.

Extending the discussion, we further study the case of notorious DUAL EC DRBG CSPRNG for presence of kleptographic backdoor to gibe NSA the ability to predict all outputs given 32 bytes of keystream.

Mersenne Twister

Mersenne Twister (MT) is by far the most widely used general-purpose PRNG, which derives its name from the fact that its period is the Mersenne prime $2^{19937} -1$

It is the default PRNG in Dyalog APL, Microsoft Excel, IDL, Julia, PHP, Pyhton, R, SageMath, standard C++, Mathematica and many more!

Algorithmic Details

The Mersenne Twister algorithm is based on a matrix linear recurrence over a finite binary field $F_2$. The algorithm is a twisted generalised feedback shift register (twisted GFSR, or TGFSR) of rational normal form, with state bit reflection and tampering.
The internal state is defined by a sequence of $n=624$, 32-bit registers ($w$)
$$x_{k+n} \to x_{k+m} \oplus (( x_k^{u} | x_{k+1}^{l})A)$$ To compensate for reduced dimensionality of equidistribution, the state is cascaded with tampering transform (to improve the equidistribution) to produce the outputm

$$y \to x \oplus(( x \gg u)&d)$$ $$y \to y \oplus(( y \ll s)&b)$$ $$y \to y \oplus(( y \ll t)&c)$$ $$z \to y \oplus( y \gg l)$$

The computed $z$ is returned by the algorithm where the choice of constants is as follows

    w, n, m, r = 32, 624, 397, 31
    a = 0x9908B0DF
    u, d = 11, 0xFFFFFFFF
    s, b = 7, 0x9D2C5680
    t, c = 15, 0xEFC60000
    l = 18
    f = 1812433253

Initialization

The state needed for a Mersenne Twister implementation is an array of $n$ values of $w$ bits each. To initialize the array, a w-bit seed value is used to supply $x_0$ through $x_{n-1}$ by setting $x_0$ to the seed value and thereafter setting

$$x_i = f \times (x_{i-1} \oplus (x_{i-1} \gg (w-2))) + i$$

for $i$ from 1 to n-1. The first value the algorithm then generates is based on $x_n$. See for details

\pagebreak

While implementing, we need to consider only three things

State initialization i.e. seeding

    def seed_mt(seed):
        MT[0] = seed
        index = n
        for i in range(1, n):
            temp = f * (MT[i - 1] ^ (MT[i - 1] >> (w - 2))) + i
            MT[i] = temp & ((1 << w) - 1)

The twist operation to produce next 624 state registers by "twisting" the current state of 624 registers

    def twist():
        for i in range(n):
            x = (MT[i] & upper_mask) + (MT[(i + 1) % n] & lower_mask)
            xA = x >> 1
            if (x % 2) != 0:
                xA = xA ^ a
            MT[i] = MT[(i + m) % n] ^ xA
        index = 0

The tamper operation to tamper a state register to the produced 32-bit output

    def extract_number():
        """aka tamper state at index"""
        y = MT[index]
        y = y ^ ((y >> u) & d)
        y = y ^ ((y << s) & b)
        y = y ^ ((y << t) & c)
        y = y ^ (y >> l)
        index += 1
        return y & ((1 << w) - 1)

Background

The primary idea of cracking Mersenne twister comes as a part of Matasano's Cryptopals challenges, after which there exist various conference talks for Mersenne twister seed and state recovery for the aid of pentesters at various security conferences e.g

untwister presented at B-Sides Las Vegas 2014, which recovers upto 32 bit seeds by a parallalized bruteforce using a pool of workers or state recovery using 624 consecutive outputs (will be discussed soon).
PRNG Cracker which in addition to parallalized seed bruteforcing, creates a rainbow table of outputs for lookup in seed database.
PHP mt_rand predictor achieves seed recover using two outputs which are 227 apart of each other exploiting the improper implementation of Mersenne twister in PHP in particular. This works only for PHP as it doesnt use the standard MT (Mersenne Twister) algorithm.

The most famous way to recover the state of Mersenne Twister is using any 624 consecutive outputs. We implemented the same for comparison with our SAT-model.

Our Work

We began with the implementation of standard MT19937 from algorithm described on Wikipedia. This involved a lot of debugging and testing against various random number library implementations, reading the source code of the MT implementations in

And figuring out how each of these vary from the standard implementation. More or less, each one of these use the standard MT as an API to extract 32 bit uniformly random values from the underlying state of MT then constructing their own API out of this functionality.
These include improved (and hence more non linear) initialization called init_by_array as proposed in MT2002, translation of equivalent functions from (usually) underlying c implementations to python and testing them rigorously to match the outputs and state. This is a bit challenging due to the fact python treats all integers without bounds and we need to ensure the general assertion of int_32 everywhere is valid.
This gave a very deep idea of how RNGs works in different systems/languages and how non-trivial it is going to be to model them as a SAT problem.

Modelling

After getting all the underlying algorithms and functionalities right, we modelled the seed recovery algorithm as a z3 formula. The tamper state when written for a BitVec(32) y is almost exactly same as we would have written for a python-int

def tamper_state(y):
    y = y ^ (LShR(y, u) & d)
    y = y ^ ((y << s) & b)
    y = y ^ ((y << t) & c)
    y = y ^ (LShR(y, l))
    return y

Note that tamper_state(y) actually returns the bitvector computation [LShR is Logical Shift Right]

     y ^ LShR(y, 11) & 4294967295 ^ 
    (y ^ LShR(y, 11) & 4294967295) << 7 & 2636928640 ^
    (y ^ LShR(y, 11) & 4294967295 ^
    (y ^ LShR(y, 11) & 4294967295) << 7 & 2636928640) << 15 & 
    4022730752 ^ LShR(
     y ^ LShR(y, 11) & 4294967295 ^
    (y ^ LShR(y, 11) & 4294967295) << 7 & 2636928640 ^
    (y ^ LShR(y, 11) & 4294967295 ^
    (

Related Skills

node-connect

339.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

339.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.9k

Commit, push, and open a PR