SkillAgentSearch skills...

Polyalphabetic

Stochastic, slippery hill climber with backtracking for Vigenere, Porta, Beaufort, Autokey, Quagmire I, II, III, and IV ciphers.

Install / Use

/learn @stblake/Polyalphabetic
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Polyalphabetic Cipher Solver

A prototype slippery stochastic shotgun-restarted hill climber with backtracking for Vigenere, Beaufort, Porta, Quagmire I, II, III, IV, and Autokey ciphers (including variants.)

This program is inspired by various explanations of Jim Gillogly's cipher solving program (that he used for solving the first three ciphers on Kryptos):

https://groups.google.com/g/sci.crypt/c/hOCNN6L13CM/m/s85aEvsmrl0J

The effort to make this program as efficient as possible was inspired by the homophonic solver in AZDecrypt.

Algorithmic Description of the Polyalphabetic Solver

Stochastic Shotgun Hill Climbing Architecture

The core engine of this program is a Shotgun Hill Climber, a heuristic search algorithm designed to navigate the rugged energy landscapes typical of polyalphabetic substitution ciphers. Unlike brute-force methods, which are computationally infeasible for (poly-)alphabetic permutations, this approach relies on iterative improvement.

The "Shotgun" aspect refers to the initialization strategy: the solver performs $N$ distinct Restarts, each initializing the state with a completely random permutation of keywords and cyclewords. This prevents the solver from becoming permanently trapped in a local maximum—a common pitfall where a solution scores well relative to its neighbors but is not the true plaintext.

During each restart, the algorithm executes a specified number of Hill Climbing Steps. In each step, the current state (the arrangement of the alphabets and the cycleword) is mutated slightly. If the mutation results in a higher fitness score, the new state is adopted. To further mitigate local maxima, the algorithm implements Simulated Annealing-like mechanics, including:

  • Slipping: A defined probability of accepting a state with a lower score to traverse "valleys" in the fitness landscape.
  • Backtracking: A probability of reverting the current state to the absolute best state found so far, ensuring that aggressive exploration does not lose a promising solution.

State Representation and Cipher Modeling

The program models Quagmire I–IV, Beaufort, and Vigenère ciphers by managing three distinct state components:

  1. Plaintext Alphabet ($P_k$): The keyed alphabet mapping plaintext letters.
  2. Ciphertext Alphabet ($C_k$): The keyed alphabet mapping ciphertext letters.
  3. Cycleword (Indicator): The vector of offsets determining the shift for each period position.

Depending on the specific cipher_type selected, the mutator functions constrain these alphabets. For example, in a Quagmire III, the program ensures $P_k$ and $C_k$ remain identical but scrambled; in Quagmire II, $P_k$ is fixed to the standard alphabet while $C_k$ is scrambled.

Hybrid Deterministic Optimization (The "Optimal Cycleword" Method)

A key innovation in this implementation is the Profile-Based Cycleword Derivation (activated via -optimalcycle). Standard hill climbers attempt to stochastically guess both the alphabet structure and the cycleword simultaneously. This creates a search space of magnitude $26! \times 26^N$.

The hybrid approach reduces the dimensionality of the problem to just $26!$. It operates hierarchically:

  1. Stochastic Layer: The hill climber perturbs the Alphabet Keywords only.
  2. Deterministic Layer: For every candidate alphabet, the program performs a columnar frequency analysis. It decomposes the ciphertext into $N$ cosets (where $N$ is the period length).
  3. Cost Minimization: For each coset, the algorithm calculates the Dot Product (or Chi-squared statistic) of the decrypted column against standard English monogram frequencies for all 26 possible shifts.
  4. State Injection: The shift producing the highest correlation is mathematically selected as the cycleword character for that column.

This ensures that every candidate alphabet is evaluated against its theoretical best cycleword, smoothing the scoring gradient and preventing correct alphabets from being discarded due to cycleword misalignment.

Scoring Function (Fitness Metric)

The fitness of a candidate state is evaluated using a weighted sum of four metrics:

  1. N-gram Statistics (Log-Likelihood): The primary driver of the solve. The program sums the log-probabilities of decrypted $N$-grams (typically trigrams or quadgrams) based on a corpus of English text. This penalizes unpronounceable or statistically improbable sequences.
  2. Crib Matching: If a known plaintext string (crib) is provided, the function checks for consistency. In strict mode, a mismatch forces a score of zero; in weighted mode, partial matches contribute to the score.
  3. Index of Coincidence (IoC): Used primarily for period estimation, this measures the unevenness of letter distributions (the "roughness" of the text).
  4. Entropy: A measure of information density, ensuring the decrypted text resembles the low-entropy characteristics of natural language rather than random noise.

Vigenère

The Vigenère cipher is a method of encrypting alphabetic text by using a keyword to shift each letter of the plaintext. Each letter in the keyword determines the shift for the corresponding letter in the plaintext, resulting in a polyalphabetic substitution cipher that is more resistant to frequency analysis than monoalphabetic ciphers.

Here we solve a simple Vigenère cipher:

$ ./polyalphabetic -type vig -cipher cipher_vigenere.txt -ngramsize 5 -ngramfile english_quintgrams.txt -nhillclimbs 500 -nrestarts 100 -backtrackprob 0.15 -slipprob 0.0005 -verbose

0.00	[sec]
8K	[it/sec]
0	[backtracks]
0	[restarts]
0	[slips]
0.00	[contradiction pct]
0.0643	[IOC]
2.8469	[entropy]
0.09	[chi-squared]
11.32	[score]
ABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZ
POLYALPHABETIC

ITISATRUTHUNIVERSALLYACKNOWLEDGEDTHATASINGLEMANINPOSSESSIONOFAGOODFORTUNEMUSTBEINWANTOFAWIFEHOWEVERLITTLEKNOWNTHEFEELINGSORVIEWSOFSUCHAMANMAYBEONHISFIRSTENTERINGANEIGHBOURHOODTHISTRUTHISSOWELLFIXEDINTHEMINDSOFTHESURROUNDINGFAMILIESTHATHEISCONSIDEREDTHERIGHTFULPROPERTYOFSOMEONEOROTHEROFTHEIRDAUGHTERS

So the cycleword (indicator) is POLYALPHABETIC.

Beaufort

The Beaufort cipher is a polyalphabetic substitution cipher that encrypts text by pairing each letter of the plaintext with a key and performing modular subtraction. It’s similar to the Vigenère cipher but uses subtraction instead of addition, making it its own reciprocal—encryption and decryption are performed with the same process.

For example, we solve a beaufort cipher which contains the famous opening line from Pride and Prejudice by Jane Austen.

$ ./polyalphabetic -type beaufort -cipher cipher_beaufort.txt -ngramsize 5 -ngramfile english_quintgrams.txt -nhillclimbs 500 -nrestarts 100 -backtrackprob 0.15 -slipprob 0.0005 -cyclewordlen 7 -verbose

0.01	[sec]
68K	[it/sec]
0	[backtracks]
0	[restarts]
389	[iterations]
0	[slips]
0.00	[contradiction pct]
0.0643	[IOC]
2.8469	[entropy]
0.09	[chi-squared]
0.81	[score]
ABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZ
REGXYLV
ITISATRUTHUNIVERSALLYACKNOWLEDGEDTHATASINGLEMANINPOSSESSIONOFAGOODFORTUNEMUSTBEINWANTOFAWIFEHOWEVERLITTLEKNOWNTHEFEELINGSORVIEWSOFSUCHAMANMAYBEONHISFIRSTENTERINGANEIGHBOURHOODTHISTRUTHISSOWELLFIXEDINTHEMINDSOFTHESURROUNDINGFAMILIESTHATHEISCONSIDEREDTHERIGHTFULPROPERTYOFSOMEONEOROTHEROFTHEIRDAUGHTERS

Note, there is a small bug here where the cycleword is messed-up, but we successfully extract the plaintext nonetheless.

Porta

The Porta cipher is a reciprocal polyalphabetic substitution cipher. We implement the Porta cipher as defined by the ACA (https://www.cryptogram.org/downloads/aca.info/ciphers/Porta.pdf)

./polyalphabetic -type porta -cipher porta_aca.txt -ngramsize 5 -ngramfile english_quintgrams.txt -nhillclimbs 250 -nrestarts 100 -backtrackprob 0.15 -slipprob 0.05 -cyclewordlen 11 -stochasticcycle -verbose

We quickly obtain the decryption:

0.02	[sec]
1166K	[it/sec]
12	[backtracks]
91	[restarts]
977	[slips]
0.00	[contradiction pct]
0.0650	[IOC]
2.6692	[entropy]
0.52	[chi-squared]
10.55	[score]
OPQSBCJPGEQ

BETWEENSUBTLESHADINGANDTHEABSENCEOFLIGHTLIESTHENUANCEOFIQLUSION

Note that many equivalent cyclewords are possible for Porta ciphers.

Autokey

Below we solve an autokey cipher (using a straight alphabet, or Vigenere tableau.)

$ ./polyalphabetic -type autokey -cipher autokey_len97_wl21.txt -ngramsize 5 -ngramfile english_quintgrams.txt -nhillclimbs 500 -nrestarts 1000 -backtrackprob 0.15 -slipprob 0.0005 -cyclewordlen 21 -verbose

And we obtain the solution:

1.46	[sec]
545K	[it/sec]
138	[backtracks]
794	[restarts]
392	[slips]
0.00	[contradiction pct]
0.0612	[IOC]
2.8111	[entropy]
0.24	[chi-squared]
10.55	[score]
ABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZ
JAMESHERBERTSANBORNJR

CIAMARKERONTHEGROUNDSEASTNORTHEASTOFKRYPTOSDECODEUSINGSETTHEORYBERLINCLOCKTHENFOLLOWMARKERDIRECTION

So the primer (or indicator) is JAMESHERBERTSANBORNJR.

We can also solve Autokey ciphers that use a Beaufort tableau (with -type autobeau):

./polyalphabetic -type autobeau -cipher cipher_autokey_beaufort.txt -ngramsize 5 -ngramfile english_quintgrams.txt -nhillclimbs 500 -nrestarts 15000 -backtrackprob 0.15 -slipprob 0.0005 --cyclewordlen 7 -verbose

And almost instantly we get the following solution:

0.01	[sec]
415K	[it/sec]
0	[backtracks]
4	[restarts]
2	[slips]
0.00	[contradiction pct]
0.0599	[IOC]
2.8191	[entropy]
0.25	[chi-squared]
10.58	[score]
ABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZ
GIRASOL

CIAMARKERONTHEGROUNDEASTNORTHEASTOFKRYPTOSDECODEUSINGSETTHEORYBERLINCLOCKANDFOLLOWMARKERDIRECTION

Similarly, we can solve an autokey cipher that uses a Porta tableau (with -type autoporta)

0.00	[sec]
399K	[it/sec]
0	[backtracks]
1	[restarts]
1	[slips]
0.00	[contradiction pct]
0.0601	[IOC]
2.8170	[entropy]
0.2
View on GitHub
GitHub Stars11
CategoryDevelopment
Updated12d ago
Forks4

Languages

C

Security Score

95/100

Audited on Mar 16, 2026

No findings