Logical Cryptanalysis

(Or on Econding Crypto Problems into SAT)

by Fabio Massacci

The basic idea of the approach is very simple:
  1. Encode the input/output relation of a cryptographic algorithm into a propositional formula. (bit values of plaintext, ciphertext, and possibly the key (if used) are encoded into propositional variables and the formula is true iff the ciphertext corresponds to the plaintext encrypted with the key).
  2. Use a SAT solver to find out the solution.
This approach easily captures to symmetric and public key cryptography, hash function, stream cipher and the like. The difficult part is finding a good encoding that a modern SAT solver can actually crack. We invented logical cryptanalysis in 1999 (See also the full list of papers farther below) And this was the start of a long path in community as the typical comment from crypto people I got was "for ever useless"... well, in this case it is true that nothing is for ever...

New Development - Logical Cryptanalysis a key component of algorithm for findng SHA-1 collisions

A landmark paper reported a way to generate full collisions with SHA-1, one ofthe most used hash algorithm:. In the paper they used logical cryptanalysis applied to hash function as a key component of the differential path search (my italic):
[...] but all these attempts led to results that not only were unsatisfactory but that even threatened the feasibility of the second near-collision attack.[...] Our final solution was to encode this problem into a satisfiability (SAT) problem and use a SAT solver to find a drop-in replacement differential path over the first eight steps that is solvable.
More specifically, we adapted the SHA-1 SAT system generator from Nossum.
Vegard Nossum was a student at the University of Oslo I co-supervised and the work of his at the basis of the collision is well described in his thesis: Chapeau to Marc, Elie, their team and, last but not least, Vegard.


If you are interested in the history of the field you might continue reading here. We had some follow-up in SAT Community (where CryptoSAT is a well known benchmark - See also Mate Soos' Blog) but the symmetric and public key approach didn't make much progress.

However, some researchers started using it for finding collisions on the MD4 and MD5 hash functions

My Old, Initial Encoding of Crypto into SAT

In what now looks the beginning of a field (but not long ago looked pretty much a dead end) I worked on two problems:

Main References

Data Encryption Standard

The U.S.A. Data Encryption Standard, a Feistel type block-cipher, has been encoded..

If you are not familiar on the work of DES, you can find background information in any good handbook of cryptography. If you are more interested just in the encoding, then our papers give you enough background information on the problem. The main thing you need to know is that this cipher works by ``repeating'' an operation (called round) a suitable number of times. The task of a round is to mix and combine the input bits with the secret key. Loosely speaking, the higher the number of rounds, the better. For instance, in the case of DES, after eight rounds all input bits affect all output bits.

Thus, from the view point of the propositional encoding (for key search), the higher the number of rounds, the harder the formula. In contrast, the higher the number of plaintext and ciphertext blocks which are conjoned the more constrained is the search: after conjoining a suitable number of pairs there is only one model.

The problems can be of two kind:

Key Search with a known or unknown plaintext
The formula encode the input/output relation of one or more blocks of ciphertexts and known plaintext. If you find a model the first N bits will give you the value of the secret key used for encryption
Verification of cryptographic properties
The formula encodes some good property of a cryptographic algorithm (e.g. there is no universal key which can decrypt any message, or there are no two keys that can produce the same plaintext/ciphetext pair). You must prove that the formula is valid.
The original encoder is available as a tar gzipped folder version 0.9. It comes equipped with scripts for automatic data gathering (the script we used for IJCAI/JAR/NMR papers) and supports lots of formats (Human readable, CNF, SMODELs,FOF, TPTP) BUT no longer works with current gcc.

Sami Liedes sliedes AT cc dot hut dot fi a student from Ilkka Niemela HUT TCSlaboratory has provided a patch that also fixes the scripts (patch of version 0.9).

Faking RSA Signatures with Public exponent 3

The RSA algorithm, a cryptographic algorithm based on modular exponentiation, has been encoded.

If you are not familiar with RSA, you can find background information in any good handbook of cryptography. If you are more interested just in the encoding, then our papers give you enough background information on the problem. The main thing you need to know is that this cipher works by suitably generating a triple of numbers $n,d,e$ such that the following equation holds

(MD)E = M (modulo N)

The pair (E,N) is the public key and the pair (D,N) is the private signing key. Basically to sign a message M≤N you elevate M to the D-th power (modulo N) and that is the signature F (from the italian Firma). when you receive the message signature pair (M,F) one simply checks that the e-th power of f is equal to m (modulo n). A PKCS standard recommended 3 as a possible value for E.

Thus, from the view point of the propositional encoding, we have encoded a relation

(FE) = M (modulo N)

where M is given and E=3. So any propositional solution of the encoding yields a fake (but valid) signature of M. Mathematically speaking this is called a cubic-residuosity problem.

The problems can be of two kind:

signature falsification
When the triple (E,D,N) is generated according the RSA algorithm there is always a solution to the equation given above and so we have the problem of finding the solution to problem whose satisfiability status is known (yes)
solution of cubic-residuosity
If the triple (E,D,N) is arbitrary the equation below may not have a solution for all values of M. Then the problem may be satisfiable (M is a cubic residue modulo N) or unsatisfiable (M is not a cubic residue). For example for N=35, M=2 is not a cubic residue, whereas M=6 is a cubic residue. Smodels takes few seconds to prove or disprove the cubic residuosity of all numbers from M<35.
The encoder is available as a tar gzipped folder version 1.0 (also in the version used for the DAM paper here). The good news is that it comes equipped with scripts for automatic data gathering (These are the script we used for the DAM) and (in comparison with DES) that it still works! The bad news is that it the 1.0 (which has some basic English Instructions) only supports the Heerhugo format (basically a human readable formula). Translating it to other formats is easily done via scripts.