# Vigenere Cipher

**Introduction**

The Vigenère cipher is a key-based substitution cipher. Typically, the message and key alphabets would be that of English characters, possibly just using lowercase letters and numbers, but modern versions use ASCII encoded characters instead. The traditional version of the cipherrotates the message characters by the value of the characters in the key, although XORing the message characters with the key characters is also common.

**Example of traditional Vigenère cipher**:

Message: This is a test message that will be encrypted

Key: MyKey

**First,** the key is expanded by copying it repeatedly until it is at least as long as the message.

Message: This is a test message that will be encrypted

Key: MyKeyMyKeyMyKeyMyKeyMyKeyMyKeyMyKeyMyKeyMyKey

**Next**, each letter in the message is rotated by the value of the corresponding key character. This is done by converting the message and key to ASCII values (hex for simplicity), and then adding them modulo 0x100.

Message: 54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 6d

65 73 73 61 67 65 20 74 68 61 74 20 77 69 6c 6c

20 62 65 20 65 6e 63 72 79 70 74 65 64

Key: 4d 79 4b 65 79 4d 79 4b 65 79 4d 79 4b 65 79 4d

79 4b 65 79 4d 79 4b 65 79 4d 79 4b 65 79 4d 79

4b 65 79 4d 79 4b 65 79 4d 79 4b 65 79

For the modulo addition version of the Vigenère cipher, the first byte of the cipher text would be0x54 + 0x4d = 0xa1. The second would be 0x68 + 0x79 = 0xe1. The third would be 0x69 + 0x4b = 0xb4 and so on.

For the XOR version of the cipher, the first byte of the cipher text would be 0x54 XOR 0x4d = 0x19. The second would be 0x68 XOR 0x79 = 0x11. The third would be 0x69 XOR 0x4b = 0x22 and so on.

To decrypt the modulo addition cipher, simply expand the key as before and subtract it modulo 0x100 from the cipher text. To decrypt the XOR version, simply expand the key as before and XOR it with the cipher text.

**Cracking the Vigenère Cipher**

There are many ways to approach cracking the cipher, and each involves understanding the

particular way the key is used to modify the message text. The most efficient method is to use statistical analysis: this is a two-stage process where the first stage calculates the key length and the second recovers the key. You are free to use any approach you like, but this document will only describe this approach.

**Calculating Key Length**

Assuming the unencrypted message is in English, we can predict that the frequency of each of the letters within it will follow the usual distribution of letters in the English language. This

distribution, e.g. the frequency of each letter, can be obtained from Wikipedia. If we were to take any random slice of the unencrypted message, we could predict that this slice would also follow the same distribution. The same cannot be said of the cipher text however.

This is because different letters in the key are used to modify different slices of the message text, upsetting this distribution. That said, if we were to only take the parts of the cipher text that had been generated with the same character from the key (in this example the 1st, 6th, 11th, etc, characters of the message are all encrypted with the first character of the key) then the distribution would still hold, albeit rearranged by character. In other words, for this slice a different character would represent each character of the message and an ‘e’ in the message (the most popular letter) would be represented by a different character (0xb2 in addition, or 0x28 in XOR, for a key character of ‘M’).

We can use this property to calculate the key length. Given a guess at the length of the key, we can slice the cipher text so that each slice would have been encrypted with a different character of our (unknown) key. We can count the frequencies of all characters (from ASCII 0x00 to ASCII 0xFF) within the slice and build a table. We can then square all the frequencies in the table and add them together to obtain a sum-of-squares. We can then repeat this with each of the slices and add all of the sum-of-squares values together. If our guess at the key length is correct, we should end up with a value that is similar to the value we would have obtained had we performed this calculation on the message text, e.g. a value representing the sum-of-squares of the distribution of English. If our guess is wrong, we will have a value similar to that representing random noise. Trying this for all possible (likely) key lengths, we will likely find one value that is greater than all others, and this will relate to the correct key length.

**Recovering the Key**

** **

To recover the key we take a similar approach, but this time we use the actual distribution of

English letters to identify the correct key characters.

First we slice the cipher text such that each slice was encrypted using a different character from the key. For each slice we then calculate that slice’s key character. To do this we enumerate all possible values for this key character (from 0x00 to 0xFF) and calculate a statistical value for each. The highest value will relate to the correct key character.

The way we approach this is to decrypt the slice using the candidate key character and count the frequencies of the resulting characters. We then multiply each of those representing characters ‘a’ to ‘z’ by the frequencies expected in the English language (taken from Wikipedia), and add them together. This gives us the statistical value for this key candidate. After trying all candidates we take the key candidate corresponding to the greatest value (as this represents the closest match to English) and record this as that key character. We repeat this process for all slices to recover the entire key. Decryption is then a simple task of reversing the encryption process.

Find the following:

• The recovered key

• The decrypted message.