This is a mathematical post which is related to the xkcd 936 comic about password strength. The central question is: What is better for passwords? A password containing a few random characters or a passphrase containing a (less) few random words? Here comes a mathematical discussion.
At first, here is the comic (if you do not already know it…):
The xkcd comic concludes that is it better to use a passphrase of 4 random words rather than a single-word password which has some known substitutions in it. It further presents some statistics about the entropy of the passwords. I wanted to prove this password strength and wanted to calculate the differences between a few password settings. (I say “password” when referring to a random password chosen from characters while I state “passphrase” when referring to a password made of words.)
How to Break Passwords
I assume that each character (or word) of the password is chosen completely random! That is: The passwords/passphrases used in this scenario are generated from a truly random source and not from a human. (E.g., use the password generator in KeePass to generate random passwords, explained here.) The only chance to break these passwords is via brute-force. That is: A machine must try every single possible combination of characters in the case of the password. For cracking the passphrase, a brute-force attack in conjunction with a dictionary attack is used. That is: Every combination of words is tested against the passphrase. (In any case: I assume that the attacker actually has the possibility to test the brute-force generated passwords against the real password, e.g., in comparing the hash-values as he might know the hash from the real password, or the like.)
The Math Behind
Well, it’s only a bit of math to calculate the strength of a password. This is basically the entropy of the password since it is chosen completely random. To calculate the entropy of a password, the character set is raised to the power of the password length: . For example, when using 83 different characters for a password with only 4 chars, the calculation would be . That is, we would have 47458321 different possibilities for the password resulting in 26 bits of entropy. (The bits of entropy are calculated with . If a calculator has only the or functions, the following algorithm can be used: .)
I used a character set for the password of 83 different characters (a-z, A-Z, 0-9, as well as the following symbols: !"$%&/()=<>+*#',;.:-_). For the generation of the passphrase I assume that the language has about 200,000 words. This might be true for the German language while the English language might have about 500,000 words (reference1 reference2). However, since the owner of the passphrase should be able to write it correctly, the vocabulary from which the passphrase is chosen should not be too complicated. ;) That is, I think 200,000 words is a good point to start. Anyway, I also calculated the values for a passphrase chosen out of 10,000 words if a really easy set of known words is used.
A “good/strength” password should have about 80 bits of security, i.e., different possibilities for the random passwords. That is, a brute-force attack would need to test different passwords to do an exhaustive key search. (Yes, I know, one could assume that such a machine only needs tries to find the password since it locates the correct one approximately after the half key space. However, this 1 bit does not make the difference here.)
I have calculated the entropy values for the following charsets: 10 digits, 83 chars, 94 chars, 10k words, 200k words, and 500k words. Then I calculated the entropy in bits while I rounded down the bit values. So, here comes the main graph: The x-axis shows the number of characters (password) respectively the number of words (passphrase) for the random chosen passwords/passphrases, while the y-axis shows the bits of entropy which the password/passphrase actually has. (The raw values for the calculations can be found at the end of this post.)
–> That is: To have 80 bits of security, a password needs about 13 characters while a passphrase only needs about 5 words! However, a passphrase chosen out of 10,000 words needs 7 words to have the same strength. <–
Since most passwords today have only 8 chars (51 bits of entropy), a passphrase with only 3 words (52 bits of entropy) would fit!
When moving from the bottom to the top of the lines in the graph we see that passwords chosen out of the 10 digits are not that secure. For example, a password with a length of 4 digits only has an entropy of 13 bits, and even 12 digits only have an entropy of 39 bits. So: don’t ever use only digits for critical passwords!
An interesting fact is the very small difference between passwords that are chosen out of 83 chars compared to passwords chosen out of 94 chars (blue and red line). I always thought that it is much more secure to use almost all possible characters for the generation of passwords, although I knew that the end-users won’t be happy if I allocate such passwords to them. But we can see that the entropy does not increase that much when adding all strange characters to the passwords. For example: A password with 8 characters has an entropy of 51 bits when chosen out of 83 chars, while it has 52 bits (only 1 more!) when chosen out of 94 chars. But if we extend the password to a length of 10, the 83 charset achieves an entropy of 63 bits, which is 12 bits more than before! However, to have at least 80 bits of entropy, you should use not less than 13 characters for your passwords.
It is easy to see that the complexity of the passphrases increases much faster than those from passwords. In fact, a 4-word passphrase chosen out of 200,000 words has an entropy of 70 bits. Now here comes the same effect as mentioned in the paragraph before: Increasing the set of words from 200k to 500k (e.g., using two languages) does not increase the security that much: only from 70 to 75 bits in the case of 4 words. But: Going from 4 to 5 words in the set of 200k words increases the entropy from 70 to 88 bits! That is: Use your mother tongue and a length of 5 words and you are secure! ;)
Concerning the passphrases chosen out of 10,000 words: This requires to have at least 7 words to gain an entropy greater than 80 bits. Ok, this is possible to remember, but I think it is getting a bit annoying if you have to remember such huge sentences for your passphrases. So: Don’t use such small dictionaries for your passphrase generation.
Increasing the Complexity
What about ideas like “I am additionally using some characters between my randomly chosen words to increase the complexity”? Well, let’s make an example: If you have 4 random words in your passphrase already and add 2 chars out of the 83 charset in random positions between the words, this would give you 20 different positions while each position would have possibilities, resulting in more possible passphrases for each passphrase. The number of passphrases would be which results in an entropy of 87 bits. This actually is a much higher entropy than the 70 bits of entropy from the 4 word passphrase!
(Do not confuse this idea with the method of adding the 83 charset to your 200k words for choosing the passphrase with only 4 words. It really does not matter whether you are using 200,000 or 200,083 “words” for your passphrases.)
Problems With Passphrases
To say it one more time: Your passphrases need to be randomly generated! (As well as your passwords, of course.) Do not generate your own “good” passphrase by just looking around in the room you are sitting in and concatenating the things you see to generate a passphrase.
The same behaves if you choose your passphrases out of some randomly suggestions. That is: If the passphrase generator you are using shows some examples of the just generated passphrases, you should NOT choose the one which looks very easy to you. The reason is, that when you choose a passphrase manually, it decreases the entropy since an attacker will always start its brute-force attacks with the simplest word sets. This is really important. You should not generate your passphrases yourself nor should you choose “easy” passphrases out of randomly generated ones!
Another general problem when using long passphrases is the input length of password fields in the applications/services you are using. I have had many services that limit the input for passwords to 16 (or the like) characters. But a passphrase with at least 4 words will have more than 16 chars. Hm, bad news… (Hopefully the application tells you that your password is too long and does not simply cut it after the maximum input size ;))
Coming back to the xkcd comic: Yes, it is more secure to use a passphrase with 4 words than a password. And yes, it would be much easier for humans to remember such a passphrase. However, the complexity bits shown in the comic are not based on the mathematic I have shown in this post, but are suggestions from not randomly generated passwords.
If chosen completely random, a passphrase with 4 words has the same complexity as a password with 11 characters. Since it is more applicable for an end-user to use a passphrase, this method should be preferred. And it should be much easier for the security engineer to motivate the users to learn a passphrase than a random password. ;)
However, if you are using more than 10 different applications and want to allocate 10 different passwords/passphrases, you are lost anyway! So my advise is still to use a password safe such as KeePass with a really strength password/passphrase as the master password! (A german KeePass introduction can be found here on my blog.)
Appendix: Raw Values
The following tables show the raw values used for the figure above. The first one lists the length of the passwords (count of characters for passwords respectively the count of words for passphrases) and the number of different passwords for each character set. The second one shows the corresponding security complexities = password entropies.
|Length||10 Numbers||83 Chars||94 Chars||10k Words||200k Words||500k Words|
If you are interested in other discussions about password security, refer to the following pages:
- Passphrase Complexity Guidelines from the University of California, Berkeley
- Considerations from the creator of the xkcd comic
- Analyzing the XKCD Passphrase Comic
- Checking Password Complexity with John the Ripper