This is a mathematical post which is related to the xkcd 936 comic about password strength. The central question is: **What is better for passwords? A password containing a few random characters or a passphrase containing a (less) few random words?** Here comes a mathematical discussion.

At first, here is the comic (if you do not already know it…):

The xkcd comic concludes that is it better to use a **passphrase** of 4 random words rather than a single-word **password** which has some known substitutions in it. It further presents some statistics about the entropy of the passwords. I wanted to prove this password strength and wanted to calculate the differences between a few password settings. (I say “password” when referring to a random password chosen from characters while I state “passphrase” when referring to a password made of words.)

## How to Break Passwords

I assume that **each character (or word) of the password is chosen completely random! **That is: The passwords/passphrases used in this scenario are generated from a truly random source and *not *from a human. (E.g., use the password generator in KeePass to generate random passwords, explained here.) The only chance to break these passwords is via **brute-force**. That is: A machine must try every single possible combination of characters in the case of the password. For cracking the passphrase, a **brute-force attack in conjunction with a dictionary attack** is used. That is: Every combination of words is tested against the passphrase. (In any case: I assume that the attacker actually has the possibility to test the brute-force generated passwords against the real password, e.g., in comparing the hash-values as he might know the hash from the real password, or the like.)

### The Math Behind

Well, it’s only a bit of math to calculate the strength of a password. This is basically the entropy of the password since it is chosen completely random. **To calculate the entropy of a password, the character set is raised to the power of the password length**: . For example, when using 83 different characters for a password with only 4 chars, the calculation would be . That is, we would have 47458321 different possibilities for the password resulting in 26 bits of entropy. (The bits of entropy are calculated with . If a calculator has only the or functions, the following algorithm can be used: .)

I used a character set for the password of **83 different characters** (a-z, A-Z, 0-9, as well as the following symbols:
!"$%&/()=<>+*#',;.:-_). For the generation of the passphrase I assume that **the language has about 200,000 words**. This might be true for the German language while the English language might have about 500,000 words (reference1 reference2). However, since the owner of the passphrase should be able to write it correctly, the vocabulary from which the passphrase is chosen should not be too complicated. 😉 That is, I think 200,000 words is a good point to start. Anyway, I also calculated the values for a passphrase chosen out of 10,000 words if a really easy set of known words is used.

A “good/strength” password should have about **80 bits of security**, i.e., different possibilities for the random passwords. That is, a brute-force attack would need to test different passwords to do an exhaustive key search. (Yes, I know, one could assume that such a machine only needs tries to find the password since it locates the correct one approximately after the half key space. However, this 1 bit does not make the difference here.)

## Calculated Values

I have calculated the entropy values for the following charsets: 10 digits, 83 chars, 94 chars, 10k words, 200k words, and 500k words. Then I calculated the entropy in bits while I rounded down the bit values. So, here comes the main graph: The x-axis shows the number of characters (password) respectively the number of words (passphrase) for the random chosen passwords/passphrases, while the y-axis shows the bits of entropy which the password/passphrase actually has. (The raw values for the calculations can be found at the end of this post.)

**–> That is: To have 80 bits of security, a password needs about 13 characters while a passphrase only needs about 5 words! However, a passphrase chosen out of 10,000 words needs 7 words to have the same strength. <–**

Since most passwords today have only 8 chars (51 bits of entropy), a passphrase with only 3 words (52 bits of entropy) would fit!

## Some Interpretations

When moving from the bottom to the top of the lines in the graph we see that passwords chosen out of the 10 digits are not that secure. For example, a password with a length of 4 digits only has an entropy of 13 bits, and even 12 digits only have an entropy of 39 bits. So: don’t ever use only digits for critical passwords!

**An interesting fact is the very small difference between passwords that are chosen out of 83 chars compared to passwords chosen out of 94 chars (blue and red line).** I always thought that it is much more secure to use almost all possible characters for the generation of passwords, although I knew that the end-users won’t be happy if I allocate such passwords to them. But we can see that the entropy does *not* increase that much when adding all strange characters to the passwords. **For example: A password with 8 characters has an entropy of 51 bits when chosen out of 83 chars, while it has 52 bits (only 1 more!) when chosen out of 94 chars. But if we extend the password to a length of 10, the 83 charset achieves an entropy of 63 bits, which is 12 bits more than before! **However, to have at least 80 bits of entropy, you should use not less than 13 characters for your passwords.

It is easy to see that the complexity of the passphrases increases much faster than those from passwords. **In fact, a 4-word passphrase chosen out of 200,000 words has an entropy of 70 bits.** Now here comes the same effect as mentioned in the paragraph before: Increasing the set of words from 200k to 500k (e.g., using two languages) does not increase the security that much: only from 70 to 75 bits in the case of 4 words. **But: Going from 4 to 5 words in the set of 200k words increases the entropy from 70 to 88 bits!** That is: Use your mother tongue and a length of 5 words and you are secure! 😉

Concerning the passphrases chosen out of 10,000 words: This requires to have at least 7 words to gain an entropy greater than 80 bits. Ok, this is possible to remember, but I think it is getting a bit annoying if you have to remember such huge sentences for your passphrases. So: Don’t use such small dictionaries for your passphrase generation.

### Increasing the Complexity

What about ideas like “I am additionally using some characters between my randomly chosen words to increase the complexity”? Well, let’s make an example: If you have 4 random words in your passphrase already and add 2 chars out of the 83 charset in random positions between the words, this would give you 20 different positions while each position would have possibilities, resulting in more possible passphrases for each passphrase. **The number of passphrases would be which results in an entropy of 87 bits. This actually is a much higher entropy than the 70 bits of entropy from the 4 word passphrase!**

(Do not confuse this idea with the method of adding the 83 charset to your 200k words for choosing the passphrase with only 4 words. It really does not matter whether you are using 200,000 or 200,083 “words” for your passphrases.)

### Problems With Passphrases

To say it one more time: **Your passphrases need to be randomly generated!** (As well as your passwords, of course.) Do not generate your own “good” passphrase by just looking around in the room you are sitting in and concatenating the things you see to generate a passphrase.

The same behaves if you *choose* your passphrases out of some randomly suggestions. That is: If the passphrase generator you are using shows some examples of the just generated passphrases, you should NOT choose the one which looks very easy to you. The reason is, that when you *choose* a passphrase manually, it decreases the entropy since an attacker will always start its brute-force attacks with the simplest word sets. This is really important. **You should not generate your passphrases yourself nor should you choose “easy” passphrases out of randomly generated ones!**

Another general problem when using long passphrases is the input length of password fields in the applications/services you are using. I have had many services that limit the input for passwords to 16 (or the like) characters. But a passphrase with at least 4 words will have more than 16 chars. Hm, bad news… (Hopefully the application tells you that your password is too long and does not simply cut it after the maximum input size ;))

## Conclusion

Coming back to the xkcd comic: **Yes, it is more secure to use a passphrase with 4 words than a password. And yes, it would be much easier for humans to remember such a passphrase.** However, the complexity bits shown in the comic are not based on the mathematic I have shown in this post, but are suggestions from not randomly generated passwords.

If chosen completely random, a passphrase with 4 words has the same complexity as a password with 11 characters. Since it is more applicable for an end-user to use a passphrase, this method should be preferred. And it should be much easier for the security engineer to motivate the users to learn a passphrase than a random password. 😉

However, if you are using more than 10 different applications and want to allocate 10 different passwords/passphrases, you are lost anyway! So my advise is still to use a password safe such as KeePass with a really strength password/passphrase as the master password! (A german KeePass introduction can be found here on my blog.)

### Appendix: Raw Values

The following tables show the raw values used for the figure above. The first one lists the length of the passwords (count of characters for passwords respectively the count of words for passphrases) and the number of different passwords for each character set. The second one shows the corresponding security complexities = password entropies.

Length | 10 Numbers | 83 Chars | 94 Chars | 10k Words | 200k Words | 500k Words |
---|---|---|---|---|---|---|

1 | 10 | 83 | 94 | 10000 | 200000 | 500000 |

2 | 100 | 6889 | 8836 | 100000000 | 40000000000 | 2,5E+11 |

3 | 1000 | 571787 | 830584 | 1E+12 | 8E+15 | 1,25E+17 |

4 | 10000 | 47458321 | 78074896 | 1E+16 | 1,6E+21 | 6,25E+22 |

5 | 100000 | 3939040643 | 7339040224 | 1E+20 | 3,2E+26 | 3,125E+28 |

6 | 1000000 | 3,2694E+11 | 6,8987E+11 | 1E+24 | 6,4E+31 | 1,5625E+34 |

7 | 10000000 | 2,71361E+13 | 6,48478E+13 | 1E+28 | 1,28E+37 | 7,8125E+39 |

8 | 100000000 | 2,25229E+15 | 6,09569E+15 | 1E+32 | 2,56E+42 | 3,90625E+45 |

9 | 1000000000 | 1,8694E+17 | 5,72995E+17 | 1E+36 | 5,12E+47 | 1,95313E+51 |

10 | 10000000000 | 1,5516E+19 | 5,38615E+19 | 1E+40 | 1,024E+53 | 9,76563E+56 |

11 | 1E+11 | 1,28783E+21 | 5,06298E+21 | 1E+44 | 2,048E+58 | 4,88281E+62 |

12 | 1E+12 | 1,0689E+23 | 4,7592E+23 | 1E+48 | 4,096E+63 | 2,44141E+68 |

13 | 1E+13 | 8,87187E+24 | 4,47365E+25 | 1E+52 | 8,192E+68 | 1,2207E+74 |

14 | 1E+14 | 7,36365E+26 | 4,20523E+27 | 1E+56 | 1,6384E+74 | 6,10352E+79 |

15 | 1E+15 | 6,11183E+28 | 3,95292E+29 | 1E+60 | 3,2768E+79 | 3,05176E+85 |

16 | 1E+16 | 5,07282E+30 | 3,71574E+31 | 1E+64 | 6,5536E+84 | 1,52588E+91 |

17 | 1E+17 | 4,21044E+32 | 3,4928E+33 | 1E+68 | 1,31072E+90 | 7,62939E+96 |

18 | 1E+18 | 3,49467E+34 | 3,28323E+35 | 1E+72 | 2,62144E+95 | 3,8147E+102 |

19 | 1E+19 | 2,90057E+36 | 3,08624E+37 | 1E+76 | 5,2429E+100 | 1,9073E+108 |

20 | 1E+20 | 2,40748E+38 | 2,90106E+39 | 1E+80 | 1,0486E+106 | 9,5367E+113 |

Length | Entropy 10 Numbers | Entropy 83 Chars | Entropy 94 Chars | Entropy 10k Words | Entropy 200k Words | Entropy 500k Words |
---|---|---|---|---|---|---|

1 | 3 | 6 | 6 | 13 | 17 | 18 |

2 | 6 | 12 | 13 | 26 | 35 | 37 |

3 | 9 | 19 | 19 | 39 | 52 | 56 |

4 | 13 | 25 | 26 | 53 | 70 | 75 |

5 | 16 | 31 | 32 | 66 | 88 | 94 |

6 | 19 | 38 | 39 | 79 | 105 | 113 |

7 | 23 | 44 | 45 | 93 | 123 | 132 |

8 | 26 | 51 | 52 | 106 | 140 | 151 |

9 | 29 | 57 | 58 | 119 | 158 | 170 |

10 | 33 | 63 | 65 | 132 | 176 | 189 |

11 | 36 | 70 | 72 | 146 | 193 | 208 |

12 | 39 | 76 | 78 | 159 | 211 | 227 |

13 | 43 | 82 | 85 | 172 | 228 | 246 |

14 | 46 | 89 | 91 | 186 | 246 | 265 |

15 | 49 | 95 | 98 | 199 | 264 | 283 |

16 | 53 | 102 | 104 | 212 | 281 | 302 |

17 | 56 | 108 | 111 | 225 | 299 | 321 |

18 | 59 | 114 | 117 | 239 | 316 | 340 |

19 | 63 | 121 | 124 | 252 | 334 | 359 |

20 | 66 | 127 | 131 | 265 | 352 | 378 |

### Other References

If you are interested in other discussions about password security, refer to the following pages:

- Passphrase Complexity Guidelines from the University of California, Berkeley
- Considerations from the creator of the xkcd comic
- Analyzing the XKCD Passphrase Comic
- Checking Password Complexity with John the Ripper

Featured image: “altonaer waagenbau” by Martin Schmid is licensed under CC BY-ND 2.0.

“…….If you have 4 random words in your passphrase already and add 2 chars out of the 83 charset in random positions between the words, this would give you 20 different positions……”

Could you explain that is some more detail? I see just 5 positions to insert 2 random characters like A1… A5. In practice only 4 since the position of A1 will probably not be used.

A1 word1 A2 word2 A3 word3 A4 word4 A5

(spaces inserted just for readability)

And if you’d allow splitting the 2 characters, I see just 10=4+3+2+1 positions

A word1 1 word2 word3 word4

A word1 word2 2 word3 word4

A word1 word2 word3 3 word4

A word1 word2 word3 word4 4

word1 A word2 5 word3 word4

etc.

Hi. My idea was to add 2 independent chars, lets say x and y. Then you could have something like this (say, the 4 words are 1 2 3 4):

xy1234

x1y234

x12y34

x123y4

x1234y

1xy234

1x2y34

1x23y4

1x234y

12xy34

12x3y4

12x34y

123xy4

123x4y

1234xy

= 15 options.

I further thought of the 5 options in which only 1 char is inserted (which gives a total of 20 different options):

x1234

1×234

12×34

123×4

1234x

But here you are right: My calculation about the 83 * 83 is a bit wrong, since this is only true for the 15 options with both chars, and not for the 5 with only one char. Hm. So it must be a bit more precise for the calculation, though it should not be that much away from mine. Something like this:

200000^4 * 15 * 83 * 83 + 200000^3 * 5 * 83

Thanks for the hint. It took my some time to rebuilt my thoughts. 😉

Please check all my other calculations, too, and tell me, if there are more mistakes!

Thanks for the details, Sorry I did not add up my 5 and 10 cases, which I should have. I agree that the original estimate is a good one.

Diceware suggests a related strengthening scheme: Add just 1 random character/symbol at a random place in a generated phrase, and some 10 bits of entropy will be gained.

FYI, I have successfully used you raw value table as a reference to check numbers generated by my passphrase generator and tester SimThrow. I checked it up to 8 words of the 10K, 200K and 500K dictionaries.

Much appreciated – And first time i’ve seen secure share buttons =)

Let’s say you mix cases in your dictionary words. That would make a huge difference, would it not? To find a 5-character word in the dictionary, the hacker would have to make 52*52*52*52*52 passes through the dictionary. So easy-to-remember rules like capitalizing all vowels, or the second letter of every word, etc., would appear to make dictionaries too large to be useful.

Yes, correct. It would increase the security if you have further rules such as capitalizing several letters. BUT ONLY if you are doing it randomly! If you always capitalize the vowels, (and if the hackers knows that), you have NOT increased anything!

However, it is not necessary at all if you are already using 5 words, since the bits-of-security are already enough. 😉

I usually try to tailor the password security to the function. On a limited access computer where the only remote access is with ssh using RSA and DSA keys, the passwords may be short and relatively simple.

On the other hand, for computers accessible over the internet, I frequently use nonsense phrases of varying lengths up to just less than 100 characters.

I’ve also used directions from one place to another as a passphrase. For example “Jack’s Laundry Washington North 15th Main Ralph’s Bank”. While not random, the choice of source and destination is pretty much random from a limited set of selections but with non official names (from where Jack does his laundry to where Ralph works as a teller) and the directions may not be the most direct or the most obvious. Alternatively, it could just be a couple of places with their address “Smith’s Church 101 North Vermont Sally’s Neighbor 503 Elm”.

And then there are the formula along with a couple of extra words such as “E^2 = m^2 + p^2 gravitational hedgehog”.

One thing that I’ve found is to never use two passphrases that begin with the same word. I did that once and was always confusing the two passphrases.

Based on xkcd and the blog post above, I hacked a little tool which generates a passphrase accordingly. Just to get an impression on how it might feel to use passphrases instead passwords. The words are randomly taken from a plain-text dictionary (in my case de-en).

On my first attempts I failed to memorize the proposed passphrases. The trick, like it was also mentioned in the xkcd, is to think out a causal relation between the words. But with ‘truly’ random words, that is hard to achieve. Plus, more words than expected that are coming up are unfamiliar to me which increases that problem.

On the other hand, from my experience I do memorize 12 to 16 character passwords in the 94 Chars class, just by training the pattern on how to type that password on my keyboard. Of course, keyboard layout changes are a big hurdle.

Well, the bottom line is, in either case you have to take your time to train and memorize your secret continuously. That is why I do not use keyrings or keyagents [exceptions have a short lifetime for passwords] in order to keep my brain trained to type my passwords over and over again.

Cheers,

Don’t know how they get 28 bits of entropy with the first example, this way this is calculated is for me totally wrong.

You have case senstive alphanum + symbols, so lets say 94 character. So the amount of possible combination is 94^11 as Tr0ub4dor&3 as a length of 11.

Then, to get the entropy, you apply the log base 2 of the total amount of possible combination. So,

log_2(94^11) = 72 bits.

So the entropy is equal to 72. I didn’t take the time to calculate the second with the horse but it also seems to be also totally wrong..

The way*,

characters*,

combinations*

Sorry for the mistake 😛

mynameisnobody wrote on 2016-04-13 at 09:59 :

“Don’t know how they get 28 bits of entropy with the first

example, this way this is calculated is for me totally wrong.”

No they are right. The base for the calculation is a randomly chose word form a ~65000 word dictionary, that gives about 16 bits of strength.

Then for every word a number of derived words are taken. For example starting with a capital or not. That results in a twofold of words to be tested, or 1 bit extra. Adding a numerical and punctuation and test all possibilities behind the base word, gives 7 bits. Not knowing the sequence of that doubles that or one extra bit. etc. etc.

Now the issue that this is a assumption that will work, is that the more permutations you apply, the less the password will be memorable. So one stops after one or 2 permutations.

My question is: What is the ideal kind of password to use for the password manager?

Let’s assume we use 20 characters. Is it more secure to:

1. Use one with completely random characters which you write down on a paper and keep it safe (or learn to remember it).

or

2. Use a certain number of uncommon words that will make up 20 characters, which you will probably be able to remember in your head.

Hi Jonathan,

choose whatever you like as long as you have at least 80 bits of security. -> “To have 80 bits of security, a password needs about 13 characters while a passphrase only needs about 5 words!”

But do NOT use your own generated passphrase with “uncommon words”. Because what is uncommon? Do you decide it? The values proposed in this post only apply to truly random chosen passphrases without any human interaction!

In my opinion, a passphrase is much better for a pasword manager because, as you already noted, is easier to remember than 13 or more characters in a randomly chosen password. 😉