Password Strength/Entropy: Characters vs. Words

This is a mathematical post which is related to the xkcd 936 comic about password strength. The central question is: What is better for passwords? A password containing a few random characters or a passphrase containing a (less) few random words? Here comes a mathematical discussion.

At first, here is the comic (if you do not already know it…):

XKCD 936: Password Strength

The xkcd comic concludes that is it better to use a passphrase of 4 random words rather than a single-word password which has some known substitutions in it. It further presents some statistics about the entropy of the passwords. I wanted to prove this password strength and wanted to calculate the differences between a few password settings. (I say “password” when referring to a random password chosen from characters while I state “passphrase” when referring to a password made of words.)

How to Break Passwords

I assume that each character (or word) of the password is chosen completely random! That is: The passwords/passphrases used in this scenario are generated from a truly random source and not from a human. (E.g., use the password generator in KeePass to generate random passwords, explained here.) The only chance to break these passwords is via brute-force. That is: A machine must try every single possible combination of characters in the case of the password. For cracking the passphrase, a brute-force attack in conjunction with a dictionary attack is used. That is: Every combination of words is tested against the passphrase. (In any case: I assume that the attacker actually has the possibility to test the brute-force generated passwords against the real password, e.g., in comparing the hash-values as he might know the hash from the real password, or the like.)

The Math Behind

Well, it’s only a bit of math to calculate the strength of a password. This is basically the entropy of the password since it is chosen completely random. To calculate the entropy of a password, the character set is raised to the power of the password length: CharacterSet^{PasswordLength}. For example, when using 83 different characters for a password with only 4 chars, the calculation would be 83^{4} = 47458321 = 2^{26}. That is, we would have 47458321 different possibilities for the password resulting in 26 bits of entropy. (The bits of entropy are calculated with log_2(x). If a calculator has only the log_{10}(x) or ln(x) functions, the following algorithm can be used: log_{2}(x) = ln(x)/ln(2).)

I used a character set for the password of 83 different characters (a-z, A-Z, 0-9, as well as the following symbols: !"$%&/()=<>+*#',;.:-_). For the generation of the passphrase I assume that the language has about 200,000 words. This might be true for the German language while the English language might have about 500,000 words (reference1 reference2). However, since the owner of the passphrase should be able to write it correctly, the vocabulary from which the passphrase is chosen should not be too complicated. 😉 That is, I think 200,000 words is a good point to start. Anyway, I also calculated the values for a passphrase chosen out of 10,000 words if a really easy set of known words is used.

A “good/strength” password should have about 80 bits of security, i.e., 2^{80} = 1.2*10^{24} different possibilities for the random passwords. That is, a brute-force attack would need to test 2^{80} different passwords to do an exhaustive key search. (Yes, I know, one could assume that such a machine only needs 2^{n-1} tries to find the password since it locates the correct one approximately after the half key space. However, this 1 bit does not make the difference here.)

Calculated Values

I have calculated the entropy values for the following charsets: 10 digits, 83 chars, 94 chars, 10k words, 200k words, and 500k words. Then I calculated the entropy in bits while I rounded down the bit values. So, here comes the main graph: The x-axis shows the number of characters (password) respectively the number of words (passphrase) for the random chosen passwords/passphrases, while the y-axis shows the bits of entropy which the password/passphrase actually has. (The raw values for the calculations can be found at the end of this post.)

Password Entropy

–> That is: To have 80 bits of security, a password needs about 13 characters while a passphrase only needs about 5 words! However, a passphrase chosen out of 10,000 words needs 7 words to have the same strength. <–

Since most passwords today have only 8 chars (51 bits of entropy), a passphrase with only 3 words (52 bits of entropy) would fit!

Some Interpretations

When moving from the bottom to the top of the lines in the graph we see that passwords chosen out of the 10 digits are not that secure. For example, a password with a length of 4 digits only has an entropy of 13 bits, and even 12 digits only have an entropy of 39 bits. So: don’t ever use only digits for critical passwords!

An interesting fact is the very small difference between passwords that are chosen out of 83 chars compared to passwords chosen out of 94 chars (blue and red line). I always thought that it is much more secure to use almost all possible characters for the generation of passwords, although I knew that the end-users won’t be happy if I allocate such passwords to them. But we can see that the entropy does not increase that much when adding all strange characters to the passwords. For example: A password with 8 characters has an entropy of 51 bits when chosen out of 83 chars, while it has 52 bits (only 1 more!) when chosen out of 94 chars. But if we extend the password to a length of 10, the 83 charset achieves an entropy of 63 bits, which is 12 bits more than before! However, to have at least 80 bits of entropy, you should use not less than 13 characters for your passwords.

It is easy to see that the complexity of the passphrases increases much faster than those from passwords. In fact, a 4-word passphrase chosen out of 200,000 words has an entropy of 70 bits. Now here comes the same effect as mentioned in the paragraph before: Increasing the set of words from 200k to 500k (e.g., using two languages) does not increase the security that much: only from 70 to 75 bits in the case of 4 words. But: Going from 4 to 5 words in the set of 200k words increases the entropy from 70 to 88 bits! That is: Use your mother tongue and a length of 5 words and you are secure! 😉

Concerning the passphrases chosen out of 10,000 words: This requires to have at least 7 words to gain an entropy greater than 80 bits. Ok, this is possible to remember, but I think it is getting a bit annoying if you have to remember such huge sentences for your passphrases. So: Don’t use such small dictionaries for your passphrase generation.

Increasing the Complexity

What about ideas like “I am additionally using some characters between my randomly chosen words to increase the complexity”? Well, let’s make an example: If you have 4 random words in your passphrase already and add 2 chars out of the 83 charset in random positions between the words, this would give you 20 different positions while each position would have 83*83 = 6889 possibilities, resulting in 6889*20 = 137780 more possible passphrases for each passphrase. The number of passphrases would be 200000^{4}*20*83*83 = 2.2*10^{26} = 2^{87} which results in an entropy of 87 bits. This actually is a much higher entropy than the 70 bits of entropy from the 4 word passphrase!

(Do not confuse this idea with the method of adding the 83 charset to your 200k words for choosing the passphrase with only 4 words. It really does not matter whether you are using 200,000 or 200,083 “words” for your passphrases.)

Problems With Passphrases

To say it one more time: Your passphrases need to be randomly generated! (As well as your passwords, of course.) Do not generate your own “good” passphrase by just looking around in the room you are sitting in and concatenating the things you see to generate a passphrase.

The same behaves if you choose your passphrases out of some randomly suggestions. That is: If the passphrase generator you are using shows some examples of the just generated passphrases, you should NOT choose the one which looks very easy to you. The reason is, that when you choose a passphrase manually, it decreases the entropy since an attacker will always start its brute-force attacks with the simplest word sets. This is really important. You should not generate your passphrases yourself nor should you choose “easy” passphrases out of randomly generated ones!

Another general problem when using long passphrases is the input length of password fields in the applications/services you are using. I have had many services that limit the input for passwords to 16 (or the like) characters. But a passphrase with at least 4 words will have more than 16 chars. Hm, bad news… (Hopefully the application tells you that your password is too long and does not simply cut it after the maximum input size ;))

Conclusion

Coming back to the xkcd comic: Yes, it is more secure to use a passphrase with 4 words than a password. And yes, it would be much easier for humans to remember such a passphrase. However, the complexity bits shown in the comic are not based on the mathematic I have shown in this post, but are suggestions from not randomly generated passwords.

If chosen completely random, a passphrase with 4 words has the same complexity as a password with 11 characters. Since it is more applicable for an end-user to use a passphrase, this method should be preferred. And it should be much easier for the security engineer to motivate the users to learn a passphrase than a random password. 😉

However, if you are using more than 10 different applications and want to allocate 10 different passwords/passphrases, you are lost anyway! So my advise is still to use a password safe such as KeePass with a really strength password/passphrase as the master password! (A german KeePass introduction can be found here on my blog.)

Appendix: Raw Values

The following tables show the raw values used for the figure above. The first one lists the length of the passwords (count of characters for passwords respectively the count of words for passphrases) and the number of different passwords for each character set. The second one shows the corresponding security complexities = password entropies.

Length10 Numbers83 Chars94 Chars10k Words200k Words500k Words
CharacterSet^{PasswordLength}
110839410000200000500000
210068898836100000000400000000002,5E+11
310005717878305841E+128E+151,25E+17
41000047458321780748961E+161,6E+216,25E+22
5100000393904064373390402241E+203,2E+263,125E+28
610000003,2694E+116,8987E+111E+246,4E+311,5625E+34
7100000002,71361E+136,48478E+131E+281,28E+377,8125E+39
81000000002,25229E+156,09569E+151E+322,56E+423,90625E+45
910000000001,8694E+175,72995E+171E+365,12E+471,95313E+51
10100000000001,5516E+195,38615E+191E+401,024E+539,76563E+56
111E+111,28783E+215,06298E+211E+442,048E+584,88281E+62
121E+121,0689E+234,7592E+231E+484,096E+632,44141E+68
131E+138,87187E+244,47365E+251E+528,192E+681,2207E+74
141E+147,36365E+264,20523E+271E+561,6384E+746,10352E+79
151E+156,11183E+283,95292E+291E+603,2768E+793,05176E+85
161E+165,07282E+303,71574E+311E+646,5536E+841,52588E+91
171E+174,21044E+323,4928E+331E+681,31072E+907,62939E+96
181E+183,49467E+343,28323E+351E+722,62144E+953,8147E+102
191E+192,90057E+363,08624E+371E+765,2429E+1001,9073E+108
201E+202,40748E+382,90106E+391E+801,0486E+1069,5367E+113
LengthEntropy
10 Numbers
Entropy
83 Chars
Entropy
94 Chars
Entropy
10k Words
Entropy
200k Words
Entropy
500k Words
log_2(NumberOfPasswords)
1366131718
261213263537
391919395256
4132526537075
5163132668894
619383979105113
723444593123132
8265152106140151
9295758119158170
10336365132176189
11367072146193208
12397678159211227
13438285172228246
14468991186246265
15499598199264283
1653102104212281302
1756108111225299321
1859114117239316340
1963121124252334359
2066127131265352378

Other References

If you are interested in other discussions about password security, refer to the following pages:

19 thoughts on “Password Strength/Entropy: Characters vs. Words

  1. “…….If you have 4 random words in your passphrase already and add 2 chars out of the 83 charset in random positions between the words, this would give you 20 different positions……”
    Could you explain that is some more detail? I see just 5 positions to insert 2 random characters like A1… A5. In practice only 4 since the position of A1 will probably not be used.
    A1 word1 A2 word2 A3 word3 A4 word4 A5
    (spaces inserted just for readability)

    And if you’d allow splitting the 2 characters, I see just 10=4+3+2+1 positions
    A word1 1 word2 word3 word4
    A word1 word2 2 word3 word4
    A word1 word2 word3 3 word4
    A word1 word2 word3 word4 4

    word1 A word2 5 word3 word4
    etc.

    1. Hi. My idea was to add 2 independent chars, lets say x and y. Then you could have something like this (say, the 4 words are 1 2 3 4):

      xy1234
      x1y234
      x12y34
      x123y4
      x1234y

      1xy234
      1x2y34
      1x23y4
      1x234y

      12xy34
      12x3y4
      12x34y

      123xy4
      123x4y

      1234xy

      = 15 options.

      I further thought of the 5 options in which only 1 char is inserted (which gives a total of 20 different options):
      x1234
      1×234
      12×34
      123×4
      1234x

      But here you are right: My calculation about the 83 * 83 is a bit wrong, since this is only true for the 15 options with both chars, and not for the 5 with only one char. Hm. So it must be a bit more precise for the calculation, though it should not be that much away from mine. Something like this:
      200000^4 * 15 * 83 * 83 + 200000^3 * 5 * 83

      Thanks for the hint. It took my some time to rebuilt my thoughts. 😉
      Please check all my other calculations, too, and tell me, if there are more mistakes!

  2. Thanks for the details, Sorry I did not add up my 5 and 10 cases, which I should have. I agree that the original estimate is a good one.

    Diceware suggests a related strengthening scheme: Add just 1 random character/symbol at a random place in a generated phrase, and some 10 bits of entropy will be gained.

    FYI, I have successfully used you raw value table as a reference to check numbers generated by my passphrase generator and tester SimThrow. I checked it up to 8 words of the 10K, 200K and 500K dictionaries.

  3. Let’s say you mix cases in your dictionary words. That would make a huge difference, would it not? To find a 5-character word in the dictionary, the hacker would have to make 52*52*52*52*52 passes through the dictionary. So easy-to-remember rules like capitalizing all vowels, or the second letter of every word, etc., would appear to make dictionaries too large to be useful.

    1. Yes, correct. It would increase the security if you have further rules such as capitalizing several letters. BUT ONLY if you are doing it randomly! If you always capitalize the vowels, (and if the hackers knows that), you have NOT increased anything!

      However, it is not necessary at all if you are already using 5 words, since the bits-of-security are already enough. 😉

  4. I usually try to tailor the password security to the function. On a limited access computer where the only remote access is with ssh using RSA and DSA keys, the passwords may be short and relatively simple.

    On the other hand, for computers accessible over the internet, I frequently use nonsense phrases of varying lengths up to just less than 100 characters.

    I’ve also used directions from one place to another as a passphrase. For example “Jack’s Laundry Washington North 15th Main Ralph’s Bank”. While not random, the choice of source and destination is pretty much random from a limited set of selections but with non official names (from where Jack does his laundry to where Ralph works as a teller) and the directions may not be the most direct or the most obvious. Alternatively, it could just be a couple of places with their address “Smith’s Church 101 North Vermont Sally’s Neighbor 503 Elm”.

    And then there are the formula along with a couple of extra words such as “E^2 = m^2 + p^2 gravitational hedgehog”.

    One thing that I’ve found is to never use two passphrases that begin with the same word. I did that once and was always confusing the two passphrases.

  5. Don’t know how they get 28 bits of entropy with the first example, this way this is calculated is for me totally wrong.

    You have case senstive alphanum + symbols, so lets say 94 character. So the amount of possible combination is 94^11 as Tr0ub4dor&3 as a length of 11.

    Then, to get the entropy, you apply the log base 2 of the total amount of possible combination. So,

    log_2(94^11) = 72 bits.

    So the entropy is equal to 72. I didn’t take the time to calculate the second with the horse but it also seems to be also totally wrong..

    1. mynameisnobody wrote on 2016-04-13 at 09:59 :
      “Don’t know how they get 28 bits of entropy with the first
      example, this way this is calculated is for me totally wrong.”

      No they are right. The base for the calculation is a randomly chose word form a ~65000 word dictionary, that gives about 16 bits of strength.
      Then for every word a number of derived words are taken. For example starting with a capital or not. That results in a twofold of words to be tested, or 1 bit extra. Adding a numerical and punctuation and test all possibilities behind the base word, gives 7 bits. Not knowing the sequence of that doubles that or one extra bit. etc. etc.
      Now the issue that this is a assumption that will work, is that the more permutations you apply, the less the password will be memorable. So one stops after one or 2 permutations.

  6. My question is: What is the ideal kind of password to use for the password manager?

    Let’s assume we use 20 characters. Is it more secure to:

    1. Use one with completely random characters which you write down on a paper and keep it safe (or learn to remember it).

    or

    2. Use a certain number of uncommon words that will make up 20 characters, which you will probably be able to remember in your head.

    1. Hi Jonathan,

      choose whatever you like as long as you have at least 80 bits of security. -> “To have 80 bits of security, a password needs about 13 characters while a passphrase only needs about 5 words!”

      But do NOT use your own generated passphrase with “uncommon words”. Because what is uncommon? Do you decide it? The values proposed in this post only apply to truly random chosen passphrases without any human interaction!

      In my opinion, a passphrase is much better for a pasword manager because, as you already noted, is easier to remember than 13 or more characters in a randomly chosen password. 😉

Leave a Reply

Your email address will not be published. Required fields are marked *