GeodSoft logo   GeodSoft

Good and Bad Passwords How-To

Review of the Conclusions and Dictionaries Used in a Password Cracking Study
Password Research

How much do we really know about how users create passwords? There is a lot of anecdotal evidence but not much quantitative evidence regarding real user passwords. In the "standard paper on UNIX password security"2, titled "Password Security: A Case History"7, Robert Morris and Ken Thompson describe the characteristics of most of 3289 passwords they had collected over a period of time. 551 were one to three characters, 477 were four letters, 706 were five single case characters and 605 were six lower case characters. "An additional 492 passwords appeared in" various dictionaries and lists. They did not describe the remaining 14% nor did they say whether the four through six character letter strings included words or variations of words.

In 1991, Daniel V. Klien did the only comprehensive password analysis I've seen. The results were published as "Foiling the Cracker: A Survey of, and Improvements to, Password Security"4. He obtained a database of 13,797 user accounts from a variety of sources and successfully cracked 3340 or 24% of them. The most computationally efficient approach was 130 variations of the account name, user name and other personal information taken directly from the passwd file. This yielded 368 passwords or 2.7%.

Several name lists were used as dictionaries. In aggregate they provided 1043 passwords or 7.6%. The cost/benefit ratio varied dramatically with "common names" being the most productive. The best single source was the dictionary provided in /usr/dict/words. This yielded 1027 passwords or 7.4%. The list "Phrases and Patterns" got 253 or 1.8% with a very good efficiency. This included a somewhat diverse collection compiled by Dan Klien and others. Examples are 123abc, 4.2bsd, "get lost", gotohell, ibmpc, itty-bitty, xyz.

"Machine Names" also found a significant number, 132 or 1% but with a very low efficiency. This list was created from an /etc/hosts file. It's worth noting it has a significant number of ordinary words and names in it as well hundreds of demo9999 names. It would be interesting to know how many should have been in another dictionary. Several of the lists were compiled by Dan Klien and associates. Some are surprisingly small. The "Movies and Actors" list is very small (118 entries) and eclectic but resulted 12 passwords. One can only speculate but it seems very likely that larger more comprehensive lists now available would yield significantly more found passwords, perhaps at a poorer cost benefit ratio.

The words from the lists were each manipulated using 14 to 17 methods similar to those described in the previous don't list. Additional capitalization variations were performed.

An Analysis of Dan Klein's Dictionaries

It's worth looking at the dictionaries used by Dan Klien is some detail. There were two general word dictionaries. One was /usr/dict/words, the standard UNIX dictionary used for spell checking. This is a small general purpose dictionary. Thus most of the words in it will be common compared to some of the words found in a collegiate dictionary or most in an unabridged dictionary. There were 3212 miscellaneous words from the "junk" dictionary that did not appear in the other dictionaries used by Dan Klien. Some of these were more obscure words but others are character sequences that do not appear to be words from any language I've ever seen; the comment admits this list contains many junk words.

The 19,683 word, standard dictionary, lead to 1027 passwords at a cost/benefit ratio of 0.052; the miscellaneous words resulted in 54 passwords at a cost/benefit ratio of 0.017. The results support three conclusions that are consistent with common sense and anecdotal observations on passwords. Many people use ordinary words as the basis for their passwords. Of these most choose words that quickly come to mind, i.e., common words. A smaller group tries to find "obscure" words on which to base their password. It would be interesting to know what results an unabridged dictionary would have produced if used against the same account and password database.

In Dan Klein's paper, "Common names" were identified as the second most productive dictionary with 2239 names yielding 548 passwords at a 0.245 cost/benefit ratio. This was the fourth best cost/benefit ratio of 27 dictionaries used. It's by far the largest of the "high yield" dictionaries but still less than one eighth the size of the /usr/dict/words list which was the only single list to yield more passwords. In short, common first names are by a signifcant amount, the most frequent basis for passwords.

The actual contents of this dictionary are very interesting. There is a comment at the beginning of the dictionary: "First names garnered from a number of password files. We get a good hit rate from these. Probably could be culled somewhat. By Daniel Klein." Reviewing these, though the list certainly contains many, perhaps most common names, I see a significant number of names I don't recognize. The "could be culled" comment supports this.

The Census Bureau creates three exceptional quality common name lists. There is one for male first names, one for female first name and one for last names. Each list is ordered by the frequency that the name is used within the U.S. population. Each list includes as many names as necessary so that 90% of the U.S. population has their name listed. There are 1219 male names, 4275 female names and 88,799 last names. 3.3% of the men in the U.S. are named James and nearly as many named John. 2.6% of the women are named Mary but this is more than two and a half times as many as Patricia, the second most common female name. 1% of the population is named Smith and .8% Johnson.

Many of Dan Klien's "common names" do not appear in any Census name list. These odd names may have yielded good results but if so, it's likely do to a strong local bias. I suspect the success of this list was due to the many really common names it does contain. I would expect a similar size selection from the Census lists to have gotten better results on Daniel Klien's own data. The Census lists or selected portions of them would almost certainly produce better results at sites throughout the U.S. Likewise, similar lists from other countries would most likely have a very high return in the country of origin.

Dan Klien's 62,727 word dictionary was an good first step in building a password cracking dictionary. Today, larger, more comprehensive and more consistent quality lists are readily available. Some of the specific dictionaries that he created should be considered for inclusion in any cracking dictionary that is to be built. The comprehensive character sequences and numbers contained in the similarly named dictionaries probably belong in any cracking dictionary.

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2006 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth on http://GeodSoft.com/terms.htm. These terms are subject to change. Distribution is subject to the then current terms, or at the choice of the distributor, those defined in a verifiably dated printout or electronic copy of http://GeodSoft.com/terms.htm at the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior permission is obtained from George Shaffer. Distribution in accordance with these terms, for private, unrestricted and uncompensated public access, non profit, or internal company use is allowed.

 
Home >
How-To >
Good Passwords >
password_research.htm


What's New
How-To
Opinion
Book
                                       
Email address

Copyright © 2000-2006, George Shaffer. Terms and Conditions of Use.