Good and Bad Passwords How-To
Review of the Conclusions and Dictionaries Used in
a Password Cracking Study
Password Research
How much do we really know about how users create passwords?
There is a lot of anecdotal evidence but not much quantitative
evidence regarding real user passwords. In the "standard
paper on UNIX password
security"2, titled
"Password Security: A Case
History"7, Robert
Morris and Ken Thompson describe the characteristics of most
of 3289 passwords they had collected over a period of time.
551 were one to three characters, 477 were four letters,
706 were five single case characters and 605 were six lower
case characters. "An additional 492 passwords appeared in"
various dictionaries and lists. They did not describe the
remaining 14% nor did they say whether the four through
six character letter strings included words or variations
of words.
In 1991, Daniel V. Klien did the only comprehensive password
analysis I've seen. The results were published as
"Foiling the Cracker: A Survey of, and Improvements to, Password
Security"4. He obtained
a database of 13,797 user accounts from a variety of sources and
successfully cracked 3340 or 24% of them. The most
computationally efficient approach was 130 variations of the
account name, user name and other personal information taken
directly from the passwd file. This yielded 368 passwords or
2.7%.
Several name lists were used as dictionaries. In aggregate they
provided 1043 passwords or 7.6%. The cost/benefit ratio varied
dramatically with "common names" being the most productive. The
best single source was the dictionary provided in
/usr/dict/words. This yielded 1027 passwords or 7.4%. The list
"Phrases and Patterns" got 253 or 1.8% with a very good
efficiency. This included a somewhat diverse collection compiled
by Dan Klien and others. Examples are 123abc, 4.2bsd, "get
lost", gotohell, ibmpc, itty-bitty, xyz.
"Machine Names" also found a significant number, 132 or 1% but
with a very low efficiency. This list was created from an
/etc/hosts file. It's worth noting it has a significant number
of ordinary words and names in it as well hundreds of demo9999
names. It would be interesting to know how many should have been
in another dictionary. Several of the lists were compiled by Dan
Klien and associates. Some are surprisingly small. The "Movies
and Actors" list is very small (118 entries) and eclectic but
resulted 12 passwords. One can only speculate but it seems very
likely that larger more comprehensive lists now available would
yield significantly more found passwords, perhaps at a poorer
cost benefit ratio.
The words from the lists were each manipulated using 14 to 17
methods similar to those described in the previous
don't list. Additional
capitalization variations were performed.
An Analysis of Dan Klein's Dictionaries
It's worth looking at the dictionaries used by Dan Klien is some
detail. There were two general word dictionaries. One was
/usr/dict/words, the standard UNIX dictionary used for spell
checking. This is a small general purpose dictionary. Thus most
of the words in it will be common compared to some of the words
found in a collegiate dictionary or most in an unabridged
dictionary. There were 3212 miscellaneous words from the "junk"
dictionary that did not appear in the other dictionaries used by
Dan Klien. Some of these were more obscure words but others are
character sequences that do not appear to be words from any
language I've ever seen; the comment admits this list contains
many junk words.
The 19,683 word, standard dictionary, lead to 1027 passwords at a
cost/benefit ratio of 0.052; the miscellaneous words resulted in
54 passwords at a cost/benefit ratio of 0.017. The results
support three conclusions that are consistent with common sense
and anecdotal observations on passwords. Many people use
ordinary words as the basis for their passwords. Of these most
choose words that quickly come to mind, i.e., common words. A
smaller group tries to find "obscure" words on which to base
their password. It would be interesting to know what results
an unabridged dictionary would have produced if used against
the same account and password database.
In Dan Klein's paper, "Common names" were identified as the
second most productive dictionary with 2239 names yielding 548
passwords at a 0.245 cost/benefit ratio. This was the fourth
best cost/benefit ratio of 27 dictionaries used. It's by far the
largest of the "high yield" dictionaries but still less than one eighth
the size of the /usr/dict/words list which was the only single
list to yield more passwords. In short, common first names are
by a signifcant amount, the most frequent basis for passwords.
The actual contents of this dictionary are very interesting.
There is a comment at the beginning of the dictionary: "First
names garnered from a number of password files. We get a good hit
rate from these. Probably could be culled somewhat. By Daniel
Klein." Reviewing these, though the list certainly contains
many, perhaps most common names, I see a significant number of
names I don't recognize. The "could be culled" comment supports
this.
The Census Bureau creates three exceptional quality common name
lists. There is one for male first names, one for female first
name and one for last names. Each list is ordered by the
frequency that the name is used within the U.S. population. Each
list includes as many names as necessary so that 90% of the U.S.
population has their name listed. There are 1219 male names,
4275 female names and 88,799 last names. 3.3% of the men in the
U.S. are named James and nearly as many named John. 2.6% of the
women are named Mary but this is more than two and a half times
as many as Patricia, the second most common female name. 1% of
the population is named Smith and .8% Johnson.
Many of Dan Klien's "common names" do not appear in any Census
name list. These odd names may have yielded good results but if
so, it's likely do to a strong local bias. I suspect the success
of this list was due to the many really common names it does
contain. I would expect a similar size selection from the
Census lists to have gotten better results on Daniel Klien's own
data. The Census lists or selected portions of them would almost
certainly produce better results at sites throughout the U.S.
Likewise, similar lists from other countries would most likely
have a very high return in the country of origin.
Dan Klien's 62,727 word dictionary was an good first step in
building a password cracking dictionary. Today, larger, more
comprehensive and more consistent quality lists are readily
available. Some of the specific dictionaries that he created
should be considered for inclusion in any cracking dictionary
that is to be built. The comprehensive character sequences and
numbers contained in the similarly named dictionaries probably
belong in any cracking dictionary.
Top of Page -
Site Map
Copyright © 2000 - 2006 by George Shaffer.
This material may be distributed only subject to the
terms and conditions set forth on
http://GeodSoft.com/terms.htm.
These terms are subject to change. Distribution is subject to the then
current terms, or at the choice of the distributor, those defined in a
verifiably dated printout or electronic copy of
http://GeodSoft.com/terms.htm at the time of the distribution.
Distribution of substantively modified versions of GeodSoft content is
prohibited without the explicit permission of George Shaffer.
Distribution of the work or derivatives of the work, in whole or in part,
for commercial purposes is prohibited unless prior permission is
obtained from George Shaffer. Distribution in accordance with these
terms, for private, unrestricted and uncompensated public access, non
profit, or internal company use is allowed.
|