Winning the password-probability game

Applying a little logic could help professional password crackers figure out which passwords occur most frequently

The things that come to you in the shower …

I was wondering the other morning whether any password-guessing tool uses scientific password frequency analysis. Is there any tool that uses as part of its algorithm the percent chance of a user employing the password Tiger2 vs. Xw3yque? We all inherently know the former will be more likely, but does any tool take that kind of probability into consideration?

Any practical implementation of frequency analysis would significantly improve password guessing. Forget malicious hackers; scientifically derived data could be used to help the good guys -- police, anti-child-pornography enforcement, armed services, and so on -- crack the bad guys' passwords more quickly.

A little background first.

As passwords increase in length, brute-force guesswork becomes harder. This is because the keyspace -- the number of possible passwords given a minimum or maximum password length -- increases with the password length.

For example, in Windows, a log-on password can use almost any Unicode character, of which there are 65,536, and passwords can be as long as 127 characters. The effective keyspace, then, is 164,000 + 264,000+ … 12764,000.

So if users took advantage of all available symbols to construct very long passwords, there would be more than 4.92 x 10611 unique passwords in operation. The equivalent crypto key would have 2,032 bits of encryption. The young kid in me wants to say it would be something like a gazillion million billions.

The reality is that most passwords are short, are made up of about 40 different characters or symbols, and are often something found in a dictionary or book of baby or pet names. Password hackers and password-cracking programs understand this.

When trying to brute-force a password -- or password hash -- there are four major techniques that crackers employ either manually or using an automated tool:

1. Sequential guessing (a, b, c, …, aa, ab, …)

2. Dictionary (common words, names, nouns, and so on)

3. Birthday attack (random guesses instead of sequential)

4. Hybrid (combinations of the first three techniques, plus intuitive logic)

Several password tools successfully use one or more of these techniques when attacking passwords. Hybrid password crackers will often use a password dictionary file and then append and replace characters and symbols in the various combinations, such as fr0g2.

But today’s publicly available password crackers are still rather simplistic in their guesswork. No logic is used to find out whether fr0g2 is more likely to be used than a@rdvark2. The vast majority of the guesses made by a password-guessing program have a very low probability of being correct, but a password guesser based on real frequency analysis would know that frog is a more common password than aardvark. Or that the words password or secret are more likely to appear in a password than infrastructure or strategic.

To help the professional password crackers, I’d like to see a password-cracking program with probabilities built in. Its password dictionary wouldn’t list words sequentially from A to Z, but from 99 percent to 0 percent probability.

Here are some of my observations and questions about passwords -- based on knowing nothing about particular users or their habits -- that might be used in a probability-based program.

1. If the minimum password size is X, most passwords will be from X to X+3 characters long.

2. Password crackers should spend less time offering up letters q, x, or z.

3. Most users place required capital letters in the first position or near the beginning.

4. When numbers are required, 1, 2, and maybe 9 are the most common, and they are usually positioned at the end. Common substitutions -- the number 1 for lowercase l, 5 for s -- should be taken into account.

5. When symbols are required, !, @, #, and $ seem most common, with ! most likely taking the place of lowercase l, @ substituting for a, and $ taking the place of s.

6. A frequency analysis should be conducted, using dictionary words as the base, on the most commonly used words in passwords. For example, even in the smaller subset of animal words, the tiger is used more often than genet, even though the latter isn’t any more complex to spell.

Can you think of more password observations? There are at least a few papers on the subject, but I haven’t seen any tools that take this type of analysis into consideration. It's too bad -- maybe more security mavens need to do their deep thinking in the shower.

Copyright © 2006 IDG Communications, Inc.