The things that come to you in the shower …
I was wondering the other morning whether any password-guessing tool uses scientific password frequency analysis. Is there any tool that uses as part of its algorithm the percent chance of a user employing the password Tiger2 vs. Xw3yque? We all inherently know the former will be more likely, but does any tool take that kind of probability into consideration?
Any practical implementation of frequency analysis would significantly improve password guessing. Forget malicious hackers; scientifically derived data could be used to help the good guys -- police, anti-child-pornography enforcement, armed services, and so on -- crack the bad guys' passwords more quickly.
A little background first.
As passwords increase in length, brute-force guesswork becomes harder. This is because the keyspace -- the number of possible passwords given a minimum or maximum password length -- increases with the password length.
For example, in Windows, a log-on password can use almost any Unicode character, of which there are 65,536, and passwords can be as long as 127 characters. The effective keyspace, then, is 164,000 + 264,000+ … 12764,000.
So if users took advantage of all available symbols to construct very long passwords, there would be more than 4.92 x 10611 unique passwords in operation. The equivalent crypto key would have 2,032 bits of encryption. The young kid in me wants to say it would be something like a gazillion million billions.
The reality is that most passwords are short, are made up of about 40 different characters or symbols, and are often something found in a dictionary or book of baby or pet names. Password hackers and password-cracking programs understand this.
When trying to brute-force a password -- or password hash -- there are four major techniques that crackers employ either manually or using an automated tool:
1. Sequential guessing (a, b, c, …, aa, ab, …)
2. Dictionary (common words, names, nouns, and so on)
3. Birthday attack (random guesses instead of sequential)
4. Hybrid (combinations of the first three techniques, plus intuitive logic)
Several password tools successfully use one or more of these techniques when attacking passwords. Hybrid password crackers will often use a password dictionary file and then append and replace characters and symbols in the various combinations, such as fr0g2.
But today’s publicly available password crackers are still rather simplistic in their guesswork. No logic is used to find out whether fr0g2 is more likely to be used than a@rdvark2. The vast majority of the guesses made by a password-guessing program have a very low probability of being correct, but a password guesser based on real frequency analysis would know that frog is a more common password than aardvark. Or that the words password or secret are more likely to appear in a password than infrastructure or strategic.
To help the professional password crackers, I’d like to see a password-cracking program with probabilities built in. Its password dictionary wouldn’t list words sequentially from A to Z, but from 99 percent to 0 percent probability.