I personally use 'password' for my password on sites like Gawker, where I'm being forced to create an account I don't care about. Using 'password' for my password is my note to myself that this is a junk account that I have no interest in. I just don't care if somebody accesses it, period.
I suspect that others do the same thing, and little weight should be given to the strength of passwords recovered from a site such as this.
The list is basically identical to every "most common passwords" leak that's come out since the beginning of the web. Even "monkey", which the author seems to think is quirk of the Gawker community, is known to frequently be a top 20 password.
This lead me to a question about DES. If no salt is provided it uses a static or default two character salt. In the gawker leak, the first two characters of the stored hash were the default salt. How is that two character default derived?
In the gawker leak, the first two characters of the stored hash were not a default salt, they were random salts. As for how they're generate? Well, randomly.
The thing is that they don't appear to be random. Everyone with the same password had the same salt. I guess default isn't the right word. Whatever algorithm they used to generate the salt always generated the same two characters given the same input. I'm curious as to whether anyone knows the details of that function.
I misinterpreted what I was looking at and I made a bad assumption that since the same hash for a known password appeared several times in my narrow view that it was the same for all occurrences of the known password.
Your comment that the maximum number of a given salt||hash was seven also threw me for a second. Am I correct that that is purely coincidental? Given the limit of only two characters for the salt (in what set? all printable characters?) and the sheer volume of accounts there is simply some unintended overlap? It just happens that the most it occured was seven times?
Yeah, seven is just coincidental. But, it appears that you are correct in one respect: seven seems a bit high to me. I don't have the time to do the probability distributions out, if someone cares would they do the calculation and check?
EDIT: The salt 'sV' occurred 215 times. sV39Fw5at18zo occurs seven times. Assuming that there were only 300 possible passwords each of which occurred with probability ~.3% (the probability of '123456'), then the probability of seven passwords hashing to the same value is incredibly low. Less than a thousandth of one percent. Does anyone know why this is? Or was it just the case that Scorpion's assumption that the distribution is very non-random is correct?
The takeaway here? If you want to "hack" into sites like these, you're virtually guaranteed to succeed by picking a few random usernames, and trying some combination of "123456", "password", "12345678", site name, and "qwerty" for password.
I think it's time for someone to come up with a radically better authentication mechanism.
There were one million passwords released, and only about 3,000 use '123456'. That's only a 0.3% chance.
Yes, you could have a bot that checked the top 50 passwords against a few thousand or so accounts- but even then, you'd only get one or two matches at most.
What if sites blocked passwords that have been used more than twice already? So, at most, there would be two "123456" passwords- any secure password is more than likely something that no more than one other person on any given site would be using.
Yes, this would suffice. But I think it points out that a duplicate password is just a proxy for a weak password: strong passwords tend to be locally unique.
If you are going to have a strength requirement, then run your strength validation routine and deny weak passwords. That a password is duplicated seems like a special case of weakness that is not worth checking for.
i.e. You probably don't want special checks for 'password' or '123456' either, since your strength validation routine should catch these.
If you are generating a per-password salt, that won't work. In order to find prior occurrences of a given password, you would have to hash the password for every salt value that you've ever used. And since you're using BCrypt[1], that will be very slow.
I am personally surprised by the number of proper names on there. Jennifer, Jordan, Michelle, Micheal. I know these are pretty common names (Jordan?) but when you figure the percentage of the population that would have these names, then the percentage of those that would use their name as a password (assuming they are using their name, and not for some other reason) then it's surprising that so many would make a top 50 list.
This is actually a pretty good analysis by the mainstream press. While the information is well-known to the point of being common sense for us, for readers of the WSJ it will likely be a learning experience.
I suspect that others do the same thing, and little weight should be given to the strength of passwords recovered from a site such as this.