Separate password breaches last week at LinkedIn, eHarmony and Last.fm exposed millions of credentials, and once again raised the question of whether any company can get password security right. To understand more about why companies keep making the same mistakes and what they might do differently to prevent future password debacles, I interviewed Thomas H. Ptacek, a security researcher with Matasano Security.
Ptacek is just one of several extremely smart researchers I’ve been speaking with about this topic. Below are some snippets from a conversation we had last week.
BK: I was just reading an article by Eric Chabrow, which pointed out that LinkedIn — a multi-billion dollar company that holds personal information on some of world’s most important executives — has neither a chief information officer nor a chief information security officer. Is it too much to ask for a company like this to take security seriously enough to do a better job protecting and securing their users’ passwords?
Ptacek: There is no correlation between how much money a company or service has or takes in — or whether it’s free or not free — and how good their password storage practices are. Nobody gets this right. I think it’s a problem of generalist developers writing password storage systems. They may be good developers, but they’re almost never security domain specialists. There are very few good developers who are also security domain specialists. So if you’re a smart and talented developer but not a security domain specialist, and you’ve got to put together a password storage system, even if it’s just MD5, to you that’s a space alien technology. That’s a cryptographic algorithm. You can tell your boss that’s a cryptographic hash. You feel good about the fact that you’re storing passwords in a cryptographic hash. But you have to be a domain expert to know that the term cryptographic hash doesn’t really mean much.
BK: Why doesn’t cryptographic hash mean much? Maybe LinkedIn shouldn’t have been using a plain old SHA-1 cryptographic hash function, but shouldn’t developers be seeking to secure their passwords with a solid cryptographic algorithm?
Ptacek: The basic mechanism by which SHA-1 passwords are cracked, or MD5 or SHA-512 — it doesn’t matter what algorithm you use — hasn’t changed since the early 1990s. As soon as code to implement SHA-1 came out, it was also available to John the Ripper and other password cracking tools. It’s a really common misconception — including among security people — that the problem here is using SHA-1. It would not have mattered at all if they had used SHA-512, they would be no better off at all.
BK: I’ve heard people say, you know this probably would not have happened if LinkedIn and others had salted the passwords — or added some randomness to each of the passwords, thus forcing attackers to expend more resources to crack the password hashes. Do you agree with that?
Ptacek: That’s actually another misconception, the idea that the problem is that the passwords were unsalted. UNIX passwords, and they’ve been salted forever, since the 70s, and they have been cracked forever. The idea of a salt in your password is a 70s solution. Back in the 90s, when people broke into UNIX servers, they would steal the shadow password file and would crack that. Invariably when you lost the server, you lost the passwords on that server.
BK: Okay. So if the weakness isn’t with the strength of the cryptographic algorithm, and not with the lack of salt added to the hashed passwords, what’s the answer?
Ptacek: In LinkedIn’s case, and with many other sites, the problem is they’re using the wrong kind of algorithm. They use a cryptographic hash, when they need to use a password hash.
BK: I’ll bite: What’s the difference?
Ptacek: The difference between a cryptographic hash and a password storage hash is that a cryptographic hash is designed to be very, very fast. And it has to be because it’s designed to be used in things like IP-sec. On a packet-by-packet basis, every time a packet hits an Ethernet card, these are things that have to run fast enough to add no discernible latencies to traffic going through Internet routers and things like that. And so the core design goal for cryptographic hashes is to make them lightning fast.
Well, that’s the opposite of what you want with a password hash. You want a password hash to be very slow. The reason for that is a normal user logs in once or twice a day if that — maybe they mistype their password, and have to log in twice or whatever. But in most cases, there are very few interactions the normal user has with a web site with a password hash. Very little of the overhead in running a Web application comes from your password hashing. But if you think about what an attacker has to do, they have a file full of hashes, and they have to try zillions of password combinations against every one of those hashes. For them, if you make a password hash take longer, that’s murder on them.
So, if you use a modern password hash — even if you are hardware accelerated, even if you designed your own circuits to do password hashing, there are modern, secure password hashes that would take hundreds or thousands of years to test passwords on.
BK: Can you give me an example of the difference between the time and resources it would take to crack, say, a SHA-1 password versus a good password hash?
Ptacek: Let’s say you have a reasonably okay password and you’re using salted, randomized SHA-1 hash. I can, off-the-shelf, build a system that will try many, many tens of thousands of possible passwords per second. I can go into Best Buy or whatever and buy a card off the shelf that will do that, and the software to do it is open source.
If we were instead using something like Bcrypt, which would have just been the difference of using a different [software] library when I built my application, I can set Bcrypt up so that a single attempt — to test whether my password was “password” or “himom” — just that one test could take a hundred milliseconds. So all of a sudden you’re talking about tens of tries per second, instead of tens of thousands.
What you do with a password hash is you design it in the opposite way you would design a standard cryptographic hash. A cryptographic hash wants to do the minimum amount of work possible in order to arrive at a secure result. But a password hash wants to deliberately be designed to do the maximum amount of work.
Can you explain in layman’s terms what it is that makes a password hash like Bcrypt take so much longer to crack?
Ptacek: It’s similar to if you said, let’s take SHA-1 password, but instead of just using SHA-1, let’s run SHA-1 on itself thousands of times. So, we’ll take the output of SHA-1 and feed it back to SHA-1, and we’ll do that thousands and thousands of times, and you’ll only know if your password hash is right, when you look at the result of that 1,000th SHA-1 run. So, in order to make that password hash work, you have to run the algorithm 1,000 times for each guess. That’s roughly the tactic that the modern, secure password hashes take. These are algorithms that are designed so that you can’t arrive at the result without lots and lots of work. I mean, we’re talking about 100 milliseconds [one-tenth of a second] worth of work on modern hardware to get the results of a single password attempt.
BK: So what’s the catch here? Are these password hashing systems any harder or more expensive to deploy?
Ptacek: It’s a knowledge issue. This is easier to deploy that building it yourself. If you build it yourself, it takes a fair amount of effort. Whereas the libraries to secure password hashes, just where you drop them in it’s like one function call. And they’re all free. There are no patents to worry about or anything like that.
The Web application very, very minorly increases its costs. Some imperceptible amount. But cleverly, they’ve done it in a way that drastically explodes the costs for attackers. If a normal password try takes a microsecond [one millionth of a second]– but a bcrypt password trial takes 10 milliseconds [a hundredth of a second] — you might as well give up. It will take years and years.
Let’s say we have a Web application that lost 10 million password hashes, but they’re bcrypted passwords. So, I think in that last dump we saw from the LinkedIn dump, about like three million of those passwords had been cracked. It was really quick and easy to see if your password had been in there. In this case, if we lost 10 million bcrypted passwords, instead of 3 million of them being published and compromised, you might be looking at tens or hundreds of user passwords being compromised. Because you could maybe effectively look for every password that was “password,” but you certainly couldn’t run a dictionary against every password there anymore.
BK: Isn’t it likely that lots of companies that are relying on cryptographic hashes haven’t switched to password hashes because it might be disruptive, expensive or hard to do? Do you agree, or are there other issues getting in the way here?
Ptacek: At a certain point, the cost of migrating that is incredibly expensive. And securing Web applications that are as complex as LinkedIn is an incredibly hard problem. At same time, trying to find talent to make sure they’re building it correctly…there’s just not enough application security talent out there. It’s hard to hire people who know what they’re talking about. There’s so much bulk knowledge that gets substituted for real expertise. You can find a lot of general developers sharing wisdom about how to develop secure applications, but these are by and large just general developers trying to be smart, they’re not people who really know what they’re talking about.
To me, the fundamental problem is passwords. People should use better storage algorithms if they’re going to inflict passwords on their users. They should do a better job of it. But the real answer is things like two-factor authentication with smart phones. Two factor authentication seems like the answer to me. I think ten years from now this is going to be a common approach.
Dynamic password will be ok.
really imformative article, i was wondering how secure is using this
this i mean in php
$input = “something”;
$output = hash(“sha512”, $text);
Nooooooo. Not strong, like bull.
Do this – http://php.ss23.geek.nz/2011/01/12/Using-crypt.html
thank you so much.
thank you again, just to clear up my mind
when user registers with
$Password = ‘SuperSecurePassword123’;
i should do
$Salt = uniqid();$Algo = ‘6’;
$Rounds = ‘5000’;
$CryptSalt = ‘$’ . $Algo . ‘$rounds=’ . $Rounds . ‘$’ . $Salt;
$HashedPassword = crypt($Password, $CryptSalt);
and save only $HashedPassword to database
and check the same in my sql queries
is that it?
If you pull the entire hash from your db for the user and use that hash to hash the value entered by the user – then you can compare the two hashes – the one you just generated and the one that was in your DB.
It is slightly confusing because people normally try to match their hashes in SQL – but that won’t work here.
Basically the old hash is used to configure the new hash generation – because it contains the type of hash function, salt and rounds for when you did it before.
oh yes, by bad, so how should i check it?
thank you for your patience, but it seems only u can help me understand this better.
ok i think i get it,
i should get the saved hash value (corresponding to email or username ) and do
if (crypt($Password1, $HashedPassword) == $HashedPassword)
hmmm… hope i am correct
bluepicaso, you’ve pretty much figured this out, and PHP’s CRYPT_SHA512 is a reasonable choice (PHP 5.3.2+ only, though), but you could want to consider using phpass to hide this kind of detail from you, to have it generate more random salts for you, and to have it use CRYPT_BLOWFISH (where available), which has slight advantage in terms of being somewhat GPU-unfriendly (since you don’t make use of a GPU anyway, but an attacker may):
How to use phpass:
bcrypt (CRYPT_BLOWFISH in PHP terms) vs. SHA-crypt (CRYPT_SHA512):
This article is in some cases right and in some cases wrong. Just consider you make a loop of sha about 1000 times.
An attacker just has to follow the entries in the rainbow tables until he comes to the last one. so instead of looking up the first entry he makes a connection line of 1000 entries. that’s not really hard.
Of course this database would be really really really massive big. exa bytes and more.
but as american government and others showed their resources they are capable to handle such a database.
Integrating salts would make it a little step harder but if you imagine of an attacker with a massive rainbow table it is nothing.
new algorithms directly created for password hashing could be created and not directly based on sha or others. the only thing you get with this new algorithm is time. until the attacker has build a new rainbow table for this algorithm. you could enlarge the hash-length.
but in the end it doesn’t even matter.
we should concentrate more on quantum cryptography not on theorems based on the performance of computers
quantum cryptography – that is definitely a step in positive direction. It will make things a ‘bit’ more difficult for many attackers/hackers.
Firstly hence the multi-vector approach… Lots of time, random salts and hashes that take a long time to generate. If it takes the US government about 1000 years to gen a rainbow table for one hash you’re probably doing it right..
Secondly lend me a quantumn CPU and we’ll talk about it. Oh yeah about that…
To be blunt: you’re awful at math and you don’t understand what you’re talking about.
It is not physically possible to build rainbow tables for a hash that uses a large salt value (i.e. 128-bits). The storage is impossible because your storage device would have to be larger than the Earth.
Stretching makes brute-force and rainbow generation take way longer than with a single iteration of SHA/MD5. You can’t just use existing rainbow tables and jump ahead 1000 places in the chain. Rainbow tables use a reduction function after each step in the chain which changes the input to the next iteration. You need the reduction functions; they are not optional. The other problem with using an existing rainbow table that is built for a single pass of an algorithm is that only every 1000th value would be useful (or worse, depending on the number of iterations used when stretching).
Increasing the hash length does nothing in this case. A 128-bit hash is fine for passwords; the problem isn’t the hash length.
you are right dude.
i just focused on the worst case.
I completely disagree with Ptacek on the topic of salts. Even if you were using bcrypt or another ‘slow’ hash function, when the hashes leak an attacker could simply try looking up the hash 10.000 or so most common passwords and probably have a lot of success.
This becomes impossible with salting each password with a property (preferably a long random string) unique to each user. This also renders rainbow tables useless.
I’d say that using properly salted MD-5 hashes is far, far more secure than running bcrypt 100 times on each password without a salt (or with the same salt being used for everyone), despite MD-5 being very fast and weak.