Not long ago I heard from a reader who wanted advice on how to stop someone from scanning his home network, or at least recommendations about to whom he should report the person doing the scanning. I couldn’t believe that people actually still cared about scanning, and I told him as much: These days there are countless entities — some benign and research-oriented, and some less benign — that are continuously mapping and cataloging virtually every device that’s put online.
One of the more benign is scans.io, a data repository of research findings collected through continuous scans of the public Internet. The project, hosted by the ZMap Team at the University of Michigan, includes huge, regularly updated results grouped around scanning for Internet hosts running some of the most commonly used “ports” or network entryways, such as Port 443 (think Web sites protected by the lock icon denoting SSL/TLS Web site encryption); Port 21, or file transfer protocol (FTP); and Port 25, or simple mail transfer protocol (SMTP), used by many businesses to send email.
When I was first getting my feet wet on the security beat roughly 15 years ago, the practice of scanning networks you didn’t own looking for the virtual equivalent of open doors and windows was still fairly frowned upon — if not grounds to get one into legal trouble. These days, complaining about being scanned is about as useful as griping that the top of your home is viewable via Google Earth. Trying to put devices on the Internet and then hoping that someone or something won’t find them is one of the most futile exercises in security-by-obscurity.
To get a gut check on this, I spoke at length last week with University of Michigan researcher Zakir Durumeric (ZD) and Michael D. Bailey at the University of Illinois at Urbana-Champaign (MB) about their ongoing and very public project to scan all the Internet-facing things. I was curious to get their perspective on how public perception of widespread Internet scanning has changed over the years, and how targeted scanning can actually lead to beneficial results for Internet users as a whole.
MB: Because of the historic bias against scanning and this debate between disclosure and security-by-obscurity, we’ve approached this very carefully. We certainly think that the benefits of publishing this information are huge, and that we’re just scratching the surface of what we can learn from it.
ZD: Yes, there are close to two dozen papers published now based on broad, Internet-wide scanning. People who are more focused on comprehensive scans tend to be the more serious publications that are trying to do statistical or large-scale analyses that are complete, versus just finding devices on the Internet. It’s really been in the last year that we’ve started ramping up and adding scans [to the scans.io site] more frequently.
BK: What are your short- and long-term goals with this project?
ZD: I think long-term we do want to add coverage of additional protocols. A lot of what we’re focused on is different aspects of a protocol. For example, if you’re looking at hosts running the “https://” protocol, there are many different ways you can ask questions depending on what perspective you come from. You see different attributes and behavior. So a lot of what we’ve done has revolved around https, which is of course hot right now within the research community.
MB: I’m excited to add other protocols. There are a handful of protocols that are critical to operations of the Internet, and I’m very interested in understanding the deployment of DNS, BGP, and TLS’s interception with SMTP. Right now, there’s a pretty long tail to all of these protocols, and so that’s where it starts to get interesting. We’d like to start looking at things like programmable logic controllers (PLCs) and things that are responding from industrial control systems.
ZD: One of the things we’re trying to pay more attention to is the world of embedded devices, or this ‘Internet of Things’ phenomenon. As Michael said, there are also industrial protocols, and there are different protocols that these embedded devices are supporting, and I think we’ll continue to add protocols around that class of devices as well because from a security perspective it’s incredibly interesting which devices are popping up on the Internet.
BK: What are some of the things you’ve found in your aggregate scanning results that surprised you?
ZD: I think one thing in the “https://” world that really popped out was we have this very large certificate authority ecosystem, and a lot of the attention is focused on a small number of authorities, but actually there is this very long tail — there are hundreds of certificate authorities that we don’t really think about on a daily basis, but that still have permission to sign for any Web site. That’s something we didn’t necessary expect. We knew there were a lot, but we didn’t really know what would come up until we looked at those.
There also was work we did a couple of years ago on cryptographic keys and how those are shared between devices. In one example, primes were being shared between RSA keys, and because of this we were able to factor a large number of keys, but we really wouldn’t have seen that unless we started to dig into that aspect [their research paper on this is available here].
MB: One of things we’ve been surprised about is when we measure these things at scale in a way that hasn’t been done before, often times these kinds of emergent behaviors become clear.
BK: Talk about what you hope to do with all this data.
ZD: We were involved a lot in the analysis of the Heartbleed vulnerability. And one of the surprising developments there wasn’t that there were lots of people vulnerable, but it was interesting to see who patched, how and how quickly. What we were able to find was by taking the data from these scans and actually doing vulnerability notifications to everybody, we were able to increase patching for the Heartbleed bug by 50 percent. So there was an interesting kind of surprise there, not what you learn from looking at the data, but in terms of what actions do you take from that analysis? And that’s something we’re incredibly interested in: Which is how can we spur progress within the community to improve security, whether that be through vulnerability notification, or helping with configurations.
BK: How do you know your notifications helped speed up patching?
MB: With the Heartbleed vulnerability, we took the known vulnerable population from scans, and ran an A/B test. We split the population that was vulnerable in half and notified one half of the population, while not notifying the other half, and then measured the difference in patching rates between the two populations. We did end up after a week notifying the second population…the other half. Continue reading →