Posts Tagged: Unicode


8
Mar 18

Look-Alike Domains and Visual Confusion

How good are you at telling the difference between domain names you know and trust and impostor or look-alike domains? The answer may depend on how familiar you are with the nuances of internationalized domain names (IDNs), as well as which browser or Web application you’re using.

For example, how does your browser interpret the following domain? I’ll give you a hint: Despite appearances, it is most certainly not the actual domain for software firm CA Technologies (formerly Computer Associates Intl Inc.), which owns the original ca.com domain name:

https://www.са.com/

Go ahead and click on the link above or cut-and-paste it into a browser address bar. If you’re using Google Chrome, Apple’s Safari, or some recent version of Microsoft‘s Internet Explorer or Edge browsers, you should notice that the address converts to “xn--80a7a.com.” This is called “punycode,” and it allows browsers to render domains with non-Latin alphabets like Cyrillic and Ukrainian.

Below is what it looks like in Edge on Windows 10; Google Chrome renders it much the same way. Notice what’s in the address bar (ignore the “fake site” and “Welcome to…” text, which was added as a courtesy by the person who registered this domain):

The domain https://www.са.com/ as rendered by Microsoft Edge on Windows 10. The rest of the text in the image (beginning with “Welcome to a site…”) was added by the person who registered this test domain, not the browser.

IE, Edge, Chrome and Safari all will convert https://www.са.com/ into its punycode output (xn--80a7a.com), in part to warn visitors about any confusion over look-alike domains registered in other languages. But if you load that domain in Mozilla Firefox and look at the address bar, you’ll notice there’s no warning of possible danger ahead. It just looks like it’s loading the real ca.com:

What the fake ca.com domain looks like when loaded in Mozilla Firefox. A browser certificate ordered from Comodo allows it to include the green lock (https://) in the address bar, adding legitimacy to the look-alike domain. The rest of the text in the image (beginning with “Welcome to a site…”) was added by the person who registered this test domain, not the browser. Click to enlarge.

The domain “xn--80a7a.com” pictured in the first screenshot above is punycode for the Ukrainian letters for “s” (which is represented by the character “c” in Russian and Ukrainian), as well as an identical Ukrainian “a”.

It was registered by Alex Holden, founder of Milwaukee, Wis.-based Hold Security Inc. Holden’s been experimenting with how the different browsers handle punycodes in the browser and via email. Holden grew up in what was then the Soviet Union and speaks both Russian and Ukrainian, and he’s been playing with Cyrillic letters to spell English words in domain names.

Letters like A and O look exactly the same and the only difference is their Unicode value. There are more than 136,000 Unicode characters used to represent letters and symbols in 139 modern and historic scripts, so there’s a ton of room for look-alike or malicious/fake domains.

For example, “a” in Latin is the Unicode value “0061” and in Cyrillic is “0430.”  To a human, the graphical representation for both looks the same, but for a computer there is a huge difference. Internationalized domain names (IDNs) allow domain names to be registered in non-Latin letters (RFC 3492), provided the domain is all in the same language; trying to mix two different IDNs in the same name causes the domain registries to reject the registration attempt.

So, in the Cyrillic alphabet (Russian/Ukrainian), we can spell АТТ, УАНОО, ХВОХ, and so on. As you can imagine, the potential opportunity for impersonation and abuse are great with IDNs. Here’s a snippet from a larger chart Holden put together showing some of the more common ways that IDNs can be made to look like established, recognizable domains:

Image: Hold Security.

Holden also was able to register a valid SSL encryption certificate for https://www.са.com from Comodo.com, which would only add legitimacy to the domain were it to be used in phishing attacks against CA customers by bad guys, for example. Continue reading →