How good are you at telling the difference between domain names you know and trust and impostor or look-alike domains? The answer may depend on how familiar you are with the nuances of internationalized domain names (IDNs), as well as which browser or Web application you’re using.
For example, how does your browser interpret the following domain? I’ll give you a hint: Despite appearances, it is most certainly not the actual domain for software firm CA Technologies (formerly Computer Associates Intl Inc.), which owns the original ca.com domain name:
Go ahead and click on the link above or cut-and-paste it into a browser address bar. If you’re using Google Chrome, Apple’s Safari, or some recent version of Microsoft‘s Internet Explorer or Edge browsers, you should notice that the address converts to “xn--80a7a.com.” This is called “punycode,” and it allows browsers to render domains with non-Latin alphabets like Cyrillic and Ukrainian.
Below is what it looks like in Edge on Windows 10; Google Chrome renders it much the same way. Notice what’s in the address bar (ignore the “fake site” and “Welcome to…” text, which was added as a courtesy by the person who registered this domain):
IE, Edge, Chrome and Safari all will convert https://www.са.com/ into its punycode output (xn--80a7a.com), in part to warn visitors about any confusion over look-alike domains registered in other languages. But if you load that domain in Mozilla Firefox and look at the address bar, you’ll notice there’s no warning of possible danger ahead. It just looks like it’s loading the real ca.com:
The domain “xn--80a7a.com” pictured in the first screenshot above is punycode for the Ukrainian letters for “s” (which is represented by the character “c” in Russian and Ukrainian), as well as an identical Ukrainian “a”.
It was registered by Alex Holden, founder of Milwaukee, Wis.-based Hold Security Inc. Holden’s been experimenting with how the different browsers handle punycodes in the browser and via email. Holden grew up in what was then the Soviet Union and speaks both Russian and Ukrainian, and he’s been playing with Cyrillic letters to spell English words in domain names.
Letters like A and O look exactly the same and the only difference is their Unicode value. There are more than 136,000 Unicode characters used to represent letters and symbols in 139 modern and historic scripts, so there’s a ton of room for look-alike or malicious/fake domains.
For example, “a” in Latin is the Unicode value “0061” and in Cyrillic is “0430.” To a human, the graphical representation for both looks the same, but for a computer there is a huge difference. Internationalized domain names (IDNs) allow domain names to be registered in non-Latin letters (RFC 3492), provided the domain is all in the same language; trying to mix two different IDNs in the same name causes the domain registries to reject the registration attempt.
So, in the Cyrillic alphabet (Russian/Ukrainian), we can spell АТТ, УАНОО, ХВОХ, and so on. As you can imagine, the potential opportunity for impersonation and abuse are great with IDNs. Here’s a snippet from a larger chart Holden put together showing some of the more common ways that IDNs can be made to look like established, recognizable domains:
Holden also was able to register a valid SSL encryption certificate for https://www.са.com from Comodo.com, which would only add legitimacy to the domain were it to be used in phishing attacks against CA customers by bad guys, for example.
A SOLUTION TO VISUAL CONFUSION
To be clear, the potential threat highlighted by Holden’s experiment is not new. Security researchers have long warned about the use of look-alike domains that abuse special IDN/Unicode characters. Most of the major browser makers have responded in some way by making their browsers warn users about potential punycode look-alikes.
With the exception of Mozilla, which by most accounts is the third most-popular Web browser. And I wanted to know why. I’d read the Mozilla Wiki’s IDN Display Algorithm FAQ,” so I had an idea of what Mozilla was driving at in their decision not to warn Firefox users about punycode domains: Nobody wanted it to look like Mozilla was somehow treating the non-Western world as second-class citizens.
I wondered why Mozilla doesn’t just have Firefox alert users about punycode domains unless the user has already specified that he or she wants a non-English language keyboard installed. So I asked that in some questions I sent to their media team. They sent the following short statement in reply:
“Visual confusion attacks are not new and are difficult to address while still ensuring that we render everyone’s domain name correctly. We have solved almost all IDN spoofing problems by implementing script mixing restrictions, and we also make use of Safe Browsing technology to protect against phishing attacks. While we continue to investigate better ways to protect our users, we ultimately believe domain name registries are in the best position to address this problem because they have all the necessary information to identify these potential spoofing attacks.”
If you’re a Firefox user and would like Firefox to always render IDNs as their punycode equivalent when displayed in the browser address bar, type “about:config” without the quotes into a Firefox address bar. Then in the “search:” box type “punycode,” and you should see one or two options there. The one you want is called “network.IDN_show_punycode.” By default, it is set to “false”; double-clicking that entry should change that setting to “true.”
Incidentally, anyone using the Tor Browser to anonymize their surfing online is exposed to IDN spoofing because Tor by default uses Mozilla as well. I could definitely see spoofed IDNs being used in targeting phishing attacks aimed at Tor users, many of whom have significant assets tied up in virtual currencies. Fortunately, the same “about:config” instructions work just as well on Tor to display punycode in lieu of IDNs.
Holden said he’s still in the process of testing how various email clients and Web services handle look-alike IDNs. For example, it’s clear that Twitter sees nothing wrong with sending the look-alike CA.com domain in messages to other users without any context or notice. Skype, on the other hand, seems to truncate the IDN link, sending clickers to a non-existent page.
“I’d say that most email services and clients are either vulnerable or not fully protected,” Holden said.
For a look at how phishers or other scammers might use IDNs to abuse your domain name, check out this domain checker that Hold Security developed. Here’s the first page of results for krebsonsecurity.com, which indicate that someone at one point registered krebsoṇsecurity[dot]com (that domain includes a lowercase “n” with a tiny dot below it, a character used by several dozen scripts). The results in yellow are just possible (unregistered) domains based on common look-alike IDN characters.
I wrote this post mainly because I wanted to learn more about the potential phishing and malware threat from look-alike domains, and I hope the information here has been interesting if not also useful. I don’t think this kind of phishing is a terribly pressing threat (especially given how far less complex phishing attacks seem to succeed just fine for now). But it sure can’t hurt Firefox users to change the default “visual confusion” behavior of the browser so that it always displays punycode in the address bar (see the solution mentioned above).
[Author’s note: I am listed as an adviser to Hold Security on the company’s Web site. However this is not a role for which I have been compensated in any way now or in the past.]
you say “non-Latin alphabets like Cyrillic and Ukrainian.” Cyrillic is an alphabet, in which many languages are written, including Russian and Ukrainian. So I would suggest “non-Latin alphabets like Cyrillic, used for Russian and Ukrainian.”
nice article to raise more awareness about this issue.
This is something Farsight has been warning for recently:
While this is a security issue, it seems that companies are not doing much to protect its users from it and some semi-legitimate business are taking advantage of it:
This is fixed in FF quantum – I’ve just tried it.
Mozilla may have read this article!
Are you sure you sure it isn’t because you’ve previously disabled punycode yourself? Because I’m still seing it in the latest stable Firefox release (58.0.2).
Agreed. Still not fixed in FF nightly. Had to change manually.
Definitely fixed in FF Quantum v58.0.2.
I did not activated any option but if one hover the link with mouse cursor on krebs article, it will show the actual link on the lower left corner of FF window frame!
Looks like FF sees the difference but will not show correctly on the address bar. Am i experiencing this correctly?
Brian, thanks for your work! Really enjoy reading on your investigations, on not so enjoyable subjects!
Yes you see that correctly. FF will display the actual website when you hover over the link which is better to do then clicking on the link then see that is the wrong website.
If you receive an email with a link to your account for that specific website and you hover over it and its a different website you automatically know its fake w/o having to click on the link.
In that regard I like FF better then other browsers, unless other ones do the same.
Good suggestion and very useful post. I use IDN checker for these types of headaches.
This is something of a problem – and specifically, it seems, something that the .com domain has got wrong in this instance.
Using a punycode value makes sense in a world where nobody reads other character sets – i.e. somewhere else. But as a general solution, for people with a genuine use for an IDN, it’s not so clever.
It is important for TLDs to apply sensible rules about not mixing lookalike scripts. (There are actually very few lookalike scripts amongst the many which are out there). So your example of spoofing with cyrillic probably works for combinations of the letters a,c,e,m,o,p,u,x,z.. Adding numerals, 1 *may* look enough like l to give apple – although it doesn’t in the font I am writing this.
On the other hand, in a real world where many people’s grasp of latin characters is shaky compared to their own script, expecting anyone to recognise a faked punycode domain is a stretch – whereas people can look for meaningful “names” (one reason why we use domain names instead of IP numbers) in their own script.
I think Firefox has it *mostly* right, and is more helpful than other browsers for a global community.
It seems a proper solution is much harder, and would involve warning about mixed-lookalike-script domains (such as your example), and putting more pressure on registrars to apply sensible rules in a given TLD.
It turns out that we made mistakes in the past – like allowing .com to be considered a default domain for the commercial world, without considering the internet as a global space. So we will find more of these edge-case problems from time to time as we fix up the big ones.
How are domain registrars to ensure one language or another is designated to a particular TLD, unless they remove the option to use dashes in domain names? Should they voluntarily reduce the pool of potential domain names and potential sales income? Should domain registrars be the name police?
Why not put the burden on DNS lookup providers or ISPs to alert users to potential IDN confusion? Or simply pressure browser creators to fix this issue through some standards?
I like the idea you mentioned. I just don’t see it being fiscally practical for registrars to get involved in limiting their income.
There are only a couple of places where this can be addressed:
1. domain issuing layer
3. “safe browsing”
We can’t do it at the DNS transport level, DNS is supposed to be pure pass-through. Any time someone interferes with DNS, we get *very* upset (and for good reason).
We can’t do it at the certificate level. Certificates are issued based on proof of ownership of a domain. If you shouldn’t be able to own a domain, then the Domain registrars shouldn’t be allowed to sell the domain in the first place.
Mozilla is saying that they’d rather it be done at the domain issuing layer, and that for problems, there’s some support at safe browsing.
Safe browsing of course won’t help you if you’re the first victim, but if you’re in the middle of the herd, you’ll probably be protected.
Personally, I wish that punycode was limited to the portion of https://en.wikipedia.org/wiki/Internationalized_country_code_top-level_domain that are non Latin:
.бг (bg, Bulgaria)
.бел (bel, Belarus)
.ею (eyu, European Union)
.қаз (qaz, Kazakhstan)
.мон (mon, Mongolia)
.мкд (mkd, Macedonia)
.рф (rf, Russia)
.срб (srb, Serbia)
.укр (ukr, Ukraine)
الجزائر. (al-Jazā’ir, Algeria)
مصر. (Misr, Egypt)
بھارت. (Bharat, India)
الاردن. (al-Urdun, Jordan)
فلسطين. (Filastin, Palestine)
پاکستان. (Pākistān, Pakistan)
السعودية. (al-Saudiah, Saudi Arabia)
سوريا. (Sūryā, Syria)
تونس. (Tunis, Tunisia)
امارات. (Emarat, UAE)
عمان. (ʻUmān, Oman)
مليسيا. (Maleesya, Malaysia)
المغرب. (al-Maġrib, Morocco)
سودان. (Sūdān, Sudan)
اليمن. (al-Yaman, Yemen)
.বাংলা (Bangla, Bangladesh)
.ভাৰত (Bharôt, India)
.ভারত (Bharôt, India)
.भारत (Bharat, India)
.భారత్ (Bharat, India)
.ભારત (Bharat, India)
.ਭਾਰਤ (Bharat, India)
.ଭାରତ (Bhārata, India)
.இந்தியா (Inthiyaa, India)
.ලංකා (Lanka, Sri Lanka)
.இலங்கை (Ilangai, Sri Lanka)
.ไทย (Thai, Thailand)
.சிங்கப்பூர் (Cinkappūr, Singapore)
.中国 (Zhōngguó, China)
.中國 (Zhōngguó, China)
.香港 (Xiānggǎng/Hoeng1gong2, Hong Kong)
.澳門 (Àomén/Ou3mun4, Macau)
.澳门 (Àomén/Ou3mun4, Macau)
.新加坡 (Xīnjiāpō, Singapore)
.台灣 (Táiwān, Taiwan)
.台湾 (Táiwān, Taiwan)
.հայ (hay, Armenia)
.გე (ge, Georgia)
.ελ (el, Greece)
.한국 (Han-guk, South Korea)
Note, the following are proposed but not active:
.κπ (kp, Cyprus)
.ישראל (Yisrael, Israel)
.日本 (Nippon, Japan)
.ລາວ (Lao, Laos)
While there is a problem with http, it’s not a significant one (already insecure).
For https the obvious answer seems to be what the Browser “palemoon” does; Specifically the green or blue text beside the padlock shows the name that was verified. For extended validation it’s a company name in green. For normal certificates it’s the “xn--yyyy” name without punycode translation in blue. (NB: A language whitelist probably exists, but mine is English)
A simple mouseover, without clicking, is a very good security tool and I wish more people would use it to verify a web site before they click. Perhaps some security software could do this for you.
Good catch. In Chrome, the mouseover displays the punycode in the lower left corner of the screen.
Hovering over link in Safari also shows the real link.
Not really, see how google is changing target URL upon click. Also some clever js can revert it back after click.
Alex Holden did not discover this flaw.
Xudong Zheng posted about this April 14, 2017: https://www.xudongz.com/blog/2017/idn-phishing/
I never said Holden discovered it. In fact, I said very clearly that he didn’t. And I’m afraid it goes way back before 2017, back more than 16 years.
Apologies — my oversight; the verbiage of “Holden’s been experimenting with how the different browsers handle punycodes in the browser and via email.” is a bit ambiguous and I took it as ‘this is a new flaw he found.’
That sentence is not even “a bit ambiguous.” Your reading comprehension simply failed, and you’re wrong to blame Brian even “a bit.”
Wow! That’s scary and I believe it’s an important topic to not ignore. Thanks for sharing.
For sure most technical people like us have JS disabled but, the usual end user doesn’t.
For security reasons like this, modern browsers generally prevent statusbar manipulation with js.
This is not entirely accurate. Browser status bar “spoofing” can still be done with modern browsers!
I tried google’s href update method with the faux ca.com domain against the latest Firefox, Internet Explorer, and Chrome, and easily reproduced the status bar spoof. The status bar does not display the punycode domain on mouse over and clicking on the link is seamless too!
Unfortunately… as already noted, Mozilla/Palemoon are broken by default and do not show you the punycode (which is unfortunate as PaleMoon use too). Contrary to the above, this includes the security lock bar information unless you drill multiple levels deep to view the actual certificate details (who does that regularly?).
Fortunately… you can fix this broken behavior by changing the default config, change network.IDN_show_punycode to TRUE and the xn-- URL will be shown for mouse-over, URL bar, and lock bar. As I chase phishing sites regularly, this is indispensable in quickly identifying fraud.
A phishing url nearly caused the loss of millions of dollars earlier this week with the cryptocurrency exchange Binance.
Be safe out there…
I quit using Firefox over their PC BS. This is more reason to stay away from it.
Thanks Brian! Changed my FF default. So disappointed lately in FF, use it for privacy/better security but their PC stance or need for $’s is slowly eroding their usefulness as a safer alternative browser. I also use Startpage and the latest edition of FF really is unfriendly to all but Google.
Any suggestions for a better, safer, more secure browser for my PC?
For a more than a novice user, I’d recommend K-Meleon.
kmeleonbrowser dot org
PROS, having used it for more than two decades:
It is based on Mozilla engines, thus has similarities, and has advanced customization.
Lack of extensions available for Mozilla uses.
Occassionally, I use FF as a backup for capabilities that I haven’t figured out with K-Meleon.
At times it’s behind on some website compatability.
Because it’s a more advanced browser, at times I’ve spent lots of times figuring out some of its features that are deeper than the browser interface–or a PRO many times, for capabilities it may perform that other browsers don’t.
So basically Mozilla’s position is to throw English speakers under the bus in the name of global equality.
They could make this a visible option on their options page, with a reasonable default based on locale.
Actually, Mozilla chooses to endanger most of its users. You can use the same trick in reverse spoofing Cyrillic alphabet with Latin or Greek alphabets, etc.
Mozilla’s position is that it’s a complicated problem that should be solved at another level, specifically at the DNS registry level.
For .com, that would be Verisign (the Registrar for .com).
I don’t think that’s a terrible perspective.
Sadly, Verisign threw us under a bus by allowing punycode into .com.
There are other tlds (including ccTLDs) where it would make more sense for punycode to be present.
Out of curiosity, how confident are you in your personal ties to this domain owner? Just saying it seems pretty confident on your part to post a link to someone specifically toying around with these kind of browser issues, and that code on their end could change at any time. Don’t get me wrong, I’m all for exposure of this issue and Edge’s failures but in this context the fact that you link to a third party on your popular blog honestly worries me. Anyways feel free to delete this as you have another one of my comments asking about your removal of NSA agent’s info but just saying it would be nice to know the editorial biases on something like this. Thanks
For those interested, there is an ongoing bug on Mozilla’s Bugzilla bug tracker:
Comment 12 posted about a year ago specifies:
> It’s a known and accepted issue that our current system does not suppress whole-script homographs in TLDs where the registry refuses to implement proper anti-spoofing controls.
I think the Tor browser should clearly be set to always show punycode for IDN domains, but having looked at that bug report I’m less convinced for Firefox.
Firefox mobile and esr on linux are affected. Mozilla, get your game together!
I have Firefox 58.0.2, and it doesn’t exhibit this issue in the way the article describes. When I hover over the link, and when I copy and paste it the address into the location bar, it shows the punycoded version, in spite of the fact that my network.IDN_show_punycode preference is set to false. It’s only when I finally visit the site that the URL in the address bar changes from punycode to rendered Unicode display.
Use the conkeror browser. It correctly warns against fake domains
I had a thought on this, will this allow duplicate looking usernames within certain sites, youtube, instagram, etc.
Would it be possible to “imposter” legit names.
That’s an even older trick used for almost two decades. In its hey days Skype, Jabber, other IMs, even discussion board nicknames were faked soliciting information from victims who didn’t recognized a switch.
Yes. Homoglyphs can be used to create look-alike names, bypass posting filters in forums (like curse words) and a number of other things. There are online tools that help you find the substitute characters.
When I am using my install of FF 58.0.2 64 bit with network.IDN_show_punycode set to false; When hovering over the link the bottom of the screen shows the actual link (“xn--80a7a.com”), and when right clicking and copying the link then pasting into the address bar it shows the actual link. When simply highlighting and doing a ctrl-C/ ctrl-V into the address bar it does not show the actual link address. Doing the copy paste after turning network.IDN_show_punycode to true shows a blue drop bar with the actual link displayed but does not change way it is displayed in the address bar. Hitting enter changes it to the actual address (the one displayed in blue bar).
Fwiw, here’s what LetsEncrypt has to say in their CPS :
… Regarding Internationalized Domain Names, ISRG will have no objection so long as the domain is resolvable via DNS. It is the CA’s position that homoglyph spoofing should be dealt with by registrars, and Web browsers should have sensible policies for when to display the punycode versions of names.
In theory, ICANN has standarized IDN guidelines .
… A registry will publish one or several lists of Unicode code points that are permitted for registration and will not accept the registration of any name containing an unlisted code point. Each such list will indicate the script or language(s) it is intended to support. If registry policy treats any code point in a list as a variant of any other code point, the nature of that variance and the policies attached to it will be clearly articulated.
… All such code point listings will be placed in the IANA Repository for IDN TLD Practices in tabular format together with any rules applied to the registration of names containing those code points, before any such registration may be accepted.
The IDN TLD practices . Note: .com explicitly allows cyrllic .
# Script: Cyrillic
# Version Number: 1.2
# Effective Date: October 25th, 2014
# Codepoints allowed from the Cyrillic script.
U+0430 # CYRILLIC SMALL LETTER A
U+0431 # CYRILLIC SMALL LETTER BE
U+0432 # CYRILLIC SMALL LETTER VE
U+A697 # CYRILLIC SMALL LETTER SHWE
Mozilla’s “protection” is that you can’t mix different script sets w/in at least a label. But, as noted, certain scripts can produce homographs for some subset of other scripts.
Thank you for writing about IDNs, Brian.
EURid and UNESCO, in cooperation with Verisign and the regional ccTLD registries, have been studying the deployment of IDNs since 2011, through the IDN World Report. So far, only 3% of the world’s domains are IDNs and uptake is hampered by a lack of so-called ‘universal acceptance’ – meaning, in layperson’s terms that IDNs don’t work very well. You gave an example of inconsistent handling of IDN across browsers (with some displaying punycode and others not). Other examples are difficulties in sending and receiving emails using IDNs – quite a drawback – especially where the sender and receiver use different email providers. It’s also impossible to use your IDN email address as a unique identifier/user name to sign in to most of today’s popular Internet applications.
Overall, application developers, browser operators and email providers need to give more urgency to overcoming the current challenges so that IDNs can be used seamlessly in any environment and in any application that a ‘traditional’ domain name can be used.
As you mentioned, homograph and homoglyph attacks are not new, and the risks are reasonably well understood by domain name registries and other parts of the industry. Deception arising from lookalikes can, of course, also occur under the ‘traditional’ ASCII system, where the majority of phishing and spoofing attacks occur. Domain name registries, including EURid the .eu TLD operator, have implemented a variety of measures to combat risks associated with lookalikes. Many do not allow mixed scripts within a domain name, and others, such as EURid, bundle lookalike domains at the point of registration so that identical-looking domain names will not end up in the hands of different actors.
IDNs have the potential to enable and support linguistic diversity online. Our research has shown that the script of IDN is an accurate signal of multilingual content, and that the geographic patterns of IDN registrations closely match the distribution of world languages (with Arabic domains appearing in the Middle East and North Africa, Han script domains in China, Cyrillic script in Russia and former Soviet Union countries etc).
It is right to highlight the potential security risks associated with IDNs but to do so without also emphasising the benefits of IDNs is only to tell part of the story.
Thanks for all your informative posts.
The Cyrillic spoof seems to be happening in redirections from firmark.com, a tax information website.
On Safari, we were presented a new tab saying “the computer is infected” with a phone number to call …
On Chrome, we got an error message citing a URL that starts out with youtube.com.******* with Cyrillic characters …
Oops that was FAIRMARK.com
He didn’t even have the decency to respond. But more importantly, I don’t even think they’ve touched any of the 13 year old coding that I brought to his attention. So what in the hell has Equifax’s IT and web dev team been up to for 13 FLIPPING YEARS?!?!
I found that Internet Explorer 11 does not show the punycode in the status bar for the lookalike http://www.са.com domain when the user enables certain language packs, even when those language packs are not first in order of preference. A malicious actor could potentially target IE users who enable multiple language packs that contain the lookalike characters required by the attacker.
For example, with the “English (United States) [en-US]” language pack first in order of preference, the punycode of the lookalike http://www.са.com domain does not display in IE’s status bar when any of these other (and similar charset) language packs are present:
Russian (Russia) [ru-RU]
Serbian (Cyrillic) [sr-Cyrl]
Ukrainian (Ukraine) [uk-UA]
I did not reproduce this behavior with the latest Firefox Quantum and Google Chrome releases.
Interesting article. It made me think about some points I’ve missed, when registered my domain. Thank you for the insights