August 18, 2014

Imagine discovering a secret language spoken only online by a knowledgeable and learned few. Over a period of weeks, as you begin to tease out the meaning of this curious tongue and ponder its purpose, the language appears to shift in subtle but fantastic ways, remaking itself daily before your eyes. And just when you are poised to share your findings with the rest of the world, the entire thing vanishes.

loremipsumThis fairly describes my roller coaster experience of curiosity, wonder and disappointment over the past few weeks, as I’ve worked alongside security researchers in an effort to understand how “lorem ipsum” — common placeholder text on countless Web sites — could be transformed into so many apparently geopolitical and startlingly modern phrases when translated from Latin to English using Google Translate. (If you have no idea what “lorem ipsum” is, skip ahead to a brief primer here).

Admittedly, this blog post would make more sense if readers could fully replicate the results described below using Google Translate. However, as I’ll explain later, something important changed in Google’s translation system late last week that currently makes the examples I’ll describe impossible to reproduce.

CHINA, NATO, SEXY, SEXY

It all started a few months back when I received a note from Lance James, head of cyber intelligence at Deloitte. James pinged me to share something discovered by FireEye researcher Michael Shoukry and another researcher who wished to be identified only as “Kraeh3n.” They noticed a bizarre pattern in Google Translate: When one typed “lorem ipsum” into Google Translate, the default results (with the system auto-detecting Latin as the language) returned a single word: “China.”

Capitalizing the first letter of each word changed the output to “NATO” — the acronym for the North Atlantic Treaty Organization. Reversing the words in both lower- and uppercase produced “The Internet” and “The Company” (the “Company” with a capital “C” has long been a code word for the U.S. Central Intelligence Agency). Repeating and rearranging the word pair with a mix of capitalization generated even stranger results. For example, “lorem ipsum ipsum ipsum Lorem” generated the phrase “China is very very sexy.”

Until very recently, the words on the left were transformed to the words on the right using Google Translate.

Until very recently, the words on the left were transformed to the words on the right using Google Translate.

Kraeh3n said she discovered the strange behavior while proofreading a document for a colleague, a document that had the standard lorem ipsum placeholder text. When she began typing “l-o-r..e..” and saw “China” as the result, she knew something was strange.

“I saw words like Internet, China, government, police, and freedom and was curious as to how this was happening,” Kraeh3n said. “I immediately contacted Michael Shoukry and we began looking into it further.”

And so the duo started testing the limits of these two words using a mix of capitalization and repetition. Below is just one of many pages of screenshots taken from their results:

ipsumlorem

The researchers wondered: What was going on here? Has someone outside of Google figured out how to map certain words to different meanings in Google Translate? Was it a secret or covert communications channel? Perhaps a form of communication meant to bypass the censorship erected by the Chinese government with the Great Firewall of China? Or was this all just some coincidental glitch in the Matrix?

For his part, Shoukry checked in with contacts in the U.S. intelligence industry, quietly inquiring if divulging his findings might in any way jeopardize important secrets. Weeks went by and his sources heard no objection. One thing was for sure, the results were subtly changing from day to day, and it wasn’t clear how long these two common but obscure words would continue to produce the same results.

“While Google translate may be incorrect in the translations of these words, it’s puzzling why these words would be translated to things such as ‘China,’ ‘NATO,’ and ‘The Free Internet,'” Shoukry said. “Could this be a glitch? Is this intentional? Is this a way for people to communicate? What is it?”

When I met Shoukry at the Black Hat security convention in Las Vegas earlier this month, he’d already alerted Google to his findings. Clearly, it was time for some intense testing, and the clock was already ticking: I was convinced (and unfortunately, correct) that much of it would disappear at any moment.

A BRIEF HISTORY OF LOREM IPSUM

Cicero.

Cicero.

Search the Internet for the phrase “lorem ipsum,” and the results reveal why this strange phrase has such a core connection to the lexicon of the Web. Its origins in modernity are murky, but according to multiple sites that have attempted to chronicle the history of this word pair, “lorem ipsum” was taken from a scrambled and altered section of “De finibus bonorum et malorum,” (translated: “Of Good and Evil,”) a 1st-Century B.C. Latin text by the great orator Cicero.

According to Cecil Adams, curator of the Internet trivia site The Straight Dope, the text from that Cicero work was available for many years on adhesive sheets in different sizes and typefaces from a company called Letraset.

“In pre-desktop-publishing days, a designer would cut the stuff out with an X-acto knife and stick it on the page,” Adams wrote. “When computers came along, Aldus included lorem ipsum in its PageMaker publishing software, and you now see it wherever designers are at work, including all over the Web.”

This pair of words is so common that many Web content management systems deploy it as default text. Case in point: Lorem Ipsum even shows up on healthcare.gov. According to a story published Aug. 15 in the Daily Mail, more than a dozen apparently dormant healthcare.gov pages carry the dummy text. (Click here if you skipped ahead to this section).

LOREMipsumhealthcare

FURTHER TESTING

Things began to get even more interesting when the researchers started adding other words from the Cicero text from which the “lorem ipsum” bit was taken, including: “Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit . . .”  (“There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain …”).

Adding “dolor” and “sit” and “consectetur,” for example, produced even more bizarre results. Translating “consectetur Sit Sit Dolor” from Latin to English produces “Russia May Be Suffering.” “sit sit dolor dolor” translates to “He is a smart consumer.” An example of these sample translations is below:

ipsum

Latin is often dismissed as a “dead” language, and whether or not that is fair or true it seems pretty clear that there should not be Latin words for “cell phone,” “Internet” and other mainstays of modern life in the 21st Century. However, this incongruity helps to shed light on one possible explanation for such odd translations: Google Translate simply doesn’t have enough Latin texts available to have thoroughly learned the language.

In an introductory video titled Inside Google Translate, Google explains how the translation engine works, the sources of the engine’s intelligence, and its limitations. According to Google, its Translate service works “by analyzing millions and millions of documents that have already been translated by human translators.” The video continues:

“These translated texts come from books, organizations like the United Nations, and Web sites from all around the world. Our computers scan these texts looking for statistically significant patterns. That is to say, patterns between the translation and the original text that are unlikely to occur by chance. Once the computer finds a pattern, you can use this pattern to translate similar texts in the future. When you repeat this process billions of times, you end up with billions of patterns, and one very smart computer program.”

Here’s the rub:

“For some languages, however, we have fewer translated documents available, and therefore fewer patterns that our software has detected. This is why our translation quality will vary by language and language pair.”

Still, this doesn’t quite explain why Google Translate would include so many references specific to China, the Internet, telecommunications, companies, departments and other odd couplings in translating Latin to English.

In any case, we may never know the real explanation. Just before midnight, Aug. 16, Google Translate abruptly stopped translating the word “lorem” into anything but “lorem” from Latin to English. Google Translate still produces amusing and peculiar results when translating Latin to English in general.

A spokesman for Google said the change was made to fix a bug with the Translate algorithm (aligning ‘lorem ipsum’ Latin boilerplate with unrelated English text) rather than a security vulnerability.

Kraeh3n said she’s convinced that the lorem ipsum phenomenon is not an accident or chance occurrence.

“Translate [is] designed to be able to evolve and to learn from crowd-sourced input to reflect adaptations in language use over time,” Kraeh3n said. “Someone out there learned to game that ability and use an obscure piece of text no one in their right mind would ever type in to create totally random alternate meanings that could, potentially, be used to transmit messages covertly.”

Meanwhile, Shoukry says he plans to continue his testing for new language patterns that may be hidden in Google Translate.

“The cleverness of hiding something in plain sight has been around for many years,” he said. “However, this is exceptionally brilliant because these templates are so widely used that people are desensitized to them, and because this text is so widely distributed that no one bothers to question why, how and where it might have come from.”


134 thoughts on “Lorem Ipsum: Of Good & Evil, Google & China

  1. John the Lawyer

    OK, now they are screwing with us. Semper ubi sub ubi is translated as “Always wear underware.”

    1. Gman

      That makes more sense because the translation is in fact: “Always where under where”. Click to see alternate translations.

  2. Ipchains

    This should have been saved for April Fools, bk. Google uses a scraping engine to feed the machine translation database and what do you think many websites use as placeholders on internationalized pages until translation text is ready? This leads to a constantly changing, constantly wrong translation since it is never used as a literal. What topics are popular internationally? China, NATO, business, tech, sex, etc. Pretty plain, if amusing side effect of semaphores on the hive mind that is the web. Or… Is that what someone would say who wanted to cover something up?

  3. Syats

    I will just leave here a quote from the excellent Library of Babel, by J.L. Borges (1941).

    It is useless to observe that the best volume of the many hexagons under my administration is entitled The Combed Thunderclap and another The Plaster Cramp and another Axaxaxas mlö. These phrases, at first glance incoherent, can no doubt be justified in a cryptographical or allegorical manner; such a justification is verbal and, ex hypothesi, already figures in the Library. I cannot combine some characters

    dhcmrlchtdj

    which the divine Library has not foreseen and which in one of its secret tongues do not contain a terrible meaning. No one can articulate a syllable which is not filled with tenderness and fear, which is not, in one of these languages, the powerful name of a god. To speak is to fall into tautology. This wordy and useless epistle already exists in one of the thirty volumes of the five shelves of one of the innumerable hexagons — and its refutation as well.

  4. mister shirts

    Many years ago, I was 0ne of a small number of people asked to provide training input to a natural language speech recognition system so that templates could be inferred for a set of words. So for a couple of hours I consistently and repeatedly provided totally unrelated spoken words (‘New York’ for ‘No’, for example) in response to the words provided on the screen… I’m happy to report that the system recognized speech rather well despite my input of noise data!

  5. Pnut

    Hi. I came across this myself a short time ago.

    I have a column of the translated text before the changes, and then again after the changes. Interesting to see the differences. What do you make of it?

    Original Text:
    Lorem ipsum dolor sit amet consectetuer velit pretium euismod ipsum enim. Mi cursus at a mollis senectus id arcu gravida quis urna. Sed et felis id tempus Morbi mauris tincidunt enim In mauris. Pede eu risus velit libero natoque enim lorem adipiscing ipsum consequat. In malesuada et sociis tincidunt tempus pellentesque cursus convallis ipsum Suspendisse. Risus In ac quis ut Nunc convallis laoreet ante Suspendisse Nam. Amet amet urna condimentum Vestibulum sem at Curabitur lorem et cursus. Sodales tortor fermentum leo dui habitant Nunc Sed Vestibulum. Ut lorem In penatibus libero id ipsum sagittis nec elit Sed. Condimentum eget Vivamus vel consectetuer lorem molestie turpis amet tellus id. Condimentum vel ridiculus Fusce sed pede Nam nunc sodales eros tempor. Sit lacus magna dictumst Curabitur fringilla auctor id vitae wisi facilisi. Fermentum eget turpis felis velit leo Nunc Proin orci molestie Praesent. Curabitur tellus scelerisque suscipit ut sem amet cursus mi Morbi

    Original Translation:
    This page is designed to explore here’s the price for the mere fact he wishes to. Mi a race but a soft old age this bow designers who can see for yourself. For In a funny way to create a developers heart disease, however, and lucky this time. Pede good laughter for the Internet to achieve financial freedom, you will feel a computer game development. In the very-scale and allies of the time to stop and the kids race to grow strong massage. In fact, before the massage Laughter In the selling and anyone when Now to grow strong. Currently a lot of pressure to push the Internet and look at the game has a lot to improve the course. Members of a lion feel good Now But It’s a warm-up Internet connection. In order that the launch of the arrows nor the meals, however, that this very thing the home of the free. A lot of us live, the development of the Internet or this is the layer of soil employee base sauce. Sauce For now members of the United States or a ridiculous How to Choose the moment, but foot to foot. He is a hollow, it’s getting a great word Curabitur the author of this life wisi easy. The yeast is widely used now a hotel employee a lion It’s the American dream will of the players. It takes a lot of the main problems of urban land to buy my cards

    After August 16th Changes:

    Lorem ipsum dolor sit amet consectetuer to come up is willing to here’s the price for the mere fact. Mi a race but a soft old age this bow pregnant anyone to push. For marketers to create a nice and lucky this time, however, In the place to start. Pede good laughter skirmish-free, you will feel in fact lorem itself according to reason. During the interview itself, and his companions in time to stop and the kids race to grow strong massage. In fact, before the bills to grow strong and anyone when Now sign up in the laughter. Currently a lot of push to improve the speed always at the gas pressure lorem and courses. But now I feel good in and dwell Members of a lion warm-temperature gas. In order to lorem-free in the home of his arrows or the certification, but that this very thing. Diet needs a lot of us live, or a lot of lorem employee base region this. Diet For now members of the United States or a ridiculous Clinical long-but foot to foot. He is a hollow magna word of life’s why I’m the author of this it’s easy. Warm-up lion, Now Consequently the development of implementation detail that the United States is willing to employee erectile dysfunction. It takes a lot of pressure to buy land that’s running my soccer

    Curious!

    1. Candy Man (not in the creepy way)

      That is quite interesting. It seems that user “Ipchains” ‘s comment explains your top paragraph too, all of the words and phrases are quite commonly found throughout the internet as they are hot topics… still very interesting. hmmm…

  6. Ghenghis McCaqnn

    Isn’t it strange how the translations of long pieces in Latin look like the sort of apparently random text you get in junk emails trying to avoid spam filters? Maybe the scammers found this “feature” in Google Translate a long time ago.

  7. ecce caesar

    You guys all have it wrong. This is obviously a mass conspiracy perpetuated by Latin teachers worldwide so that students can’t cheat on their translation homework.

    1. Hayton

      The “lorem ipsum” quirk in Google Translate has indeed been noted before. See the article “Google Translate’s “Lorem Ipsum” Easter Egg” (from 2010) at http://www.xefer.com/2010/10/lorem-ipsum

      Things must be quiet in KrebsWorld, or else Brian has been reading too much Dan Brown on his vacation.

      fwiw, Google Translate is currently translating
      “Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus magna. Cras in mi at felis aliquet congue. Ut a est eget ligula molestie gravida. Curabitur massa. Donec eleifend, libero at sagittis mollis, tellus est malesuada tellus, at luctus turpis elit sit amet quam. Vivamus pretium ornare est” as

      “Lorem ipsum dolor sit amet pretty easy. Bikes are great. Monitoring gas emissions but in my hotel. As a pregnant employee is you need here. Urban mass. Unfortunately, popular culture, with a free but sagittis soft, overnight is the Vikings Earth, at least of mourning, the Lakers how to be successful. The price of computer speakers”.

      Lakers? Vikings? computer speakers? Not a word about NATO or China, just general weirdness. I too would not rely on Google Translate to help me with my Latin homework …

    2. hayton

      I replied to this post with a link to a 2010 blog about Lorem Ipsum. The reply was there, then it vanished. What happened?

      The blog was entitled “Google Translate’s “Lorem Ipsum” Easter Egg”. It noted several quirks in translating ‘lorem ipsum’ placeholder text

      1. hayton

        Oh well, the original reply has come back. Now says “awaiting moderation”. A quirk of WordPress perhaps.

  8. Hibi

    Actually you can pull up a similar result by translating China to Latin it will pull up lorem ipsum dolor

  9. Ramesh

    I can still replicate the results at translate.google.co.in – might be true for other country domains too. More results :

    lorem ipsum ipsum lorem = Free the Internet
    lorem ipsum ipsum lorem lorem = China is the winner of the Internet
    dolor amet sit dolor ipsum lorem = Pain is a very important vehicle

    and

    lorem ipsum dolor sit amet lorem ipsum dolor sit amet = system design, system design, it is pain is pain

  10. Parkperch

    Bored teenagers have been gaming Google Translate for years. Once you figure out how to trick it into believing that two pages are translations of each other, it will very gladly incorporate any “translation” you tell it about. It’s statistically weighted, but it can be gamed just like any webcrawler.

  11. Jason

    This has been that way or at least 6 years. I used to type piña and it would come out “grenade” instead of pineapple. When i would type “estamos en contactos” it would come out: we are in contact with the enemy instead of, we are in touch.

  12. Hope

    Just this evening I read about your disovery. I began trying out Google Translate. Lorem Ipsum from Latin to English does nothing any more. But the two “m”s at the end caught my attention, so I played around. Take away the “m”s and you get Thank you. Add the “m”s to the beginning so it reads “mm lore ipsu” brings up “mm NATO”. The alternate translation for NATO? China… very strange

  13. Crystal

    Google is still throwing up the same results for lorem ipsum

  14. pablo

    I just recently went to the translator and put in Ipsum ipsum ipsum dore, the translation of these words were, ” The game itself may precipitate”

  15. Noah

    “China” and “NATO” appear to be lorem ipsum starting arguments. If you use search terms from the article, you’ll see that you don’t get the same results (consistently). However, by prepending “China” or “NATO” onto the term(s) it comes back with more lorem ipsum terms. Interestingly, many terms come back with the same result whether you use China or NATO, but not all. For example, using the following terms [auto-detect to latin]:

    China main focus of China
    NATO main focus of China
    China main focus of NATO
    NATO main focus of NATO

    The first three (3) translate to: Lorem ipsum dolor sit amet, consectetur

    Whereas the last one translates to: Lorem Ipsum Lorem ipsum dolor sit amet

  16. JCitizen

    To answer the post by “The Investigator”, which has not appeared yet, Google translate gives this answer to your NATO post:

    “Lorem ipsum dolor sit amet, consectetur adipisicing pain, but because occasionally circumstances occur in which toil and pain can procure him some great pleasure. To take a trivial example, which of us ever undertakes laborious physical exercise, except to obtain some advantage from it. But who has the charms of pleasure by desire, that avoids a pain that produces no resultant pleasure. These cases are they can not foresee, are upon fault quae workshop to leave gently breath this is your troubles.”

    Don’t worry about your confusion, I don’t think these internet phenomena are meant to be understood by anyone in particular.

      1. JCitizen

        A quote from your Wiki link pretty much summarizes my attitude about this modern phenomenon:

        “Nowadays a variety of software, including text editors and plug-ins, can generate semi-random “lorem-like text”, which often has little or nothing in common with the canonical adaptations other than looking like (and often being) jumbled Latin.”

  17. Yellow

    Comment links from main page seem to go to the wrong article.

    Checkout “John Safran vs God – Bible Code”. He runs Milli Vanilli lyrics through software to post-predict 9/11 attacks.

  18. anonymos

    ipsum ipsum lorem lOrEm LoRM produces lorem lorem form of the game

    Maybe its still possible to somehow “confuse” the translaotr…

  19. Geraldine

    No matter if some one searches for his required thing, so he/she needs to be available that in detail, therefore
    that thing is maintained over here.

    Review my homepage website – Geraldine,

  20. mister shirts

    The post from ‘Geraldine’ on Sept 2, 2014 at 12:58 am is actually from a blog spam-bot.

  21. David Williams

    This looks to me like someone gamed the system and not a conspiracy theory. There just doesn’t seem to be enough sentences to convey any real meaning covertly.

    Still a pretty cool phenomena.

  22. .U_1Ka_ldVe4

    whoah this blog is fantastic i really like studying your posts.
    Stay up the good work! You recognize, a lot of
    persons are hunting round for this info, you can aid them greatly.

  23. credit card generator 2014 online

    I don’t know if it’s just me or if everyone else experiencing problems with your
    site. It looks like some of the written text on your posts
    are running off the screen. Can somebody else please provide feedback and let me
    know if this is happening to them as well? This could be a issue
    with my browser because I’ve had this happen before. Thanks

  24. escher7

    Lorem ipsum is a pseudo-Latin text used in web design, typography, layout, and printing in place of English to emphasise design elements over content. It’s also called placeholder (or filler) text. It’s a convenient tool for mock-ups. It helps to outline the visual elements of a document or presentation, eg typography, font, or layout. Lorem ipsum is mostly a part of a Latin text by the classical author and philosopher Cicero. Its words and letters have been changed by addition or removal, so to deliberately render its content nonsensical; it’s not genuine, correct, or comprehensible Latin anymore. While lorem ipsum’s still resembles classical Latin, it actually has no meaning whatsoever. As Cicero’s text doesn’t contain the letters K, W, or Z, alien to latin, these, and others are often inserted randomly to mimic the typographic appearence of European languages, as are digraphs not to be found in the original.

Comments are closed.