19
Sep 11

Cultural CAPTCHAs

facebooktwittergoogle_plusredditpinterestlinkedinmail

CAPTCHAs, those squiggly and frustrating puzzles that many Web sites require users to solve before registering or leaving comments, are designed to block automated activity and deter spammers. But for some Russian-language forums that cater to spammers and other miscreants, CAPTCHAs may also be part of a vetting process designed to frustrate foreigners and outsiders.

I'm still slogging through Disc 2 of this lengthy Soviet-era spy series.

“Verified,” one of the longest-running Russian-language forums dedicated to Internet scammers of all stripes, uses various methods to check that users aren’t just casual lurkers or law enforcement. It recently began using CAPTCHAs that quiz users about random bits of Russian culture when they register or log in.

Consider this CAPTCHA, from Verified: “Введите пропущенное число ‘… мгнoвeний вeсны.’” That translates to, “Enter the missing number ‘__ moments of spring.’”

But it may not be so simple to decipher “мгнoвeний вeсны,” the “moments of spring” bit. One use of cultural CAPTCHA is to frustrate non-native speakers who are trying to browse forums using tools like Google translate. For example, Google translates мгнoвeний вeсны to the transliteration “mgnoveny vesny.” The answer to this CAPTCHA is “17,” as in Seventeen Moments of Spring, a 1973 Russian television mini-series that was enormously popular during the Soviet Union era, but which is probably unknown to most Westerners.

This CAPTCHA asks about the alcohol content of Vodka.

Although these cultural CAPTCHAs may not stop those determined to break them, cultural CAPTCHAs are an interesting approach to blocking unawanted users. Most CAPTCHA systems can be trivially broken because they merely require users to repeat numbers and letters. Some CAPTCHAs ask the visitor to solve math or logic puzzles, but these questions can be answered by anyone with a grade school grasp of math.

Spammers tend to rely on commercial, human-powered CAPTCHA solving services, which automate the solving of CAPTCHAs with the help of low-paid workers in China, India and Eastern Europe who earn pennies per hour deciphering the puzzles. CAPTCHAs that bombard workers at these automated facilities with a range of cultural questions might frustrate these low-paid workers, but the challenges likely would be more frustrating (not to mention alienating and offensive) to legitimate users who are unfamiliar with the targeted culture.

In many ways, cultural CAPTCHAs seem to be uniquely suited for small, homogeneous and restricted online communities. I would not be surprised to see their use, variety and complexity increase throughout the criminal underground, which is constantly trying to combat the leakage of forum data that results when authorized members have their passwords lost or stolen.

Tags: , ,

32 comments

  1. It’s an interesting idea but IMHO a doomed one. One reason is that you can only come up with a very limited number of puzzles. For example, the software Light Alloy is free for Russian users and it verifies that you are Russian by giving you a child riddle. Now it has far too many riddles on its list with a large part not being well-known enough even among Russians. So many people wouldn’t be able to register if they play by the rules.

    But there is a second flaw here. Even without knowing what “мгновений весны” means you could type it into Google and you would immediately know that the missing word is “семнадцать”. You could then either use an auto-translator or switch to the English Wikipedia page and you would know that you need to type in 17. In fact, that’s how I usually “solve” the riddles for Light Alloy.

    • If I search on Google for “Russian moments of spring”, the number “Seventeen” is clearly evident in the top results. While a Jeopardy-winning IBM supercomputer could play fairly well at this game, the impact of cultural CAPTCHAs highlights the need for building trusted working relationships with law enforcement or at least expatriates of any region where authorities want to target cyber-crime.

      In many crime investigations, authorities use “under cover” agents to gain the trust needed to get intelligence on criminal activity. It shouldn’t be different online.

  2. Thanks Brian for another great educational article. I have even gone to recaptcha and spent a few minutes, here and there, filling in the field for no other purpose than to help with book conversions to digital media.
    [ http://www.google.com/recaptcha/learnmore, if anyone is interested, instead of playing AngryBirds or what ever, while on hold for tech support. ;-) ]
    Your experience with the Russian language and culture certainly adds a dimension that most security writers do not have to share. Thanks for all you do to keep us safe.

  3. I saw the same idea used on a Christian forum, where the CAPTCHA involved supplying the missing word from a scripture verse.

  4. This is a well known thing but in case anyone doesn’t know yet:

    If you read the link supplied by Lindy above regarding reCaptcha you’ll note that only one word is required to be correct as reCaptcha has it recorded to necessarily check your answer to fill the role of “determining that you are human” (with reliable automated solving going for $1.39 / 1000 is there even a point … ?)

    Anyhow, what this means for those just trying to legitimately solve the captcha is that one word can be anything at all (this is Google using you’re free labour to digitise works and make a profit) and one word must be correct (a word that has already been solved to some level that makes reCaptcha thinks it’s correct).

    So you can enter anything for one word such as “a” and solve the other word. The appearance of the two words is changed quite often but the legitimate word is always clearly different from the unknown word – ATM the word that must be solved is a mix of filled black letters and black outlines of letters. Also the known word will never contain punctuation, non alphabetical chars or numbers. It’s pretty easy to figure out which is which after you’ve solved one.

    People may be aware of at least 2 campaigns to constantly use erm … non-politically correct words to solve the unknown word which means reCaptchas project to use your free labour to profit may be somewhat scuppered (doubtful but we can hope).

    Frankly I’m a tad bewildered by Lindy’s enthusiasm for the project even to the point of filling in captchas just for the hell of it (???) but I see it as making my life harder to make someone else, Google – not a struggling company working for good – a profit by making me do a small amount of work for them for free.

    So I take the easier route of putting nonsense for one of the words – hope this makes sense to everyone.

    • Neej, good point.

      I noticed this after lurking some NSWF image boards and their “raids” setup against recaptcha.

      After you’ve went through a number of these recaptchas, it’s obvious which word is the one that needs to be correct, and which word doesn’t need to be correct.

      For the most part I try to type both the words correct, maintaining the goal of Google by digitizing these words, but every once in awhile, there is word with strange figures (for example, a math problem) to which I type random junk.

      I’m not sure if this is true – but I believe Google has an idea on how long the word that doesn’t have to be typed is. So for example, if the word that doesn’t have to be correct was ‘cellphone’ , if I was to type ‘sd’ , it would say one of the words is incorrect, but if I was to type cellbooks, it would go through thinking both the words are correct.

      Not sure if I made that clear, or if it’s true, but it’s something I noticed from doing so many of these annoying captchas.

      Firefox needs an add-on for a de-captcha service that auto-fills these annoying captchas.

      Who knows, maybe Wladimir Palant can make one for us to combat these ‘Cultural CAPTCHAs’ :D

  5. I don’t believe this is a new idea at all.

    Wasn’t this idea applied by the GI’s defending against German infiltrators behind their lines during the Battle of the Bulge? I recall reading that suspected spies were asked cultural background questions like ” Who are ‘dem Bums’ ? ” and others.

    Even the most fluent English-speaking Germans got tripped up this way.

    • And in the jungles of the South Pacific GI sentries supposedly used passwords like “lollapalooza” that Japanese scouts couldn’t pronounce

    • A baseball riddle like that plays a role in the classic movie Stalag 17 (with William Holden) about POWs in WWII with an informer in their midst.

  6. I despise CAPTCHA more than Adobe updates.

  7. We are using this during registration on a motorcycle forum I’m a admin on – one question asked what purpose your left foot serves while riding, another asks what the lever on the left bar actuates, etc.

    Before going to this, we asked “What do you ride” as a question and used to get some interesting answers from the potential spammers.

    Cultural CAPTCHA has worked (so far, knocking on wood) 100% on keeping out any spammers.

    • The answer to what foot does what would depend on the year and country-of-origin of the bike in question. A ’73 HD shifted and braked on opposite sides compared to more standardized models. That doesn’t even take into consideration “suicide clutches”.

      • Ya got me there!

        I should have mentioned this is a “sportbike” forum, ours being more heavily trackday-oriented, and thusly, we weren’t concerned about backwards controls (some brit-bikes did it too) or suicide clutches.

    • When I hear “we’ve blocked 100% of spam” the first thing I think is: How many legitimate members of your community have you blocked, confused, aggravated and otherwise driven off? And how much spam are those people worth?

      • Actually, no one has mentioned anything about being annoyed. I’ve spoken with seaembers in person about it to verify. What did drive members from a different forum, however, was the SPAM. That was actually a big reason we started ours!

  8. Good call Krebs!, The Mex Army has been using this for decades, on detaining illegals here who look like every other beaner and speak spanish, They ask them to complete song lyrics from old Mex songs. Awesome!.

  9. So, basically a digital shibboleth. Or would that be shibboleet? http://xkcd.com/806/

  10. A friend who lives in France says a similar thing happens in French online chatrooms to discover who is a long time foreigner pretending to be French (which is quite common). Apparently they slip into the first few conversations some obscure object and use an incorrect pronoun le la les (masculine , feminine) and watch for the reaction. Apparently it works quite well.

  11. If you want to use a captcha product that is super easy and super secure..Check out this one I have found by Confident Technologies..

    http://www.confidenttechnologies.com/demos/captcha-demo

  12. I have started on another path: simple and useful.

    Simple means the user has to type only a few letters and useful because it can spark new discussion, it can suggest new articles to read.

    Give it a try: RetinaPost.com and comment if you like it or not. Also you can try the WordPress plugin.

    • But in the case of Brian’s Russian forum, how would that keep out an unwanted guest?

      • Hi Rich,

        The solution would be simple: cover a part of the words or only some letters with inputs.

        For example: the user has to guess the begging and the end of a word based on the context.

        I will consider this solution, until then I will promote my current solution: Retina Post.

  13. At first time I was serious
    But then…. I lol’d

  14. I was married to a Russian lady for nine years and over that time was exposed to content and thought processes that a normal westerner just won’t be able to be exposed to. Russian culture and language is deeply rooted in rich Russian literature which becomes flat when translated out of Russian.

    The simple order of words in a sentence in Russian is very different than other languages. The Russian classsics, like Puskin, Devstoyevsky, etc. read very differently in Russian vs. English because the reader in Russian sees what their imagination bears from the writing. A native English speaker will always be confused because we are trained for literal interpretation. Many Russian words also have ambiguity in connotations. A hand, rootschke (phonetic) is the same as the foot. A westerner will decide it was the practical choice between hand and foot. The Russian will choose either the most humorous connotation or the one that is most detrimental to the protagonist in the story.

    Contrary to what other posts say, I believe the quantity of Russian captchas is infinite.

    • I am a native russian speaker.

      I find this a very interesting concept. I support Mark’s oppinion. The number of combinations is infinite.

      And even if you run out, you could always use distorted captcha-like images for questions instead of plaintext.

      • Add songs and poems.

        Every English speaker knows the correct answer for:

        Guess a 4 letter missing word:
        “Give me baby one more ……” (B. Spears).

  15. Sneaky russians .. looks like they are the only who can think straith …

    In russian language words like STRELKI (Стрелки ) got lots of dif meanings ..

    Стрелки — ARROWS like bow arrows
    Стрелки — TRAIN LINES
    Стрелки — Indicators on the CLOCK and so on my list is endless .. all u need to do is ask the captcha quasion in the CORRECT MANNER