09
Jan 12

Virtual Sweatshops Defeat Bot-or-Not Tests

facebooktwittergoogle_plusredditpinterestlinkedinmail

Jobs in the hi-tech sector can be hard to find, but employers in one corner of the industry are creating hundreds of full-time positions, offering workers on-the-job training and the freedom to work from home. The catch? Employees will likely toil for cybercrooks, and their weekly paychecks may barely cover the cost of a McDonald’s Happy Meal.

Kolotibablo.com home page

The abundance of these low-skilled, low-paying jobs is coming from firms that specialize in the shadowy market of mass-solving CAPTCHAs, those blurry and squiggly words that some websites force you to retype. One big player in this industry is KolotiBablo.com, a service that appeals to spammers and exploits low cost labor in China, India, Pakistan, and Vietnam.

KolotiBablo, which means “earn money” in transliterated Russian, helps clients automate the solving of puzzles designed to prevent automated activity by bots, such as leaving spammy comments or mass-registering accounts at Webmail providers and social networking sites. The service offers an application programming interface (API) that allows clients to feed kolotibablo.com CAPTCHAs served in real time by various sites, which are then solved by KolotiBablo workers and fed back to the client’s system.

Paying clients interface with the service at antigate.com, a site hosted on the same server as kolotibablo.com. Antigate charges clients 70 cents to $1 for each batch of 1,000 CAPTCHAs solved, with the price influenced heavily by volume. KolotiBablo says employees can expect to earn between $0.35 to $1 for every thousand CAPTCHAs they solve.

The twin operations say they do not condone the use of their services to promote spam, or “all those related things that generate butthurt for the ‘big guys,’” mostly likely a reference to big free Webmail providers like Google and Microsoft. Still, both services can be found heavily advertised and recommended in several underground forums that cater to spammers and scam artists.

Registered antigate.com users can read more about why customers typically purchase the service, and how KolotiBablo is run. From the description:

“All CAPTCHAs in our service are completely solved by real humans, there are usually 500-1000 (and growing) workers online from all the world. That’s why we can process any CAPTCHAs at any volume for a fixed price $1 per 1000 CAPTCHAs.

You may probably think that using human resource inappropriate or inhumane. However, keep in mind that we pay the most of collected money to our workers who sit in the poorest corners of our planet and this work gives them a stable ability to buy food, clothes for themselves and their families. Most of our staff is from China, India, Pakistan and Vietnam.”

To get started as a CAPTCHA-solving worker at Kolotibabo.com (pictured at left), you’ll need to provide a working account at WebMoney, a virtual currency. After that, the system will start feeding you live CAPTCHAs to solve, prefacing each with an notice about the rate that the client has agreed to pay per batch.

Depending on the demands that clients place on the service, there may be a brief delay between CAPTCHAs, but generally only a few seconds pass between the time a solved puzzle is submitted and when a new one is offered. Each new puzzle is preceded by an audible “beep,” and workers are expected to solve and type each of the CAPTCHAs in less than 10 seconds. During downtime, the system displays workers’ average puzzle solving times, as well as actual and projected weekly earnings.

If sort of drudgery sounds like easy money, take a moment to work out the math. Assuming that you can solve six CAPTCHAs per minute and work eight hours straight, you’d be able to solve about 2,880 puzzles each day. Even at the highest CAPTCHA solving rate, you’d only make $2.88 daily; at the lowest rate, you’d make just over a dollar a day.

No, the real earnings only come when you assemble an army of workers to solve CAPTCHAs for your WebMoney account, as described by this FAQ at KolotiBablo.com.

As long as there is low-cost human labor willing to do this kind of work for pennies per day, CAPTCHAs will continue to be an ineffective way to prevent automated account creation and spammy Web site comments. But at least experts are working on making CAPTCHAs less annoying: Some firms are starting to pitch more user-friendly alternatives to the hard-to-read squiggly CAPTCHAs.

If you’d like to learn more about CAPTCHAs and the semi-automated systems being built to defeat them, I’d suggest reading this paper (PDF) on CAPTCHA-solving services, from researchers at University of California, San Diego. Also, in Nov. 2010, I wrote about CAPTCHABot, another puzzle-solving service with similar rates and practices.

Tags: , , , , ,

32 comments

  1. Wow, i thought that someone could pull this off, but i am amazed at how they did it.

    I have been to India and other developing countries, and i am not the least surprised of the pay you mention, even at the lowest level, somehow 1$ per day is considered a wage you can live on in those countries…do you instead have any idea how much money the guy running the horde of captchers (?) can make outta this?

    • This is not new at all, human captcha solving services have been around for years such as DeathbyCaptcha, BypassCaptcha etc and so on.

      Any worthwhile SEO app such as SEnuke, , SERobot, Article Marketing Robot, Bookmark Demon, Sick Submitter etc allows the use of these services. For the price of this service I wouldn’t expect a good service. Having said that I’ve never used this particular service and the price has been dropping steadily as more players enter the market.

      By the way canny programmers (seemingly always Russian for some reason – they’re really smart guys basically) have automated captcha solving by applications themselves – ie. no need to pay for outsourcing.

      Xrumur I believe does this (I may be wrong about this as I don’t use it anymore) but even freeware applications like USDownloader break most captchas that cyberlockers use these days (often reCaptcha).

      The author of USDownloader is also responsible for a popular AC3 decoding filter … like I said, exceptionally talented guys.

      • Agreed. I read articles on it 3-4 years ago where they were using porn, including child porn, to do it. They would let people see the porn without charging money. However, they would force the person to periodically solve a few CAPTCHA’s. Every half dozen CAPTCHA’s solved would give them access to a similar number of high res pics or vids.

        There were also people outright paying for CAPTCHA’s. And Amazon’s Mechanical Turks service has been used for purposes like these. The smart one’s claim it’s to improve character recognition software or some other nonsense.

      • Automated solvers work fine in isolation, but they tend to be bad business because the CAPTCHA provider can typically change CAPTCHA implementation at low cost and thus undermine the capital investment in the solver. By and large automated solvers only play a role in the market for CAPTCHAs that are not well stewarded or for very short-term runs against a popular service. New solvers typically have success rates of 25-30% and the failure rate provides a great signal that its time to update your service.

        As to the issue of service quality, Kolotibablo/Antigate is actually a very high quality service (among the best we’ve measured and certainly, along with captchabot, the best bang for the buck). Not surprisingly Xrumer uses antigate and captchabot out of the box (or did a year ago when we last checked). The dynamics around pricing is complex since it reflects not only the base labor cost (which can go down to 0.40/1k), but the providers turnover and cost structure. The higher-priced providers frequently have a kick-back relationship with software providers (basically keeping prices artificially high through artificial monopoly)

  2. Unlimited human brains to solve problems …

    Just setup a porn website and use any unsolved captchas as your website entrance page.

    • You need to have a *very* popular porn website to achieve the speed and quality offered by antigate and similar services.

      Let’s calculate. Assume there’s an incoming queue which gets 10 new captchas per second. To be sure that the input is correct, you need to show the same captcha at least 2 times (and more until there are two exact matches). So the factor of ~ 2.3 seems to have sense. Assuming that a porn site visitor solves one captcha every minute, the number of unique visitors in a minute required is:

      10 * 60 * 2.3 = 1380

      With an assumed average of 5 captchas per unique visitor (e.g. he spends 5 minutes on the site), we need 1380/5*60*24 ~= 400 k uniques daily. Wow. With such a popular website there are really better ways to monetize.

    • In practice, no one does this. Russian Guy’s analysis is spot on. Its a cute idea, but a bad business modell.

      • Actually, they do. I don’t have the references on-hand right now (devastating HD crash a while back). However, I had good intel that child pornography sites were successfully using this approach to break CAPTCHA’s, presumably how they funded their operation. They used obscure sites in foreign countries, “bulletproof” hosting providers and/or Tor. This was a year or two ago that someone brought this to my attention. He said it had actually been going on for “quite a while,” whatever that means.

        • Documentation would be appreciated. Everyone “has heard” of the practice, but I’ve found no documentation of it being used at any scale (precisely because it has terrible economics when compared with just paying $1/k). Koobface briefly did this kind of opportunistic solving (system warning, etc) but I suspect this was as much an experiment (or to keep everything “in house”) as anything and every social bot I’ve seen since used one of the services.

        • No, NickP they do not. It’s not just a question of math. Anyone who knows anything about how CP operates on the net knows that what your claim is BIZARRE. CP is distributed most often via free P2P. The very very few CP websites that are run to make a profit (mostly out of eastern Europe) are run via CC, webmoney, or something else. I honestly wish people would stop peddling CP=HUGE MONEY. It’s just bullshit.

          • I’m withholding the source intentionally for his protection from any possible litigation (although he’s not American). The guy who gave me the information is extremely reliable. He saw it first hand on the Tor network going through a huge directory list of .onion sites to see which worked & what was on them. One site had a mostly blank page asking to solve CAPTCHA’s to build up credits. He did five of them & was rewarded with something he was definitely not interested in. He cleared his cache & may have reported the site to the authorities.

            Efficient or the best method? Definitely not. Porn for CAPTCHA’s is very rare. The typical situation with cp operators is to use IRC, Freenet, Tor, offshore hosting/proxies, etc. to transfer it. A few years ago an operator published an insider’s view of the industry & claimed he used Truecrypt-protected VM appliances that contained it on a hidden volume, with the functional appliance disguising the nature of the purchase & helping launder it.

            But, since you are sure I’m wrong, my friend was hallucinating, and porn for CAPTCHA’s NEVER happens cause it’s too bizarre or whatever….

            Spammers employ stripper to crack security (NY Times)
            https://www.nytimes.com/idg/IDG_002570DE00740E18002573850052CF9B.html?ref=technology

            It was just a stripper, but they did it & it worked. Q.E.D.

            • CP surely exists out there, but by its very illegal nature it has to be quite expensive. Meanwhile CAPTCHAs are literally less than dime a dozen. How can such a service stay in business for long?

            • The issue is not “has someone ever done this?”. People do all kinds of dumb things that don’t matter.

              The important question is “does opportunistic solving via pornography play a role in practice?” and “does it play a role at scale?” I push back against the porn thing because its sensational and people keep repeating it as though its a practice that matters. I’ve seen no evidence that it is… again because the economics are terrible.

              Note that the 2007 stripper virus that did this (which incidentally is the very first reference in our paper) operated at a time when the CAPTCHA solving business had first started and the price regime was much more advantageous (up to $10/1k). However these prices were unsustainable since CAPTCHA solving is as unskilled a job as you can get. Once people figured out that this could be aggregated and outsourced (and to labor markets much cheaper than eastern Europe) prices went down by 10x.

              Again, at the market price of $1/1k, a porn-based opportunistic solver would need 1M successful conversions to make $10… Now lets think about costs. Lets assume you pull of these 1M solves each day and that have really eager users who will solve one CAPTCHA for each porn image. At 100KB per image, that’s 100GB/day of bandwidth you need to cover with your $10 in revenue. Ok, well maybe these are all compromised hosts, freehosting, etc… but seriously, the value of the traffic and the bandwidth dwarf the CAPTCHA cost by orders of magnitude. Indeed, if you want some empirical numbers lets use the TU Vienna/UCSB porn study from WEIS which showed a spread of roughly $2-3/1000 on porn traffic. So $2,000/day vs $10/day… Now in principal these might be additive and why not go for an extra $10? Well, because putting an annoying CAPTCHA up is likely to reduce conversions if you’re a paysite and reduce sellable clicks if you’re running a TGP/MGP.

              I’d be happy to be wrong on this because that’d mean there’s some really interesting new factor making opportunistic solving attractive in spite of all its drawbacks. But I’ve yet to see the evidence suggesting that this approach get used today at any scale that we’d care about.

              • Ooops my bad math… 1M impressions with one solve per impression gets you $1k at $1/1000. A much better deal than my above analysis, but still not as good as just selling the traffic directly. And much worse if your visitors wont solve one CAPTCHA per pic (e.g., once every five)

              • “The issue is not “has someone ever done this?”. ”

                Actually, that WAS the original issue. You said it was a rumor that occasonally floated around. The other guy said it was bizarre & basically implied it was impossible due to his understanding of cp operations. So, I posted two counterexamples.

                As for economics and such, I largely agree with your stance on it.

                • But just because someone saw something on Tor doesn’t mean it was actually a functioning site. What did they pay in, Bitcoins? That’s a serious question. You’re going to have to show me more than a screencap before I believe you on this topic. (and normally I do believe you Nick P).

                  Anyway, even if I am wrong I would think that such a site would be rare. People involved in CP tend to be security paranoid (and for good reason) and I can’t imagine they would want yet another attack vector to expose their activities.

  3. Very interesting. This is one method of breaking CAPTCHAs that has no obvious countermeasures.

    Or maybe there is one. If the time taken to break a CAPTCHA is increased, operations like these will have a harder time making money. But it has to be done in a way that wouldn’t annoy a legitimate user. Ideas, anyone? Would letting words appear a few letters at a time (maybe even as the user is typing, kinda like Google Instant) work?

    • And it will just take about an hour to implement the antigate API extension for this case. :)

      The Captcha is a method to tell if it’s a human or a robot. To distinguish between a human and another human, you should find some different principle. This is typically solved by requiring another communication channel such as phone/SMS confirmation.

      • Quite so. Besides, the right way to look at CAPTCHAs is not as a perfect defense, but as a filter that increases the transaction cost. This filters out all the bad guys will inefficient business models and lets you focus more expensive screens (e.g., SMS challenges) on a far smaller number of cases. Trying to make CAPTCHAs do everything is unlikely to be an efficient strategy.

  4. > The twin operations say they do not condone the use
    > of their services to promote spam, or “all those related
    > things that generate butthurt for the ‘big guys,’”

    Well.. at least they’re respectful ..and professional!

  5. Capitalism: we brought you the pop-up ad.

  6. Google et al should pay $2.89 per day to these folks to pass computer or agriculture courses or something similar.

    Win-win.

    I remember a (possibly apocryphal) story about the Opium trade almost collapsing when McDonalds started paying a few more pennies per acre of potatoes than the going rate for poppies.

  7. An example from my archives showing how long this stuff has been going on. Prices have dropped since then.

    Inside India’s CAPTA-solving economy
    http://www.zdnet.com/blog/security/inside-indias-captcha-solving-economy/1835

  8. Russian social sites and the like have these days moved away from using captchas altogether and towards requiring users to provide a valid mobile phone number, which can be used to send the user a code he then enters to validate his identity. (And I notice that Facebook recently asked me for a mobile number too.)

    What really surprised me was that even this only kept the spammers at bay for about six months or so.

  9. CAPTCHAs are so bloody frustrating for genuine users to ‘interpret’. The amount of times I catch myself staring sideways at some obscure letter trying to figure out what is….

  10. Way to bring literacy into third-world countries!

  11. The only question that I have is how such absurdly underpaid workers (from $0.35 ~ $1.00 per 1,000 CAPTCHAs) can afford to buy a computer and pay for Internet access to even be able to work for so little.

    TJL

    • I suppose they go to work in some internet coffee, and maybe pay half of what they earn for this service :|

  12. I use antigate.com in my daily SEO-activities, its very handy and cheap service, supported by many useful SEO software products. Go Indians or whoever solves those captchas )

  13. Why not make it a standard that all captchas are valid for say… 30 seconds? Then they can’t be passed on to an API to be solved unless they are very quick.

    It also seems like if one IP is requesting thousands of accounts that require captchas, it would be wise for the application provider to limit how many accounts can be created per ip in a certain time.

    • Both good ideas and both quite ineffective unfortunately.

      Wrt CAPTCHA timeouts: everyone has something like this, but they don’t work against the solving services. Lets work it through back of the envelope: Suppose the legitimate user population solves a CAPTCHA with a mean of 15 seconds, with a SD of 5. You get 3 sigmas of coverage under your 30 second timeout so not too many legit users will get pissed. Now lets conservatively assume that the population staffing CAPTCHA services is no better than normal users (i.e., that even though they are doing this all day and are financially motivated, they still don’t do better than average legitimate users). Given this, you need to count on the delay to get the CAPTCHA into their hands to resolve the difference between legit solvers and those staffing offshore solving services. Unfortunately, the typical RTT from the US West Coast to China is perhaps 200ms… completely dwarfed by the legit population response-time variation. In fact, the measured median response time (including network delay) for antigate is under 10seconds. Timeouts don’t work in general for this reason.

      IP blacklisting isn’t that effective either because those signing up for these accounts are typically using proxy networks (we’ve seen situations where an actor signed into thousands of accounts using a different IP for every signup). Remember that the solving service IP’s never appear in your logs because they deliver directly to their client who _in turn_ uses the solution via whatever infrastructure they have available.

  14. Hmmm…
    Does anybody know about Pixodrom ( http://anti-captcha.net ) second website name (such as kolotibablo)?