Why Text-based CAPTCHA Cannot Satisfy the Needs of Enterprise

30446

07 Nov 2019 • 10 min read

Why Text-based CAPTCHA Cannot Satisfy the Needs of Enterprise

07 Nov 2019 • 10 min read

Email spam has steadily grown since the early days of the Internet, and by 2014 was estimated that it made up around 90% of email messages sent [1]. In the early 1990s, Yahoo was troubled by the mailbox bots, which bombed users’ inboxes with spam and badly hurt the user experience. Therefore, Yahoo invited Luis von Ahn who put forward CAPTCHA in a text-based style to solve this problem.

CAPTCHAs are designed to be easy for humans but hard for machines. It is the gateway that kicks out bots and only allows humans to pass the test. By integrating a captcha, you can get secure control of your websites, mobile apps, or APIs. However, it also creates barriers to some extent for legit human visitors to continue their voyage or even stop them from visiting the website.

The traditional text-based captcha hypothesizes that the question presented by the captcha is hard for bots to answer and can only be answered by humans. Therefore, most of the text-based captchas focus on how to generate difficult questions or images for bots or computers.

With the fast growth of computer vision, machines are becoming more sophisticated at image recognition. To fight against the continuous evolution of technology, complex noises have been integrated with the text, which also makes the captcha difficult for humans to recognize. According to a series of studies conducted by Stanford University, distorting, rotating, or collapsing characters, using multiple fonts and so on could be effective ways to reduce the machine recognition ratio [2]. These difficult text-based images hurt not only bots but also humans. By collecting captcha from 13 most used websites and performing experiments with more than 1,100 people, the study found that the text-based captcha took on average 9.8 seconds to view and solve [3]. The more difficult and illegible the captcha, the higher the drop rate in conversions. Many websites or mobile applications count on conversions to make money. Every extra second spent on captcha could lead to an increase in the bounce rate and potentially hurt your business.

In addition, the text-based captcha is still not secure enough. The text-based captcha requires a question bank to present different captcha questions or images each time. If this question bank has been traversed and you get the answers for all the questions, it’s super easy to bypass the captcha. Some already mature techniques (for example, binarize letters, support vector machines, and optical character recognition) can easily recognize the text-based captcha with different levels of accuracy. Lots of crowdsourcing companies providing CAPTCHA solving services. These companies also called captcha farm, hire an army of low-waged human captcha solvers to solve the captcha manually.

Increasing the difficulty level of captcha will enhance the drop in conversions, not to mention that it might do nothing to help improve the security. The traditional text-based captcha has lost its effectiveness to the advancing technology and cannot satisfy the need of enterprises. To address this security dilemma, GeeTest created a new generation of captcha to tell humans and bots apart based on human behavior. When people are surfing the internet, they will automatically generate biometric information (e.g. mouse track) and environment information (for example, device attributes, browser version). When bots are trying to crack the captcha, they might use browser automation tools or hack the API, and thus results in environment discrepancy with legit human visitors. Besides the behavior pattern, mouse click frequency and other biometric information of bots are also significantly different from humans.

To provide a better user experience, GeeTest captcha firstly asks the human visitors to click on the captcha button. The biometric data generated through this simple action could be analyzed together with the device attributes to figure out the risk level. If a risk is detected, then the visitors will only be asked to finish a different captcha challenge based on the risk level. By doing this, GeeTest could lower the potential obstacles for visitors as much as possible. GeeTest conducted a test with 30 people and found that it took on average 2.74 seconds to view and solve GeeTest captcha, which is far less than the time needed to pass the text-based captcha.

Want to learn more about GeeTest captcha? Leave us a message at intermational@geetest.com

References:

[1] The M3AAWG Email Metrics Report. [2019-10-10].

[2] Text-based CAPTCHA Strengths and Weaknesses. [2011-10].

[3] How good are humans at solving CAPTCHAs? A large scale evaluation. [2010-05].

Start your free trial

Over 320,000 websites and mobile apps worldwide are protected by GeeTest captcha

Get started

Learn more

GeeTest

Subscribe to our newsletter

Learn how to prevent DDoS attacks with modern strategies and tools to protect your website from bot traffic, downtime, and service disruption.

Fraud Prevention

Protect Your Website from DDoS Attacks: Best Practices for 2025

Learn how to prevent DDoS attacks with modern strategies and tools to protect your website from bot traffic, downtime, and service disruption.

GeeTest • 2 min read

Discover the essentials of Business Rules Engine (BRE), including the core benefits, types, and operations. Uncover how GeeTest BRDE enhances rule management.

Botpedia

A Comprehensive Guide to Business Rules Engine (BRE)

Discover the essentials of Business Rules Engine (BRE), including the core benefits, types, and operations. Uncover how GeeTest BRDE enhances rule management.

GeeTest • 2 min read