20 May 2020 • 10 min read
20 May 2020 • 10 min read
CAPTCHA is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.”, which is a module used in websites, mobile apps, and APIs to distinguish automated computer programs from genuine human users.
Sometimes, CAPTCHA is described as reverse Turing tests. As we know, the Turing test is a method of inquiry in artificial intelligence where a computer has to convince a human that it's a human. Therefore a reverse Turing test is a human convincing a computer that it is not a computer. If you write a program that automatically generates such a test on the internet, then you get yourself a CAPTCHA.
The first commercial deployment of a CAPTCHA-like system was done by Andrei Broder and his fellow engineers at the AltaVista search engine. They’ve developed an automated filter system to stop bots from submitting URLs (a type of black hat SEO) that skewed AltaVista’s algorithm.
However, the term CAPTCHA was coined in 2003 by Luis von Ahn and his colleagues at Carnegie Mellon University in their publication “CAPTCHA: Using Hard AI Problems for Security.”
For more information, please refer to "History of CAPTCHA - The Origin Story".
CAPTCHAs are deployed at operational gateways such as login, register, submit, etc., to prevent computer programs from accessing and committing fraud and abuse.
The original idea behind CAPTCHA is to present cognitive challenges in the form of distorted texts that humans can recognize and pass easily while computer programs cannot.
However, with the advancement of computer technologies, traditional CAPTCHAs quickly became obsolete. Today, advanced CAPTCHA modules use risk analysis engines based on behavioral detection and artificial intelligence technologies to prevent sophisticated bot threats.
Digital criminals use bot programs to commit fraud and abuse through automation of tasks in a malicious way such as credential stuffing attacks, email or comment spam, online poll fraud, etc. CAPTCHA prevents digital criminals from automating their fraudulent tasks by ensuring there is a real human behind each request, thus effectively preventing scammers from scaling their illicit operations and committing online fraud and abuse.
CAPTCHA prevents threats including:
Without CAPTCHA, spam and abuse would take over most of the platforms, and the internet ecosystem as we know would not exist.
Over the past two decades, CAPTCHA has evolved in three generations to defend against the increasingly sophisticated bad bots as well as meet the user's needs for a smoother experience.
This Captcha takes simple logic: Humans are better than machines at recognizing twisted and warped text letters. It includes:
The superiority of humans over machine programs in recognizing twisted and warped text letters. By introducing noise in the form of different widths, heights, background patterns, borders, and so on, text letters would become impossible to be recognized by OCR(Optical Character Recognition) technology at the time.
Humans solved the challenges with only a 33% success rate while the computer had an accuracy of 99.8% in recognizing heavily distorted texts. This marked the end for first-generation text-based CAPTCHAs. Standard CAPTCHA (also known as text-based CAPTCHA) became obsolete in 2014 when google pitted one of its machine learning algorithms against humans on recognizing heavily distorted text.
This Captcha had left the text-based input approach for more innovative challenges that deemed them very difficult for machines to bypass. These challenges included logic puzzles, visual comparisons, movement-based CAPTCHAs, or math challenges.
However, even though the second generation of CAPTCHAs looked very different from the first one, the logic behind the challenges stayed very similar: the superiority of humans over machine programs in recognizing images, numbers, or various objects. It includes:
As computer technology advanced and bad bots were becoming better at solving such puzzles, the CAPTCHAs had to be increasingly difficult. As the user friction created by difficult CAPTCHAs has become too severe and the advanced AI technology deemed the second generation gamified CAPTCHAs merely ineffective.
This Captcha has taken the human verification process into a new dimension by introducing advanced risk analysis into the equation. With no requirement of human thinking, no-knowledge CAPTCHAs have minimum to no interruption to user operations and provide a much better user experience.
The back-end risk analysis based on behavioral factors within a confined space, as well as environmental factors such as device reputation, hardware specifications, etc., are utilized to tell apart genuine human behavior from automated human behavior. AI-powered bots can mimic human behavior, and AI-powered CAPTCHA is a necessity to stop advanced bot threats.
The first and second generations of CAPTCHAs had a paradoxical logic that stated: humans, compared to machines, have superiority in recognizing images, numbers, or various objects. This means that as computer programs get better at recognizing characters and images, the CAPTCHA challenges have to be increasingly difficult to prevent computer programs from bypassing them.
As a result, the difficulty of CAPTCHAs has increased exponentially with the advancements in computer and AI technologies. Today, a better solution is the third generation no-knowledge CAPTCHAs. Advanced CAPTCHA solutions are much easier to pass and infinitely more secure than traditional CAPTCHA solutions.
A CAPTCHA test is triggered if the risk analysis engine cannot ensure that the user is indeed a human or a bot. Thus, it marks the user as suspicious, and a challenge-response is presented. There are many factors behind risk analysis, which are the secret sauce of the system. Some known factors are the parameters related to user behavior, such as operating speed and the IP address.
Fraudsters have been exploiting systems with automated attacks since the early days of the Internet. Traditional CAPTCHA saved us from bot threats in the early times.
However, backed by massive financial motivation and advancing computer technologies, fraudsters keep finding ways to bypass or crack CAPTCHA measures, leaving the ecosystem vulnerable to bot attacks. Some bots can get past the text CAPTCHAs on their own. Researchers have demonstrated ways to write a program that beats image recognition CAPTCHAs as well. In addition, attackers can use click farms to beat the tests: thousands of low-paid workers solving CAPTCHAs on behalf of bots.
Industry experts hail machine learning as the most promising solution against automated threats. When the behavioral analysis is integrated into captcha, the challenge-response becomes a way to collect biometric data, instead of a cognitive challenge. This is a dramatic change in the logic of bot defense compared to older generations of captchas.
Biometric data is used in the risk analysis engine to determine whether the behavior is human or machine, which allows challenges to be much harder for bots and a lot easier for all humans. This type of captcha is often referred to as “Advanced CAPTCHA”.
Advanced CAPTCHA is an anti-bot program that distinguishes itself from regular CAPTCHAs by utilizing a risk analysis engine (often based on behavioral analysis) over or within its challenge-response mechanism. While the legacy captcha systems ensured security with the difficulty of the challenge response, advanced captchas focused on behavioral characteristics of the traffic instead, providing higher security with less user friction.
Advanced CAPTCHAs can be used as stand-alone solutions to mitigate up to 98% of automated threats, and the benefit of bot managers over CAPTCHAs is minimal at best. In the field of cybersecurity, there is no 100% safety and in the endeavor to stop online fraud & abuse caused by automated programs, advanced CAPTCHAs can successfully break the business model of fraudsters.
As CAPTCHA hacking methods are abundant, and tools of hacking are easily accessible, traditional captchas are easily bypassed, making the sites vulnerable to malicious automated attacks.
However, with the introduction of Geetest Adaptive CAPTCHA, the era of captcha is far from over. When integrated with a back-end engine, the possibilities for this advanced captcha are far and wide. We can observe the most prominent of those sophistication possibilities under three main categories:
Environment detection refers to the information retrieved from the user’s computer environment, such as the hardware specification, various devices, screen size, browser properties, version, etc.
Using elaborate machine learning models for advanced risk analysis, the environmental information can be used to detect browser automation tools accurately. By mitigating browser automation tools from the arsenal of hackers, an advanced captcha solution can significantly limit the ability of hackers to stay under the radar and scale their fraudulent operations.
While strong front-end encryption and dynamic honeypot can mitigate the threat of API hacking, sophisticated origin detection techniques can pinpoint requests from captcha farms.
The integration of behavioral analysis into captcha allows challenges to be less about the ‘correct’ answer and more about ‘the method’ of acquiring the answer.
Biometric data generated through the user’s interaction with the captcha module is used in the risk analysis engine to determine whether the behavior belongs to a human or a machine. This is a dramatic change for the logic of bot defense compared to older generations of captchas and a crucial feature for any relevant advanced captcha solutions.
A biometric classification model within a captcha model means that merely using ML and OCR to crack the challenge is not enough. An automated program has to not only crack the challenge but also do so while perfectly mimicking human behavior. Generating biometric data that is genuinely human to pass the risk analysis engine -though possible- still introduces enough limitations to prevent “a successful bot attack” from occurring.
Once a CAPTCHA is presented to a user, the image used within the challenge becomes public. This means hackers can use these images to train a machine learning model or use them for reverse library types of attacks. Therefore, images used within the challenges can pose a threat to the security of the captcha.
By continuously updating the resource pool and encrypting the images used within the challenges, advanced captchas can prevent reverse library and brute force type of attacks, significantly increasing the cost of attempting an attack.
Advanced CAPTCHA is a business imperative, not an IT imperative.
If your online business operations are valuable for your business, you should go with an enterprise-grade advanced captcha that is secure and provides a seamless user experience and 24/7 support. Enterprise-grade advanced captchas can stay up to date with emerging threats, and mitigate the risk of your business being a target of sophisticated bad bots.
There are a lot of industries that can benefit significantly from an advanced CAPTCHA solution, including:
Most websites or mobile apps with critical operations (such as login, register, form submission, etc.) need advanced CAPTCHA to prevent automated attacks from happening. With the availability and affordability of machine learning and cloud computing tools, today, the bad actors can reach further and hit harder than ever before. Without a strong deterrence such as an Advanced CAPTCHA solution, bot attacks are only a matter of when. Advanced CAPTCHAs increase the cost of attack exponentially and are a necessity for the protection of most websites, mobile apps, and APIs.
With over 12 years of enterprise-grade captcha services experience, GeeTest has served 360,000+ enterprises worldwide including Airbnb, Binance, Xiaomi, etc., and processes 1,000,000,000+ requests per day. Try GeeTest Adaptive CAPTCHA and protect your website, app, and APIs from Captcha bot attacks, or register for a free 30-day trial now!
GeeTest
GeeTest
Subscribe to our newsletter