20 May 2020 • 10 min read
20 May 2020 • 10 min read
CAPTCHA is a module used in websites, mobile apps, and APIs to distinguish automated computer programs from genuine human users.
CAPTCHA is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.”
Turing test is a method of inquiry in artificial intelligence where a computer has to convince a human that it's a human. Therefore a reverse Turing test is a human convincing a computer that it is not a computer. If you write a program that automatically generates such a test on the internet, then you got yourself a CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart.
Captcha test is a challenge-response -often in the form of a cognitive task- that requires user interaction to determine whether or not the user is human.
Automated computer programs -bots- can be used for malicious activities such as spam, brute-force attacks, competitive data mining, etc. CAPTCHA is a gateway that protects websites, mobile APPs, and APIs from these bad bots by presenting a challenge-response that is hard for computer programs to bypass and easy for human users.
Without CAPTCHA, spam and abuse would take over most of the platforms, and the internet ecosystem as we know today would not exist.
The digital-criminals use bot programs to commit fraud and abuse through automation of tasks in a malicious way such as credential stuffing attacks, email or comment spam, online poll fraud, etc. CAPTCHA prevents digital criminals from automating their fraudulent tasks by ensuring there is a real human behind each request, thus effectively prevents scammers from scaling their illicit operations and committing online fraud and abuse.
The first and second generation of CAPTCHAs had a paradoxical logic that stated; humans, comparing to machines, have superiority in recognizing images, numbers, or various objects. This means, as the computer programs get better at recognizing characters and images, the CAPTCHA challenges have to be increasingly difficult to prevent computer programs from bypassing them.
As a result, the difficulty of CAPTCHAs has increased exponentially with the advancements of computer and AI technologies. Today, a better solution is the third generation no-knowledge CAPTCHAs. Advanced CAPTCHA solutions are much easier to pass and infinitely more secure than traditional CAPTCHA solutions.
CAPTCHAs are deployed at operational gateways such as login, register, submit, etc., to prevent computer programs from accessing and committing fraud and abuse.
The original idea behind CAPTCHA is to present cognitive challenges in the form of distorted texts that humans can recognize and pass easily while computer programs cannot.
However, with the advancement of computer technologies, traditional CAPTCHAs quickly became obsolete. Today, advanced CAPTCHA modules use risk analysis engines based on behavioral detection and artificial intelligence technologies to prevent sophisticated bot threats.
The Invisible CAPTCHA is a pre-challenge-response mechanism, which works as a risk analysis system. It is designed to reduce user friction caused by challenge responses. Invisible CAPTCHA reduces the friction for unsuspected visitors while enforcing challenges only for suspicious users.
When a user clicks the CAPTCHA checkbox, the invisible CAPTCHA module analyzes parameters such as typing speed, cursor movements, cookies, etc., to determine whether a user is a human or a bot.
A CAPTCHA test is triggered if the risk analysis engine cannot ensure that the user is indeed a human or a bot. Thus, it marks the user as suspicious, and a challenge-response is presented. There are many factors behind risk analysis, which are the secret sauce of the system. Some known factors are the parameters related to user behavior, such as operating speed and the IP address.
The purpose of captcha is to stop automation; however, with the widely available and intelligent hacking tools, traditional captchas -unlike enterprise-grade captchas- can be easily automated using browser automation tools and captcha farms.
The first commercial deployment of a CAPTCHA-like system has done by Andrei Broder and his fellow engineers at the AltaVista search engine. They’ve developed an automated filter system to stop bots from submitting URLs -a type of black hat SEO- that skewed AltaVista’s algorithm.
However, the term CAPTCHA was coined in 2003 by Luis von Ahn and his colleagues at Carnegie Mellon University in their publication “CAPTCHA: Using Hard AI Problems for Security.”
There are two aspects of CAPTCHA that benefit AI technology.
Firstly, CAPTCHA is a great way to feed data into machine learning algorithms, which is crucial for its effectiveness.
Let’s take image recognition as an example. A stop sign is a red octagon with white letters reading “STOP,” and it can be identified by computer programs reasonably easily. However, the stop sign within a picture can look very different depending on the angle of the photo, the lighting, the weather, etc.
If we can feed millions of real-world pictures of stop signs into a machine learning algorithm, it can become very accurate at identifying stop signs within an image. However, marking millions of real-world images that include stop signs to feed the algorithms would require immense amounts of human labor. Or one could put all those images into a 3x3 matrix and ask people to select the ones that include a stop sign. Google has been training its image recognition algorithm through ReCAPTCHA for many years now, and even though the system has been successful at labeling large amounts of images, it has severe flaws as an anti-bot solution due to the high user friction it creates.
The second aspect of CAPTCHA that benefits AI technology is its role within the cybersecurity environment.
CAPTCHA is invented for the cybersecurity field to tell humans and bots apart using hard AI problems such as character or image recognition. Cybersecurity is a cat and mouse game between attackers and defenders where both sides try to best each other constantly. When CAPTCHA is deployed as a tool on the side of defenders, the only way for attackers to bypass CAPTCHA is through sophisticated solutions that will give automated computer programs the ability to be more human-like.
Especially today, where advanced CAPTCHA solutions utilize AI in their defenses in one way or another, every time a CAPTCHA solution has been breached, it likely means a computer program is more human-like and AI technology moves a step further.
Over the past two decades, CAPTCHA has evolved in three generations to defend against the increasingly sophisticated bad bots as well as meet the user needs for a smoother experience.
Standard CAPTCHA (also known as text-based CAPTCHA) had become obsolete in 2014 when google pitted one of its machine learning algorithms against humans on recognizing heavily distorted text.
Humans solved the challenges with only a 33% success rate while the computer had an accuracy of 99.8% in recognizing heavily distorted texts. This marked the end for first-generation text-based CAPTCHAs.
As computer technology advanced and bad bots were becoming better at solving such puzzles, the CAPTCHAs had to be increasingly difficult. As the user friction created by difficult CAPTCHAs has become too severe and the advanced AI technology deemed the second generation gamified CAPTCHAs merely ineffective.
The back-end risk-analysis based on behavioral factors within a confined space, as well as environmental factors such as device reputation, hardware specifications, etc., are utilized to tell apart genuine human behavior from automated human behavior. AI-powered bots can mimic human behavior, and AI-powered CAPTCHA is a necessity to stop advanced bot threats.
The CAPTCHA solutions that rely on the difficulty of cognitive tasks to ensure security will neither find security nor acceptable usability.
Yes, captcha entry jobs are a thing. There are online platforms as well as physical “captcha farms” that offer $0.5 to $2 per 1000 captchas solved.
Workers can directly register to online earning platforms or hired through freelancer websites. After the workers have downloaded the necessary software, they will be presented with their work material, captchas.
Captcha solving farms are perceived as a serious cybersecurity threat by captcha and bot management systems as they allow attackers to utilize genuine humans for their operations. Nonetheless, advanced captcha solutions such as GeeTest can defeat captcha-solving farms through sophisticated environment and origin detection techniques.
Although any website with a sensitive login or registration module should use a CAPTCHA, there are six industries that benefits significantly more from an advanced CAPTCHA solution.
Most websites or mobile APPs with a critical operation such as login, register, form submission, etc., need a CAPTCHA to prevent automated attacks from happening. With the availability and affordability of machine learning and cloud computing tools, today, the bad actors can reach further and hit harder than ever before.
Without a strong deterrence such as an Advanced CAPTCHA solution, bot attacks are only a matter of when. Advanced CAPTCHAs increase the cost of attack exponentially and are a necessity for the protection of most websites, mobile APPs, and APIs.
Advanced CAPTCHA is a business imperative, not an IT imperative.
Captcha is undeniably a necessity for all online platforms with critical operations such as login, registration, submission, etc., to defend against automated (bot) threats. When implementing a captcha solution, however, there are certain advantages, disadvantages, trade-offs, and precautions that one should be aware of.
Increased User Friction: Captcha uses a challenge-response to tell apart genuine humans from machines, which adds friction to the user experience. The amount of friction varies significantly among different captcha solutions from 1.6 seconds to well over 30 seconds. Moreover, advanced captcha solutions utilize risk analysis engines to limit challenge responses only to suspicious users, therefore effectively reducing the overall user friction further.
Limited Number of Effective Solutions: First-generation text-based captchas and second-generation gamified captchas are obsolete against sophisticated bot threats that are commonplace on the internet today. The effectiveness of a captcha solution is mostly dependent on its innovative technologies, such as artificial intelligence and behavioral analysis. There are only a handful of advanced captcha providers today that utilizes such technologies and options somewhat limited.
Without a captcha, online platforms are open to significant fraud and abuse, but using an obsolete captcha can be just as dangerous with a false sense of security and atrocious user experience.
When it comes to fighting modern bot threats, the goal for all advanced solutions is the same; to distinguish genuine human behavior from automated human behavior. While CAPTCHA achieves this goal through interaction, some systems detect bots by analyzing the entirety of the website traffic.
Even though these expensive bot detection systems are deployed network-wide and provide security for the whole website instead of just essential operational gateways, they still encounter large numbers of suspicious traffic. How do detection-based systems deal with suspicious traffic? Directly blocking them runs the risk of a high rate of false positives and reduced conversion rates for the website while allowing the suspicious traffic makes the system vulnerable to bot attacks. This is where advanced CAPTCHA solutions come into play, making the final judgment on the suspicious traffic and reducing the rate of false positives to the minimum. The Defence-in-depth approach can maximize security; however, the positive effect of this in terms of security is negligible at best.
There is no 100% protection in cybersecurity, yet advanced CAPTCHA solutions can get there pretty close. When the significant cost benefits comparing to alternatives are accounted for, advanced CAPTCHA has a clear business imperative and a must-have as the first line of defense against automated threats.
There are five major techniques that fraudsters utilize when hacking a CAPTCHA system;
These techniques can be used based on the target CAPTCHA system or in combination with one another to increase further the bot programs' sophistication, which can be used for large-scale fraud operations.
CAPTCHA, a static mechanism by nature, is a perfect problem to be solved by an ML algorithm. Cybercriminals can create a dataset from the targetted captcha challenge that is readily available on the internet to train a model with a supervised learning algorithm. With enough data and good model architecture, the model can achieve high enough accuracy.
This type of attack is often directed towards less sophisticated or in-house captcha solutions implemented by smaller websites.
The attackers can collect the entire database of questions/images via a simple script, then get the questions answered or images labeled through a 3rd party service. Therefore, the attackers will have answers to the entire question bank, and the captcha will be deemed useless.
Another method is to let the bot program try and pass the captchas randomly. Every time the program is successful, it can take note of the correct answer for that question. The program will eventually acquire the entire database of questions with the correct answers, deeming the captcha useless.
API hacking in the context of CAPTCHAs refers to returning fake responses to the challenges -often through a js injection- in an attempt to fool the back-end system.
When legitimate users interact with CAPTCHAs in the front end, the data is retrieved and processed based on specific rules. This data can pass the verification because back-end programs sense that the data has been processed with the correct rules. The rules are the logic that decides the interaction between the front-end and back-end, including the data type, format, and encryption.
Captcha solving farm refers to automated captcha recognition services where captchas are directed -through an API- to human workers who solve captchas remotely.
Since CAPTCHA challenges are designed to determine whether the user behind the request is human, captcha-solving farms are perhaps the most legit way to bypass captcha systems. Today, these services can be acquired for approximately 1.5$ per 1000 captchas solved.
Also referred to as headless browsers, allows the execution of a full version of the browser while controlling it programmatically. Meaning that these tools can run without the graphical user interface(GUI).
Browser automation is not a direct captcha hacking technique per se, but rather it is a powerful tool that enables bot programs to appear more human-like and makes bots extremely difficult to be detected by bot detection systems.
CAPTCHA hacking methods are abundant, and tools of hacking are easily accessible. With these readily available tools and techniques, traditional captchas are easily bypassed, making the sites vulnerable to malicious automated attacks.
However, with the introduction of Advanced CAPTCHAs, the era of captcha is far from over. When integrated with a back-end engine, the possibilities for captcha sophistication are far and wide. We can observe the most prominent of those sophistication possibilities under three main categories;
Environment detection refers to the information retrieved from the users’ computer environment, such as the hardware specification, various devices, screen size, browser properties, and version, etc.
Using elaborate machine learning models for advanced risk analysis, the environmental information can be used to detect browser automation tools accurately. By mitigating browser automation tools from the arsenal of hackers, an advanced captcha solution can significantly limit the ability of hackers to stay under the radar and scale their fraudulent operations.
While strong front-end encryption and dynamic honeypot can mitigate the threat of API hacking, sophisticated origin detection techniques can pinpoint requests from captcha farms. (link; captcha farm)
The integration of behavioral analysis into captcha allows challenges to be less about the ‘correct’ answer and more about ‘the method’ of acquiring the answer.
Biometric data generated through the user’s interaction with the captcha module is used in the risk analysis engine to determine whether the behavior belongs to a human or a machine. This is a dramatic change for the logic of bot defense comparing to older generations of captchas and a crucial feature for any relevant advanced captcha solutions.
A biometric classification model within a captcha model means that merely using ML and OCR to crack the challenge is not enough. An automated program has to not only crack the challenge but also do so while perfectly mimicking human behavior. Generating biometric data that is genuinely human to pass the risk analysis engine -though possible- still introduces enough limitations to prevent “a successful bot attack” from occurring.
Once a CAPTCHA is presented to a user, the image used within the challenge becomes public. This means hackers can use these images to train a machine learning model or use them for reverse library type of attacks. Therefore, images used within the challenges can pose a threat to the security of the captcha.
By continuously updating the resource pool and encrypting the images used within the challenges, advanced captchas can prevent reverse-library and brute force type of attacks, significantly increasing the cost of attempting an attack.
Captchas are very beneficial to secure your website, mobile app, and API against bot fraud and abuse. Still, for adequate protection, you should only choose among the few advanced captcha solutions that are available today.
CAPTCHA is an interactive security approach to detecting and mitigating bad bot threats. User experience is a significant differentiating factor for advanced captcha solutions in the era of user experience where user-friction directly translates to the success of business operations through conversion rates and revenue.
When it comes to captcha technology, privacy concerns are raised whether the data collected by the captcha system can tell which specific human you are.
Marcos Perona has raised such privacy concerns surrounding no captcha ReCaptcha, AdTruth's lead engineer, who found that the no captcha ReCaptcha isn't overtly labeled as a Google service. Yet, anyone clicking through it "consents" to be tracked by Google's cookies. The combination of first-party cookies and a browser fingerprint can be tied back to an individual. Most individuals merely clicking "I'm not a robot" won't know this is happening behind the scenes.
Due to the increasingly sophisticated bad bot threats, the options to mitigate such advanced threats is somewhat limited, and behavioral analysis is hailed as the most promising solution. Most advanced captchas introduced a risk analysis engine based on behavioral and environmental data, since, at this stage, it is a necessity.
Does the data collected for behavioral analysis threaten privacy? Such data alone is simply insufficient to tell which specific human is behind a request and cannot be used for tracking an individual. Thus, it is safe to say that advanced captcha solutions alone do not pose a threat to users' privacy.
Yes, with advanced OCR(Optical Character Recognition) technology, computers can recognize distorted and warped texts better than humans. Text-based captcha is easier for bots and harder for humans.
Yes, the advanced machine learning technology allows computer programs to be easily trained to handle cognitive tasks such as presented in the second-generation captcha solutions. The logic of harder captcha equals more security is inherently paradoxical, and as computer technologies advanced, it became evident that challenging machines on such tasks are ineffective.
Not yet, but it is eventually expected to be right around the singularity. Jokes aside, current machine models can mimic human behavior up to a certain degree; however, they would have to do that 100% flawlessly not to get detected by sophisticated behavioral detection tools.
If your online business operations are valuable for your business, you should go with an enterprise-grade captcha that is secure and provides a seamless user experience and 24/7 support. Enterprise-grade captchas can stay up to date with emerging threats, and mitigate the risk of your business being a target of sophisticated bad bots.
If you do not have a cybersecurity budget for an anti-bot solution and your business doesn’t carry significant value through its website or mobile app, then you should use a free captcha solution provided by Google. Beware though, hackers can still bypass ReCaptcha through various means, and if you are explicitly targeted it will not stop the sophisticated bots coming on your way, also consider the cost of increased user friction and privacy concern by some of your users.
Honeypots are often pointed as the frictionless alternative to captchas; however, honeypots alone are not sufficient to stop spambots. The modern bot threats are smart enough to overcome such tricks with ease and stopping them requires more advanced solutions than a simple honeypot.
Two-factor authentication is an identity verification system that validates the users’ real identity while a captcha is used to determine whether the user is a human or a bot and cannot answer which specific human is behind the request. Captcha and 2FA/MFA solve different problems; therefore, they are not an alternative to one another.
When it comes to fighting modern bot threats, the challenge for all advanced solutions is the same; to detect non-human behavior within the online traffic. While CAPTCHA stops this non-human traffic through the help of an interactive interaction, bot managers do this by analyzing the entire traffic to understand users’ intent.
Both bot managers and advanced captchas use a risk analysis engine, and if the particular product has any relevance in today’s cybersecurity space, then it is likely to be using a machine learning model that requires a very specific, non-universal dataset. The behavioral data observed by advanced captchas are restricted within a pre-defined space, and this exact same space expands across the entire network of the captcha provider, which results in easy access to relevant data flow, thus a more accurate machine learning model. Moreover, when the risk analysis engine of a bot manager cannot ensure whether the user is an actual human or a bot, the suspicious traffic is directed to a challenge-response of a captcha system.
Advanced CAPTCHAs can be used as stand-alone solutions to mitigate up to 98% of automated threats, and the benefit of bot managers over CAPTCHAs is minimal at best. In the field of cybersecurity, there is no 100% safety and in the endeavor to stop online fraud & abuse caused by automated programs, advanced CAPTCHAs can successfully break the business model of the fraudsters.
Industry experts hail machine learning as the most promising solution against automated threats. When the behavioral analysis is integrated into captcha, the challenge-response becomes a way to collect biometric data, instead of a cognitive challenge. This is a dramatic change for the logic of bot defense comparing to older generations of captchas.
The biometric data is used in the risk analysis engine to determine whether the behavior is human or machine, which allows challenges to be much harder to bots and a lot easier for all humans. This type of captchas is often referred to as “Advanced CAPTCHA”.
Advanced CAPTCHA is an anti-bot program that distinguishes itself from regular CAPTCHAs by utilizing a risk analysis engine -often based on behavioral analysis- over or within its challenge-response mechanism.
While the legacy captcha systems ensured security with the difficulty of the challenge-response, advanced captchas focused on behavioral characteristics of the traffic instead, providing higher security with less user-friction
Advanced CAPTCHA analyzes the behavioral and environmental factors to find non-human features within the traffic. If the risk analysis engine cannot ensure the user is human or a bot, then the suspicious traffic goes through a challenge-response mechanism.
The risk analysis engine helps to reduce the friction for some users while at the same time, it can detect and block automation or captcha hacking tools directly. When there is not enough evidence to make a precise judgement, the challenge-response comes to collect further evidence.
Subscribe to our newsletter