14 May 2020 • 10 min read
14 May 2020 • 10 min read
Advancing computer technologies allow bad bots to reach farther on the web, and as they find value to be extracted through automation, they strike harder.
Captcha hacking has never been easier with the intelligent tools that are widely available today. Although using a security solution may give a sense of security, the false sensation is likely to be short-lived, endangering your online business along the way.
Is captcha dead? Not even by a long shot, captcha is a necessity for stopping malicious automation (bad bots), however in order to understand “what makes a captcha secure?”, you must first understand what the captcha hacking landscape looks like.
There are five major techniques that fraudsters utilize when hacking a CAPTCHA system;
These techniques can be used based on the target CAPTCHA system or in combination with one another to further increase the sophistication of the bot programs, which then can be used for large-scale fraud operations.
Machine learning empowers programmers to find optimized solutions for specific tasks or problems. ML achieves this by using sample data to build a mathematical model, which can conduct the mission without explicit instructions.
CAPTCHA, a static mechanism by nature, is a perfect problem to be solved by an ML algorithm. Cybercriminals can create a dataset from the targetted captcha challenge that is readily available on the internet to train a model with a supervised learning algorithm. With enough data and good model architecture, the model can achieve high enough accuracy.
Researchers from Lancaster University, Northwest University, and Peking University used GAN (Generative Adversarial Network) -a type of deep neural network- to create a captcha solver for text-based captcha schemes. The model only required 500 images to train and could crack text-based captchas with 80% accuracy. The researchers applied their model on 33 text-based CAPTCHA solutions, including 32 of the world’s most popular websites such as Google, Microsoft, eBay, Wikipedia, Baidu, and many others.
It is evident that text-based CAPTCHAs are not a viable option on today’s internet. ML algorithms can also be used successfully to crack other forms of interactive CAPTCHA as well; however, it is not a silver bullet, and there are ways to prevent ML-based threats, which we will discuss in a bit.
This type of attack is often directed towards less sophisticated or in-house captcha solutions that are implemented by smaller websites.
The attackers can collect the entire database of questions/images via a simple script, then get the questions answered or images labeled through a 3rd party service. Therefore, the attackers will have answers to the entire question bank, and the captcha will be deemed useless.
Another method is to simply let the bot program try and pass the captchas randomly. Every time the program is successful, it can take note of the correct answer for that question, the program will eventually acquire the entire database of questions with the correct answers, deeming the captcha useless.
API (Application Programming Interface) is a set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other services.
API hacking in the context of CAPTCHAs refers to returning fake responses to the challenges -often through a js injection- in an attempt to fool the back-end system.
When legitimate users interact with CAPTCHAs in the front end, the data is retrieved and processed based on specific rules. This data can pass the verification because back-end programs sense that the data has been processed with the correct rules. The rules are the logic that decides the interaction between the front-end and back-end, including the data type, format, and encryption.
Captcha solving farm refers to automated captcha recognition services where captchas are directed -through an API- to human workers who solve captchas remotely.
Since CAPTCHA challenges are designed to determine whether the user behind the request is human, captcha-solving farms are perhaps the most legit way to bypass captcha systems. Today, these services can be acquired for approximately 1.5$ per 1000 captchas solved.
When the scale of attacks is concerned, captcha farms may not be profitable for brute force type of attacks, however, when leaked user credentials from a data breach is concerned, captcha solving farms pose great danger. Although detecting captcha-solving farms is not impossible, it requires quite the sophistication from the captcha providers’ side.
Also referred to as headless browsers, allows the execution of a full version of the browser while controlling it programmatically. Meaning that these tools can run without the graphical user interface(GUI).
Browser automation is not a direct captcha hacking technique per se, but rather it is a powerful tool that enables bot programs to appear more human-like and makes bots extremely difficult to be detected by bot detection systems.
CAPTCHA hacking methods are abundant, and tools of hacking are easily accessible. With these readily available tools and techniques, traditional captchas are easily bypassed, making the sites vulnerable to malicious automated attacks.
However, with the introduction of Advanced CAPTCHAs, the era of captcha is far from over. When integrated with a back-end engine, the possibilities for captcha sophistication are far and wide. We can observe the most prominent of those sophistication possibilities under three main categories;
Environment detection refers to the information retrieved from the users’ computer environment, such as the hardware specification, various devices, screen size, browser properties, and version, etc.
Using elaborate machine learning models for advanced analysis, the environmental information can be used to detect browser automation tools accurately. By mitigating browser automation tools from the arsenal of hackers, an advanced captcha solution can significantly limit the ability of hackers to stay under the radar and scale their fraudulent operations.
While strong front-end encryption and dynamic honeypot can mitigate the threat of API hacking, sophisticated origin detection techniques can pinpoint requests from captcha farms. (link; captcha farm)
A sophisticated environmental detection is a must-have for any advanced captcha to stay relevant in today’s bot detection and mitigation market, however, it cannot ensure security by itself only.
The integration of behavioral analysis into captcha allows challenges to be less about the ‘correct’ answer and more about ‘the method’ of acquiring the answer.
Biometric data generated through the user’s interaction with the captcha module is used in the risk analysis engine to determine whether the behavior belongs to a human or a machine. This is a dramatic change for the logic of bot defense comparing to older generations of captchas and a crucial feature for any relevant advanced captcha solutions.
A biometric classification model within a captcha model means that merely using ML and OCR to crack the challenge is not enough. An automated program has to not only crack the challenge but also do so while perfectly mimicking human behavior. Generating biometric data that is genuinely human to pass the risk analysis engine -though possible- still introduces enough limitations to prevent “a successful bot attack” from taking place.
With the amount of information, a captcha module can receive from the user’s interaction and the environment is simply not enough to answer the question “who is the person behind this request?” therefore;
an advanced captcha module alone does not pose a threat to user’s privacy.
However, if the data retrieved from the user’s environment and interaction with the module is combined with more data -through shared cookies- by the same entity, then it may pose a threat to the user's privacy. As a standalone security solution, however, advanced captchas do not pose a threat to user privacy.
A defense static in nature will be breached no matter how sophisticated it may be.
Once a CAPTCHA is presented to a user, the image used within the challenge becomes public. This means hackers can use these images to train a machine learning model or use them for reverse library type of attacks. Therefore, images used within the challenges can pose a threat to the security of the captcha.
By continuously updating the resource pool and encrypting the images used within the challenges, advanced captchas can prevent reverse-library and brute force type of attacks, significantly increasing the cost of attempting an attack.
There is a wide array of methods used for CAPTCHA hacking that are easily accessible and readily available. While most legacy captcha systems are utterly ineffective against bad bots, some solutions such as Google’s ReCaptcha can slow down the speed of attacks at the cost of very high friction to your legitimate users. These solutions, though sufficient for a small website or a personal blog, are utterly ineffective when it comes to preventing bot fraud and the related financial losses from harming an online business.
For any online business that has sensitive login/sign-up pages, various form submissions, or uses gift card type promotions, a monetary value can be extracted using automation, which means they will be targeted by maliciously automated programs (a.k.a bad bots).
To prevent malicious automation from harming your online business, at Geetest, we have developed a CAPTCHA solution based on cutting-edge AI technologies and deep learning models to not only guarantee the security of your business against malicious automation but also provide an unmatched user-experience to maximize the business imperative of a bot defense solution.
Geetest is the world’s leading enterprise-grade truly AI-powered CAPTCHA solution that is protecting over 290,000 online businesses worldwide.
Subscribe to our newsletter