09 Apr 2020 • 10 min read
09 Apr 2020 • 10 min read
Bad Bots are used maliciously to distribute spam, conduct DDOS attacks, operate as malware command and control infrastructure, or flood public forums with fraudulent commentary.
Looking at the history of ever-evolving bot threats and anti-bot solutions, there are three major points moving forward with the future development of CAPTCHA technology; security, user experience, and privacy.
In order to evaluate the security of an advanced captcha solution, we must first look at the captcha hacking landscape and understand the various threats and angles of attack.
Know your enemy and know yourself, in 100 battles, you will never peril
- Sun Tzu, Art of War
There are many ways to crack or bypass first and second-generation captchas; however, when it comes to the security of an advanced captcha, there are three major threats.
AI has brought a lot of buzz to the cybersecurity industry in recent years, and today, due to the widely available and open source machine learning and OCR tools, they are also widely adopted by cybercriminals. Advanced Optical Character Recognition (OCR) and machine learning (ML) technologies have caused a fundamental shift in the logic of CAPTCHA technology, and providing cognitive challenges became futile as these technologies make the captcha we know obsolete.
Any form of challenge that humans can solve, a machine learning model can be trained to solve it better.
Also referred to as headless browsers, allows the execution of a full version of the browser while controlling it programmatically. Meaning that these tools can run without the graphical user interface(GUI).
Browser automation tools allow bot programs to appear more human-like, and they are extremely difficult to detect and prevent.
A captcha solving farm refers to automated captcha recognition services where captchas are directed -through an API- to human workers to be remotely solved. This approach exploits the fundamental logic of captcha, which is to distinguish automated computer programs from genuine humans.
Even though neither captcha-solving farms nor the act of captcha solving is illegal by law, it is evident that captcha-solving farms are used for illicit purposes by cybercriminals.
All security solutions have more than one trick in their bag, advanced captchas are no different; many approaches have been tested and applied to certain degrees of success over the years. However, to counter the most advanced and sophisticated threats, the list comes down to behavioral detection, environment detection, and a dynamic approach to security.
Behavioral detection using machine learning is hailed as one of the most promising solutions against automated threats by industry experts and one of the high-potential use cases for improving cybersecurity.
The integration of behavioral analysis into captcha allows challenges to be a way to collect biometric data instead of a cognitive challenge. The biometric data is used in the risk analysis engine to determine whether the behavior is human or machine, which is a dramatic change for the logic of bot defense comparing to older generations of captchas.
This means, merely using ML and OCR to crack the challenge is not enough. An automated program has to not only crack the challenge but also do so while perfectly mimicking human behavior. Generating biometric data that is genuinely human to pass the risk analysis engine -though possible- still introduces enough limitations to prevent “a successful bot attack.”
The environment detection refers to the information retrieved from the visitors’ computer environment, such as the hardware specification, screen size, browser type, and version, etc.
Through the sophistication of environment detection techniques, browser automation tools -a vital tool for automated attacks- can be accurately identified. At the same time, the front-end encryption and dynamic honeypot allow the effective detection of API hacking features. When the environment detection is combined with origin detection, it can spot and block captcha solving farms, securing the potency and the integrity of captcha against these powerful hacking tools.
No matter how advanced and sophisticated a security solution may be, if it's static in nature, it will eventually be breached. Not only the future of captcha technology but also the cybersecurity solutions lie within their capability to adapt themselves to emerging threats with quick succession.
A static defense will be breached unless it can adapt itself to emerging threats instantaneously.
AI-powered bad bots can change their attack patterns till they discover weaknesses in the defenses. Even though behavioral and environmental detection systems are the top defense solutions against sophisticated bad bots, if their back-end engines are not dynamic in nature to effectively detect and evolve against rapidly changing attack patterns, they will be deemed futile—Especially facing against the AI-powered bad bots.
The sophistication of the machine learning models behind the risk analysis engines of advanced captchas makes the difference when it comes to adapting to emerging threats and changing attack patterns.
For example, the Convolutional Graph Neural Network (CGNN) used by GeeTest CAPTCHA is a proven model to detect highly sophisticated bots with excellent accuracy. The CGNN model, powered by massive amounts of biometric data, can accurately identify non-human features within requests. If an emerging threat is detected anywhere across the network, the model evolves itself instantaneously to recognize and block the threats across the entire grid, securing the network under its protection.
CAPTCHA is an interactive security approach to detecting and mitigating bad bot threats. In the era of user experience where user friction directly translates to the success of business operations through conversion rates and revenue, the user experience is a significant differentiating factor for advanced captcha solutions.
Advanced captcha solutions that focus on the challenge difficulty for security and challenge users with cognitive tasks such as image recognition or image orientation is a primitive and high-friction method.
Relying on the difficulty of the challenge-response to ensure security means high user friction, which diminishes the business imperative of the security solution. At the same time, advancing AI technologies makes cognitive challenges futile against sophisticated bad bots.
By emphasizing security through the risk analysis engine that relies on behavioral and environmental information, the challenge-response effectively turns into a way to collect biometric data, which is used to detect bot features. Therefore, the challenge-response can be an effortless interaction with minimal user friction, such as sliding a button. While legitimate users face minimum to no friction, malicious automated programs are effectively filtered by the sophisticated risk analysis engine running on the back-end, providing a smooth user experience without sacrificing security.
1.6 seconds on average is all it takes to pass a GeeTest Slide Captcha
Data privacy is defined as having an understanding and consent over how your sensitive and personally identifiable information is collected, used, stored, and shared. It’s the right of an individual to be free from uninvited surveillance and crucial for one to exist safely and express opinions freely.
But, what if privacy becomes a trade-off for increased security? The debate between security versus privacy is a decade-old one, yet we all can agree that none of us likes to be intrusively tracked.
When it comes to captcha technology, privacy concerns are raised about whether the data collected by the captcha system can be used to tell which specific human you are.
Such privacy concerns surrounding no captcha ReCaptcha have been raised by Marcos Perona, AdTruth's lead engineer, who found that the no captcha ReCaptcha isn't overtly labeled as a Google service, yet anyone clicking through it "consents" to be tracked by Google's cookies. The combination of first-party cookies and a browser fingerprint can be tied back to an individual, and most individuals simply clicking "I'm not a robot" won't know this is happening behind the scenes.
Due to the increasingly sophisticated bad bot threats, the options to mitigate such advanced threats is somewhat limited, and behavioral analysis is hailed as the most promising solution. Most advanced captchas introduced a risk analysis engine based on behavioral and environmental data, since, at this stage, it is a necessity.
Does the data collected for behavioral analysis threaten privacy? Such data alone is simply insufficient to tell which specific human is behind a request and cannot be used for tracking an individual. Thus, it is safe to say that advanced captcha solutions alone do not pose a threat to users' privacy.
Moving forward to the future, the endless cat and mouse game between attackers and defenders will only get intensified as the stakes get higher. At present, with the support of enterprise-grade resources, the utilization of sophisticated security methods, the ingenuity of innovation, and a dedicated mission to keep the internet safe and trusted, the good guys are winning the war.
The key challenge that stands for the advanced captcha solutions is to ensure maximum security with a user-experience first approach, which requires not only ingenuity in innovation but also a change in the fundamental logic behind the captcha technology.
Instead of difficult challenges, establishing security through sophisticated behavioral and environment detection techniques along with a dynamic and adaptive defense approach is at the core of this change.
The recipe for a great advanced captcha solution requires more than a new security approach. At GeeTest we believe that only when our sophisticated security is combined with seamless user experience and respect for the right of privacy, then we can achieve a truly great captcha solution.
Subscribe to our newsletter