geetest_logo

The confrontation between cyber attackers and defenders is becoming increasingly fierce along with the rapid evolution of online technology. CAPTCHA plays a crucial role for defenders in this confrontation in terms of increasing the difficulty of online fraud, managing bot traffic, and offering risk management support, etc.


You will learn more about CAPTCHA invention and evolution, CAPTCHA creating and solving technology, and GeeTest CAPTCHA invention and implementation in this article.


1. CAPTCHA's Invention and Evolution



CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and Humans Apart) is also known as HIP (for human interaction proof). The original form of CAPTCHA as shown below is text-based, which fully utilized human reading-comprehension ability. It was first designed and patented in 1997.

1.1 Early-Days CAPTCHA Application

In 1999, Slashdot, a social news website, released an online poll asking which was the best computer science graduate school in the US. As the poll system back then was not rigorous enough, only IP addresses of voters were recorded in order to prevent single users from voting more than once. Students at Carnegie Mellon and MIT wrote programs to stuff the ballots for their schools, which might be the earliest electoral fraud as far as I know.



Yahoo and Luis von Ahn, usually known as the inventor of CAPTCHA, developed a word-based CAPTCHA called EZ Gimpy. It was later deployed on Yahoo's email registration page to prevent registration bots from creating countless new accounts.



Apart from the cases mentioned above, CAPTCHA was also used for cases like anti-crawler, spam, brute force attacks, search engine, etc.


1.2 CAPTCHA's Competitor

Soon after text-based CAPTCHA was invented, there came its first competitor, OCR (for optical character recognition). You will learn more about two mainstream OCR methods (pattern recognition and machine learning ) in the following part.


In 2003, Greg Mori and his team applied an improved version of shape context matching to Yahoo EZ Gimpy's dataset and identified the word in an EZ-Gimpy image with a success rate of 93%. However, EZ Gimpy's dataset was far easier than today's CAPTCHA challenge, as it only contained 516 words from a small dictionary. In fact, that's why Mori and his team got a high recognition rate with EZ Gimpy images.



In 2005, Kumar Chellapilla used CNN (Convolutional Neural Network) model for CAPTCHA recognition based on single character identification. Through 7 controlled experiments, they found that CNN model can recognize the twisted characters far more efficiently than humans.


1.3 CAPTCHA Evolution

Ever since CAPTCHA was invented and widely applied, more complex text-based CAPTCHA and other forms of verification have been coming out endlessly, which can be categorized into two trends:

① more challenge types and

② adding behavior data.


The image below shows three versions of Google reCAPTCHA. The first version is text-based CAPTCHA. It uses two distorted words as a CAPTCHA challenge. The key technology in this version previously belonged to a distributed manual recognition digital text collection project at CMU. Only one of the words is the real challenge (reCAPTCHA knows the answer), the other words even can not be recognized by reCAPTCHA's system. The system assumes that if a user correctly identifies the first word, there is a high probability that he or she can also identify the other. With a large number of user answers about the same word, the system can digitize plenty of printed texts that are difficult to recognize with ORC technology. This version has not been available since March 31, 2018.



The second version is more prevalent, which includes two types of challenges: the "select from 9 images" and the "I'm not a robot" checkbox. When the user clicks the checkbox, some browser data and user behavior data will be sent to reCAPTCHA. If it is difficult to judge based on the above data or if the user is regarded as a risky user, "select from 9 images" will pop up for further verification.


The third version removes all types of CAPTCHA challenges with interface and interaction. It becomes an icon in the corner representing the privacy protocol. After a website owner deploys it into the system, its JS code continuously collects user behavior data for user risk scoring (returning a floating score of 0 to 1, with lower scores representing lower risk).


From the evolution of reCAPTCHA, it started from text-based CAPTCHA, and then upgraded to user behavior-based checkbox & image verification, and eventually, it solely relies on user behavior data. It is not hard to notice that trend of CAPTCHA over the past 20 years has been exploring more intuitive verification forms and managing risk based on multi-dimensional data.


1.4 New CAPTCHA Exploration

Since the security of text-based CAPTCHAs is at risk and users keep complain about poor user experience, people from the industry and academia are actively exploring more user-friendly and secure forms of authentication.


Dice Captcha, shown below on the left, was designed by Dice Captcha in 2010. Compared to text-based CAPTCHA, this form is more user-friendly and intuitive and makes the CAPTCHA challenge even a bit entertaining. However, Dice Captcha has a relatively low-security capability and can be brute-force broken, so it is not widely used. And the CAPTCHA shown on the right exists only in papers, with high security but poor user experience.



DotCHA, the CAPTCHA in the figure below, was proposed in 2019 and only can be seen in papers and demos. It used dynamic dots that scattered in 3D space to form shapes of characters for users to recognize.



Besides the above CAPTCHA forms, there are various forms of CAPTCHA, such as sensor-based CAPTCHA, minigame CAPTCHA, which shows that when designing a new CAPTCHA, the developers need to take both ease of use and security into consideration.


1.5 Behavior Analysis CAPTCHA

In addition to exploring better forms of verification, CAPTCHA developers are trying to use more multi-dimensional data to improve verification accuracy. A great example of that is behavior analysis CAPTCHA.


The image below on the left shows the combination of image or puzzle-based CAPTCHA and behavior analysis. The image on the right is what we call "intelligent CAPTCHA " that offers a better user experience by using user's behavior data for initial risk verification, which is the similar technology that the third version of reCAPTCHA adopts.


Apart from user's behavior data, most CAPTCHAs currently on the market also collect device data and network environment data, etc. to assist the verification.


2. Prevent Cybercrime with CAPTCHA


2.1 CAPTCHA's Role in Preventing Cybercrime

Things have changed now. We can not simply match one of CAPTCHA's functions with a specific scenario. The forms and techniques of cybercrime are becoming diverse, therefore, CAPTCHA needs to be more systematic than before, such as raising the attacker's cost, managing malicious traffic, and offering risk verification support.


  • Raise the attacker's cost: CAPTCHA, as an essential part of website deployment, can be installed on web pages like sign in, reset the password, order placement, and comment, etc. to prevent threats like brute force attacks and credential stuffing, thereby raising attackers cost and difficulty.


  • Manage malicious traffic: Many Internet companies have or are building risk control systems based on their actual business situation. With pre risk judgment from the risk control system, companies could adopt CAPTCHAs, easy or hard, to deal with the risk, improve user experience and reduce misjudgment.


  • Risk verification support: The behavior data collected by CAPTCHA enrich the dimension of the risk control system and provide more perspectives and pieces of evidence for the judgment. For example, slide CAPTCHA collects the user's mouse trajectory, and image CAPTCHA collects the user's mouse click data.


2.2 How does CAPTCHA Prevent Cybercrime?

The nature of the CAPTCHA product in the market is actually a web application that relies on the HTTP protocol. The basic operation process is as follows.


① Deploy a CAPTCHA program based on code such as Java or JS in the webpage.


② After being triggered by a certain logic, the page initializes the program and communicates with the back-end, and then loads resources to complete the rendering of CAPTCHA, and finally, waits for user interaction.


③After user interaction, CAPTCHA sends the data to the back-end for comprehensive analysis.


The following figure shows the network requests involved in the GeeTest slide CAPTCHA, including requests for JS, CSS, images, and other file resources.



In the above process, there is a possible threat of CAPTCHA being cracked while getting the answer and submitting the answer. To be specific, attackers can use techniques, such as computer vision, machine learning algorithms, or CAPTCHA database lookup, etc., to get the CAPTCHA answers. After that, they use tools to send the answer to the back-end interface of CAPTCHA in the form of the HTTP request. There are various tools for submitting answers, which can be simply divided into two categories: simulator submission and interface submission. Simulators include browser simulators for PC, cell phone simulators for mobile, etc.


2.2.1 Types of Cyber Threats

According to the above analysis, it's easy to find out current threats to CAPTCHA are: automatic solver and manual solver.


The automatic solver can be divided into two types according to their team size and whether they make a profit from solving CAPTCHA:

  • CAPTCHA solving platforms
  • CAPTCHA bypass scripts


According to their purpose, there are two types of CAPTCHA solving platforms:

  • Image recognition platform
  • Automatic solving platform


Users upload CAPTCHA images to the image recognition platform and get the answer, such as image missing spot, the location of characters. etc.


The automatic solving platform, on the other hand, takes over the whole process, and users don't have to do anything or know any technology.



Take the sneaker bot industry as an example, which integrates various functions such as registration, login, verification, and order placement, etc. When comes to CAPTCHA solving, there are various ways in the sneaker bot industry. It has a large scale as well as clear classification standards.



2.2.2 CAPTCHA's Security Capabilities


  • Web Infrastructure Security

The nature of CAPTCHA is actually a web application, which requires front and back-end cooperation. Therefore, to ensure the security of CAPTCHA, the website needs to


  • Protect the end-side business logic with JavaScript/Java SDK obfuscation reinforcement


  • Ensure the security and reliability of data and reduce the possibility of data tampering and forgery by means of data encryption transmission, etc.


  • Ensure the robustness of the front and back-end logic to prevent vulnerability exploitation and SCA, and ensure host security of the back-end server.


  • CAPTCHA Forms Security

Security of CAPTCHA forms directly shows CAPTCHA's security and is also a focus of academics.


Firstly, according to CAPTCHA's definition, the design of the challenge should be based on the difference of human and current AI capabilities, and CAPTCHA uses the challenges to block bot traffic. When bots can solve the CAPTCHA challenge, in turn, the related AI technology is also updated. However, when designing AI challenges, developers should also consider practical applications.


Secondly, security also comes from the information and time differences; all new forms of CAPTCHAs are secure at the beginning. And as attackers delve deeper and computer technology evolves, any form of CAPTCHA security will decay or even fail. So CAPTCHA security is a constantly improving process in dynamic confrontation.


Finally, for brute force attacks and poorly generalized recognition models, a large number of challenge datasets will ensure higher security capability.


  • Data Security Strategy

Detect risks with data such as device environment data, network data, or behavior data of general or specific challenges. Data mining and processing capabilities for other dimensions will also lead to higher security capability.


  • Operation Security Strategy

In the process of developing and deploying CAPTCHA, what people deal with is not the bots, but the attackers who control the bots. It is a battle between humans. Therefore, the operation of CAPTCHA products is also important for their security capability. During the operation of CAPTCHA, people can activated CAPTCHA functions like real-time blocking and marking risky visitors. At the same time, CAPTCHA also supports dynamic challenge dataset update based on automatic feedback which raises the attacker's cost through regular development and maintenance.


3. The Future of CAPTCHA


GeeTest CAPTCHA v4 - Adaptive CAPTCHA


The following part describes the technological factors that may affect the security of CAPTCHA.


  • AI development: In the short run, CAPTCHA challenge design will still focus on the difference between humans and AI. However, AI models are now performing far better than humans on many individual tasks. CAPTCHA challenge design may turn to a new and comprehensive direction such as [semantic challenge + images], [semantic challenge + behaviors], etc.


  • Hardware popularization: There are many innovative CAPTCHA challenges like CAPTCHAs using device motion sensors which is not practical enough now, but it will have a more important impact in the future with the development and popularization of hardware.




  • Privacy protection: Privacy protection and user data-based risk control seem to be opposites, therefore, using limited data to distinguish between humans and bots is an important trend in the future.



  • Social value: CAPTCHA was originally designed as a tool to develop AI technology. When the value of data is increasing, CAPTCHA may play a crucial role in data annotation and so.




Visit here to learn more about GeeTest CAPTCHAs.


Looking for a cyber security job worldwide? Check here.



Start your free trial
Over 320,000 websites and mobile apps worldwide are protected by GeeTest captcha
author

GeeTest

GeeTest

ad_img