29 Dec 2023 • 10 min read
29 Dec 2023 • 10 min read
Following our initial discussion on GeeTest's foray into AI-generated content (AIGC), this article delves deeper into the technological strides and real-world applications we've achieved. We're witnessing a transformative phase in text-to-image technology, where transforming text into vivid images isn't just a simple conversion—it's an intricate blend of text and visuals, pushing digital imagery's boundaries. Our focus today is on the Stable Diffusion model and its pivotal role in evolving image and image-based CAPTCHA systems.
Stable Diffusion (SD) is a state-of-the-art generative AI model classified under diffusion models in deep learning. Designed to generate data closely resembling its training data, Stable Diffusion specializes in image processing. It is acclaimed for efficiently generating and modifying images, making it a standout in text-to-image technology. This efficiency, coupled with its open-source nature, has garnered widespread interest in the technology community.
Stable Diffusion, as a latent diffusion model, revolutionizes image processing by compressing images into a significantly smaller latent space, instead of operating in the traditional high-dimensional image space. This approach boosts the model's speed and efficiency.
The capabilities of Stable Diffusion are diverse. It includes text-to-image generation, image-to-image translation, and image enhancement tasks like super-resolution and colorization. It utilizes a Variational Autoencoder (VAE) comprising an encoder for compressing the image into the latent space and a decoder for reconstructing the image from this compressed form. We will showcase it later.
In terms of its diffusion process, Stable Diffusion employs both forward and reverse diffusion techniques. Forward diffusion involves gradually adding noise to an image until it becomes random noise. Reverse diffusion, conversely, involves starting with this noise and iteratively removing it to create an image. All these diffusion processes occur in the latent space during training. Instead of corrupting an image with noise in the image space, Stable Diffusion corrupts the representation of the image in the latent space with latent noise. This process is faster due to the smaller size of the latent space.
Conditioning plays a crucial role in how Stable Diffusion converts text into images. It involves steering the noise predictor to produce the desired outcome based on the text prompt. For example, when given prompts like "paradise," "cosmic," or "beach," the model generates images that visually align with these concepts, creating scenes with elements like clear skies or vast beaches. This innovative process allows Stable Diffusion to interpret and visualize textual descriptions effectively.
Here is how Stable Diffusion actually works in the text-to-image process.
As depicted, by feeding both the initial pure noise vector and the subsequently denoised latent vector into the Image Decoder, we can discern the stark contrast in the resulting images. The sequence reveals that the pure noise vector, devoid of meaningful content, translates into an image comprised solely of noise. Conversely, the latent vector, having undergone 50 iterations of denoising, incorporates semantic information, leading to an image that effectively embodies this semantic content.
The adoption of the SD model for CAPTCHA generation has significantly bolstered the security of these systems. SD's advanced latent diffusion techniques enable the production of complex verification images, overcoming the common vulnerabilities and inefficiencies of traditional CAPTCHAs.
The SD model introduces sophisticated visual effects like shadow text, challenging for AI recognition systems yet discernible to humans. Utilizing ControlNet, SD manipulates light and shadow to create image-based CAPTCHAs with deliberately vague and distorted characters, effectively confusing automated image recognition models.
For instance, Chinese characters like "冰" (ice), "拿" (take), and "铁" (iron), crafted with shadow effects from environmental elements, remain clear to human users while stumping image recognition algorithms with atypical character formation.
Similarly, characters such as "曲奇" (cookie), "黑森林" (black forest), "果冻" (jelly), and "蓝莓" (blueberry) are easily distinguishable by people against noisy backdrops but are often misinterpreted by image recognition models. By weaving shadow art into the base image and introducing errors in shadow placement, overlap, and alignment, SD generates a high rate of AI recognition failure — 99.74% in tests involving 5,000 shadow images.
This approach not only maintains accuracy for human users but also significantly increases the difficulty for bots, enhancing CAPTCHA security beyond traditional character warping and background interference methods.
SD's advanced technology not only fortifies CAPTCHAs against automated attacks but also enhances their realism and aesthetic appeal. These CAPTCHAs, distinguished by their vibrant colors and sharp resolution, substantially improve the user experience.
GeeTest's integration of SD below exemplifies how shadow text can be merged into engaging images, finely tuned to balance security and usability.
Wu Yuan, CEO of GeeTest, emphasizes the design challenge of CAPTCHAs: they must prevent bot invasions without degrading the user experience. Adopting SD for the processing of character-based icon CAPTCHAs has proven popular among users. The resultant lively and clearer images enable users to complete verifications swiftly, reducing the time to just three seconds—significantly less than that required by traditional CAPTCHAs.
SD integration has revolutionized CAPTCHA design, phasing out the need for manual image creation. Inputting a text prompt into SD quickly produces intricate validation images, greatly reducing time and labor. GeeTest's CAPTCHA V4 introduces an automated image update system, enhancing security against brute-force attacks and improving image generation speed by 30%.
This integration proves highly effective, with SD surpassing traditional methods in security, efficiency, and user experience. It significantly speeds up CAPTCHA image production, boosting system responsiveness. By combining SD and Generative Adversarial Networks (GANs), the resulting CAPTCHAs are resilient against advanced cracking tactics, marking a leap forward in bot detection and prevention strategies.
The SD model redefines image generation as a diffusion process that progressively eliminates noise. Beginning with random Gaussian noise, it methodically removes noise through training until the image is noise-free, ultimately producing visuals that closely mirror textual prompts. However, this denoising is resource-intensive, particularly for high-resolution image production, posing challenges in the efficient allocation of computational resources and in scaling GPU utilization.
To address these challenges, we've identified three strategic objectives:
In response, we've developed a model service architecture utilizing Ray and Kubernetes (K8s):
This framework enables the deployment of a model service with a lean codebase, substantially curtailing both memory usage and computational expenses.
Additionally, for the collective management and generation of CAPTCHA image sets, we've crafted a suite of functional interfaces around ray.serve and the SD model's framework. These interfaces are dedicated to managing the prompt database and streamlining the automated pipeline production of images.
The open-source Stable Diffusion model, a standout in latent diffusion technology, eclipses competitors like DALL·E and Midjourney with its rapid development and versatility. Its integration across various platforms and access to numerous pre-trained models highlight its adaptability. The community’s active engagement has propelled SD to the forefront of diverse image generation.
SD's innovation extends beyond image creation to revolutionize human-computer interaction. Utilizing latent diffusion and Generative Adversarial Networks, it excels in producing complex, realistic CAPTCHA images, enhancing security and user experience. This advancement positions SD to bring transformative changes in digital security across industries, marking an exciting era of technological evolution.
GeeTest
GeeTest
Subscribe to our newsletter