Web scraping is legal: how do websites limit bot scraping now?

15781

21 Apr 2022 • 10 min read

Web scraping is legal: how do websites limit bot scraping now?

21 Apr 2022 • 10 min read

Scraping publicly accessible data is now legal, according to the U.S. Ninth Circuit of Appeals ruling. The landmark ruling immediately sparked the public’s privacy and security concerns, and online businesses will face rising data insecurity and potential customer loss in the near future.

“Helpless” Organizations against web scraping

In May 2017, LinkedIn sent a letter to hiQ, asking it to stop unauthorized access and data scraping. Later, hiQ filed a lawsuit in the U.S. District Court for the Northern District of California, claiming that LinkedIn’s actions violated its freedom of speech. Since then, LinkedIn has been on a 5-year lawsuit, which is the case of "hiQ Labs Inc v. LinkedIn Corporation".

The case was sent to the U.S. Supreme Court in 2021, but eventually, it was returned to the Ninth Circuit Court of Appeals for re-examination. On April 18, 2022, in a second ruling, the court reaffirmed its original decision that web scraping is legal and said that scraping public data on the Internet does not violate the US Computer Fraud and Abuse Act (CFAA). LinkedIn spokesman Greg Snapper said in a statement that "We are disappointed by the court's decision, which is only a preliminary ruling and the case is far from over."

Obviously, when facing web scraping attacks, even LinkedIn, which is a Silicon Valley giant, still cannot protect itself legally. There is no doubt that more US companies like LinkedIn will encounter the same problem in the near future.

Rising web scraping attack

According to Akamai, nearly 40% of global Internet traffic is occupied by crawlers. In the second quarter of 2021, the number of account abuse attacks worldwide reached 70 billion, a year-on-year increase of 15%. What is even more worrying is that malicious logins with a daily peak of more than 1 billion are becoming more frequent. According to Check Point Research, the number of cyberattacks on businesses per week will increase by 50% in 2021 compared to 2020.

Check Point Research

In addition to the increasing scale of bot scraping, the forms of this kind of attack are also becoming more diverse. Credential stuffing, credit card fraud, inventory hoarding, scalper bots, and gift card fraud all become common forms of attacks. In addition, the bot scraping attack launched will cause bandwidth resource consumption to the enterprise, thus occupying a lot of server resources. If the server does not reserve additional business concurrency, it will affect normal business, slow down the page speed, or even crash the server.

In today's fierce competition on the Internet, bot scraping has become the primary threat to enterprises' online assets.

Anti-scraping technique - CAPTCHA

The fight against web scraping has never stopped. In our daily life, anti-scraping techniques are everywhere. For example, when browsing a website or opening an app, we often encounter CAPTCHAs. In most cases, CAPTCHAs will appear in sessions like registration, login, or in the process of accessing the page, for example, the small window pops up during the process of watching live videos, playing games, etc. What is behind these is the battle between the enterprise and scraping bots. As a sharp tool to distinguish humans from bots, CAPTCHA has become one of the most common solutions.

Today, so-called crawlers can be seen everywhere, lowering the attack threshold for attackers. In the face of threats caused by bot scraping, enterprises face three challenges:

Data scraping from competitors

User privacy data leakage and account theft

Slight disturbance to users, or even zero disturbance

Therefore, in the face of the increasing bot scraping, how to balance user experience and data security at the same time and ensure the continuous development of the business has become a major challenge for enterprises.

GeeTest always pursues a balance between security and user experience. GeeTest Adaptive CAPTCHA adopts 7-layer active dynamic protection and makes bot attacks invalid through constant changes. There are as many as 4,374 changes in a unit cycle, which increases attackers’ costs.

In terms of user experience, at the moment of clicking CAPTCHA, GeeTest will start to process more than 200 single-dimensional policy checks. It only takes 1.4s for users to pass the verification on average, and GeeTest offers invisible CAPTCHA as well, which detects bots with zero user friction.

Try demo now!

GeeTest Adaptive CAPTCHA helps companies deal with complex and ever-changing bot threats, assists in monitoring web traffic, and effectively mitigates suspicious traffic. With good user experience and efficient and stable security capabilities, GeeTest CAPTCHA has been widely used in leading enterprises in various industries.

Final words

At a time when online competition is becoming increasingly fierce, the competition for online assets is bound to intensify, and there will be more interest frictions around web scraping.

It is necessary to balance the relationship between commercial interests and fair competition. On the one hand, the data accumulated by online platforms and hard-won resources are worthy of protection, but they should not be excessively protected and form a monopoly; business interests should not override public interests. They should respect users’ choice, open exchange, sharing of data, and data security. For enterprises, when the definition of data protection is ambiguous or even not within the scope of legal protection, a powerful anti-scraping tool will become the last hope for enterprises.

As a security service provider with about 10 years of experience in the fight against cyberattacks, GeeTest has become the common choice of 320,000 outstanding companies around the world, providing more than 1.6 billion security protections every day to protect the security of online assets of enterprises.

Start your free trial

Over 320,000 websites and mobile apps worldwide are protected by GeeTest captcha

Get started

Learn more

Hayley Hong

Content Marketing @ GeeTest

Subscribe to our newsletter

Learn how to prevent DDoS attacks with modern strategies and tools to protect your website from bot traffic, downtime, and service disruption.

Fraud Prevention

Protect Your Website from DDoS Attacks: Best Practices for 2025

Learn how to prevent DDoS attacks with modern strategies and tools to protect your website from bot traffic, downtime, and service disruption.

GeeTest • 2 min read

Discover the essentials of Business Rules Engine (BRE), including the core benefits, types, and operations. Uncover how GeeTest BRDE enhances rule management.

Botpedia

A Comprehensive Guide to Business Rules Engine (BRE)

Discover the essentials of Business Rules Engine (BRE), including the core benefits, types, and operations. Uncover how GeeTest BRDE enhances rule management.

GeeTest • 2 min read