You are on page 1of 4

https://research.aimultiple.

com/web-scraping-ethics/

Introduction
If you are scraping the web, you’ve probably already seen how it benefited
your business. If your website is being scraped, then you may be angry with
bots exhausting your traffic and your information benefiting others. Web
scraping often sounds like it only benefits the scraping tools. So it raises
questions: Is it legal? Can your specific use case violate the rules? Even if
legal, is it considered ethical? Would it bring bad reputation to your
business?
In this article, we will give you a short summary of major web scraping
lawsuits, the latest legal status by country and common do’s and don’ts of
web scraping to use it in a legal and ethical way as much as possible.
Please note that this article is for informational purposes and should not be
taken as legal advice.

1. First things first: Is web scraping legal?


Short answer is, yes. Scraping publicly available information on the web in
an automated way is legal as long as the scraped data is not used for any
harmful purpose or directly attacking the scraped website’s business or
operations. A disclaimer comes forward if the scraped data is personally
identifiable information (PII). There are data protection regulations around
PII in many countries, the major ones being GDPR in EU and CCPA in
California. There are no federal regulations about that in the US yet, but
combination of different laws and state-level regulations often protect PII at
federal level. Therefore, it is important not to scrape personally identifiable
information or even if scraped, businesses can mask and protect it with
data enhancing technologies.

2. History of major web scraping lawsuits


Despite being legal on the surface, being scraped is not desired by
companies. If these platforms can show that being scraped by a bot
damages their infrastructure or operations, then that activity may be found
illegal by the court. Here, we collected the most significant lawsuits where
the court sided with the scraped website. Businesses should keep in mind
that without an overarching law, similar cases to below may not result with
the same order given that each one is evaluated on a case by case basis.

1. eBay vs Bidder’s Edge Case: One of the earliest publicly known


web scraping lawsuit was opened by eBay on EBidger, an online
price comparison website for consumers in 2000. The court order was
preventing Bidger’s Edge to scrape eBay content again. The main
argument eBay won over was that Bidger’s Edge exhausting their
system and others following Bidger’s Edge could cause more harm to
eBay’s system.
2. Facebook vs Power Ventures Case: In 2009, Facebook sued
Power Ventures for scraping content from its websites that its users
uploaded. This set example for a case where web scraping was
evaluated from intellectual property standpoint. The court sided with
Facebook and ordered a fiscal penalty for Power Ventures.
3. Linkedin vs hiQ Labs Case: The most recent and unsettled web
scraping case started in 2019. Linkedin sued hiQ Labs, a data
analytics company that scraped publicly available profiles for a
professional skill analysis. First order by a regional court found hiQ
Labs guilty, but Linkedin applied Supreme Court, which ruled for a
revision of the case in June 2021. This case could be quite influential
for the future of web scraping in the US, because the initial order of
“legalizing it” did not get a straight approval from the most powerful
court of the country.

3. Latest regulations of Web Scraping by Country


United States: There are no federal laws against web scraping in the
United States as long as the scraped data is publicly available and the
scraping activity does not harm the website being scraped. There is one
specific act from 2016 against purchasing an excessive number of tickets at
once using bots to prevent black markets.

European Union and the UK: EU recently has passed Digital Services


Act, which aims to bring all EU countries under Digital Single
Market sharing same regulations. According to Article 3 and 4 of this
regulation, “reproduction of publicly available content” is not illegal. This
regulation approaches the topic more from intellectual property point of
view, and needless to say, would find any web scraping involving personal
data illegal due to GDPR. Apart from it, the situation is similar to the US in
EU markets and the UK.

China: Within sources in English, there is no direct regulation against web


scraping in China too. Similar to other countries, it seems like web scraping
is used in China for business use cases as well and it is not legal to scrape
and process personal data.

4. Dos and Don’ts of Legal and Ethical Web Scraping


From legal standpoint, one question businesses should ask themselves is
whether their scraping act harm the scraped website. If the scraping activity
is too intense which can interrupt the services of the scraped website or the
scraped data is used in a way to duplicate the activity or the service of that
website, then even though regulations don’t exist, the website would have
grounds to file a lawsuit against the scraper.

From an ethical standpoint, given that web scraping already has many use
cases and professional providers in the market, we can claim that there is
no shame in using web scraping for business purposes. There are technical
best practices that will ease the traffic load on the scraped website, such as
using APIs rather than web scraping, when available. As long as you find a
trusted web crawler to work with or make sure your technical resources
take these into consideration, you can defend your web scraping being
ethical for your business purposes.

Dos:

 Scrape only the data you need by determining the exact business
case and customizing your web crawler technology for it. This will
minimize your risk of exhausting the scraped website with unwanted
traffic.
 Always read the terms of use of the scraped website. Apart from
commercial terms of use, websites also have a hidden technical file
that is called robot.txt file which includes information about the
permissions of the scraped website. Your web crawling solution or
technical experts should help you with leveraging that in the process.
 Be transparent
about your web scraping use if you need to address it in public. As
long as you generate more value out of the scraped data rather than
duplicating it, there is no shame.

Don’ts:

 Do not exhaust the scraped website with too often and extensive


pulls. This will also increase the likelihood that your crawler will be
blocked by the scraped website.
 Do not collect personally identifiable information or if you obtain
permission by the robot.txt to collect it, make sure to mask the data to
minimize exposure at processing.
 Do not expose the scraped data to public. Make sure that it is
stored securely just like your own company data. You never know for
what purposes it may be used if leaked.

Sponsored:
If you partner with a service provider for web scraping, make sure to
leverage their technical expertise and legal experience. For example, Bright
Data dedicates a compliance officer to their customers to make sure they
don’t have any questions in mind about the legal processes of web scraping
along the way.

Sourc
e: Bright Data

Further Reading:
Check out our articles to learn more about best practices and challenges of
web scraping:

 Web Scraping Tools: Data-driven Benchmarking in 2023


 Roadmap to Web Scraping: Use Cases, Challenges & Tools
 The Ultimate Guide to Web Scraping Challenges & Best Practices

If you want to have more in-depth knowledge about web scraping,


download our whitepaper:
Get Web Scraping Whitepaper

If you believe that your business may benefit from a web scraping
solution, check our list of web crawlers to find the best vendor for you.
For guidance to choose the right tool, reach out to us:
Find the Right Vendors

This article was drafted by former AIMultiple industry analyst Bengüsu


Özcan.

You might also like