Professional Documents
Culture Documents
on
DATA SCRAPING
Submitted by
SUDHANSHU SHEKHAR
In partial fulfillment of the requirements for the award of the degree of Bachelor
of Technology
in
APRIL 2024
DEPARTMENT OF COMPUTER SCIENCE AND TECHNOLOGY
GITA Autonomous College Bhubaneswar
Badaraghunathpur, Madanpur, Bhubaneswar
CERTIFICATE
1.0 INTRODUCTION
3.0 APPLICATIONS
7.0 COCNLUSIONS
Data scraping, also known as web scraping or data extraction, refers to the
automated process of extracting data from websites and other online
sources. It involves using software tools or programming scripts to access
web pages, retrieve specific information, and store it in a structured format
for further analysis or use.
Data scraping techniques vary depending on the type of data and the
structure of the website. Common methods include:
Data scraping possesses several characteristics that define its nature and
functionality These characteristics collectively contribute to the
effectiveness and utility of data scraping as a tool for accessing, analysing,
and leveraging data from online sources for various purposes.
1. Web Scraping: This type involves extracting data from web pages by parsing
through the HTML code. Web scraping tools can navigate through websites,
locate specific elements, and extract desired information such as text, images,
links, and more.
2. API Scraping: Many websites and online platforms provide access to their data
through APIs (Application Programming Interfaces). API scraping involves
making requests to these APIs and retrieving structured data in a standardized
format, such as JSON or XML.
3. Screen Scraping: Screen scraping techniques are used to capture data displayed
within graphical user interfaces (GUIs), such as desktop applications, web
browsers, or mobile apps. Screen scraping tools can simulate user interactions
and extract data from the screen.
4. Text Scraping: Text scraping focuses on extracting textual information from
documents, PDF files, or unstructured text sources. Text scraping techniques
involve parsing through text documents to identify and extract relevant data
based on predefined criteria.
5. Image Scraping: Image scraping involves extracting data from images or
graphics, such as text within images, metadata, or visual patterns. Image scraping
tools can analyse images using techniques like Optical Character Recognition
(OCR) to extract textual information.
6. Social Media Scraping: Social media scraping involves extracting data from
social media platforms, such as Facebook, Twitter, LinkedIn, and Instagram. This
type of scraping can gather various types of data, including user profiles, posts,
comments, likes, and shares.
7. E-commerce Scraping: E-commerce scraping focuses on extracting data from
online retail platforms, such as product listings, prices, descriptions, reviews, and
ratings. E-commerce scraping enables price monitoring, competitive analysis,
and market research.
8. Financial Scraping: Financial scraping involves extracting data from financial
websites, stock exchanges, and market data providers. This type of scraping can
gather financial data, stock prices, market indices, economic indicators, and news
relevant to financial analysis and investment decisions.
9. Geospatial Scraping: Geospatial scraping involves extracting data from
geographic information systems (GIS), maps, and spatial databases. This type of
scraping can gather spatial data, coordinates, locations, addresses, and other
geospatial information.
10. Structured Data Scraping: Structured data scraping focuses on extracting data
from databases, spreadsheets, or other structured data sources. This type of
scraping can automate the process of data extraction from structured formats
and databases, enabling efficient data retrieval and analysis.
APPLICATIONS OF DATA SCRAPING
1. Garijo, F., Gil, Y., Corcho, O., & García-Cuesta, E. (2014). Data Scraping:
A New Paradigm for Open Data Consumption. In The Semantic Web:
Trends and Challenges (pp. 582-596). Springer, Cham. Link
2. Mochalov, P., Lavrenov, A., & Philippov, A. (2020). Web Scraping:
From Data Collection to Data Engineering. In Proceedings of the 1st
International Workshop on Big Data and Computing Health (pp. 18-
25). Link
3. Jatowt, A., Lai, A., & Yoshikawa, M. (2014). Challenges and
Methodologies in Mining and Extracting Information from the Web.
In Web and Wireless Geographical Information Systems (pp. 1-12).
Springer, Cham. Link
4. Mayer-Schönberger, V., & Cukier, K. (2013). Big Data: A Revolution
That Will Transform How We Live, Work, and Think. Houghton
Mifflin Harcourt.
5. Lim, K. H. (2019). Web Scraping with Python: Collecting More Data
from the Modern Web. No Starch Press.
6. Mitchell, R. (2015). Web Scraping with Python: Learn Web Scraping
with Python In A DAY! Create Your Own Scaper Using BeautifulSoup,
And Python. Amazon Digital Services LLC.
7. Lawson, R., & Nadeau, L. (2015). Web Scraping with Python. Packt
Publishing Ltd.