Welcome to Scribd!

Seminar Report

Uploaded by

0% found this document useful (0 votes)

14 views6 pages

This seminar report discusses web scraping. It introduces web scraping as a technique to automatically extract data from websites. It outlines some common uses of web scraping, including price monitoring, market research, news monitoring, and sentiment analysis. The report then describes techniques for web scraping, including DOM parsing and HTML parsing. It provides an overview of the procedure for web scraping using libraries like Requests, Beautiful Soup, and Pandas. Finally, it summarizes that web scraping is a useful technique for extracting data from websites and analyzing extracted information.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

14 views6 pages

Seminar Report

Uploaded by

kumarravi40402

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 6

Search inside document

Seminar report – 5th Semester

WEB SCRAPING

A Seminar Report

Submitted by

RAVI KUMAR
[20106107028]

in partial fulfilment for the award of the degree

Batchelor of Technology
IN
BRANCH OF STUDY
At

Department of Information Technology

Muzaffarpur Institute of Technology, Muzaffarpur
June 2023

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

ACKNOWLEDGEMENT

I want to thank particularly our seminar topic Sudhir Kumar for his support and encouragement

throughout the completion of this seminar topic and for having faith in us. I also want to wish to thank

Sudhir kumar for their continuing support and encouragement.

Ravi kumar
Roll No.: - 20IT31
University Reg. No.- 20106107028
Session: 2020-24
Sem.:- 5th

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

TABLE OF CONTENTS

1. INTRODUCTION

2. USES OF WEB SCRAPING

3. TECHNIQUES

4. PROCEDURE

5. SUMMARY

6. REFERENCES

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

INTRODUCTION

Web scraping is a technique to fetch data from websites. While surfing on the web, many websites don’t allow the
user to save data for personal use. One way is to manually copy-paste the data, which both tedious and time-
consuming. Web Scraping is the automation of the data extraction process from websites. This event is done with
the help of web scraping software known as web scrapers. They automatically load and extract data from the
websites based on user requirements. These can be custom built to work for one site or can be configured to work
with any website.

USES OF WEB SCRAPING

Web scraping finds many uses both at a professional and personal level. Having different needs at
different levels, some popular uses of web scraping are.

• Price Monitoring
• Market Research
• News Monitoring
• Sentiment Analysis
• Email Marketing

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

TECHNIQUES

Web Scraping is the process of automatically mining data or collecting information from

the World Wide Web. There are methods that some websites use to prevent web

scraping, such as detecting and disallowing bots from crawling (viewing) their pages. In

response, there are web scraping systems that rely on using techniques such as DOM

(Document Object Model), computer vision and natural language processing to simulate

human browsing to enable gathering web page content for offline parsing. Current web

scraping solutions range from the ad-hoc, requiring human effort, to fully automated

systems that can convert entire websites into structured information, with limitations.

• Human copy-and-paste

• Text pattern matching

• HTTP programming

• HTML parsing

• DOM parsing

PROCEDURE

The library of codes we can use for this project can:

• Requests Library

• Beautiful Soup Library

• Pandas

Dept. of IT, MIT Muzaffarpur

Seminar report – 5th Semester

SUMMARY

Web Scraping is an interesting and an extremely popular technique which proves itself to be

quite handy to learn. There are several other libraries apart from Beautiful Soup. Scrappy is

a very popular open-source web crawling framework that is also written in Python. It’s ideal

for web scraping and extracting data using API’s. Beautiful Soup is used to create a parse
tree and extract data from the HTML of a webpage.

REFERENCES

https://www.google.com
https://www.flipkart.com/

Dept. of IT, MIT Muzaffarpur

Web Scraping Ganesh
Document20 pages
Web Scraping Ganesh
Jeshwanth Kachhwa
0% (1)
Project Report PDF
Document61 pages
Project Report PDF
Nil Upadhyay
No ratings yet
UniSim Design Installation Guide
Document8 pages
UniSim Design Installation Guide
api-3750488
No ratings yet
Data Scraping.: Bachelor of Science (Information Technology)
Document32 pages
Data Scraping.: Bachelor of Science (Information Technology)
vilas kamble
No ratings yet
Mshop (Working)
Document47 pages
Mshop (Working)
shubhtiwari882j
No ratings yet
GR12
Document68 pages
GR12
sagarswain9753
No ratings yet
Minor Training Report
Document29 pages
Minor Training Report
ankitsaxena88
No ratings yet
It in Business
Document6 pages
It in Business
Sibtain Karbalai
No ratings yet
Final Rajat Mini Project
Document50 pages
Final Rajat Mini Project
Raman
No ratings yet
PPT
Document22 pages
PPT
anjali9myneni
No ratings yet
E-Commerce Review Scrapper: Python Mini Project On
Document15 pages
E-Commerce Review Scrapper: Python Mini Project On
dikshant Solanki
No ratings yet
145 Sharmeen Shaikh
Document97 pages
145 Sharmeen Shaikh
Preeti Sahu
No ratings yet
Fin Irjmets1653369654
Document4 pages
Fin Irjmets1653369654
SociaLinks Support
No ratings yet
Online Attendance Management System: College of It and Management Education Bhubaneswer
Document49 pages
Online Attendance Management System: College of It and Management Education Bhubaneswer
piyush singh
No ratings yet
Paper 4004
Document6 pages
Paper 4004
harshithachandrakanth13
No ratings yet
Anirudh Internship Report On Web Development Traning
Document31 pages
Anirudh Internship Report On Web Development Traning
Shanu Mishra
No ratings yet
AafaqueNazir Project
Document60 pages
AafaqueNazir Project
botplusnoob97
No ratings yet
Internship Report (Fix)
Document50 pages
Internship Report (Fix)
Naufal Raissatama
No ratings yet
E Magazine Documentation
Document27 pages
E Magazine Documentation
Om Sharma
No ratings yet
Banking Management System
Document43 pages
Banking Management System
Sonit Kumar
No ratings yet
FYP Proposal Alumni System of UOBS
Document8 pages
FYP Proposal Alumni System of UOBS
xixa
No ratings yet
Divya Report
Document25 pages
Divya Report
Rekha H
No ratings yet
Project Report On Student Admission
Document29 pages
Project Report On Student Admission
ramneet
100% (5)
ADBL Project Report (11,12,20)
Document6 pages
ADBL Project Report (11,12,20)
Avanish Tiwari
No ratings yet
Industrial Training Report Format
Document15 pages
Industrial Training Report Format
Chirag Sharma
No ratings yet
STE Micro-Project Report - Suraj
Document24 pages
STE Micro-Project Report - Suraj
Morris jonson
No ratings yet
Online Integrated Platform For Projects Submission
Document55 pages
Online Integrated Platform For Projects Submission
BHAVIKA
No ratings yet
Online Parking System
Document21 pages
Online Parking System
Future D
No ratings yet
WE 01 Introduction PDF
Document20 pages
WE 01 Introduction PDF
Pankaj Haritas
No ratings yet
Yuvan
Document42 pages
Yuvan
atkuri ajay kumar
No ratings yet
Ajp Notes
Document18 pages
Ajp Notes
Krunal Wani
No ratings yet
Online Railway Ticketing 1
Document32 pages
Online Railway Ticketing 1
Nishant Chauhan
No ratings yet
Online Learning Platforms MCA Siddharth Kush
Document64 pages
Online Learning Platforms MCA Siddharth Kush
Shakya Ajit Maurya
No ratings yet
Design and Implementation of A Software Result
Document13 pages
Design and Implementation of A Software Result
rosemaryjibril
No ratings yet
Project Report PDF
Document61 pages
Project Report PDF
Anushka
No ratings yet
Tours and Travels System: Project Report
Document61 pages
Tours and Travels System: Project Report
Anushka
100% (1)
Project Report PDF
Document61 pages
Project Report PDF
Anushka
No ratings yet
Project Report PDF
Document61 pages
Project Report PDF
Anushka
No ratings yet
Shardha Project
Document47 pages
Shardha Project
Prashant Chavan
No ratings yet
Report of Internship
Document28 pages
Report of Internship
Nikhat Shaikh
No ratings yet
AGP Report G2
Document16 pages
AGP Report G2
nw887059
No ratings yet
Daily Routine Notification.
Document24 pages
Daily Routine Notification.
Anushka
No ratings yet
Packers and Movers-1
Document62 pages
Packers and Movers-1
uckotsst karahahubus
No ratings yet
Synopsis "Zee Bank Atm System" Submitted by Archana Panwar: For The Award of The Degree of
Document30 pages
Synopsis "Zee Bank Atm System" Submitted by Archana Panwar: For The Award of The Degree of
Archana Panwar
No ratings yet
Online Admission Cell
Document6 pages
Online Admission Cell
IJRASETPublications
No ratings yet
Yang 2019 IOP Conf. Ser. - Mater. Sci. Eng. 490 062082
Document7 pages
Yang 2019 IOP Conf. Ser. - Mater. Sci. Eng. 490 062082
a
No ratings yet
Internship Report: A Report Submitted in Partial Fulfillment of The Requirements For The Award of Degree of
Document20 pages
Internship Report: A Report Submitted in Partial Fulfillment of The Requirements For The Award of Degree of
Nikita gudde
No ratings yet
Data Scraping
Document17 pages
Data Scraping
ADMINO GAMING
No ratings yet
It Is A Mini Project
Document34 pages
It Is A Mini Project
Mohit Dixit
No ratings yet
Sikkim Manipal University: Online Examination System
Document6 pages
Sikkim Manipal University: Online Examination System
KapilLad
No ratings yet
Detect Ephis Full
Document51 pages
Detect Ephis Full
Gokul krishnan
No ratings yet
College Management System
Document50 pages
College Management System
Govind Singh Parihar
50% (2)
College Admission Predictor: A Project Report
Document32 pages
College Admission Predictor: A Project Report
Atul Singh
100% (2)
College Complain & Requisition Automation
Document80 pages
College Complain & Requisition Automation
Veronica Gavan
No ratings yet
CPP Project Final
Document17 pages
CPP Project Final
Yash
No ratings yet
Seminar Report: "Campus Recruitment System
Document11 pages
Seminar Report: "Campus Recruitment System
Lucky prasad
No ratings yet
College Placement Management System
Document5 pages
College Placement Management System
MINAKSHI SINGH
No ratings yet
A Project ON "Repository & Search Engine For Alumni of University"
Document62 pages
A Project ON "Repository & Search Engine For Alumni of University"
tariquewali11
No ratings yet
PMS Project Report
Document45 pages
PMS Project Report
alamaurangjeb76
No ratings yet
PHP Mic Final
Document12 pages
PHP Mic Final
Prathmesh Pawar
No ratings yet
Handbook of Artificial Intelligence
From Everand
Handbook of Artificial Intelligence
Dumpala Shanthi
No ratings yet
DWG Aaa SWG Dist Aispd 1102 01
Document1 page
DWG Aaa SWG Dist Aispd 1102 01
Wael Othman
No ratings yet
Powertronic Installation Manual - Bajaj Rs 200 (2016-2019)
Document34 pages
Powertronic Installation Manual - Bajaj Rs 200 (2016-2019)
Gaurav
No ratings yet
Autocad Building Plan Drawing
Document2 pages
Autocad Building Plan Drawing
Jigme Tamang
No ratings yet
Methodological Standards in Single-Case Experimental Design
Document7 pages
Methodological Standards in Single-Case Experimental Design
Ana María Velásquez
No ratings yet
ALL DarkWeb Links - Updated by Harleyquinn
Document15 pages
ALL DarkWeb Links - Updated by Harleyquinn
Roman Grischuk
No ratings yet
Mango Exporters From Valsad
Document2 pages
Mango Exporters From Valsad
abc111007
No ratings yet
MGF and Softner Control
Document1 page
MGF and Softner Control
Sameer Ware
No ratings yet
Catalogue Bernecker - Pipe Clamps
Document15 pages
Catalogue Bernecker - Pipe Clamps
Junior Corrêa
No ratings yet
Strategy Formulation of Smart Logistics Development in A National Logistics - Company
Document11 pages
Strategy Formulation of Smart Logistics Development in A National Logistics - Company
Kartika Setyani
No ratings yet
Unit II Cloud Delivery Models
Document17 pages
Unit II Cloud Delivery Models
Rahul Borate
No ratings yet
UENR82810001
Document4 pages
UENR82810001
arris69pratama
No ratings yet
Ferns Under The Microscope
Document7 pages
Ferns Under The Microscope
Fadhlan Muchlas
No ratings yet
Design of Deep Reinforcement Learning Controller Through Data Assisted Model For Robotic Fish Speed Tracking
Document14 pages
Design of Deep Reinforcement Learning Controller Through Data Assisted Model For Robotic Fish Speed Tracking
hadi
No ratings yet
4 - Use of DeepFEM - Finite Element Analysis in DeepEX
Document36 pages
4 - Use of DeepFEM - Finite Element Analysis in DeepEX
George Haile
No ratings yet
Operating and Mounting Shut Off Valve Solenoid Valve: Manual Safety
Document5 pages
Operating and Mounting Shut Off Valve Solenoid Valve: Manual Safety
Aris Setyawan
No ratings yet
CERES Fact Sheet
Document4 pages
CERES Fact Sheet
enviro
No ratings yet
Current Transformers With 6.3Mm Fast-On Terminals: Description
Document2 pages
Current Transformers With 6.3Mm Fast-On Terminals: Description
Sri
No ratings yet
Salary - Survey - 2024 - 1704983051 2024-01-11 14 - 24 - 32
Document10 pages
Salary - Survey - 2024 - 1704983051 2024-01-11 14 - 24 - 32
HR UAE
100% (1)
AmiPur Survey Form Attachment
Document2 pages
AmiPur Survey Form Attachment
Sushant Sharma
No ratings yet
Ato BLDC Motors Catalog
Document10 pages
Ato BLDC Motors Catalog
Yang Ying
No ratings yet
09 Web Attacks PDF
Document79 pages
09 Web Attacks PDF
angelhb
No ratings yet
Photo Homework Solver
Document7 pages
Photo Homework Solver
afnaejbzmabqew
100% (1)
Education Experience: - Software Engineer
Document1 page
Education Experience: - Software Engineer
Kanhaiya Verma
No ratings yet
10.4.3 Lab - Using Wireshark To Examine TCP and Udp Captures
Document13 pages
10.4.3 Lab - Using Wireshark To Examine TCP and Udp Captures
Alaa Laabidi
No ratings yet
Knauf Type X Fire Rated GW TX 1
Document2 pages
Knauf Type X Fire Rated GW TX 1
Asoka Kumarasiri Jayawardana
No ratings yet
Atc 105
Document4 pages
Atc 105
Siva Sankara Narayanan Subramanian
No ratings yet
Assessment of Polyurethane Nanofiber With Nikel As Terahertz Metamaterial and Strain Sensor
Document4 pages
Assessment of Polyurethane Nanofiber With Nikel As Terahertz Metamaterial and Strain Sensor
Nazmul Islam
No ratings yet
Everest Air Conditioning Company
Document11 pages
Everest Air Conditioning Company
Adorabel Limpahan Singco Lpt
No ratings yet
Control Area Network: Introduction To CAN
Document11 pages
Control Area Network: Introduction To CAN
Prashant Kadukar
No ratings yet