You are on page 1of 11

Web Scraping using Python

Annexure I
Part A – Plan
“ Web Scraping Using Python “

1.0 Aim / Benefits of the Micro-Project:-


 The main aim of the project is to acquire then on-tabular or poorly structured data from
websites and convert it into a usable, structured format, such as a .csv file or spreadsheet.
 To learn the uses of web scraping and techniques to perform this web scraping process may
improve our data collection and analysis process.
 To learn different python libraries ,and learn how to use this libraries in our project.

2.0 Course Out comes Addressed:-


 Display message on screen using python script on IDE

3.0 Purposed Methodology :-

 Define Project Objectives: Clearly outline the goals and objectives of the web scraping project.
Determine the specific data you need to extract and how you plan to use it.
 Identify Target Websites: Identify the websites from which you want to scrape data. Ensure that
scraping is allowed by reviewing the website's terms of service and robots.txt file.
 Choose Tools and Libraries: Select the appropriate tools and libraries for web scraping.
Commonly used libraries in Python include Beautiful Soup, Scrapy, and Selenium. Choose the one
that best fits your project requirements.
 Inspect Website Structure: Use browser developer tools to inspect the structure of the target
website. Identify the HTML elements containing the data you want to extract and determine their
CSS selectors or XPaths.
 Develop Scraping Code: Write Python code to navigate to the target website, locate the desired
HTML elements, and extract the data. Use the selected library to parse the HTML content and
retrieve the relevant information.
 Handle Pagination and Dynamic Content: If the data is spread across multiple pages or
requires interaction (e.g., clicking buttons), implement logic to handle pagination and dynamic
content. This may involve iterating through pages or simulating user interactions with the Selenium
library.
 Implement Error Handling: Implement robust error handling to handle exceptions and errors
that may occur during the scraping process. This includes handling network errors, timeouts, and
unexpected changes to the website structure .
 Test and Validate: Test the scraping code on a small scale to ensure it functions as expected.
Validate the extracted data to verify its accuracy and completeness .
 Run the Scraper: Once testing is successful, run the scraper to extract data from the target
website(s). Monitor the scraping process to ensure it runs smoothly and without interruptions.

GOVERNMENT POLYTECHNIC ACHALPUR Page 1


Web Scraping using Python

 Data Processing and Storage: Process the extracted data as needed, such as cleaning,
formatting, or transforming it into a usable format. Store the data in a suitable location, such as a
local file, database, or cloud storage service.
 Schedule and Automate: If the scraping process needs to be repeated regularly, consider
scheduling it to run automatically at predefined intervals. Use cron jobs or task schedulers to
automate the scraping process.
 Monitor and Maintain: Monitor the scraping process regularly to ensure it continues to function
properly. Update the scraping code as needed to adapt to changes in the website's structure or
behavior

4 . 0 Action Plan:

1 Selection of the 15/02/2024 15/10/2023 Shaikh


topic Altamash
[1hrs]

2 Research about 16/02/2024 16/02/2024 Syed kamran


the resources.
[2hrs]

GOVERNMENT POLYTECHNIC ACHALPUR Page 2


Web Scraping using Python

3 Collection 17/02/2024 17/02/24 Mohammad


of kashifl
[2hrs]
Resources
Creating logic to
make the program 18/02/2024 Arpit tatte
4 18/02/2024
[2hrs]

5 Writing the program 19/02/2024 19/02/2024 Sheikh Altamash


[2hrs]

6 Executed program 20/02/2024 20/02/2024 Mohammad


kashif
[2hrs]
Fixed Bugs
7 And error 22/02/2023 22/02/2024 Afifullah khan
[2hrs]

8 Wrote the Report 23/02/2024[ 23/02/2024 Shaikh


2Hr] Altamash

Syed kamran
9. Submitted the Sheikh
06/04/24 06/04/24
[2Hr] altamash
Micro project
Mohammad kashif

afifullah khan

Arpit tatte

5.0 Resources Used:-

Sr. Resources Specifications Qty Remarks


No. Required

1. Computer System 8GBRamand i5 processor 1

2. MS Word Latest 1

3. IDE VS Code 1

GOVERNMENT POLYTECHNIC ACHALPUR Page 3


Web Scraping using Python

4. Browser Chrome 1

 NAME OS THE TEAMS MEMBERS WITH ROLL NUMBERS .

Syed Kamran ( 62 )
Mohammad Kashif Faraz ( 59 )
Shaikh Altamash ( 68 )
Afifullah Khan ( 61 )
Arpit Tatte ( 51 )
Ayush Khajone ( 66 )

GOVERNMENT POLYTECHNIC ACHALPUR Page 4


Web Scraping using Python

Annexure-II
Part B – Plan
“ Web Scraping Using Python “

0.1 Rationale:-

Webscrapingisanautomatedmethodusedtoextractlargeamountsofdatafromwebsites.Python
community has come up with some pretty powerful web scraping tools. The Internet hosts

GOVERNMENT POLYTECHNIC ACHALPUR Page 5


Web Scraping using Python

perhaps the greatest source of information on the planet. Many disciplines, such as data science,
business intelligence, and investigative reporting, can benefit enormously from collecting and
analyzingdatafromwebsites.Pythonispowerfulprogramminglanguageandithasefficienthigh- level
data structures that is useful for data scraping.

1 Aim / Benefits of the Micro-Project:-


 The main aim of the project is to acquire then on-tabular or poorly structured data from
websites and convert it into a usable, structured format, such as a .csv file or spreadsheet.
 To learn the uses of web scraping and techniques to perform this web scraping process may
improve our data collection and analysis process.
 To learn different python libraries ,and learn how to use this libraries in our project.

2.0 Course Outcome Achieved:-

 Display message on screen using python script on IDE

3.0 Purposed Mothodology:-

1. Define Project Objectives


2. Identify Target Website
3. Choose Tools and Libraries
4. Inspect Website Structure
5. Develop Scraping code
6. Handle Pagination and Dynamic Content
7. Run the Scraper
8. Data Processing and Storage
9. Schedule and Automate
10. Monitor and Maintain

4.0 Referenced Websites:-


 https://nanonets.com/blog/web-scraping-with-python-tutorial/
 https://nanonets.com/blog/web-scraping-with-python-tutorial/
 https://nanonets.com/blog/web-scraping-with-python-tutorial/

GOVERNMENT POLYTECHNIC ACHALPUR Page 6


Web Scraping using Python

4.0Actual Resourse Used:-

Sr. Resources Specifications Qty Remarks


No. Required

1. Computer System 8GBRamand i5 processor 1

2. MS Word Latest 1

3. IDE VS Code 1

4. Browser Chrome 1

5.0Output:-

GOVERNMENT POLYTECHNIC ACHALPUR Page 7


Web Scraping using Python

GOVERNMENT POLYTECHNIC ACHALPUR Page 8


Web Scraping using Python

GOVERNMENT POLYTECHNIC ACHALPUR Page 9


Web Scraping using Python

6.0 Skills Developed / Learning Outcomes of the Micro-Project:-

 We use this study to improve our web scraping process, and we discovered that most of the
web scrapers are often quite similar and general in nature designed to carry out generic and
easy jobs.
 Able to extract web site data to Excel sheet.
 We learn how to use python basics in our project.

7.0 Applications of this Micro-Project:-


 Webscrapingiswidelyutilizedforavarietyofpurposes,includingcomparingpricesonline,
observingchangesinweatherdata,websitechangedetection,research,integratingdatafrom
multiplesources,extractingoffersanddiscounts,scrapingjobpostingsinformationfromjob
portals, brand monitoring, and market analysis.

GOVERNMENT POLYTECHNIC ACHALPUR Page 10


Web Scraping using Python

9.0 Source Code:-


import requests
from bs4 import BeautifulSoup
req = requests.get("https://www.geeksforgeeks.org/")
soup = BeautifulSoup(req.content,'html.parser')
print(soup.prettify)

GOVERNMENT POLYTECHNIC ACHALPUR Page 11

You might also like