You are on page 1of 3

MANGO DETAILS WEB SCRAPPING

Project

Group Members:
1. Imran Ahsanullah
2. Sheeraz Ali
SCOPE

With the addition of more and more data in the world of the internet, the importance of web
scraping is increasing. Many companies are now offering customized web scraping tools to their
clients in which they gather data from all over the world of the internet and arrange them into
useful and easily understandable data. It reduces the precious man-power to manually visit each
website and collect the data. Web Scrapers are designed and code for each and individual website
and crawlers do broad scraping. If the website has a complicated structure, more coding is required
to scrap its data as compared to a simple one. The Future of web scraping is indeed bright and it
will become more and more essential for every business with the passage of time.

Ten different mango websites would be scrapped. From where we will extract all the required
information that we have needed. Like: Extract name, description, sku, id, images, features,
options. Save to files: csv, xml, json,excel. Get solution to extract content from Mango website.
Extract the next Mango fields:

Field Types Description


name, sku, price,
description
quantity or If quantity is accessible we extract as is , if no we determine availability
availability and if item is available set quantity = 5 if no = 0
all images All images will be scraped and we will save them as urls.
features Each feature will extracted separately and will saved to appropriate
columns or tags
options (size, color Each combination with specific set of size or color will be saved
etc) correctly and all related images will be saved for such combination
categories with Extract full category path for each items and to get full hierarchy for
structure source catalog
WORKFLOW OF THE PROJECT

1. First of all we will select the scraping category that either we have to scrap the ecommerce
websites information or any businesses, schools information.
2. Select the different websites from where we have to extract the required information.
3. Analyze the website interface to make sure that how it would be scrapped
4. Write the code for extracting the information from different websites
a. There would be a different way/logic to extract the data from each website
5. Select Data Storage Type
a. Create a database and required tables
i. Add required columns and assign them proper data types based on the type
of data
ii. Insert the data in tables
b. Create a JSON file
c. Create a CSV File
6. Create the reports to display the stored/scrapped data in an informative way

DISTRIBUTION OF CLASSES PLUS ALL MAIN FEATURES OF OOP


1. The categories and subcategories would be distributed into classes and subclass. Where the
Main categories would be considered as the classes while the subcategories would be
treated as subclasses. Meanwhile, the subclasses would be derived from the main classes.
2. The abstraction would be used in order to display the required and informative information
in a well-structured and easily understandable format from the unstructured data.
3. Polymorphism would also be used by scrapping the updated data from the websites and
overriding/overloading the previously stored data in the required formats

ROLE OF GROUP MEMBERS


1. Sheraz Ali
a. Will select and analyze the websites
2. Imran Ahsanullah
a. Will write the code and scrap the websites in the required formats

You might also like