You are on page 1of 5

Web scraping and Content Summarising

In scope requirements

User Roles
1. Super Admin Hemant>Need to have three Roles 1.Super Admin
2. Content Producer 2.Content Admin who will also have ability to add new sites
3.Content Producer/Writer besides multiple users and some
users can have both role 2 & 3 clubbed

Landing Page
1. User will be shown login page on opening the website Hemant>Needs to have SSL

Login Page
1. Login Page will have UserName and Password
2. Remember me option will be provided
3. Reset password will be provided

Hemant>Will there be a display indicating accuracy/betterness indicator on


Logged Users Landing Page keywords so that it helps the user on choice of keywords to be input.Assuming
he can change on the fly
1. Content Producer Landing Page
a. Post Login user will be redirected to Home page
b. Home Page will have a search bar for keywords search where user will perform the web
scraping and summarizing the text Hemant>How about exporting into notepad,word and pdf
c. Option to visit User Profile Page to update their basic data
d. Link to see their previously summarized text and Keywords used

2. Super Admin
a. Post Login User will redirected to Dashboard
b. Link to Add new users to the system Hemant>Deactivate users too,active from,deactive date ,Remarks etc
c. Link to Add new websites and data extraction configurations
d. Link to Add new text summarizing apis
e. Text Summarization History & Report
f. Data extraction monitoring
i. API Responses
ii. Error Reporting
g. API configurations
Web scraping and Content Summarising

Search and Summarising the news for Content Producer

1. Search bar with option to Search by keywords


a. Search bar and a Submit button

2. Search Results
a. Results will be based web scraping from web sites configured under super admin page ,
those are enabled
b. Results will be shown with Title and description content with Source URL link
c. Results will have multiple selection checkbox options where in users can select multiple
combination of web scrapped content and can be summarized
d. At the end of the page , summarize button will be placed
e. Clicking the summarize button after selecting the content needs to be summarized the
user will be taken to the summarizing page. Hemant>Export to notepad,word & PDF to be there
Hemant>Final summary can we check for plagiarism
Web scraping and Content Summarising
Hemant>We also need to have unique
identifiers for outputs so that
Summarizing Page it could be referred

1. Page will have summarized content from different apis - Apis can be configured based
configurations in the backend
2. Option will be provided for users to save the content as draft and review it later
3. A view link will be provided to view the previously saved contents

User Profile
1. User can reset their password here
2. User can add basic details like First name and Last name

Super Admin - Add users


1. Super Admin will have ability to add new users to the system
2. Field like User name , first name and last name , password , email address will given for the
super admin
3. Super Users can edit the users to change their password
4. Super Users can delete the users as well a block the users Hemant> No physical deletion.Super Admin should also have reports
on links used,usage,no of summaries done etc by user etc..

Super Admin - Add new websites and XPATH configurations


1. This page will list all the available websites with edit/delete/disable options
2. Add link will be provided for users to add new websites
Web scraping and Content Summarising

3. Add Website - Form


a. Website Name
b. Website URL
Hemant>Content summarisation research on the best need to be
c. Title Xpath Configuration identified & mentioned as days go by.All have to agree on same.
d. Description XPath Configuration
e. Status - Enabled/Blocked

Super Admin Api Configurations


1. Ability to change api url and secret key ( Fields might vary based on vendors) Hemant>Not clear
Web scraping and Content Summarising

Hemant> Small Additions such as export,plagiarism check to be added

You might also like