Professional Documents
Culture Documents
Abhishek S. 199278080
Anugu Sathya Sai 199278016
Hemina Joy 199278035
Monisha K. 199278020
Sohini De 199278062
NATURAL NATURAL LANGUAGE PROCESSING (NLP) IS A SUBFIELD OF LINGUISTICS,
LANGUAGE COMPUTER SCIENCE, AND ARTIFICIAL INTELLIGENCE CONCERNED WITH THE
INTERACTIONS BETWEEN COMPUTERS AND HUMAN LANGUAGE, IT DEALS
PROCESSIN WITH PROGRAMMING COMPUTERS TO PROCESS AND ANALYZE LARGE
AMOUNTS OF NATURAL LANGUAGE DATA.
G
VISION
To p r e d i c t t h e c u s t o m e r c h u r n r a t e
p r o b a b i l i t y i n O T T p l a tf o r m s u s i n g N L P
MISSION
To e x p l o r e a d v a n c e d N L P f e a t u r e e n g i n e e r i n g
t e c h n i q u e s w i t h s p e c i fi c f o c u s o n i t s a p p l i c a ti o n
i n O T T p l a tf o r m s
OBJECTIVES
• To r e d u c e t h e c h u r n r a t e s o f t h e c u s t o m e r s
l e a v i n g t h e O T T p l a tf o r m s
• To i d e n ti f y t h e n u m b e r o f c u s t o m e r s w h o a r e
l i k e l y t o s u b s c r i b e b a c k o r l e a v e t h e p l a tf o r m
based on their usage
• To u n d e r s t a n d t h e w o r k i n g o f s e n ti m e n t
a n a l y s i s & i t s a p p l i c a ti o n s
3
S C O P E S TAT E M E N T
To predict the probability of customer retention and customer churn on the basis on sentiment analysis performed by Natural Language
Processing.
P R O J E C T J U S T I F I C AT I O N
• OTT (Over-the-top) platforms like Netflix, Amazon Prime etc. depend on the subscription and feedback on customers as a source of review.
• The feedback of consumers can be used to identify the satisfaction of consumer. Coupled with other factors like the digital screen time,
number of missed hits, frequency of visit of customer we can predict the possibility of customer retention or customer churn.
PROJECT SCOPE
JUDGE CREDIT WORTHINESS, CUSTOMER DATA SHARED BY CUSTOMERS CAN BE FURTHER TECHNOLOGY CAN BE USED FOR PREDICTION OF
SATISFACTION LEVEL, LIKELIHOOD OF CUSTOMER USED TO CORRECTLY MAKE PREDICTIONS FOR THE CUSTOMER INTEREST FOR SOCIAL MEDIA
ATTRITION AND GROWTH OF OTT PLATFORMS CUSTOMER ADVERTISEMENTS , INSURANCE PREMIUMS
THROUGH DATA ON PATIENT RECORDS
4
SOFTWARE REQUIREMENTS SPECIFICATION
Infrastructure Requirement
Deliverables • 16 GB ram
• Spam Filtering and identification of authentic reviews • Processor: i7 5th generation or above
• Sentiment Analysis – Positive, negative or neutral • OS: Windows / Linux
• Text classification into specific groups for comparative evaluation • Storage: 1TB or above
• Sentence similarity to identify similar customers & improve prediction • GPU(to perform parallelized matrix operations):
• Generate training dataset for machine learning for predictive analytics GeForce GTX 1070
• Python 3.0LTK
• NLP libraries of Python
• Hadoop
F U N C T I O N A L S T A G E S O F P R O J E C T • Amazon Cloud
Text Pre Processing Feature Engineering Topic Modelling Predictive Modelling Model Tuning
• Text Cleaning • Syntactic Parsing • Latent Semantic • Featuring • Random Forest
• Text • POS Tagging Analysis Engineering • K – Fold cross
Standardization • TF-IDF • Latent Dirichlet Techniques validation
• Text Normalization • Word Embedding Allocation • Basic Feature
• Non-negative Engineering
Matrix • Advanced
Factorization Feature
• Hierarchical Engineering
Dirichlet Process
5
SOFTWARE REQUIREMENTS SPECIFICATION
N O N - F U N C T I O N A L R E Q U I R E M E N T S
Scalability Security
• The NLP model should be able to handle more and • Customer usage data is very sensitive and hence the
more data with time as usage data will be keep infrastructure we use for storing their data should
flowing in at a rapid rate be highly secure without any vulnerability
Capacity Maintainability
• The database and the cloud storage used must be • Once the models are built and deployed,
large enough to accommodate all the data required maintaining and running them will be a simple
for the project process which will not require the presence of a ML
expert every time
6
STATEMENT OF WORK
A B O U T P R O J E C T
Head of Innovation Development Team Content User Database Sales and Marketing
at the firm with expertise in AI Management team Management Team team
and ML 7
Work Breakdown Structure
PHASES OF THE PROJECT
Data collection
Stemming and TF-IDF Model building
lemmatisation
Latent Semantic
Word Cloud Text Cleaning Syntactic Parsing Random Forest Light GBM
Analysis
Cleaning
7 weeks Text Parsing Results
4 weeks Interpretation
and END
Visualization
Feature
Normalization
Enhancement
6 Weeks 3 weeks 5 weeks
25 WEEKS
PROJECT COST BASELINE
Miscellaneous 2,00
Cleansing 1 5
Remove special characters 1 4
Decoding Data 1 1 2 1 8
Split Attached Words 1 2
Slang Lookup 1 3
Conversion to Lowercase 1 1 1 2 4
Normalization 1 3 1 7
Integer to word conversion and expanding 1 2
contractions 2
Spell check 1 1 1
Grammer Check/ Language check 1 3 4
Lemmatization 1 1 2 4
Stemming 1 3 2 6
Word Standarization 1 4 4
Tokenization 1 4 3 4 9
N - Grams 1 5
Term Frequency 1 2
Inverse Document Frequency 1 3
Term Frequency- Inverse Document
Frequency (TF-IDF) 1 3 7
Bag of Words 1 1 2 5
USE CASES Use Case ID : Use Case Name :
UC - 1 Customer Sentiment
Rationale Data is the new oil. To gain competitive advantage, it becomes necessary to make use of the data available with the
organization. Analyzing the usage metrics like the number of visits in a month, screen time and choice of offerings
watched can all help in offering better services to the user and retaining them
Precondition All the required users are registered on the database. Their usage is logged everytime they are using the services.
s Fast and efficient storage of these details.
Basic course 1. The user accesses the OTT via mobile application or web application
of events 2. The visit is logged on to the user database.
3. The user watches the content of his choice.
4. The corresponding activity is recorded in the database against the user
5. The user records are analyzed and insights are extracted which can help in understanding the users better
Alternative Look for paid third party research and surveys available on websites like Capital line, Accent DATA, IVC Research,
paths etc.
Post- The user is logged in to the account and doesn’t share the account credentials with others.
conditions
USE CASES Use Case ID : Use Case Name :
UC – 3 Upgrading Opportunity
Alternative Randomly target customers in batches without spending resources on this analysis phase
paths
Post- The user is logged in to the account and doesn’t share the account credentials with others and provides regular
conditions feedback
USE CASES Use Case ID : Use Case Name :
UC – 4 Content Addition
Rationale Customer searches for content on the platform. Sometimes these contents searched for are not available. When
several customers are looking for the same content and when it is not available, we can add those to the platform.
• Machines can fail to comprehend the context of text unless properly and carefully trained.
• We should develop systems that read and understand text the way a person does
A M B I G U I T Y
O T H E R
C H A L L E N G E S
• Create a vision to
achieve our required • Regular review of
Problem Identification:
outcome from the the mile stones
Generate an NLP
algorithm to detect user project i.e. designing • Execution of the
an NLP algorithm deployment plan • Providing feedback
sentiments, usage trends
and consensus
and missing content of
• Identifying possible • Determining the continuously
the OTT platform
resistance and phase wise
determine a plan to deployment and • Validating by
Factors:
checking the churn
• Requirement resolve resistance executing the same
rate and whether
Gathering
• user sentiments are
• Team selection Creation of a project
moving towards
• Data Storage deployment plan
positive side
ACTION ROLL-OUT PLAN
Requirement Design & Testing &
Development Go-Live After Go-Live
Gathering Infrastructure Implementation
( 2 weeks) (2 weeks) (4 weeks) (3 weeks) (1 week) (2 weeks)
Model Tuning
22
References
• Bob Hughes, Mike Cotterell, “Software Project Management”, Third
Edition, Tata McGraw Hill, 2004.
• Ramesh, Gopalaswamy, "Managing Global Projects", Tata McGraw
Hill, 2001.
• Royce, “Software Project Management”, Pearson Education, 1999.
• Jalote, “Software Project Management in Practice”, Pearson
Education, 2002.
THANK YOU
Group 11