You are on page 1of 24

PREDICTION OF CHURN PROBABILITY IN

OTT PLATFORM USING NLP


GROUP 11

Abhishek S. 199278080
Anugu Sathya Sai 199278016
Hemina Joy 199278035
Monisha K. 199278020
Sohini De 199278062
NATURAL NATURAL LANGUAGE PROCESSING (NLP) IS A SUBFIELD OF LINGUISTICS,
LANGUAGE COMPUTER SCIENCE, AND ARTIFICIAL INTELLIGENCE CONCERNED WITH THE
INTERACTIONS BETWEEN COMPUTERS AND HUMAN LANGUAGE, IT DEALS
PROCESSIN WITH PROGRAMMING COMPUTERS TO PROCESS AND ANALYZE LARGE
AMOUNTS OF NATURAL LANGUAGE DATA.
G
VISION

To p r e d i c t t h e c u s t o m e r c h u r n r a t e
p r o b a b i l i t y i n O T T p l a tf o r m s u s i n g N L P

MISSION
To e x p l o r e a d v a n c e d N L P f e a t u r e e n g i n e e r i n g
t e c h n i q u e s w i t h s p e c i fi c f o c u s o n i t s a p p l i c a ti o n
i n O T T p l a tf o r m s

OBJECTIVES
• To r e d u c e t h e c h u r n r a t e s o f t h e c u s t o m e r s
l e a v i n g t h e O T T p l a tf o r m s
• To i d e n ti f y t h e n u m b e r o f c u s t o m e r s w h o a r e
l i k e l y t o s u b s c r i b e b a c k o r l e a v e t h e p l a tf o r m
based on their usage
• To u n d e r s t a n d t h e w o r k i n g o f s e n ti m e n t
a n a l y s i s & i t s a p p l i c a ti o n s
3
S C O P E S TAT E M E N T

To predict the probability of customer retention and customer churn on the basis on sentiment analysis performed by Natural Language
Processing.
P R O J E C T J U S T I F I C AT I O N

• OTT (Over-the-top) platforms like Netflix, Amazon Prime etc. depend on the subscription and feedback on customers as a source of review.

• The feedback of consumers can be used to identify the satisfaction of consumer. Coupled with other factors like the digital screen time,
number of missed hits, frequency of visit of customer we can predict the possibility of customer retention or customer churn.

PROJECT SCOPE

JUDGE CREDIT WORTHINESS, CUSTOMER DATA SHARED BY CUSTOMERS CAN BE FURTHER TECHNOLOGY CAN BE USED FOR PREDICTION OF
SATISFACTION LEVEL, LIKELIHOOD OF CUSTOMER USED TO CORRECTLY MAKE PREDICTIONS FOR THE CUSTOMER INTEREST FOR SOCIAL MEDIA
ATTRITION AND GROWTH OF OTT PLATFORMS CUSTOMER ADVERTISEMENTS , INSURANCE PREMIUMS
THROUGH DATA ON PATIENT RECORDS

4
SOFTWARE REQUIREMENTS SPECIFICATION
Infrastructure Requirement
Deliverables • 16 GB ram
• Spam Filtering and identification of authentic reviews • Processor: i7 5th generation or above
• Sentiment Analysis – Positive, negative or neutral • OS: Windows / Linux
• Text classification into specific groups for comparative evaluation • Storage: 1TB or above
• Sentence similarity to identify similar customers & improve prediction • GPU(to perform parallelized matrix operations):
• Generate training dataset for machine learning for predictive analytics GeForce GTX 1070
• Python 3.0LTK
• NLP libraries of Python
• Hadoop
F U N C T I O N A L S T A G E S O F P R O J E C T • Amazon Cloud

Text Pre Processing Feature Engineering Topic Modelling Predictive Modelling Model Tuning
• Text Cleaning • Syntactic Parsing • Latent Semantic • Featuring • Random Forest
• Text • POS Tagging Analysis Engineering • K – Fold cross
Standardization • TF-IDF • Latent Dirichlet Techniques validation
• Text Normalization • Word Embedding Allocation • Basic Feature
• Non-negative Engineering
Matrix • Advanced
Factorization Feature
• Hierarchical Engineering
Dirichlet Process
5
SOFTWARE REQUIREMENTS SPECIFICATION

N O N - F U N C T I O N A L R E Q U I R E M E N T S

Scalability Security
• The NLP model should be able to handle more and • Customer usage data is very sensitive and hence the
more data with time as usage data will be keep infrastructure we use for storing their data should
flowing in at a rapid rate be highly secure without any vulnerability

Capacity Maintainability
• The database and the cloud storage used must be • Once the models are built and deployed,
large enough to accommodate all the data required maintaining and running them will be a simple
for the project process which will not require the presence of a ML
expert every time

6
STATEMENT OF WORK
A B O U T P R O J E C T

• Assessing the user sentiment of the OTT platform through NLP


• Analyzing the usage trends of the customers to create Business Insights
• Opportunity to upgrading and preventing downgrading of subscription from one
plan to another
• Identifying the missing content that was most searched for in order to add to the
platform
• Helps in using cutting edge technologies like NLP, Data Analytics to gain competitive
advantage over other competitors
• Providing the best suited content according to the requirements of the user base
S T A K E H O L D E R S

Head of Innovation Development Team Content User Database Sales and Marketing
at the firm with expertise in AI Management team Management Team team
and ML 7
Work Breakdown Structure
PHASES OF THE PROJECT

Pre-processing Feature engineering

Tokenization Bag of word

Data collection
Stemming and TF-IDF Model building
lemmatisation

Removing stop Word


word embedding
Model
Evaluation
Phase Based - Work Breakdown Structure
NLP based Predictive
Modeling

Exploratory Data Feature Predictive


Text Preprocessing Topic Modeling
Analysis Engineering Modeling

Latent Semantic
Word Cloud Text Cleaning Syntactic Parsing Random Forest Light GBM
Analysis

Price Distribution Text


POS Tagging
Latent Dirichlet Document Term Document Term
Analysis Standardization Allocation Matrix Creation Matrix Creation

Non-negative K-Fold Cross K-Fold Cross


Category Analysis Text Normalization TF-IDF Matrix
Factorization Validation Validation

Item Condition Hierarchical


Word Embedding
Analysis Dirichlet Process
PART I WBS - Schedule
PART II WBS - Schedule
Critical Path
START

Cleaning
7 weeks Text Parsing Results
4 weeks Interpretation
and END
Visualization
Feature
Normalization
Enhancement
6 Weeks 3 weeks 5 weeks

25 WEEKS
PROJECT COST BASELINE

Task Resources Effort Estimates (Days) Cost (₹ ‘000)


Shared Resources from all
Requirement Gathering functions 20 20
Data Collection & Storage
Cost DB Servers, Cloud Storage 10 5,00

Documentation Computer Resources 5 30

Software cost Licenses 10,00

Hardware Cost Laptops, PCs 12,00

Network cost Servers, Routers


A S AT I S F I E D CUTOMER
4,00

Employee cost NLP Engineers, Data Engineers 60 10,00

Miscellaneous 2,00

Total cost 43,50


The resource allocation matrix structure has become the
RESOURCE ALLOCATION primary organizational means for maintaining an efficient flow
of resources in multi-project environments.
Technical
Task Name Project Excellence Group Developer Testers Business Implementatio Total People
Manager Member Testers n Team required

Cleansing 1           5
Remove special characters 1           4
Decoding Data 1 1 2 1     8
Split Attached Words 1           2
Slang Lookup 1           3
Conversion to Lowercase 1 1 1 2     4
Normalization 1 3   1     7
Integer to word conversion and expanding 1   2      
contractions 2
Spell check 1     1     1
Grammer Check/ Language check 1   3       4
Lemmatization 1   1 2     4
Stemming 1   3 2     6
Word Standarization 1       4   4
Tokenization 1   4 3   4 9
N - Grams 1           5
Term Frequency 1           2
Inverse Document Frequency 1           3
Term Frequency- Inverse Document
Frequency (TF-IDF) 1 3         7
Bag of Words 1   1 2     5
USE CASES Use Case ID : Use Case Name :
UC - 1 Customer Sentiment

Name UC-1: Customer Sentiment


Summary Collecting user Reviews and analyzing the sentiment
Rationale It is almost impossible to reach out to each customer through calls and ask about their experience and satisfaction
levels with the service provided by the OTT platform. Customer reviews are the next best way to assess the
sentiment of the customers
Users All users
Precondition All the required users are registered on the database & there is access to all the reviews provided by the
s customers. The user is requested to provide reviews regularly
Basic course 1. The user accesses the OTT via mobile application or web application
of events 2. User account is created upon buying the subscription of the OTT service
3. The user watches the content and creates an opinion about the service provided by the platform
4. Customer management team sends out request to provide feedback on the user experience
5. The reviews are stored and processed to prepare it to feed it to the NLP models
6. NLP processes the reviews and extracts the sentiment from the underlying reviews
Alternative Look for third party data like the reviews provided on social media platforms like twitter, quora, etc.
paths Learn from the competitors' offerings and analyze what is being done better
Post- The user is logged in to the account and doesn’t share it with others
conditions The user regularly acknowledges the customer management’s request and provides the feedback.
USE CASES Use Case ID : Use Case Name :
UC – 2 Usage Analytics

Name UC-2 Usage Analytics


Summary Collecting user’s platform usage data and extracting insights

Rationale Data is the new oil. To gain competitive advantage, it becomes necessary to make use of the data available with the
organization. Analyzing the usage metrics like the number of visits in a month, screen time and choice of offerings
watched can all help in offering better services to the user and retaining them

Users All users

Precondition All the required users are registered on the database. Their usage is logged everytime they are using the services.
s Fast and efficient storage of these details.

Basic course 1. The user accesses the OTT via mobile application or web application
of events 2. The visit is logged on to the user database.
3. The user watches the content of his choice.
4. The corresponding activity is recorded in the database against the user
5. The user records are analyzed and insights are extracted which can help in understanding the users better

Alternative Look for paid third party research and surveys available on websites like Capital line, Accent DATA, IVC Research,
paths etc.

Post- The user is logged in to the account and doesn’t share the account credentials with others.
conditions
USE CASES Use Case ID : Use Case Name :
UC – 3 Upgrading Opportunity

Name UC-3 Upgrading Opportunity


Summary Using customer review sentiment analysis and usage data to predict Plan Upgradation Opportunity
Rationale Using the sentiment extracted from the customer reviews along with the usage data to see if the customer will
upgrade to the next (more expensive) plan
Users All users
Precondition All the required users are registered on the database. Their usage is logged everytime they are using the services. The
s reviews provided by them are also available in the database

Basic course 1. User’s sentiment is assessed from the reviews.


of events 2. User’s usage data is analysed to generate insights
3. The usage data insights and sentiment is combined to predict if the customer will be a good target for upgrading
the plan
4. The corresponding customer is approached by the sales team for upgradation wither by direct contact or through
inline communication
5. Helps greatly in increasing profits as the acquisition cost is not involved because of generating more revenues
from existing customers

Alternative Randomly target customers in batches without spending resources on this analysis phase
paths
Post- The user is logged in to the account and doesn’t share the account credentials with others and provides regular
conditions feedback
USE CASES Use Case ID : Use Case Name :
UC – 4 Content Addition

Name UC-4 Content Addition


Summary Using failed customer searches to know which contents are lacking in the platform

Rationale Customer searches for content on the platform. Sometimes these contents searched for are not available. When
several customers are looking for the same content and when it is not available, we can add those to the platform.

Users All users


Precondition All the required users are registered on the database. Their usage is logged everytime they are using the services.
s
Basic course 1. The user accesses the OTT via mobile application or web application
of events 2. The visit is logged on to the user database.
3. User looks for content by entering the title in the search bar.
4. When the content is not available, the corresponding searched title is logged in the database
5. The failed search data of all the customers is pooled together and the most commonly present titles are
shortlisted
6. This shortlist of titles is provided to the content team to add to the offerings.

Alternative Add the most Googled titles to the content offered


paths
Post- The user is logged in to the account and doesn’t share the account credentials with others
conditions
UML DIAGRAM

UML diagram for use


case between Platform
and different
stakeholders
RISKS
T R A I N I N G

• Machines can fail to comprehend the context of text unless properly and carefully trained.
• We should develop systems that read and understand text the way a person does

A M B I G U I T Y

• Critical problem is ambiguity and it of four types


• Lexical Ambiguity: one word with different meanings
• Syntactic Ambiguity: Phrasing in more than a single way
• Referential Ambiguity: Usage of pronouns
• Pragmatic Ambiguity: Meaning of a sentence may be different because of different intentions of speaker

O T H E R
C H A L L E N G E S

• Analyzing and storing large unstructured data


• All users won’t be providing reviews and feedback in OTT platform
20
CHANGE MANAGEMENT

Prepare Communication & Planning Executive and Review Transition Plans

Prepare Design Execute Sustain

• Create a vision to
achieve our required • Regular review of
Problem Identification:
outcome from the the mile stones
Generate an NLP
algorithm to detect user project i.e. designing • Execution of the
an NLP algorithm deployment plan • Providing feedback
sentiments, usage trends
and consensus
and missing content of
• Identifying possible • Determining the continuously
the OTT platform
resistance and phase wise
determine a plan to deployment and • Validating by
Factors:
checking the churn
• Requirement resolve resistance executing the same
rate and whether
Gathering
• user sentiments are
• Team selection Creation of a project
moving towards
• Data Storage deployment plan
positive side
ACTION ROLL-OUT PLAN
Requirement Design & Testing &
Development Go-Live After Go-Live
Gathering Infrastructure Implementation
( 2 weeks) (2 weeks) (4 weeks) (3 weeks) (1 week) (2 weeks)

Scope Text Unit Testing


Logical Design Milestone Go-
Identification Hand Holding
Document PreProcessing Live
workshop
System
Functional Integration
Design Feature Testing After Go-live
Design Setting Live
Workshop Engineering training
Document
Beta Project
Implementatio Efficiency of
Creation of FAQ Topic Support
Infrastructure n Model
Database Modelling Contract
Design Identification
Employees
Pain Point Training &
Predictive Reponses
Identification WBS Modelling Total Duration : 14
weeks
Documentation

Model Tuning
22
References
• Bob Hughes, Mike Cotterell, “Software Project Management”, Third
Edition, Tata McGraw Hill, 2004.
• Ramesh, Gopalaswamy, "Managing Global Projects", Tata McGraw
Hill, 2001.
• Royce, “Software Project Management”, Pearson Education, 1999.
• Jalote, “Software Project Management in Practice”, Pearson
Education, 2002.
THANK YOU

Group 11

You might also like