BAAI

Recommended system
Reinforcement learning

Framework
 Identify of business problem - reactive process or pro-active process
 Acquire data -
o What source of information dealing with
 Primary data - we personal collect data
 Secondary data - the requirement that passed on to third part for data
collection
o What are the different data type
 Quantitative and qualitative data
 Under qualitative data - Text, image, audio, video
 Under quantitative data
 Continuous data and discreate data
 Ratio and interval for continuous data
 Categorical, ordinal, and binary for discreate data
 No machine learning algorithm understand qualitative data. Requires transformation of data

 Clean - Pre-processing the data
o If cleaning the data is same or repetitive then automate it.
o Missing and outlier are two most important parameter to address
o Missing data
 Remove the data either by rows or column
 Imputation -
 mean, median, and mode
 Cross-sectional (ML, K-nn, Sklearn knnimputer )
 Time-series data (interpolation techniques)
o Outlier data
 Detection
 Decision

 For detecting outlier -
 Univariate method - box plot, Z-score
 Bivariate method - scatter plot
 Multivariate method - residual score is close to zero then no outliers, if residual
score is greater than zero than outlier are present
 Clusters for unstructured data for multivariate method
 DB scan - density based scan
 Decide what to be done
 Error records - data entry problem
 Random chance - outlier capping

 Transform
o Data type transformation
 Continuous to discreate (Binning)
 Qualitative to quantitative
 Discreate to continuous (one hot, label and code)
o Models requirement
 Distance based models - clusters, min-max, scaling
 Weights based models - neural network, deep learning
 Feature engineering
 PCA - dimension reduction

 EDA - Exploratory data Analysis
o Understand the central tendencies and discretion
o Better understanding of data
o To identify the patterns
o To convince the stakeholders

 Model
o Challenge - bombarded with models and algorithms
o If the model has hypothesis and null then it is inferential model
o Three steps involved
 Model selection
 Descriptive - sample analysis
 Inferential - hypothesis testing, chi-square test
 Predictive
 Supervised learning - the model knows what it predicts, has
learning and testing also called has task-based model
 Classification - yes or no kind of model
 Logistic
 Discriminant
 Linear
 Quadratic
 Naïve bases
 Regression
 Linear
 Non-linear models
 Time series forecasting
 Parametric models - moving
average techniques, Exponential
smoothing, Arima models
 Non-parametric models - Auto
regressive, conditional
heteroscedastic
 KNN, SVM, decision tree.
 Ensemble methods
 Bagging
 Boosting
 Stacking
 Neural network

 Unsupervised learning - Data orient approach
 Clustering
 Hierarchal
 Dec

 Non-Hierarchal
 KNN
 PAM
 Clara
 Fuzzy clustering
 Location based clustering
 Model based clustering
 Density based clustering
 DB scan
 Dimension reduction
 PCA - principal component analysis
 FA - factor analysis
 MCA - multiple corresponding analysis
 T-SNE
 Autoencoders
 Topic modelling
 LDA - Latent Dirichlet allocation
 NMF - non- negative metric factorization
 Recommender system (special case)
 Popularity based recommendation - non personalized
 Association analysis (market basket analysis) - used on transactional data
 Content based recommendations - distance matrix
All the three mentioned above are non-personalized
recommendations
 Collaborative filtering
 The data necessary for Collaborative filtering are users, items and interaction
between the user and the items.
 User based CF
 Item based CF
 SVD - Singular value decomposition - it works on the concept of dimension reduction
 Hybrid recommendations - combinations of above recommendations

 Semi-supervised learning - unlabelled data and company has high manpower to manual
entry data
 Reinforcement learning - the science of decision making
 Prescriptive - combination of descriptive and predictive
 Model build
 Model evaluate or fine tunning

 Insights
Second type of recommender system

Market basket analysis

If two items are brought together then the support is high

Popularity based - users, items and interaction between users and items
Market basket analysis - transactions data

Support = no of transactions containing both A and B / total no of transactions

Confidence = no of transactions containing both A and B / no of transactions containing A

1. Items sets with high support are only consider for market basket analysis
2. Generally confidence is used for recommending whereas support is used as a filter
Techniques applied to convert text to numbers

 Bag of words
 Glove
 Words to vector
 Embedding

Bag of words
Count vectorizer method
TF-IDF vectorizer method

Both are broadly called as document term matrix

Term frequency is different from document frequency
Recommender system
Collaborative recommender system

Two board types of collaborative filters
o Item based CF
o Users based CF

Users no are listed in row and items are listed in columns

From interaction data the obtain insights will be

IBCF - Similarity matrix (Item-Item)
Correlation
Similarity is based upon the co-occurrence of the purchases
UBCF - similarity (User - User)
Cosine similarity
Similarity is based upon purchase behaviour

Package for all this calculations are Sk.learn surprise

BAAI

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BAAI

Uploaded by

Copyright:

Available Formats

Recommended system

 No machine learning algorithm understand qualitative data. Requires transformation of data

Second type of recommender system

Techniques applied to convert text to numbers

You might also like