You are on page 1of 4

Recommended system

Reinforcement learning
 
 
Framework
 Identify of business problem - reactive process or pro-active process
 Acquire data -
o What source of information dealing with
 Primary data - we personal collect data
 Secondary data - the requirement that passed on to third part for data
collection
o What are the different data type
 Quantitative and qualitative data
 Under qualitative data - Text, image, audio, video
 Under quantitative data
 Continuous data and discreate data
 Ratio and interval for continuous data
 Categorical, ordinal, and binary for discreate data

 No machine learning algorithm understand qualitative data. Requires transformation of data


 Clean - Pre-processing the data
o If cleaning the data is same or repetitive then automate it.
o Missing and outlier are two most important parameter to address
o Missing data
 Remove the data either by rows or column
 Imputation -
 mean, median, and mode
 Cross-sectional (ML, K-nn, Sklearn knnimputer )
 Time-series data (interpolation techniques)
o Outlier data
 Detection
 Decision
 
 For detecting outlier -
 Univariate method - box plot, Z-score
 Bivariate method - scatter plot
 Multivariate method - residual score is close to zero then no outliers, if residual
score is greater than zero than outlier are present
 Clusters for unstructured data for multivariate method
 DB scan - density based scan
 Decide what to be done
 Error records - data entry problem
 Random chance - outlier capping
 
 Transform
o Data type transformation
 Continuous to discreate (Binning)
 Qualitative to quantitative
 Discreate to continuous (one hot, label and code)
o Models requirement
 Distance based models - clusters, min-max, scaling
 Weights based models - neural network, deep learning
 Feature engineering
 PCA - dimension reduction
 
 EDA - Exploratory data Analysis
o Understand the central tendencies and discretion
o Better understanding of data
o To identify the patterns
o To convince the stakeholders
 
 Model
o Challenge - bombarded with models and algorithms
o If the model has hypothesis and null then it is inferential model
o Three steps involved
 Model selection
 Descriptive - sample analysis
 Inferential - hypothesis testing, chi-square test
 Predictive
 Supervised learning - the model knows what it predicts, has
learning and testing also called has task-based model
 Classification - yes or no kind of model
 Logistic
 Discriminant
 Linear
 Quadratic
 Naïve bases
 Regression
 Linear
 Non-linear models
 Time series forecasting
 Parametric models - moving
average techniques, Exponential
smoothing, Arima models
 Non-parametric models - Auto
regressive, conditional
heteroscedastic
 KNN, SVM, decision tree.
 Ensemble methods
 Bagging
 Boosting
 Stacking
 Neural network
 
 Unsupervised learning - Data orient approach
 Clustering
 Hierarchal
 Dec
 
 Non-Hierarchal
 KNN
 PAM
 Clara
 Fuzzy clustering
 Location based clustering
 Model based clustering
 Density based clustering
 DB scan
 Dimension reduction
 PCA - principal component analysis
 FA - factor analysis
 MCA - multiple corresponding analysis
 T-SNE
 Autoencoders
 Topic modelling
 LDA - Latent Dirichlet allocation
 NMF - non- negative metric factorization
 Recommender system (special case)
 Popularity based recommendation - non personalized
 Association analysis (market basket analysis) - used on transactional data
 Content based recommendations - distance matrix
All the three mentioned above are non-personalized
recommendations
 Collaborative filtering
 The data necessary for Collaborative filtering are users, items and interaction
between the user and the items.
 User based CF
 Item based CF
 SVD - Singular value decomposition - it works on the concept of dimension reduction
 Hybrid recommendations - combinations of above recommendations
 
 
 
 Semi-supervised learning - unlabelled data and company has high manpower to manual
entry data
 Reinforcement learning - the science of decision making
 Prescriptive - combination of descriptive and predictive
 Model build
 Model evaluate or fine tunning
 
 Insights

Second type of recommender system


 
Market basket analysis
 
If two items are brought together then the support is high
 
Popularity based - users, items and interaction between users and items
Market basket analysis - transactions data
 
 
Support = no of transactions containing both A and B / total no of transactions
 
Confidence = no of transactions containing both A and B / no of transactions containing A
 
1. Items sets with high support are only consider for market basket analysis
2. Generally confidence is used for recommending whereas support is used as a filter

Techniques applied to convert text to numbers


 Bag of words
 Glove
 Words to vector
 Embedding
 
Bag of words
Count vectorizer method
TF-IDF vectorizer method
 
Both are broadly called as document term matrix
 
Term frequency is different from document frequency

Recommender system
Collaborative recommender system
 
Two board types of collaborative filters
o Item based CF
o Users based CF
 
Users no are listed in row and items are listed in columns
 
From interaction data the obtain insights will be
 
IBCF - Similarity matrix (Item-Item)
Correlation
Similarity is based upon the co-occurrence of the purchases
UBCF - similarity (User - User)
Cosine similarity
Similarity is based upon purchase behaviour
 
 
Package for all this calculations are Sk.learn surprise

You might also like