Professional Documents
Culture Documents
Instance
Supervised learning
• Labeled data
• You know what you’re looking for
• Classification: predict categorical labels
• Regression: predict continuous target variables
Classification
• Categorical variables
• Relationship between instance and feature
Classification
• Naïve Bayes classifier
• Features are independent
• Fast performance
• Decent classifier
Unsupervised algorithms
• Unlabeled data
• You might have no idea what you’re looking for
• Clustering: splitting observations into groups
• Dimensionality reduction: flatten data to fewer
dimensions
Clustering
• Exploring the data
• Similar objects in the same group
• Distance between data points
Clustering
• K-means clustering
• Three steps
• Chooses initial cluster centers
• Assigns data instance to cluster
• Recalculates cluster center
• Efficient
Outline
Typical steps in a Data Mining/ Modelling Project:
Load and generate datasets
Split a dataset for cross-validation
Use some learningalgorithms
I NaiveBayes
I SVM
I Randomforest
Evalute the performance of the algorithms
I Accuracy
I F1-score
I AUC ROC
Sc