You are on page 1of 11

Data Mining

What is machine learning?


• Application of algorithms that learn from
examples
• Representation and generalization
Why should we care?
• Useful in every day life
• Email spam, handwriting analysis, stock market
analysis, Netflix

• Especially useful in data analysis


• Feature extraction, linear regression, classification,
clustering
Machine Learning Vocab
Feature Class

Instance
Supervised learning
• Labeled data
• You know what you’re looking for
• Classification: predict categorical labels
• Regression: predict continuous target variables
Classification
• Categorical variables
• Relationship between instance and feature
Classification
• Naïve Bayes classifier
• Features are independent
• Fast performance
• Decent classifier
Unsupervised algorithms
• Unlabeled data
• You might have no idea what you’re looking for
• Clustering: splitting observations into groups
• Dimensionality reduction: flatten data to fewer
dimensions
Clustering
• Exploring the data
• Similar objects in the same group
• Distance between data points
Clustering
• K-means clustering
• Three steps
• Chooses initial cluster centers
• Assigns data instance to cluster
• Recalculates cluster center
• Efficient
Outline
Typical steps in a Data Mining/ Modelling Project:
Load and generate datasets
Split a dataset for cross-validation
Use some learningalgorithms
I NaiveBayes
I SVM
I Randomforest
Evalute the performance of the algorithms
I Accuracy
I F1-score
I AUC ROC

Sc

You might also like