Professional Documents
Culture Documents
Unit-05
Data Wrangling
Predicting a Predicting a No
category? Quantity? Yes
Not working
Do you have
Labeled
data?
Clustering Regression <100K
<100K
no. of samples Randomized
category No PCA
samples known
Algo SGD Few
Linear Regressor features are <10K
SGD <10K <10K
SVC samples samples important samples
Classifier
Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled
data.
Supervised learning model takes direct feedback to check if it is Unsupervised learning model does not take any feedback.
predicting correct output or not.
The goal of supervised learning is to train the model so that it The goal of unsupervised learning is to find the hidden patterns
can predict the output when it is given new data. and useful insights from the unknown dataset.
Supervised learning can be categorized Unsupervised Learning can be classified
in Classification and Regression problems. in Clustering and Associations problems.
Supervised learning model produces an accurate result. Unsupervised learning model may give less accurate result as
compared to supervised learning.
It includes various algorithms such as Linear Regression, It includes various algorithms such as Clustering, KNN, and
Logistic Regression, Support Vector Machine, Multi-class Apriori algorithm.
Classification, Decision tree, Bayesian Logic, etc.
Test Model
Data Testing
Train Model
Data Training
We also can use another pandas built-in function describe() to obtain insights of the data.
valueCount.py Output
1 print(df.describe())
covCorr.py
1 print(df.cov())
2 print(df.corr())