This action might not be possible to undo. Are you sure you want to continue?
However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?

Get more training examples Try smaller sets of features Try getting additional features Try adding polynomial features Try decreasing Try increasing
Dayalbagh Educational Institute
CSM 042 Machine Intelligence
1
Machine learning diagnostic: Diagnostic: A test that you can run to gain insight what is/isn’t working with a learning algorithm, and gain guidance as to how best to improve its performance. Diagnostics can take time to implement, but doing so can be a very good use of your time.
Dayalbagh Educational Institute
CSM 042 Machine Intelligence
2
Evaluating your hypothesis
Fails to generalize to new examples not in training set.
size of house no. of bedrooms
size price
no. of floors
age of house
average income in neighborhood
kitchen size
Dayalbagh Educational Institute
CSM 042 Machine Intelligence
3
Evaluating your hypothesis Dataset: Size 2104 1600 2400 1416 3000 1985 1534 1427 1380 1494 Price 400 330 369 232 540 300 315 199 212 243 CSM 042 Machine Intelligence 4 Dayalbagh Educational Institute .
Compute test set error: Dayalbagh Educational Institute CSM 042 Machine Intelligence 5 .Training/testing procedure for linear regression .Learn parameter from training data (minimizing training error ) .
Learn parameter from training data .Training/testing procedure for logistic regression .Compute test set error: .Misclassification error (0/1 misclassification error): Dayalbagh Educational Institute CSM 042 Machine Intelligence 6 .
price Dayalbagh Educational Institute CSM 042 Machine Intelligence 7 .Overfitting example size Once parameters were fit to some set of data (training set). the error of the parameters as measured on that data (the training error) is likely to be lower than the actual generalization error.
10. 3. Problem: is likely to be an optimistic estimate of generalization error. our extra parameter ( = degree of polynomial) is fit to test set.Model selection 1. I. Dayalbagh Educational Institute CSM 042 Machine Intelligence 8 . Choose How well does the model generalize? Report test set error .e. 2.
Evaluating your hypothesis Dataset: Size 2104 1600 2400 1416 3000 1985 1534 1427 1380 1494 Price 400 330 369 232 540 300 315 199 212 243 CSM 042 Machine Intelligence 9 Dayalbagh Educational Institute .
Train/validation/test error Training error: Cross Validation error: Test error: Dayalbagh Educational Institute CSM 042 Machine Intelligence 10 .
2.Model selection 1. Pick Estimate generalization error for test set Dayalbagh Educational Institute CSM 042 Machine Intelligence 11 . 10. 3.
Bias/variance Price Price Size Size Price Size High bias (underfit) “Just right” High variance (overfit) Dayalbagh Educational Institute CSM 042 Machine Intelligence 12 .
Bias/variance Training error: Cross validation error: error degree of polynomial d Dayalbagh Educational Institute CSM 042 Machine Intelligence 13 .
variance Suppose your learning algorithm is performing less well than you were hoping. ( or is high.) Is it a bias problem or a variance problem? Bias (underfit): error (cross validation error) Variance (overfit): (training error) degree of polynomial d Dayalbagh Educational Institute CSM 042 Machine Intelligence 14 .Diagnosing bias vs.
I. 2. Problem: is likely to be an optimistic estimate of generalization error. 3. our extra parameter ( = degree of polynomial) is fit to test set. Choose How well does the model generalize? Report test set error .e.Model selection 1. 10. Dayalbagh Educational Institute CSM 042 Machine Intelligence 15 .
Evaluating your hypothesis Dataset: Size 2104 1600 2400 1416 3000 1985 1534 1427 1380 1494 Price 400 330 369 232 540 300 315 199 212 243 CSM 042 Machine Intelligence 16 Dayalbagh Educational Institute .
Train/validation/test error
Training error:
Cross Validation error:
Test error:
Dayalbagh Educational Institute
CSM 042 Machine Intelligence
17
Model selection 1. 2. 3. 10.
Pick Estimate generalization error for test set
Dayalbagh Educational Institute
CSM 042 Machine Intelligence
18
Bias/variance
Price
Price
Size
Size
Price
Size
High bias (underfit)
“Just right”
High variance (overfit)
Dayalbagh Educational Institute
CSM 042 Machine Intelligence
19
Bias/variance Training error: Cross validation error: error degree of polynomial d Dayalbagh Educational Institute CSM 042 Machine Intelligence 20 .
) Is it a bias problem or a variance problem? Bias (underfit): error (cross validation error) Variance (overfit): (training error) degree of polynomial d Dayalbagh Educational Institute CSM 042 Machine Intelligence 21 . variance Suppose your learning algorithm is performing less well than you were hoping. ( or is high.Diagnosing bias vs.
Linear regression with regularization Model: Price Price Size Size Price Size Large xx High bias (underfit) Intermediate xx “Just right” Small xx High variance (overfit) Dayalbagh Educational Institute CSM 042 Machine Intelligence 22 .
Choosing the regularization parameter Dayalbagh Educational Institute CSM 042 Machine Intelligence 23 .
Choosing the regularization parameter Model: 1. 2. 4. 3. Try Pick (say) . Test error: CSM 042 Machine Intelligence 24 Dayalbagh Educational Institute . Try Try Try Try Try 12. 5.
Bias/variance as a function of the regularization parameter Dayalbagh Educational Institute CSM 042 Machine Intelligence 25 .
Learning curves error (training set size) Dayalbagh Educational Institute CSM 042 Machine Intelligence 26 .
Dayalbagh Educational Institute size CSM 042 Machine Intelligence price (training set size) 27 . getting more training data will not (by itself) help much.High bias price error size If a learning algorithm is suffering from high bias.
getting more training data is likely to help.High variance (and small ) error price size (training set size) If a learning algorithm is suffering from high variance. Dayalbagh Educational Institute size CSM 042 Machine Intelligence price 28 .
when you test your hypothesis in a new set of houses. What should you try next?  Get more training examples Try smaller sets of features Try getting additional features Try adding polynomial features Try decreasing Try increasing Dayalbagh Educational Institute CSM 042 Machine Intelligence 29 .Debugging a learning algorithm: Suppose you have implemented regularized linear regression to predict housing prices. you find that it makes unacceptably large errors in its prediction. However.
Supervised learning Training set: Dayalbagh Educational Institute CSM 042 Machine Intelligence 30 .
Unsupervised learning Training set: Dayalbagh Educational Institute CSM 042 Machine Intelligence 31 .
Applications of clustering Market segmentation Social network analysis Image credit: NASA/JPLCaltech/E. Madison) Organize computing clusters Astronomical data analysis CSM 042 Machine Intelligence 32 Dayalbagh Educational Institute . Churchwell (Univ. of Wisconsin.
Dayalbagh Educational Institute CSM 042 Machine Intelligence 33 .
Dayalbagh Educational Institute CSM 042 Machine Intelligence 34 .
Dayalbagh Educational Institute CSM 042 Machine Intelligence 35 .
Training set (drop convention) Dayalbagh Educational Institute CSM 042 Machine Intelligence 36 .Kmeans algorithm Input: (number of clusters) .
Kmeans algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Dayalbagh Educational Institute CSM 042 Machine Intelligence 37 .
Kmeans for nonseparated clusters Tshirt sizing Weight Height Dayalbagh Educational Institute CSM 042 Machine Intelligence 38 .
2. ) to which example assigned = cluster centroid ( ) = cluster centroid of cluster to which example assigned Optimization objective: is currently has been Dayalbagh Educational Institute CSM 042 Machine Intelligence 39 .….Kmeans optimization objective = index of cluster (1.
Kmeans algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Dayalbagh Educational Institute CSM 042 Machine Intelligence 40 .
Kmeans algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Dayalbagh Educational Institute CSM 042 Machine Intelligence 41 .
Random initialization Should have Randomly pick examples. training Set equal to these examples. Dayalbagh Educational Institute CSM 042 Machine Intelligence 42 .
Local optima Dayalbagh Educational Institute CSM 042 Machine Intelligence 43 .
Pick clustering that gave lowest cost Dayalbagh Educational Institute CSM 042 Machine Intelligence 44 . Get Compute cost function (distortion) } .Random initialization For i = 1 to 100 { Randomly initialize Kmeans. Run Kmeans.
What is the right value of K? Dayalbagh Educational Institute CSM 042 Machine Intelligence 45 .
Evaluate Kmeans based on a metric for how well it performs for that later purpose. E.g. Tshirt sizing Tshirt sizing Weight Height Weight Height Dayalbagh Educational Institute CSM 042 Machine Intelligence 46 . you’re running Kmeans to get clusters to use for some later/downstream purpose.Choosing the value of K Sometimes.
Principal Component Analysis (PCA) problem formulation Dayalbagh Educational Institute CSM 042 Machine Intelligence 47 .
Principal Component Analysis (PCA) problem formulation Reduce from 2dimension to 1dimension: Find a direction (a vector onto which to project the data so as to minimize the projection error. so as to minimize the projection error. ) Dayalbagh Educational Institute CSM 042 Machine Intelligence 48 . Reduce from ndimension to kdimension: Find vectors onto which to project the data.
PCA is not linear regression Dayalbagh Educational Institute CSM 042 Machine Intelligence 49 .
PCA is not linear regression Dayalbagh Educational Institute CSM 042 Machine Intelligence 50 .
Data preprocessing Training set: Preprocessing (feature scaling/mean normalization): Replace each with .g. scale features to have comparable range of values. Dayalbagh Educational Institute CSM 042 Machine Intelligence 51 .. number of bedrooms). If different features on different scales (e. size of house.
Principal Component Analysis (PCA) algorithm Reduce data from 2D to 1D Reduce data from 3D to 2D Dayalbagh Educational Institute CSM 042 Machine Intelligence 52 .
Principal Component Analysis (PCA) algorithm Reduce data from dimensions to dimensions Compute “covariance matrix”: Compute “eigenvectors” of matrix [U. : Dayalbagh Educational Institute CSM 042 Machine Intelligence 53 .S.V] = svd(Sigma).
we get: Dayalbagh Educational Institute CSM 042 Machine Intelligence 54 .Principal Component Analysis (PCA) algorithm From [U.V] = svd(Sigma) .S.
Principal Component Analysis (PCA) algorithm summary After mean normalization (ensure every feature has zero mean) and optionally feature scaling: Sigma = [U.1:k).V] = svd(Sigma). Ureduce = U(:. z = Ureduce’*x.S. Dayalbagh Educational Institute CSM 042 Machine Intelligence 55 .
Reconstruction from compressed representation Dayalbagh Educational Institute CSM 042 Machine Intelligence 56 .
choose to be smallest value so that (1%) “99% of variance is retained” Dayalbagh Educational Institute CSM 042 Machine Intelligence 57 .Choosing (number of principal components) Average squared projection error: Total variation in the data: Typically.
V] = svd(Sigma) Algorithm: Try PCA with Compute Check if Dayalbagh Educational Institute CSM 042 Machine Intelligence 58 .Choosing (number of principal components) [U.S.
S.Choosing (number of principal components) for which [U.V] = svd(Sigma) Pick smallest value of (99% of variance retained) Dayalbagh Educational Institute CSM 042 Machine Intelligence 59 .
Dayalbagh Educational Institute CSM 042 Machine Intelligence 60 .Supervised learning speedup Extract inputs: Unlabeled dataset: New training set: Note: Mapping should be defined by running PCA only on the training set. This mapping can be applied as well to the examples and in the cross validation and test sets.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.