You are on page 1of 4

Machine Learning Basics (4 hours)

Today data driven modules are necessary in products to achieve the next level of performance. Using
Honeywell use cases this session will help the audience gain an understanding of the concepts and tools
for building data driven intelligent systems. This session will cover:

1. The Machine Learning Landscape


a. What Is Machine Learning?
b. Why Use Machine Learning?
c. Types of Machine Learning Systems
i. Supervised/Unsupervised Learning
ii. Batch and Online Learning
iii. Instance-Based Versus Model-Based Learning
d. Main Challenges of Machine Learning
i. Insufficient Quantity of Training Data
ii. Non-representative Training Data
iii. Poor-Quality Data
iv. Irrelevant Features
v. Overfitting the Training Data
vi. Underfitting the Training Data
vii. Stepping Back
e. Testing and Validating
i. Hyperparameter Tuning and Model Selection
ii. Data Mismatch

2. End-to-End Machine Learning Project


a. Working with Real Data
b. Look at the Big Picture
i. Frame the Problem
ii. Select a Performance Measure
iii. Check the Assumptions
c. Get the Data
i. Create the Workspace
ii. Download the Data
iii. Take a Quick Look at the Data Structure
iv. Create a Test Set
d. Discover and Visualize the Data to Gain Insights
i. Visualizing Geographical Data
ii. Looking for Correlations
iii. Experimenting with Attribute Combinations
e. Prepare the Data for Machine Learning Algorithms
i. Data Cleaning
ii. Handling Text and Categorical Attributes
iii. Custom Transformers
iv. Feature Scaling
v. Transformation Pipelines
f. Select and Train a Model
i. Training and Evaluating on the Training Set
ii. Better Evaluation Using Cross-Validation
g. Fine-Tune Your Model
i. Grid Search
ii. Randomized Search
iii. Ensemble Methods
iv. Analyze the Best Models and Their Errors
v. Evaluate Your System on the Test Set

3. Classification
a. Training a Binary Classifier
b. Performance Measures
i. Measuring Accuracy Using Cross-Validation
ii. Confusion Matrix
iii. Precision and Recall
iv. Precision/Recall Tradeoff
v. The ROC Curve
c. Multiclass Classification
d. Error Analysis
e. Multilabel Classification
f. Multioutput Classification

4. Training Models
a. Linear Regression
i. The Normal Equation
ii. Computational Complexity
b. Gradient Descent
i. Batch Gradient Descent
ii. Stochastic Gradient Descent
iii. Mini-batch Gradient Descent
c. Polynomial Regression
d. Learning Curves
e. Regularized Linear Models
i. Ridge Regression
ii. Lasso Regression
iii. Elastic Net
iv. Early Stopping
f. Logistic Regression
i. Estimating Probabilities
ii. Training and Cost Function
iii. Decision Boundaries
iv. Softmax Regression

5. Support Vector Machines


a. Linear SVM Classification
i. Soft Margin Classification
b. Nonlinear SVM Classification
i. Polynomial Kernel
ii. Adding Similarity Features
iii. Gaussian RBF Kernel
iv. Computational Complexity
c. SVM Regression

6. Decision Trees
a. Training and Visualizing a Decision Tree
b. Making Predictions
c. Estimating Class Probabilities
d. The CART Training Algorithm
e. Computational Complexity
f. Gini Impurity or Entropy?
g. Regularization Hyperparameters
h. Regression
i. Instability

7. Ensemble Learning and Random Forests


a. Voting Classifiers
b. Bagging and Pasting
i. Bagging and Pasting in Scikit-Learn
ii. Out-of-Bag Evaluation
c. Random Patches and Random Subspaces
d. Random Forests
i. Extra-Trees
ii. Feature Importance
e. Boosting
i. AdaBoost
ii. Gradient Boosting
f. Stacking

8. Dimensionality Reduction
a. Main Approaches for Dimensionality Reduction
i. Projection
ii. Manifold Learning
b. PCA
i. Preserving the Variance
ii. Principal Components
iii. Projecting Down to d Dimensions
iv. Explained Variance Ratio
v. Choosing the Right Number of Dimensions
vi. PCA for Compression
vii. Randomized PCA
viii. Incremental PCA
c. Kernel PCA
i. Selecting a Kernel and Tuning Hyperparameters

9. Unsupervised Learning Techniques


a. Clustering
i. K-Means
ii. Using clustering for image segmentation
iii. Using Clustering for Preprocessing
iv. Using Clustering for Semi-Supervised Learning
v. DBSCAN
b. Gaussian Mixtures
i. Anomaly Detection using Gaussian Mixtures
ii. Selecting the Number of Clusters
iii. Bayesian Gaussian Mixture Models
iv. Other Anomaly Detection and Novelty Detection Algorithms

You might also like