Professional Documents
Culture Documents
FAI Lecture - 4-10-2023 PDF
FAI Lecture - 4-10-2023 PDF
Classification:
Steps to Implement a Decision Tree Model:
1. Data Collection:
- Example: Gather a dataset of customer behaviour, including age, income, and purchase
history, along with labels indicating customer churn (yes or no).
2. Data Preprocessing:
- Example: Handle missing values and convert categorical variables into numerical ones.
3. Splitting Data:
- Example: Divide the dataset into 70% training data and 30% test data.
4. Feature Selection:
- Example: Use metrics like Gini impurity or information gain to select the most important
features, such as income and age.
5. Tree Construction:
- Example: Build the decision tree based on the selected features, creating decision nodes
and leaves.
6. Pruning:
- Example: Remove branches that have little power in prediction, reducing the complexity of
the model.
7. Model Training:
- Example: Use the pruned tree to train the model on the training data.
8. Model Testing:
- Example: Use the decision tree to classify the test data into churn or not churn categories.
9. Evaluation:
- Example: Use metrics like accuracy and F1-score to evaluate the model's performance.
10. Deployment:
- Example: Once the model is evaluated and fine-tuned, integrate it into a customer
relationship management system for real-time churn prediction.
Key Concepts:
- Decision Nodes: These are the nodes where the decision-making process occurs based on
certain attributes.
- Pruning: This is the process of removing the less useful branches of the tree. It reduces the
complexity and prevents overfitting.
2. Data Preprocessing:
- Example: Handle missing values and convert categorical variables into numerical ones.
3. Splitting Data:
- Example: Divide the dataset into 80% training data and 20% test data.
4. Bootstrap Sampling:
- Example: Create multiple subsets of the original training data through random sampling with
replacement.
8. Model Aggregation:
- Example: Combine the predictions from all the individual trees through majority voting.
9. Model Testing:
- Example: Use the aggregated model to classify the test data into positive or negative
customer response categories.
10. Evaluation:
- Example: Use metrics like accuracy and F1-score to evaluate the model's performance.
11. Deployment:
- Example: Once the model is evaluated and fine-tuned, integrate it into a marketing system
for real-time customer response prediction.
Explanation:
1. Bias/Variance:
- Overfitting: Low bias means the model fits the training data very well but has high variance,
making it sensitive to fluctuations in the data.
- Underfitting: High bias means the model is too simplistic to capture the underlying patterns,
resulting in low variance but poor performance.
2. Model Complexity:
- Overfitting: A complex model like a deep neural network can easily overfit if not properly
regularised.
- Underfitting: A simple model like a shallow neural network may not have the capacity to learn
complex relationships in the data.
3. Data Sensitivity:
- Overfitting: The model memorises the noise in the data rather than learning the actual
pattern.
- Underfitting: The model fails to capture essential patterns, effectively missing the point of the
data.
4. Examples:
- Overfitting: Using a polynomial regression model with a very high degree can lead to
overfitting.
- Underfitting: Applying linear regression to a dataset that has a non-linear relationship will
result in underfitting.
Overfitting:
1. Training Data: In the case of overfitting, the model performs exceptionally well on the training
data. It learns the training data to such an extent that it even captures the noise or random
fluctuations.
2. Test Data: When exposed to new, unseen data (test data), an overfitted model performs
poorly. This is because it has learned the training data too well, including its noise, which
doesn't generalise to new data.
Underfitting:
1. Training Data: An underfitted model performs poorly even on the training data. This is
because the model is too simplistic to capture the underlying patterns in the data.
2. Test Data: Similar to its performance on the training data, an underfitted model also performs
poorly on the test data. It lacks the complexity needed to understand the data's structure,
whether it's training or test data.
Summary:
- Overfitting: High accuracy on training data but low accuracy on test data.
- Underfitting: Low accuracy on both training and test data.
Multiple Perspectives:
1. Statistical Perspective: Overfitting and underfitting can be understood as errors due to high
variance and high bias, respectively. High variance leads to overfitting, and high bias leads to
underfitting.
2. Computational Perspective: Overfitting can often be mitigated by techniques like
regularization, while underfitting may require more complex architectures or feature engineering.
3. Practical Perspective: In real-world applications, you often aim for a balance between bias
and variance, known as the Bias-Variance tradeoff, to ensure good performance on both training
and test data.
—----------------------------------------------------------------------------------------------------------------------------
--------------
K-Nearest Neighbors (KNN):
2. Data Preprocessing:
- Example: Handle missing values by imputation and normalize features like age and BMI to
bring them to a similar scale.
3. Splitting Data:
- Example: Divide the dataset into 80% training data and 20% test data.
5. Distance Calculation:
- Example: Use Euclidean distance to find the distance between patients based on their age,
BMI, and blood sugar levels.
6. Model Training:
- Example: In KNN, the training phase is implicit. The model uses the training data during the
testing phase.
7. Model Testing:
- Example: For each patient in the test set, find the 3 nearest patients from the training set
and classify the test patient based on the majority class among these nearest neighbors.
8. Evaluation:
- Example: Use metrics like accuracy and F1-score to evaluate how well the model predicts
diabetes risk.
9. Deployment:
- Example: Once the model is evaluated and fine-tuned, it can be deployed in a healthcare
system to assist doctors in early diagnosis of diabetes.
—----------------------------------------------------------------------------------------------------------------------------
--------------
Naive Bayes algorithm
Steps to Implement a Naive Bayes Model:
1. Data Collection:
- Example: Gather a dataset of emails, each labelled as either "spam" or "not spam".
2. Data Preprocessing:
- Example: Convert the email text into a bag-of-words or term frequency-inverse document
frequency (TF-IDF) representation.
3. Splitting Data:
- Example: Divide the dataset into 70% training data and 30% test data.
4. Feature Selection:
- Example: Choose the most relevant words or phrases that commonly appear in spam
emails, such as "win", "free", "urgent", etc.
5. Probability Calculation:
- Example: Calculate the prior probabilities of spam and not-spam classes, as well as the
conditional probabilities of each word given a class.
6. Model Training:
- Example: Use the training data to calculate the probabilities needed for the Naive Bayes
formula.
7. Model Testing:
- Example: For each email in the test set, calculate the posterior probabilities of it being spam
or not spam and classify based on the higher probability.
8. Evaluation:
- Example: Use metrics like accuracy, precision, and recall to evaluate the model's
performance in classifying emails.
9. Deployment:
- Example: Once the model is evaluated and fine-tuned, integrate it into an email system to
automatically filter out spam emails.
Types of Naive Bayes:
Gaussian Naive Bayes
1. Assumption: Assumes that the continuous features follow a Gaussian distribution.
2. Use-case: Suitable for classification problems where feature vectors are made up of
continuous attributes.
3. Example: Predicting whether a person is healthy or not based on features like height, weight,
and age.
2. Data Preprocessing:
- Example: Normalize the features like income and credit score to bring them to a similar
scale.
3. Splitting Data:
- Example: Divide the dataset into 75% training data and 25% test data.
4. Choosing Kernel:
- Example: Decide between linear and non-linear kernels based on the nature of the data. For
credit risk, a linear kernel might suffice.
5. Hyperplane Calculation:
- Example: Find the optimal hyperplane that separates the classes in the training data.
6. Model Training:
- Example: Use the training data to find the support vectors that define the hyperplane.
7. Model Testing:
- Example: Use the hyperplane to classify the test data into high or low credit risk categories.
8. Evaluation:
- Example: Use metrics like accuracy and precision to evaluate the model's performance.
9. Deployment:
- Example: Once the model is evaluated and fine-tuned, integrate it into a financial system for
real-time credit risk assessment.
SVM Model Implementation Diagram
Key Concepts:
- Hyperplane: A hyperplane is a decision boundary that separates data points of different
classes. In a 2D space, it's a line; in a 3D space, it's a plane, and so on.
- Support Vectors: These are the data points that are closest to the hyperplane and are used to
define the optimal hyperplane.
Types of SVM:
1. Linear SVM: Used when the data is linearly separable. The hyperplane is a straight line (or
plane in higher dimensions).
2. Non-linear SVM: Used when the data is not linearly separable. This involves transforming the
input space into a higher dimension and finding a hyperplane there.
—----------------------------------------------------------------------------------------------------------------
Wrapper Method for Feature Selection
1. Initial State:
- Start with either an empty set or a full set of features, depending on the specific wrapper
technique (e.g., Forward Selection, Backward Elimination).
2. Iteration:
- Add or remove features based on the specific wrapper technique being used.
3. Evaluation:
- Fit the model using the selected features and evaluate its performance using a chosen
metric like accuracy, F1-score, or cross-validation score.
4. Selection:
- Choose the feature set that gives the best performance according to the evaluation metric.
5. Termination:
- The process ends when adding or removing features does not result in a significant
improvement or when a pre-defined stopping criterion is met.
Example:
- Step 1: Start with an empty set of features.
- Step 2: Add 'Income' and evaluate the model's accuracy.
- Step 3: Add 'Age' and re-evaluate. If the accuracy improves, keep 'Age' in the feature set.
- Step 4: Continue this process, adding and removing features to maximise accuracy.
- Step 5: Stop when no further improvement is observed.
Forward Elimination
1. Initial State: Start with no features in the model.
2. Addition: In each iteration, add the feature that provides the most significant improvement in
model performance.
3. Evaluation: After adding a new feature, evaluate the model using a chosen metric (e.g.,
R-squared, AIC).
4. Termination: Continue this process until adding new features does not improve the model
significantly.
Example:
- Step 1: Start with an empty model.
- Step 2: Add 'Income' as it improves the model's R-squared the most.
- Step 3: Add 'Age' as the next feature that provides the most significant improvement.
- Step 4: Stop if adding more features doesn't improve R-squared significantly.
Backward Elimination
1. Initial State: Start with all features in the model.
2. Removal: In each iteration, remove the feature that is least significant in improving model
performance.
3. Evaluation: After removing a feature, evaluate the model using a chosen metric (e.g.,
R-squared, AIC).
4. Termination: Continue this process until removing more features deteriorates the model
significantly.
Example:
- Step 1: Start with a model including 'Income', 'Age', and 'Gender'.
- Step 2: Remove 'Gender' as it is the least significant.
- Step 3: Evaluate the model; if performance remains stable or improves, consider it for the final
model.
- Step 4: Stop if removing more features worsens the model significantly.
2. Iteration:
- Forward: Add the best feature.
- Backward: Remove the worst feature.
- Bidirectional: Either add or remove based on evaluation.
3. Evaluation:
- Use a metric like R-squared, AIC, or cross-validation score to evaluate the model after each
addition or removal.
4. Termination:
- Stop when adding or removing features does not result in a significant improvement or when
a pre-defined number of features is reached.
Example:
- Step 1: Start with an empty model.
- Step 2: Add 'Income' as it improves the model's R-squared the most.
- Step 3: Evaluate and consider removing 'Age' if it doesn't contribute significantly.
- Step 4: Stop if a pre-defined number of features is reached or if changes don't improve the
model significantly.
—----------------------------------------------------------------------------------------------------------------------------
--------------
Exhaustive Feature Selection
1. Initial State:
- Start with an empty set and a full set of all features.
2. Iteration:
- Generate all possible combinations of features, from a single feature to all features.
3. Evaluation:
- For each combination, fit the model and evaluate it using a chosen metric like R-squared,
AIC, or cross-validation score.
4. Selection:
- Choose the feature subset that gives the best performance according to the evaluation
metric.
5. Termination:
- The process ends once all combinations have been evaluated.
Example:
- Step 1: Start with an empty set and a full set of features like 'Income', 'Age', and 'Gender'.
- Step 2: Generate combinations: ('Income'), ('Age'), ('Gender'), ('Income', 'Age'), ('Income',
'Gender'), ('Age', 'Gender'), ('Income', 'Age', 'Gender').
- Step 3: Evaluate each combination using R-squared.
- Step 4: Choose the combination, say ('Income', 'Age'), that gives the highest R-squared.
- Step 5: The process ends as all combinations have been evaluated.
—----------------------------------------------------------------------------------------------------------------------------
--------------
Regression:
Key Terminologies
1. Dependent Variable (Y): The variable you are trying to predict.
2. Independent Variables (X): The variables used to predict the dependent variable.
3. Coefficients: The weights assigned to the independent variables.
4. Intercept: The constant term in the regression equation.
5. R-squared: A statistical measure of how well the model explains the variability in the
dependent variable.
6. Residuals: The difference between the observed and predicted values.
7. Overfitting: When the model performs well on the training data but poorly on new data.
8. Underfitting: When the model performs poorly on both the training and new data.
9. Regularisation: Techniques like Ridge and Lasso to prevent overfitting.
10. Multicollinearity: When independent variables are highly correlated, making it difficult to
isolate the effect of each variable.
3. Support Vector Regression (SVM): Uses support vectors to perform linear or nonlinear
regression.
Application: Effective in high-dimensional spaces and versatile enough to capture both linear
and nonlinear relationships (e.g., financial forecasting).
Limitation: Computationally intensive and sensitive to the choice of hyperparameters.
5. Random Forest Regression: An ensemble of decision trees for more robust predictions.
Application: An ensemble method that enhances the robustness and accuracy of decision trees,
useful in various fields like healthcare and finance.
Limitation: Loses some interpretability and can be computationally expensive for large datasets.
8. Logistic Regression: Despite the name, it's used for binary classification but is algorithmically
a regression model.
Application: Widely used for binary classification tasks like spam detection or customer churn
prediction.
Limitation: Assumes a linear relationship between the logit of the outcome and the predictors,
which may not always hold true.
9. Multiple Linear Regression: Linear regression with more than one independent variable.
- Application: Useful for predicting a continuous outcome based on multiple predictors (e.g.,
predicting sales based on advertising spend across multiple channels).
- Limitation: Assumes a linear relationship among all variables and may suffer from
multicollinearity if predictors are highly correlated.
Comparison Table: