You are on page 1of 27

Classification:

Steps to Implement a Decision Tree Model:


Steps to Implement a Random Forest Model:
Overfitting vs Underfitting:
K-Nearest Neighbors (KNN):
Naive Bayes algorithm
Types of Naive Bayes:
Support Vector Machines (SVM)
Wrapper Method for Feature Selection
Forward Elimination
Backward Elimination
Sequential Feature Selection (SFS)
Exhaustive Feature Selection
Regression:
Key Terminologies
Types of Regression Models

Classification:
Steps to Implement a Decision Tree Model:
1. Data Collection:
- Example: Gather a dataset of customer behaviour, including age, income, and purchase
history, along with labels indicating customer churn (yes or no).

2. Data Preprocessing:
- Example: Handle missing values and convert categorical variables into numerical ones.

3. Splitting Data:
- Example: Divide the dataset into 70% training data and 30% test data.

4. Feature Selection:
- Example: Use metrics like Gini impurity or information gain to select the most important
features, such as income and age.

5. Tree Construction:
- Example: Build the decision tree based on the selected features, creating decision nodes
and leaves.

6. Pruning:
- Example: Remove branches that have little power in prediction, reducing the complexity of
the model.
7. Model Training:
- Example: Use the pruned tree to train the model on the training data.

8. Model Testing:
- Example: Use the decision tree to classify the test data into churn or not churn categories.

9. Evaluation:
- Example: Use metrics like accuracy and F1-score to evaluate the model's performance.

10. Deployment:
- Example: Once the model is evaluated and fine-tuned, integrate it into a customer
relationship management system for real-time churn prediction.

Key Concepts:
- Decision Nodes: These are the nodes where the decision-making process occurs based on
certain attributes.
- Pruning: This is the process of removing the less useful branches of the tree. It reduces the
complexity and prevents overfitting.

Decision Tree Model Implementation Diagram


—----------------------------------------------------------------------------------------------------------------------------
--------------
Steps to Implement a Random Forest Model:
1. Data Collection:
- Example: Gather a dataset of customer behaviour, including age, income, and past
purchase history, along with labels indicating customer response (positive or negative).

2. Data Preprocessing:
- Example: Handle missing values and convert categorical variables into numerical ones.

3. Splitting Data:
- Example: Divide the dataset into 80% training data and 20% test data.

4. Bootstrap Sampling:
- Example: Create multiple subsets of the original training data through random sampling with
replacement.

5. Decision Tree Construction:


- Example: Build individual decision trees for each bootstrap sample.

6. Random Feature Selection:


- Example: For each tree, randomly select a subset of features at each split.

7. Individual Tree Training:


- Example: Train each tree using its respective bootstrap sample and feature subset.

8. Model Aggregation:
- Example: Combine the predictions from all the individual trees through majority voting.

9. Model Testing:
- Example: Use the aggregated model to classify the test data into positive or negative
customer response categories.

10. Evaluation:
- Example: Use metrics like accuracy and F1-score to evaluate the model's performance.

11. Deployment:
- Example: Once the model is evaluated and fine-tuned, integrate it into a marketing system
for real-time customer response prediction.

Random Forest Model Implementation Diagram


Key Concepts:
- Bootstrap Sampling: This involves creating multiple subsets of the original dataset through
random sampling.
- Random Feature Selection: This involves selecting a random subset of features for each
decision tree in the forest.
- Model Aggregation: This involves combining the predictions from all individual trees to make a
final prediction.

Overfitting in Random Forest


1. Complex Trees: If the individual decision trees are too complex, they may capture noise
rather than the underlying pattern.
2. Limited Randomness: Insufficient randomness in feature selection can lead to trees that are
highly correlated and over-specialised.
3. High Depth: Allowing trees to grow too deep can result in capturing outliers or noise.
4. Low Regularisation: Lack of tree pruning or regularisation techniques can contribute to
overfitting.
Example:
- In a Random Forest model predicting customer churn, if trees are too deep and capture every
minute behaviour of a small group of customers, the model may perform poorly on new data.

Underfitting in Random Forest


1. Oversimplification: If the trees are too simple, they may fail to capture important patterns.
2. High Bias: Setting too many constraints on the model can lead to high bias and poor
generalisation.
3. Insufficient Trees: Too few trees can result in a model that doesn't capture the complexity of
the data.
4. Poor Feature Selection: If irrelevant features are given importance, the model may
underperform.
Example:
- In the same customer churn model, if the trees are too shallow and only use one or two
features like 'age' or 'income', they may not capture the complexity of customer behaviour,
leading to underfitting.
—----------------------------------------------------------------------------------------------------------------------------
--------------
Overfitting vs Underfitting:

Explanation:
1. Bias/Variance:
- Overfitting: Low bias means the model fits the training data very well but has high variance,
making it sensitive to fluctuations in the data.
- Underfitting: High bias means the model is too simplistic to capture the underlying patterns,
resulting in low variance but poor performance.

2. Model Complexity:
- Overfitting: A complex model like a deep neural network can easily overfit if not properly
regularised.
- Underfitting: A simple model like a shallow neural network may not have the capacity to learn
complex relationships in the data.

3. Data Sensitivity:
- Overfitting: The model memorises the noise in the data rather than learning the actual
pattern.
- Underfitting: The model fails to capture essential patterns, effectively missing the point of the
data.

4. Examples:
- Overfitting: Using a polynomial regression model with a very high degree can lead to
overfitting.
- Underfitting: Applying linear regression to a dataset that has a non-linear relationship will
result in underfitting.
Overfitting:
1. Training Data: In the case of overfitting, the model performs exceptionally well on the training
data. It learns the training data to such an extent that it even captures the noise or random
fluctuations.

2. Test Data: When exposed to new, unseen data (test data), an overfitted model performs
poorly. This is because it has learned the training data too well, including its noise, which
doesn't generalise to new data.

Underfitting:
1. Training Data: An underfitted model performs poorly even on the training data. This is
because the model is too simplistic to capture the underlying patterns in the data.

2. Test Data: Similar to its performance on the training data, an underfitted model also performs
poorly on the test data. It lacks the complexity needed to understand the data's structure,
whether it's training or test data.

Summary:
- Overfitting: High accuracy on training data but low accuracy on test data.
- Underfitting: Low accuracy on both training and test data.

Multiple Perspectives:
1. Statistical Perspective: Overfitting and underfitting can be understood as errors due to high
variance and high bias, respectively. High variance leads to overfitting, and high bias leads to
underfitting.
2. Computational Perspective: Overfitting can often be mitigated by techniques like
regularization, while underfitting may require more complex architectures or feature engineering.

3. Practical Perspective: In real-world applications, you often aim for a balance between bias
and variance, known as the Bias-Variance tradeoff, to ensure good performance on both training
and test data.

—----------------------------------------------------------------------------------------------------------------------------
--------------
K-Nearest Neighbors (KNN):

Steps to Implement a KNN Model with eg:


1. Data Collection:
- Example: Collect a dataset of patient records, including age, BMI, blood sugar levels, and
whether they have diabetes or not.

2. Data Preprocessing:
- Example: Handle missing values by imputation and normalize features like age and BMI to
bring them to a similar scale.

3. Splitting Data:
- Example: Divide the dataset into 80% training data and 20% test data.

4. Choosing 'k' Value:


- Example: Use techniques like cross-validation to find an optimal 'k' value, say k=3 for our
diabetes prediction model.

5. Distance Calculation:
- Example: Use Euclidean distance to find the distance between patients based on their age,
BMI, and blood sugar levels.

6. Model Training:
- Example: In KNN, the training phase is implicit. The model uses the training data during the
testing phase.

7. Model Testing:
- Example: For each patient in the test set, find the 3 nearest patients from the training set
and classify the test patient based on the majority class among these nearest neighbors.

8. Evaluation:
- Example: Use metrics like accuracy and F1-score to evaluate how well the model predicts
diabetes risk.

9. Deployment:
- Example: Once the model is evaluated and fine-tuned, it can be deployed in a healthcare
system to assist doctors in early diagnosis of diabetes.

Euclidean Distance Calculation:


The Euclidean distance is used to find the 'k' nearest neighbors to a given data point. In a 2D
plane, the distance between two points \( A(x_1, y_1) \) and \( B(x_2, y_2) \) is calculated as:
In our example, this could be the distance between two patients based on their normalized age
and BMI values.

—----------------------------------------------------------------------------------------------------------------------------
--------------
Naive Bayes algorithm
Steps to Implement a Naive Bayes Model:
1. Data Collection:
- Example: Gather a dataset of emails, each labelled as either "spam" or "not spam".

2. Data Preprocessing:
- Example: Convert the email text into a bag-of-words or term frequency-inverse document
frequency (TF-IDF) representation.

3. Splitting Data:
- Example: Divide the dataset into 70% training data and 30% test data.

4. Feature Selection:
- Example: Choose the most relevant words or phrases that commonly appear in spam
emails, such as "win", "free", "urgent", etc.

5. Probability Calculation:
- Example: Calculate the prior probabilities of spam and not-spam classes, as well as the
conditional probabilities of each word given a class.

6. Model Training:
- Example: Use the training data to calculate the probabilities needed for the Naive Bayes
formula.

7. Model Testing:
- Example: For each email in the test set, calculate the posterior probabilities of it being spam
or not spam and classify based on the higher probability.

8. Evaluation:
- Example: Use metrics like accuracy, precision, and recall to evaluate the model's
performance in classifying emails.

9. Deployment:
- Example: Once the model is evaluated and fine-tuned, integrate it into an email system to
automatically filter out spam emails.
Types of Naive Bayes:
Gaussian Naive Bayes
1. Assumption: Assumes that the continuous features follow a Gaussian distribution.
2. Use-case: Suitable for classification problems where feature vectors are made up of
continuous attributes.
3. Example: Predicting whether a person is healthy or not based on features like height, weight,
and age.

Multinomial Naive Bayes


1. Assumption: Assumes that the features follow a Multinomial distribution.
2. Use-case: Often used in text classification problems where the features can be interpreted as
word frequencies or counts.
3. Example: Sentiment analysis of customer reviews, where features could be the frequency of
positive or negative words.

Bernoulli Naive Bayes


1. Assumption: Assumes that all our features are binary such that they take only two values (0s
and 1s).
2. Use-case: Suitable for binary feature vectors, often used in text classification problems where
a feature's presence or absence matters.
3. Example: Email spam filtering, where the presence or absence of certain keywords is a
feature.
—----------------------------------------------------------------------------------------------------------------------------
--------------
Support Vector Machines (SVM)
Steps to Implement an SVM Model:
1. Data Collection:
- Example: Gather a dataset of financial records, including income, credit score, and loan
amount, along with labels indicating credit risk (high or low).

2. Data Preprocessing:
- Example: Normalize the features like income and credit score to bring them to a similar
scale.

3. Splitting Data:
- Example: Divide the dataset into 75% training data and 25% test data.

4. Choosing Kernel:
- Example: Decide between linear and non-linear kernels based on the nature of the data. For
credit risk, a linear kernel might suffice.

5. Hyperplane Calculation:
- Example: Find the optimal hyperplane that separates the classes in the training data.

6. Model Training:
- Example: Use the training data to find the support vectors that define the hyperplane.

7. Model Testing:
- Example: Use the hyperplane to classify the test data into high or low credit risk categories.

8. Evaluation:
- Example: Use metrics like accuracy and precision to evaluate the model's performance.

9. Deployment:
- Example: Once the model is evaluated and fine-tuned, integrate it into a financial system for
real-time credit risk assessment.
SVM Model Implementation Diagram
Key Concepts:
- Hyperplane: A hyperplane is a decision boundary that separates data points of different
classes. In a 2D space, it's a line; in a 3D space, it's a plane, and so on.

- Support Vectors: These are the data points that are closest to the hyperplane and are used to
define the optimal hyperplane.

Types of SVM:
1. Linear SVM: Used when the data is linearly separable. The hyperplane is a straight line (or
plane in higher dimensions).

2. Non-linear SVM: Used when the data is not linearly separable. This involves transforming the
input space into a higher dimension and finding a hyperplane there.

—----------------------------------------------------------------------------------------------------------------
Wrapper Method for Feature Selection
1. Initial State:
- Start with either an empty set or a full set of features, depending on the specific wrapper
technique (e.g., Forward Selection, Backward Elimination).

2. Iteration:
- Add or remove features based on the specific wrapper technique being used.

3. Evaluation:
- Fit the model using the selected features and evaluate its performance using a chosen
metric like accuracy, F1-score, or cross-validation score.

4. Selection:
- Choose the feature set that gives the best performance according to the evaluation metric.

5. Termination:
- The process ends when adding or removing features does not result in a significant
improvement or when a pre-defined stopping criterion is met.
Example:
- Step 1: Start with an empty set of features.
- Step 2: Add 'Income' and evaluate the model's accuracy.
- Step 3: Add 'Age' and re-evaluate. If the accuracy improves, keep 'Age' in the feature set.
- Step 4: Continue this process, adding and removing features to maximise accuracy.
- Step 5: Stop when no further improvement is observed.

Forward Elimination
1. Initial State: Start with no features in the model.
2. Addition: In each iteration, add the feature that provides the most significant improvement in
model performance.
3. Evaluation: After adding a new feature, evaluate the model using a chosen metric (e.g.,
R-squared, AIC).
4. Termination: Continue this process until adding new features does not improve the model
significantly.

Example:
- Step 1: Start with an empty model.
- Step 2: Add 'Income' as it improves the model's R-squared the most.
- Step 3: Add 'Age' as the next feature that provides the most significant improvement.
- Step 4: Stop if adding more features doesn't improve R-squared significantly.

Backward Elimination
1. Initial State: Start with all features in the model.
2. Removal: In each iteration, remove the feature that is least significant in improving model
performance.
3. Evaluation: After removing a feature, evaluate the model using a chosen metric (e.g.,
R-squared, AIC).
4. Termination: Continue this process until removing more features deteriorates the model
significantly.

Example:
- Step 1: Start with a model including 'Income', 'Age', and 'Gender'.
- Step 2: Remove 'Gender' as it is the least significant.
- Step 3: Evaluate the model; if performance remains stable or improves, consider it for the final
model.
- Step 4: Stop if removing more features worsens the model significantly.

Sequential Feature Selection (SFS)


1. Initial State:
- Forward: Start with no features.
- Backward: Start with all features.
- Bidirectional: Start with a pre-defined set or none.

2. Iteration:
- Forward: Add the best feature.
- Backward: Remove the worst feature.
- Bidirectional: Either add or remove based on evaluation.

3. Evaluation:
- Use a metric like R-squared, AIC, or cross-validation score to evaluate the model after each
addition or removal.

4. Termination:
- Stop when adding or removing features does not result in a significant improvement or when
a pre-defined number of features is reached.

Example:
- Step 1: Start with an empty model.
- Step 2: Add 'Income' as it improves the model's R-squared the most.
- Step 3: Evaluate and consider removing 'Age' if it doesn't contribute significantly.
- Step 4: Stop if a pre-defined number of features is reached or if changes don't improve the
model significantly.
—----------------------------------------------------------------------------------------------------------------------------
--------------
Exhaustive Feature Selection
1. Initial State:
- Start with an empty set and a full set of all features.

2. Iteration:
- Generate all possible combinations of features, from a single feature to all features.

3. Evaluation:
- For each combination, fit the model and evaluate it using a chosen metric like R-squared,
AIC, or cross-validation score.

4. Selection:
- Choose the feature subset that gives the best performance according to the evaluation
metric.

5. Termination:
- The process ends once all combinations have been evaluated.

Example:
- Step 1: Start with an empty set and a full set of features like 'Income', 'Age', and 'Gender'.
- Step 2: Generate combinations: ('Income'), ('Age'), ('Gender'), ('Income', 'Age'), ('Income',
'Gender'), ('Age', 'Gender'), ('Income', 'Age', 'Gender').
- Step 3: Evaluate each combination using R-squared.
- Step 4: Choose the combination, say ('Income', 'Age'), that gives the highest R-squared.
- Step 5: The process ends as all combinations have been evaluated.
—----------------------------------------------------------------------------------------------------------------------------
--------------

Regression:
Key Terminologies
1. Dependent Variable (Y): The variable you are trying to predict.
2. Independent Variables (X): The variables used to predict the dependent variable.
3. Coefficients: The weights assigned to the independent variables.
4. Intercept: The constant term in the regression equation.
5. R-squared: A statistical measure of how well the model explains the variability in the
dependent variable.
6. Residuals: The difference between the observed and predicted values.
7. Overfitting: When the model performs well on the training data but poorly on new data.
8. Underfitting: When the model performs poorly on both the training and new data.
9. Regularisation: Techniques like Ridge and Lasso to prevent overfitting.
10. Multicollinearity: When independent variables are highly correlated, making it difficult to
isolate the effect of each variable.

Types of Regression Models


1. Linear Regression: Predicts a continuous outcome based on a linear relationship.
Application: Ideal for predicting a continuous outcome when the relationship between variables
is linear (e.g., salary based on years of experience).
Limitation: Limited to capturing linear relationships, making it less versatile for complex data
patterns.
2. Polynomial Regression: Extends linear regression to model nonlinear relationships.
Application: Suitable for modelling nonlinear relationships, such as the growth rate of a
population.
Limitation: Risk of overfitting increases with the degree of the polynomial, requiring careful
selection.

3. Support Vector Regression (SVM): Uses support vectors to perform linear or nonlinear
regression.
Application: Effective in high-dimensional spaces and versatile enough to capture both linear
and nonlinear relationships (e.g., financial forecasting).
Limitation: Computationally intensive and sensitive to the choice of hyperparameters.

4. Decision Tree Regression: Utilises a tree-like model for making predictions.


Application: Excellent for capturing complex, nonlinear relationships and offers high
interpretability (e.g., predicting energy consumption).
Limitation: Susceptible to overfitting, especially with deep trees, and sensitive to noisy data.

5. Random Forest Regression: An ensemble of decision trees for more robust predictions.
Application: An ensemble method that enhances the robustness and accuracy of decision trees,
useful in various fields like healthcare and finance.
Limitation: Loses some interpretability and can be computationally expensive for large datasets.

6. Ridge Regression: Linear regression with L2 regularisation.


Application: Particularly useful when predictors are highly correlated, as it regularises the
coefficients (e.g., in bioinformatics).
Limitation: Introduces bias into the estimates, as it shrinks coefficients towards zero.
7. Lasso Regression: Linear regression with L1 regularisation.
Application: Effective for feature selection, as it can reduce some coefficients to zero, making it
useful in high-dimensional datasets (e.g., text classification).
Limitation: May discard important variables if they are highly correlated with others.

8. Logistic Regression: Despite the name, it's used for binary classification but is algorithmically
a regression model.
Application: Widely used for binary classification tasks like spam detection or customer churn
prediction.
Limitation: Assumes a linear relationship between the logit of the outcome and the predictors,
which may not always hold true.
9. Multiple Linear Regression: Linear regression with more than one independent variable.
- Application: Useful for predicting a continuous outcome based on multiple predictors (e.g.,
predicting sales based on advertising spend across multiple channels).
- Limitation: Assumes a linear relationship among all variables and may suffer from
multicollinearity if predictors are highly correlated.

Comparison Table:

You might also like