You are on page 1of 13

www.justintodata.

com /machine-learning-algorithm-types-for-beginners-overview/

Machine Learning for Beginners: Overview of Algorithm


Types Start learning Machine Learning from here
⋮ 22/5/2020

In this beginners’ tutorial, we’ll explain the machine learning algorithm types and some popular algorithms.

Machine learning is a critical skill for data science. According to our analysis, 64% of the Indeed job postings
require machine learning skills for data scientists.

Following this guide, you can break into machine learning by understanding:

What is machine learning, in simple words.


What are supervised learning, unsupervised learning, and reinforcement learning, the three main
types of machine learning algorithm.
10 commonly used machine learning algorithms, with some step-by-step example projects.

1/13
In the end, you’ll gain an overview of machine learning (ML) and when to use these algorithms when practicing
ML.

Let’s get started!

Machine Learning for Beginners: What is machine learning?


In simple words, machine learning is when the computers being able to learn and perform certain tasks,
without being programmed to do so.

Like the process of humans learning from experience, computers can learn from the “training” dataset provided
to it.

The computer applies machine learning algorithms to create mathematical models. After this process,
machines can make predictions or decisions on a new dataset.

What are the relationships: machine learning vs. data science vs. AI vs. deep learning?

This is a question that often confuses beginners. Here’s a quick explanation.

Machine learning is a subset of data science, where data science contains other data-related processes. And
it’s also a fundamental concept within Artificial Intelligence (AI). While deep learning is a subset of machine
learning based on neural networks with “deep” or multiple hidden layers.

What are the different Types of Machine Learning Algorithms?


Based on the problems, we can divide machine learning algorithms into three main types:

Supervised Learning – learn based on existing labels/target to make better predictions.


Unsupervised Learning – learn without labels/target to identify insights/clusters.
Reinforcement Learning – learn based on trials and errors to maximize rewards.

Each of these three machine learning algorithm types also has a breakdown of sub-categories. Here is a chart
showing the ML types.

2/13
Let’s look at them one by one!

Supervised Learning

Within supervised learning problems, the machines are provided labeled training dataset, where there are
both input variables (X) and an output variable (y).

The objective of the problem is to find a suitable mapping function f from X to y. Then when new inputs come
in, we can apply f to predict the corresponding output.

y = f(X) or Output = f(Input)

The machines learn and improve the function f through iterative optimization. This process involves minimizing
the difference between the estimated and the actual output.

The process is called “supervised” learning since it is “supervised” by the human-labeled output variable.

3/13
We can divide supervised learning into regression and classification based on whether the output
variable/target is numerical or categorical.
Categorical data are divided into categories such as gender (male/female), competition levels
(low/medium/high). While numerical data are represented by numbers such as body weight, the number of
dogs.

Classification

When the target is a categorical variable, we use classification.

Based on the training observations with known labeled categories, classification is the problem of predicting
the categories a new observation belongs to. It is about learning the patterns among observations based on
experience.

Below are some examples of classification applications:

Classify customers of banks into different credit profiles based on existing customers’ status.
– Input variables include income, age, gender, location, expenses, etc.
– Output variable: whether a customer defaults on the payment to banks.
Identify fraud among bank transactions based on fraudulent historical tags.
– Input variables include transaction amount, date/time, customer history, merchants, etc.
– Output variable: whether a transaction is a fraud or not.
Apply sentiment analysis on Yelp review data based on historical sentiment labels.
– Input variables include the text of reviews.
– Output variable: whether the review is positive, negative, or neutral.

Related article: How to do Sentiment Analysis with Deep Learning (LSTM Keras)
A tutorial showing an example of sentiment analysis: learn how to build a deep learning model to classify the
Yelp review data in Python step-by-step.

4/13
Regression

When the output variable is numerical, we have a regression problem.

It is about estimating the relationship between the target (dependent variable) based on at least one input
variable (independent variable/feature). It is often used to predict or forecast based on experience.

Below are some examples of regression problems:

Predict housing prices based on historical sales.


– Input variables may include the size and age of the property, number of bathrooms, property tax, etc.
– Output variable: the price of the house.
Time series forecasting.
– Input variables are the values of the time series (the historical data).
– Output variable: the value of the time series at a later time.
Customer Spending based on a sample of existing customers.
– Input variables can have customer demographics, income, etc.
– Output variable: how much a customer spends.

Unsupervised Learning

When we only have input variables (X) but no corresponding output variable, the training dataset is unlabeled.
The machine can still extract structures within the dataset, such as grouping or clustering of data points.

This becomes an unsupervised learningproblem. You can also think of it as “data mining” without any existing
knowledge/labels.

Association Rule Learning

5/13
Association rules are used to analyze transactions to find interesting relationships among different variables. It
is mostly used in market basket analysis.

What other products are likely to be purchased when a customer buys both laptop and a mouse?

From the algorithms, we may find a rule that there’s a 90% probability the customer will also buy a laptop
cover.

We can then design marketing strategies to make it convenient for the customers to shop. If it’s a physical
store, we can place these items closer. If it’s an online store, we could recommend laptop cover to a customer
who has both laptop and mouse in the shopping cart.

Clustering

Clustering is the corresponding unsupervised procedure of classification. When the training dataset doesn’t
have labeled categories, we use clustering. The algorithm helps grouping observations into categories based
on some measure of inherent similarity or distance.

In other words, clustering is about separating a set of data objects into clusters. The observations in the same
group are more like each other than the observations in different groups.

6/13
It is used to recognize patterns among clusters in different fields such as biology, marketing, etc.

For example, we can segment customers into groups such that customers in the same group behave similarly.
Then we can target specific groups with customized marketing campaigns to generate more sales.

Dimensionality Reduction

This unsupervised problem is about reducing the number of input variables while retaining most information in
the dataset.

3 dimensions/variables to 2 dimensions

There are several advantages of reducing the features:

less processing time/storage space is needed.


the potential of removing multi-collinearity.

7/13
easier to visualize the data.
avoid the curse of dimensionality.

For example, say we have a dataset with 100 features/input variables.


If we can reduce 100 to 5 while retaining 90% of the original dataset’s valuable information, it’ll be easier to
perform other tasks. For example, we can use the reduced dataset for clustering without losing much accuracy.

Reinforcement Learning

When the training dataset is a set of rules/environment, and we want the machine/agent to maximize the
rewards within this environment, this is reinforcement learning.
We want the machines to learn by trial and error, which is often expected in games.

In this case, the machines are not bound by any experience, and they learn based on each trial’s feedback.
This makes the machines with the potential of becoming better than humans. That’s how AlphaGo beats our
best human players!

With feedback/labels from the environment, reinforcement learning is somewhat similar to supervised learning.

Although unsupervised learning and reinforcement learning sound more advanced, supervised learning is
more common and should be the main focus of the study.

After learning the types of machine learning problems, let’s look at 10 commonly used algorithms.

10 Popular Machine Learning Algorithms


In this section, we’ll introduce some of the most practical machine learning algorithms. It is hard to classify
them according to the above categories since one algorithm could be used for multiple machine learning types.

Linear Regression

Linear Regression is one of the fundamental algorithms that every data analyst or scientist should know. It is
often the first predictive model that machine learning beginners learn.

8/13
It is called “linear” regression since the relationship between the output (y) and the input variables
(X) is assumed to be linear. Or we can say, the mapping functions f are linear predictor functions.

This algorithm is widely used in both the industry and academia since it’s simple and interpretable with well-
studied theories.

The goal is to fit the linear model while minimizing a cost function, which determines how well the linear
equation fits the training data. The most used cost function is Mean Squared Error (MSE).

Further Reading: Linear Regression in Machine Learning: Practical Python Tutorial

Logistic Regression

Logistic regression is an algorithm used for classification problems that gives the probability of a particular
class. The class could be binary such as fraud/legit or multiple classes such as win/tie/loss.

Similar to linear regression, it is called “logistic” regression since logistic functions are used to model the
variables.

It is often the first predictive classification model that machine learning beginners learn. Logistic regression is
also widely used in both industry and academia due to its simplicity and interpretability.

Like linear regression, logistic regression also consists of a cost function, which is usually cross entropy.

Further Reading: Logistic Regression for Machine Learning: complete Tutorial

Lasso and Ridge Regression

Next, let’s look at linear/logistic regression’s variation – Lasso and Ridge regression. They are linear or logistic
regression with added penalty terms/regularization in the cost function.

Regularization is used to reduce model complexity to deal with overfitting and multicollinearity problems. With
the penalty term, the model should provide better predictions on new data inputs.

9/13
Lasso and Ridge regressions are different by the added penalty terms.

Note: the same regularization techniques can be applied to other machine learning algorithms as well.

Related article: How to Improve Sports Betting Odds — Step by Step Guide in Python
Sports betting could be more than using your gut feeling. This guide shows you the step by step ridge
regression to sports bet smarter using Python.

Decision Tree

Decision tree learning is an algorithm that consists of nodes in a tree-like structure. Within each internal node,
there is a decision function to determine the next path to take. For each observation, the prediction of the
output or decision is made at a terminal node/leaf.

This algorithm can be used for both regression and classification. It’s both easy to visualize and non-
parametric, which makes the model easy to interpret and implement.

Unlike other models, we don’t need to scale the input data, which makes it easier to use.

Further Reading: Decision Tree Model in Machine Learning: Practical Tutorial with Python

Random Forest

Random forest consists of an ensemble of decision trees. It is one of the most popular and powerful machine
learning algorithms.

Multiple decision trees are trained independently within the algorithm, with each one randomly distinct from the
others. Each of the trees predicts a data point. Then these predictions are aggregated together using specific
methods (e.g., average) to form a final prediction.

Usually, the final prediction is more accurate than each tree. But the complexity of the model also makes it
harder to interpret compared to decision trees.

Like decision trees, we don’t need to scale the input data.

10/13
Further Reading: Unlocking Random Forest in Machine Learning

Gradient Boosting

Gradient boosting is also an ensemble of models. It is a boosting method that combines weak prediction
models into a single strong one in an iterative way.

This ensembling method often involves decision trees. But unlike random forests, the trees are not trained
independently, but rather sequentially within Gradient tree boosting. The training data for the next tree
depends on the output of the previous tree in the sequence. The trees are trained to minimize a cost function,
with Mean Squared Error (MSE) and cross entropy being the most popular.

Gradient tree boosting is generally more accurate than random forests. It is the most successful algorithm for
Kaggle competitions with structured datasets.

One disadvantage of this method is its difficulty in interpretation.

Related article: Hyperparameter Tuning with Python: XGBoost Step-by-Step Guide


Gradient boosting has many hyperparameters, which makes it harder to tune. Check out this practical guide to
improve your model’s performance, learn how to use this machine learning technique with XGBoost example.

Neural Networks

Neural Networks is an algorithm inspired by the biological neural networks within animal brains.

Within the networks, a collection of connected units or nodes (neurons) is aggregated into layers. Each layer
may perform a different transformation on its inputs.

Within each node, the algorithm has multiple linear/logistic regressions stacked to transform the input to
output. During this process, non-linear activation functions are also applied to the linear equations to introduce
complexity into the equation.

11/13
The objective of the algorithm is to find the equation that best fits the training data. The judgment of well fit is
also determined by a cost function like linear/logistic regression.

There are different popular neural network models, such as multilayer perceptron (MLP), long short-term
memory (LSTM), convolutional neural network (CNN). The CNNs are the best for computer vision problems,
while LSTMs are best for natural language processing (NLP) problems.

Related articles:

How to do Sentiment Analysis with Deep Learning (LSTM Keras)


A tutorial showing an example of sentiment analysis: learn how to build a deep learning model to classify the
reviews data in Python step-by-step.

3 Steps to Time Series Forecasting: LSTM with TensorFlow Keras


A machine learning time series analysis example with Python. See how to transform the dataset and fit LSTM
with the TensorFlow Keras model.

Hyperparameter Tuning with Python: Keras Step-by-Step Guide


Neural Networks have many hyperparameters, which makes it harder to tune. This is a practical guide to
Hyperparameter Tuning with Keras and Tensorflow in Python. Read on to implement this machine learning
technique to improve your model’s performance.

K-Means Clustering

k-means clustering is usually the first unsupervised learning algorithm for machine learning beginners to know.

The algorithm assigns each observation into one of the k clusters, with k being the parameter chosen by us.
The algorithm aims to find the clusters that result in the lowest within-cluster sum of squares.

Principal Component Analysis (PCA)

Principal component analysis (PCA) is the most popular dimensionality reduction algorithm.

Given a dataset with N input variables/features, we can apply PCA and transform it into a dataset with n
features, where n < N.

The algorithm uses a set of n linear functions called principal components and “displays” the data in terms of
these functions.
If there is a lot of collinearity/correlation among the features, we would be able to reduce the dimensionality of
the dataset without losing much information.

Isolation Forests

Isolation forest is commonly used for anomaly or outlier detection on unsupervised data.
All we need to do is to specify a proportion of outliers in the dataset, and the algorithm will separate the data
into two groups: normal and outliers.

12/13
The algorithm uses a similar idea to random forests, which is an ensemble of trees trained and worked
together to make better predictions. The main difference is that the isolation forest uses isolation trees that are
easier to create than decision trees.

Related article: How to apply Unsupervised Anomaly Detection on bank transactions


This is a practical example of unsupervised learning of anomaly (outlier) detection. Learn how to apply the
algorithms with a step-by-step guide in Python.

You’ve made it! Hope you got a good overview of machine learning and its algorithm types.

With a good foundation, keep learning and dig deeper!

Leave a comment for any questions you may have or anything else.

Related “Break into Data Science” resources:

How to Learn Data Science Online: ALL You Need to Know


A detailed review of resources online, including courses, books, free tutorials, portfolios building, and more.

Python crash course: breaking into Data Science


A FREE Python online course, beginner-friendly tutorial. Start your successful data science career journey.

What are the In-Demand Skills for Data Scientists in 2020


Why Python, SQL, Machine Learning are the most in-demand skills for data science.

SQL Tutorial for Beginners: Learn SQL for Data Analysis


An ultimate tutorial to learn SQL for data analysis (from beginner to advanced). Learn & master SQL queries
with this practical guide.

13/13

You might also like