You are on page 1of 11

Introduction of the Popular Machine

Learning Algorithm
Author Details:

Jasmin Praful Bharadiya, Doctor of Philosophy Information Technology, University of the


Cumberlands, USA, jasminbharadiya92@gmail.com

Abstract

Machine Learning (ML) is a research area that has developed over the past few decades as a
result of the work of a small group of computer enthusiasts who were interested in the idea of
computers learning to play video games and from a branch of mathematics called statistics that
hardly ever took computational methods into consideration. The development of a large number
of algorithms that are frequently used for text interpretation, pattern recognition, and a variety of
other business purposes has sparked clear research interest in data mining to find hidden
regularities or irregularities in data. data. data. data. social data is growing by the second This
article describes the idea and history of machine learning and contrasts the three most popular
machine learning algorithms using some fundamental ideas. The Sentiment140 dataset has been
used to demonstrate and evaluate the efficiency of each method in terms of training time,
prediction time, and prediction accuracy.

Machine learning algorithms have become indispensable tools in analyzing complex datasets and
extracting valuable insights. Among the myriad of algorithms available, one particular technique
has gained widespread popularity due to its versatility and effectiveness. This comprehensive
review aims to delve into the efficacy of this popular machine learning algorithm by offering a
comprehensive analysis of its underlying principles, diverse applications, notable strengths, and
inherent limitations.

Keywords: Algorithm, Supervised learning, unsupervised learning, Regression, Deep learning


and Support vector machines.

Introduction
Machine learning is a paradigm that can be used to describe learning from the past (in this case,
historical data) to enhance performance in the future. This field only focuses on autonomous
learning techniques. The term "learning" describes the automatic adjustment or enhancement of
an algorithm based on prior "experiences" without any outside aid from a person.

Machine learning's exponential rise in recent years has revolutionized several industries, from
data analysis to artificial intelligence. Computers are now able to learn from data, recognize
patterns, and forecast the future thanks to machine learning algorithms. Because of its
adaptability and exceptional performance across a wide range of applications, Random Forests
has become one of the most well-known and effective algorithms among the many others that are
currently accessible.

Humans have employed a number of tools since the beginning of time to speed up the
completion of various tasks. Thanks to the inventiveness of the human brain, various
technologies have been realized. By enabling people to fulfill a variety of needs, including those
related to travel, industry, and computing, these devices improved the quality of human
existence. There is also machine learning among these.

Machine Learning Algorithms (MLA) are used to help machines handle data more effectively.
Even after examining the data, there are times when we are unable to interpret or extrapolate the
results. In that case, we apply machine learning. The need for machine learning has expanded as
a result of the abundance of data sets. Many industries use machine learning to find relevant data.
Machine learning aims to learn from data. This problem, which requires the processing of large
volumes of data, is addressed by a large number of mathematicians and programmers using a
variety of methods.

To handle data challenges, machine learning employs a variety of strategies. The lack of a single
type of algorithm that works well in all circumstances is a point that data scientists want to
underline. Solve a problem. The type of problem you are trying to solve, how many variables
there are, what type of model would work best, and other considerations affect the type of
algorithm that is used.

Here is a brief overview of some of the most popular machine learning (ML) algorithms.

1. Linear Regression
Machine learning uses a variety of methodologies to address data difficulties. Data
scientists are meant to emphasize the fact that there is no particular type of algorithm that
works well in all situations. Solve a dilemma. The type of method used depends on the
type of problem you are trying to solve, the number of variables, the type of optimal
model, and other factors.

Linear regression can be expressed mathematically as: y= β0+ β 1x+ ε

Since the Linear Regression algorithm represents a linear relationship between a dependent (y)
and one or more independent (y) variables, it is known as Linear Regression. This means it finds
how the value of the dependent variable changes according to the change in the value of the
independent variable. The relation between independent and dependent variables is a straight line
with a slope.

2. Logistic Regression
To estimate discrete values (often binary values such as 0/1) from a set of independent
variables, logistic regression is used. By fitting the data to a logit function, it helps predict
the probability of an event. Also known as logit regression.
These methods listed below are often used to help improve logistic regression models
 include interaction terms
 eliminate features
 regularize techniques
 use a non-linear model
3. Decision Tree
Decision Tree algorithm in machine learning is one of the most popular algorithm in use
today; this is a supervised learning algorithm that is used for classifying problems. It
works well in classifying both categorical and continuous dependent variables. This
algorithm divides the population into two or more homogeneous sets based on the most
significant attributes/ independent variables.
4. SVM (Support Vector Machine) Algorithm
SVM algorithm is a method of a classification algorithm in which you plot raw data as
points in an n-dimensional space (where n is the number of features you have). The value
of each feature is then tied to a particular coordinate, making it easy to classify the data.
Lines called classifiers can be used to split the data and plot them on a graph.

5. Naive Bayes Algorithm


A Naive Bayes classifier assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature.
Even if these features are related to each other, a Naive Bayes classifier would consider
all of these properties independently when calculating the probability of a particular
outcome.
A Naive Bayesian model is easy to build and useful for massive datasets. It's simple and
is known to outperform even highly sophisticated classification methods.

6. KNN (K- Nearest Neighbors) Algorithm


Problems related to classification and regression can be solved using this approach. It
seems that categorization problem solving is being applied more and more within the data
science business. It is a simple algorithm that classifies new instances by obtaining the
consent of at least k of its neighbors and then saves all existing instances. The class with
which the case has more features is assigned a case. This calculation is done using a
distance function.
KNN can be easily understood by comparing it to real life. For example, if you want
information about a person, it makes sense to talk to his or her friends and colleagues!

Things to consider before selecting K Nearest Neighbours Algorithm:

 KNN is computationally expensive


 Variables should be normalized, or else higher range variables can bias the
algorithm
 Data still needs to be pre-processed.
7. K-Means
It is an unsupervised learning algorithm that solves clustering problems. Data sets are
classified into a particular number of clusters (let's call that number K) in such a way that
all the data points within a cluster are homogenous and heterogeneous from the data in
other clusters.

How K-means forms clusters:


 The K-means algorithm picks k number of points, called centroids, for each
cluster.
 The closest centroids of each data point create a cluster or K clusters.
Based on the cluster members already present, now generates new centroids.
 The closest distance for each data point is calculated using these updated
centroids. Until the centroids remain the same, this process is repeated.
8. Random Forest Algorithm
The term "random forest" refers to a collection of decision trees. Each tree is assigned a
class, and the tree "votes" for that class, to categorize a new item based on its
characteristics. Of all the trees in the forest, the forest chooses the ranking with the
highest votes.
Each tree is planted & grown as follows:
• If the training set has N instances, then a random sample of N cases is chosen.
This sample will serve as the training set of the tree.
• If there are M input variables, a number mm is given and the best division of
this m is used to divide each node. At each node, m variables of M are randomly
chosen. Throughout this procedure, the value of m is held constant.
• Each tree develops to its maximum potential size. Pruning is not done.
9. Dimensionality Reduction Algorithms
In today's world, vast amounts of data are being stored and analyzed by corporates,
government agencies, and research organizations. As a data scientist, you know that this
raw data contains a lot of information - the challenge is to identify significant patterns
and variables.
Dimensionality reduction algorithms like Decision Tree, Factor Analysis, Missing Value
Ratio, and Random Forest can help you find relevant details.
10. Gradient Boosting Algorithm and AdaBoosting Algorithm
Gradient Boosting Algorithm and AdaBoosting Algorithm are boosting algorithms used
when massive loads of data have to be handled to make predictions with high accuracy.
Boosting is an ensemble learning algorithm that combines the predictive power of several
base estimators to improve robustness.

In short, it combines multiple weak or average predictors to build a strong predictor.


These boosting algorithms always work well in data science competitions like Kaggle,
AV Hackathon, CrowdAnalytix. These are the most preferred machine learning
algorithms today. Use them, along with Python and R Codes, to achieve accurate
outcomes.

Popular machine learning (ML) algorithms new advancements:

As per my pragmatic understanding the Machine learning algorithms have been continuously
evolving, and several advancements have taken place in recent years. Here are some popular
machine learning algorithms and their advancements:

Deep Learning: Deep learning, particularly deep neural networks, has made significant
advancements in various areas, such as computer vision, natural language processing, and speech
recognition. Some notable advancements include:
a. Transformer models: Transformer models, such as the Transformer architecture and its
variants (e.g., BERT, GPT, T5), have achieved state-of-the-art performance in language-related
tasks by leveraging self-attention mechanisms.

b. Generative adversarial networks (GANs): GANs have become more powerful and versatile,
enabling the generation of highly realistic and high-resolution images and videos. Progressive
GANs and StyleGAN are examples of advancements in this area.

Reinforcement Learning: Reinforcement learning has seen advancements in training algorithms


and applications in various domains, such as robotics and game playing. Notable advancements
include:

a. Deep Q-Networks (DQN): DQN combines deep neural networks with Q-learning, enabling
agents to learn directly from raw sensory input. It has achieved remarkable results in playing
video games.

b. Proximal Policy Optimization (PPO): PPO is an algorithm that improves policy


optimization in reinforcement learning. It has shown stable and efficient learning in complex
environments.

Transfer Learning: Transfer learning has gained attention as a technique that allows models to
leverage knowledge from pre-trained models and apply it to new tasks with limited data. Recent
advancements include:

a. Pre-trained language models: Large-scale pre-trained language models, such as GPT-3 and
T5, have demonstrated impressive performance on a wide range of natural language processing
tasks. Fine-tuning these models with task-specific data has become a common practice.

b. Domain adaptation: Techniques for transferring knowledge across different domains have
improved, allowing models to generalize well from the source domain to the target domain with
limited labeled data.

Bayesian Deep Learning: Bayesian methods in deep learning have been explored to capture
uncertainty and improve model robustness. Advancements include:
a. Variational inference: Variational inference techniques have been applied to deep learning
models, enabling Bayesian inference with deep neural networks. Variational Autoencoders
(VAEs) and Bayesian Neural Networks (BNNs) are examples of this advancement.

b. Bayesian Optimization: Bayesian optimization algorithms have been used to tune


hyperparameters of deep learning models efficiently, reducing the need for
manualhyperparameter tuning.

Summary

The article discusses various machine learning algorithms, including supervised learning,
unsupervised learning, regression, deep learning, and support vector machines. These algorithms
are widely used in the field of machine learning for different purposes.

 Supervised learning involves training a model using labeled data, where the algorithm
learns to make predictions based on input-output pairs. Regression, a type of supervised
learning, is specifically used for predicting continuous numeric values.
 Unsupervised learning, on the other hand, deals with unlabeled data and focuses on
discovering patterns and structures within the data. Clustering is a common unsupervised
learning technique that groups similar data points together based on their inherent
characteristics.
 Deep learning refers to the use of deep neural networks, which are artificial neural
networks with multiple layers. Deep learning has gained popularity due to its ability to
automatically learn hierarchical representations from complex data, such as images, text,
and audio.
 Support Vector Machines (SVMs) are a type of supervised learning algorithm that is
effective for both classification and regression tasks. SVMs find the best hyperplane that
separates different classes in the data, maximizing the margin between them.

These algorithms play a crucial role in various applications of machine learning, such as image
recognition, natural language processing, fraud detection, and recommendation systems. Each
algorithm has its strengths and weaknesses, and the choice of algorithm depends on the specific
problem and the available data.
Conclusion:

Machine learning is both supervised and unsupervised. If you have fewer data points with clearly
identified training data, choose supervised learning. Unsupervised.

For large data sets, learning would generally result in better performance and better results.
Consider using deep learning techniques if you have a large collection of data that is easily
accessible. Also, you found out what is deep reinforcement learning and reinforcement learning.
Given his greater understanding of neural networks, their uses and their drawbacks. Several
different machine learning algorithms are examined in this paper. Today, machine learning is
used by everyone, whether they know it or not. Update your profile photo on social networking
sites to receive product recommendations when you shop online. This article provides
introductions to the vast majority of known machine learning methods.

References

[1].W. Richert, L. P. Coelho, “Building Machine Learning Systems with Python”, Packt
Publishing Ltd., ISBN 978-1-78216-140-0
[2].https://ijisrt.com/convolutional-neural-networks-for-image-classification
[3].J. M. Keller, M. R. Gray, J. A. Givens Jr., “A Fuzzy KNearest Neighbor Algorithm”,
IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-15, No. 4, August 1985
[4].P. Harrington, “Machine Learning in action”, Manning Publications Co., Shelter Island,
New York, 2012
[5]. Jasmin Praful Bharadiya https://journaljerr.com/index.php/JERR/article/view/858
[6].M. Bkassiny, Y. Li, and S. K. Jayaweera, “A survey on machine learning techniques in
cognitive radios,” IEEE Communications Surveys & Tutorials, vol. 15, no. 3, pp. 1136–
1159, Oct. 2012.
[7].Jasmin Praful Bharadiya https://journalajorr.com/index.php/AJORR/article/view/164
[8].S. Marsland, Machine learning: an algorithmic perspective. CRC press, 2015

You might also like