Professional Documents
Culture Documents
Future Research
Kundan Kumar(21MCA2029)
Abstract
The digital world has access to a multitude of data in the Fourth Industrial
Revolution (4IR, also known as Industry 4.0) era, including Internet of Things (IoT)
data, cybersecurity data, business data, mobile data, health data, social media
data, etc. Knowledge of artificial intelligence (AI), machine learning (ML), is
essential to intelligently assess these data and create the associated smart and
automated applications. Machine learning algorithms come in a variety of forms,
including supervised, unsupervised, semi-supervised, and reinforcement learning.
In addition, deep learning, a type of machine learning that encompasses a wider
range of techniques, can effectively examine a lot of data. We provide a thorough
overview of various machine learning techniques in this paper, showing how they
may be used to increase the functionality and intelligence of an application.
Determining the fundamentals of various machine learning approaches and how
they may be applied in a variety of real-world application areas, such as
cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and
many more, is thus the core contribution of this work. Based on our work, we also
identify the difficulties and promising paths for further research. Overall, this
article seeks to serve as a resource for decision-makers in a variety of real-world
settings and application areas, especially from a technical standpoint.
Introduction
According to Simon, learning is "the process of a change and enhancement in
behaviours through investigating new information through time." Machine
learning is the term used when the "learning" referred to in this definition is
carried out by computers. During the machine learning process, enhancement
refers to developing the optimum solution based on the previously acquired
knowledge and sample data (Srmaçek,2007). The term "big data" has emerged as
a result of advancements in information technologies. The concept of "big data"
is not new, and it can be characterised as vast, growing, unrestricted raw data
collections that cannot be analysed using conventional database procedures.
Large amounts of data are gathered through websites, ATMs, credit card swipers,
and other devices. The information gathered in this manner is ready for analysis.
Depending on the business industry, different data collection fields have distinct
objectives for analysis. Applications for machine learning can be found in a variety
of industries, including biology, calculative finance, automotive, aviation,
production, natural language processing, image processing, and computer vision.
The goal is predicated on the idea of analysing and interpreting the prior data,
though. Machine learning techniques and algorithms have been developed to
analyse and interpret data because humans are unable to do it (Amasyal,2008).
In this study, the recently popularised notion of machine learning is thoroughly
studied. The paper provides details on the development of machine learning, the
techniques and algorithms employed, and the domains in which it is applied. The
conclusion, which is the last section, includes the findings from the earlier
research.
Machine learning
Computer procedures that are based on an algorithm and follow specific steps
have no margin for mistake. In some instances, computers make judgments based
on the current sample data, which is different from commands that are created
to have a result depending on an input. In some circumstances, computers may
err in their decision-making just like people do. In other words, machine learning
is the process of giving computers the ability to learn using data and experience
much like a human brain (Gör,2014). The basic goal of machine learning is to
develop models that can learn from previous data to become better, recognise
intricate patterns, and find solutions to new issues.
Fig: -1 Machine Learning Types
• Supervised Learning: - It is a technique that generates the output set using
the current input data. Classification and regression supervised learning are
the two categories of supervised learning.
➢ Classification: - Dividing up the data into the groups listed on the
data set in accordance with their distinctive characteristics.
➢ Regression: - Estimating or drawing conclusions about the data's
additional characteristics based on the features it does have.
• Unsupervised Learning: - In unsupervised learning, there is no output data
provided, which is the difference between supervised and unsupervised
learning. The relationships and connections between the data are used in
the learning process. Unsupervised learning lacks training data as well.
➢ Clustering: - Finding the data groups that are like one another when
the data's intrinsic categories are unknown.
➢ Association: - Identifying the links and relationships among the data
in a single data collection.
• Semi-Supervised Learning: - Given that it works with both labelled and
unlabelled data, semi-supervised learning is a combination of the
supervised and unsupervised approaches outlined above. As a result, it falls
in the middle between learning "without supervision" and learning "with
supervision". Semi-supervised learning is helpful when there are many
unlabelled data sets and few labelled ones in the actual world. In the end,
a semi-supervised learning model should be able to predict outcomes that
are superior than those obtained by simply using the model's labelled data
alone. Machine translation, fraud detection, data labelling, and text
categorization are a few examples of application domains where semi-
supervised learning is applied.
• Reinforcement Learning: - The agents learn in this type of learning by
receiving rewards. Even though there are start and finish points, the agent's
objective is to use the most direct and efficient route to the destination.
Positive awards are offered to the agent when she follows the proper
procedures. But choosing the wrong path results in undesirable outcomes.
Learning happens while working toward a goal.
Analyses of Frequently Used Machine Learning Algorithms
• Decision Tree Algorithm: - The decision tree method is a classic
technique that is frequently used for machine learning. Its basic
operation is to begin processing data information from the collection
instance's root node and proceed until it reaches the point where the
nodes come together. Scientific division of real-world examples. The
decision number algorithm will continue to separate branches to make
data analysis easier while also trimming the branches to improve the
integrity of the data content. The algorithm falls under the top-down
algorithm category from a calculation standpoint. The node is expanded
to more than two depending on the node after the content of the node
is examined during the content analysis process for the best attributes.
For instance, you could designate the decision tree with a lot of data
information as the larger tree A when you are evaluating data, and you
could also choose the maximum number of branch splitting. If the upper
limit is set to 5, the larger tree A will stop splitting once it reaches that
value and instead utilise the pruning method to analyse the larger tree
model in order to clean up the data and increase the objectivity of the
data analysis findings.
➢ Root nodes: It is a node with the ability to develop one or more
branches but no existing branches. The dependent variable is
displayed by the root node, along with the variable that will be
utilised for classification.
➢ Interior nodes: - It is a node that can contain two or more
outbound branches in addition to one incoming branch.
➢ Leaf or Terminal nodes: - The nodes which has incoming branch
but not outgoing branch.
Fig 2: - Observations of last ten days
Weather
Yes Yes
Humidity Wind
No Yes No Yes
Fig 3: - Decision Tree Diagram
• Support Vector Machine (SVM): - The SVM algorithm is another popular
method used in machine learning. The algorithm mostly uses the vector
machine approach in the application procedure to finish the established data
analysis work. The SVM algorithm will also assess the data information that
has to be handled automatically in order to improve the data information.
Several sets of analysis samples must be gathered to identify the sample data
for the boundary value in order to increase the scientific rigour of the final
data analysis conclusions. Assuming, for instance, that the data information
to be processed is H (d), processing H (d) starts with central processing of the
data information using SVM technology, allowing for complete dispersion.
The maximum distance of the entire plane is used to calculate the H (d)
plane's boundary. The accuracy of data processing is improved by analysing
the vector content of the H (d) plane to produce the output vector.
[2] Jiang Na, Yang Haiyan, Gu Qingchuan, Huang Jiya. Machine learning and its algorithm and
development analysis [J]. Information and Computer Science (Theoretical Edition), 2019 (01): 83-84 +
87.
[3] Li Zhiwei. Development of machine learning and several learning methods [J]. Industry and Science
Forum, 2018, 15 (10): 198-199.
[4] Zhang Run, Wang Yongbin. Research on machine learning and its algorithm and development [J].
Journal of Communication University of China (Natural Science Edition), 2018, 23 (02): 10-18 + 24.
[5] Zhang Changshui. Research on the development of machine learning and data mining [C].. 2010-
2011 Development Report of Control Science and Engineering Discipline.: Chinese Society of
Automation, 2018: 82-89 + 223
[6] https://www.javatpoint.com/nlp-analysis-of-restaurant-reviews
[7]https://www.researchgate.net/publication/341875705_Research_on_Machine_Learning_and_Its_
Algorithms_and_Development
[8] https://link.springer.com/article/10.1007/s42979-021-00592-x