You are on page 1of 20

ARTIFICIAL

and MACHINE
INTELLIGENCE

LEARNING
a primer

VS Joshi
Global Head of Product Marketing

VS Joshi has done a nice job of articulating the concepts of ML in this


eBook. His approach of one concept per page, with a diagram and
example per concept makes it easier for readers to learn the basics
of AI/ML. It provides them with a bird’s eye view of AI/ML.

- Ashish Nadkarni, GVP, Worldwide


Infrastructure Research at IDC.
Table of Contents
Introduction .............................................................................................................................................02

What is Artificial Intelligence (AI)?............................................................................................................03

What is Machine Learning (ML)?..............................................................................................................04

How Do Machines Learn? ........................................................................................................................05

Types of Machine Learning.......................................................................................................................06

Supervised Learning ................................................................................................................................07

Regression……………………… ...................................................................................................................................08

Classification…………………......................................................................................................................................09

Unsupervised Learning ............................................................................................................................10

Clustering……………………………..............................................................................................................................11

Dimensionality Reduction……..............................................................................................................................12

Reinforcement Learning .........................................................................................................................13

What is Deep Learning (DL)?....................................................................................................………….….14

Convolution Neural Network (CNN)........................................................................................................15

Recurrent Neural Network (RNN)...........................................................................................................16

Transformers…………………………………...................................................................................................................17

Conclusion, and hopefully the beginning! ...............................................................................................18

01
Introduction

AI is the new electricity. Just as electricity transformed everything people knew then, AI is
going to transform everything we know now.

-Andrew Ng,
Thought leader in AI and a professor at
Stanford University.

Artificial intelligence (AI) and machine learning (ML) are at the core of many services we use every day.

Do you use email? Do you watch shows on Netflix?


You’re already using AI via the spam AI drives its suggestions for what to
filtering feature of your email provider. watch next.

Do you shop on Amazon? Do you use a virtual assistant?


Recommendations for other products you Alexa, Siri, OK Google – all get their
may like are powered by AI. abilities from AI.

Needless to say, AI and ML have enterprise uses as well, for everything from shortening delivery routes
to fraud detection and prevention or anticipating how soon industrial machinery is likely to wear out.
However, the inherent complexity of AI and ML can be daunting. And their practical value can get lost in
marketing jargon or science-fiction dazzle.

Yet AI and ML have the potential to disrupt nearly every industry and line of work, especially as the data
grows exponentially, immense compute power is available cheaply at a click of a button, and the
underlying technologies of data collection, storage, and manipulation keep evolving to extend AI’s reach
and drop its cost. Product managers, business leaders, and everyone all need to recognize how AI and ML
can transform their products and dramatically enhance the efficiency, output, and user experience of
their processes.

But to work with artificial intelligence, first you need an understanding of its main concepts. This eBook
is essentially a guided tour around the field, presenting key concepts with examples and illustrations,
and providing the high-level map to start your AI/ML journey. You’ll gain a basic framework to build your
own AI/ML learning strategy. So, turn the page to join our tour.

02
What is Artificial Intelligence (AI)?
AI is a set of technologies that enable machines to perceive, think, and act like humans. Machine
Learning (ML) is a set of techniques within AI, and Deep Learning is a subset of that. We’ll go into
different types of ML and how they work in the following pages.

The following figure depicts how computers/machines use technologies such as computer vision
and natural language processing (NLP) to perceive data. Using the data, Machine Learning enables
computers to learn, think and reason. Autonomous vehicles, robots, chatbots, and virtual
assistants are some common use cases for these technologies.

COMPUTER VISION AUTONOMOUS VEHICLES

ROBOTICS
DEEP
NATURAL LANGUAGE LEARNING
PROCESSING VIRTUAL ASSISTANT

PERCEIVE THINK ACT

03
What is Machine Learning (ML)?
Machine learning is a subset of AI. It is the science of enabling computers to self-learn from the data
presented to them, from experiences, without being explicitly programmed. In fact, in Machine
Learning, computers learn from the input and output data presented to it and produce a formula,
program, or hypothesis that defines the relationship between input and output data. This is
completely different than conventional programming where a program acts on the input data to
generate an output.

Data Data
OUTPUT PROGRAM
Program Output

Computer Computer

TRADITIONAL PROGRAMMING MACHINE LEARNING

Of the two basic approaches to AI, ML takes the one based on statistics and probability.
In other words, it comes up with a program based on the statistical likelihood that a given input caused
a given output.

The counterpart to ML’s statistical or probabilistic approach is a deterministic or rules-based approach,


in which experts directly program the machines. The early 1980s rise of expert systems (the first truly
successful forms of AI) stemmed from this deterministic approach. However, the current explosion of AI
is due to advances in ML, primarily in Deep Learning.

04
How Do Machines Learn?
So, how do machines/computers learn from data?
Here is one way practitioners do it — they divide the data into two portions: Training Dataset
and Validation Dataset in an 80:20 proportion.

TRAINING
DATASET
80% 20% VALIDATION
DATASET

Training dataset Validation dataset

The training dataset is fed to the “learning The validation dataset is the data that is withheld
algorithm” (aka an “untrained model”) to find to be used to test the generated
patterns in the data. Machine learning algorithms function/formula. The model function is fine-
learn through a process called induction or tuned with the help of this validation dataset. It
inductive learning. Induction is a reasoning process provides an unbiased evaluation of how well the
that generalizes (creates a model) from specific model fits other data not present in the training
information (training dataset). That generalization dataset. The model can then be fine-tuned by
is captured in terms of a generated function or tweaking other parameters, often called “hyper-
formula that describes the relationship between parameters,” to arrive at a better fit.
two or more elements in the dataset.

After the final tuning of these hyper-parameters, the model has been “trained” (like an employee
learning a new job). This trained model can now infer or predict outcomes. If new data (where only the
input, or X, is known) is fed to the trained model, it will predict output Y.

05
Types of Machine Learning
Computers learn in different ways, giving rise to multiple types of Machine
Learning such as:

Supervised Unsupervised Reinforcement


Learning Learning Learning

The following bubble chart shows use cases for various types of ML. For example, regression
learning models can be used for predicting population growth, estimating life expectancy, forecasting
markets, and much more.

Machine Learning Bubble Chart

Image Customer
Meaningful Structure Classification Retention
Compression Discovery
Identify
Big Data Feature
Dimensionality Fraud Classification Diagnostics
Visualization Elicitation
Reduction Detection

Advertising Popularity
Population Prediction
Growth
Recommender
Prediction Weather
Systems
Forecasting

Clustering Regression
Targeted
Marketing Unsupervised Supervised Market
Learning Learning Forecasting

Customer
MACHINE Estimating
Segmentation
Life Expectancy
LEARNING

Reinforcement
Learning

Real-time Game AI
Decisions
Reinforcement
Learning
Robot Skill
Navigation Acquisition
Learning
Tasks
Source: Data Science Central

06
Supervised Learning

Image Customer
Classification Retention

Fraud Detection Classification Diagnostics

Advertising Popularity
Prediction
Population
Growth Prediction Weather
Forecasting

Regression
Market
Forecasting
Supervised
Estimating
Learning Life Expectancy

In Supervised Learning, labeled/tagged data is fed to the algorithm. The data is tagged by a human,
for example, as “car” or “dollars.” The input and output data (labels/tags) are both known. The
computer/machine ingests this data and generates a function/formula.

Thus, Supervised Learning is about learning and generating a function (f) that maps an input to an
output based on pre-labeled data pairs. The goal is to approximate the mapping function so well that
when you have new input data (x’) then the output variable for that data is predicted as (y’) based
upon the generated function f.

It is called “supervised” learning because the process of an algorithm learning from the training
dataset can be thought of as the training data supervising the learning process of an algorithm, like a
teacher*. In Supervised Learning, we know the correct answers, in the form of labeled data pairs. The
algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning
stops when the algorithm achieves an acceptable level of performance. Supervised Learning problems
can be further grouped into Regression and Classification.
* Source - Wikipedia.org

07
Regression
In a Regression problem the output variable is a real value, such as “dollars” or “weight.” Here, the result
is predicted within a continuous output — that is, the input variable is mapped to some continuous
function. For example, if somebody is asked to predict the amount of revenue generated with a $1,000
advertising budget, they wouldn’t know where to start. But if they are given the paired data (revenues
the previous advertising budgets have generated) they can generate a hypothesis (model).

Revenue Prediction and Advertising Budget

35,000

30,000

25,000
Revenue generated in $

20,000

15,000

10,000

5,000

0 $500 $1,000 $1,500

Advertising budget

As presented in the graph, the dots that represent the input-output pair of “advertising budget” to
“revenue generated” are plotted. The line represents the continuous output, hypothesized by the
model — in fact, a function. Now, the person maps the new input variable ($1,000 advertising
budget) to this continuous output or function, they will be able to predict the approximate revenues
that advertising budget of $1,000 might deliver
— which in this case is about $22,500.

08
Classification
In a classification problem, the input variables are mapped into discrete categories. The simplest type
of classification is binary, where there are only two choices: Yes or no? This or that? Some examples:

• ...... Is this email a spam or not? Discrete categories: Spam or No spam.


• ...... Is this pet a dog or cat? Discrete Categories: Dog or Cat.

In addition to binary, there are multi-class classification, where the data can be put into more
than two classes.

• ......Coffee cup size: Small, Medium or Large.


• ......T-shirt size: XS, S, M, L, XL.

In summary, Classification separates the data, Regression fits data!

Classification Regression

(credit: Deep Math machine learning.ai)

09
Unsupervised Learning

Meaningful Structure
Compression Discovery

Big Data Feature


Visualization Dimensionality Elicitation
Reduction

Recommender
Systems

Clustering
Targeted
Marketing

Unsupervised
Customer Learning
Segmentation

In unsupervised learning the data is not labeled, allowing the algorithm to act on that data without
guidance. The algorithm finds natural groupings and patterns within the presented data, i.e., it
identifies any hidden structure either by clustering or shrinking the data.

Thus, unsupervised learning checks for previously undetected patterns in a dataset with no
preexisting labels and with a minimum of human supervision. Unlike supervised learning there are no
correct answers and there is no teacher. Algorithms are left to their own devices to discover and
present interesting structures in the data.

There are two kinds of Unsupervised Learning – Clustering and Dimensionality Reduction.

10
Clustering
Clustering is the process of using algorithms to identify how different types of data are related. In
Clustering, the machine is fed a dataset and asked to identify some structure. A clustering algorithm
then finds natural groupings between objects within the given dataset. The algorithm groups the
data into one or more data clusters based upon similarities, differences, or some other complex
relationships between data objects.

In the diagram below the marketer is not sure of the target customer for the product. Using a
clustering approach, the marketer identifies the similarities within the prospect, to group them in
different clusters such as gender, income, age, location and so on. And after the clustering is done the
target market can be segmented. Customizing the marketing activity for specific segments delivers
better business outcomes.

Tryingto determine
appropriate audience

Using clustering algorithms on Selling the product to a


customer base targetedaudience

In addition to market segmentation, some other use cases for Clustering include
identifying fraudulent/criminal activities, identifying fake news, spam filters, and
recommendation systems.

11
Dimensionality Reduction
Dimensionality Reduction is the process of finding the natural groups within the data by reducing the
number of unnecessary features/variables within a dataset under consideration, thus obtaining a set
of principal variables. Here, the aim is to discard the features/variables that are not relevant to the
target variable and can be potentially considered noise.
Feature selection is one type of Dimensionality Reduction. It is the process of identifying and selecting
features that are relevant to the target variable. It can be done either manually by common
knowledge or programmatically using various tools.

Example: Let’s say you are building a model that predicts people’s eyesight and you have a large
amount of data that describes each person in detail — their biometrics, habits, demographic data,
lifestyle information, details of their clothing preferences, etc.

We can safely assume that the color of a person’s clothes or brand of shoes won’t be of much help
in predicting the person’s eyesight. So, these fields can be dropped without hesitation. Consider
lifestyle. Do we think the number of hours they spend in front of a screen affects their eyesight? Of
course, we do! That will be a factor in predicting their eyesight. By making simple manual feature
selections, we have reduced the Dimensionality of the given data.

This was possible because the unhelpful features were obvious or could be easily inferred based
upon common knowledge. In case these features are not obvious, various tools can be used to aid
the feature selection.

Brand Clothing
Used Preference
No. of Screen
Habits Demographic Hours
Data
Biometrics Lifestyle

No. of
Screen
Hours

Predicts people’s eyesight

12
Reinforcement Learning
Of all the types of Machine Learning,
Game AI
Skill Reinforcement Learning (RL) is closest to the
Acquisition
kind of learning that humans and other animals
do. As infants and toddlers, humans learn by
interacting with their environment. This
interaction produces a wealth of information
Learning about cause and effect, about action and
Reinforcement Tasks
reaction, and about the steps involved in
Learning achieving goals, however small.

This back and forth with the environment


Robot continues well into adulthood. Whether we are
Navigation
learning to ride a bicycle, or are having a
Real-time
Decisions conversation with our colleagues, we are aware
of how the environment responds to
our actions and we seek to influence what happens by modulating our actions and behavior. In
short, humans learn by trial and error. Reinforcement Learning is conceptually the same but is a
computational approach to learning, by performing actions.

OBSERVATIONS

ENVIRONMENTS REWARDS AGENT

ACTIONS

Here is the typical framing of a Reinforcement Learning scenario: An agent takes actions in an
environment. The actions are interpreted into a reward and an observation (representation of the
state), which are fed back into the agent. Learning happens when an agent performs a sequence of
actions based on decisions that will maximize the returned reward. The goal of a Reinforcement
Learning agent is to collect as many rewards as possible.

In the most interesting cases, actions may affect not only the immediate reward but also the next
reward, and through that, all subsequent rewards. The two characteristics –
trial-and-error and search-and-delayed rewards – are the most important distinguishing features of RL.
The model needs to pay attention not only to the immediate reward but to the overall reward. RL is
used in robot control, elevator scheduling, telecommunications, checkers, optimizing the behavior of
self-driving cars, and much more.

13
What is Deep Learning?
Deep Learning (DL), a type of Machine Learning, is in fact a new name for Neural Networks. Modeled
loosely on the human brain, neural networks consist of inter-connected “neurons” (Software-based
calculators) organized in layers. The data passes through multiple layers of densely packed artificial
neurons, in one direction. The neural network processes a huge amount of data, learns complex
features of the data, and then uses what it has learned to make determinations about new data and
to provide more accurate results.

Since 2010, modern GPUs, courtesy of the computer gaming industry, enabled the
one-layer networks of yesteryear to blossom into the sophisticated 10, 25, even 50-layer networks
of today. That’s what the “deep” in “deep learning” refers to: the depth of the network’s layers.
Currently, DL is responsible for the best-performing systems in almost every area of AI. Deep
Learning makes use of several algorithms thus offering different DL models with each model better
suited to perform a specific task. Some of the popular Deep Learning models are CNN, RNN, LSTM,
and Transformers.

DEEP
MACHINE
LEARNING
LEARNING
MACHINE LEARNING BASED
ARTIFICIAL ABILITY TO PERFORM TASKS
ON ARTIFICIAL NEURAL NETWORKS
INTELLIGENCE WITHOUT EXPLICIT INSTRUCTIONS
ENGINEERING OF MACHINES AND RELYING ON PATTERNS
THAT MIMIC COGNITIVE FUNCTIONS

Source: Data Science Central


Timeline and Hierarchy of AI, ML and DL.

14
Convolution Neural Network (CNN)
CNN is a multi-layer neural network used to analyze images for image classification, segmentation, or
object detection. CNNs work by reducing an image to its multiple key features and using the
combined probabilities of the identified features appearing together to determine a classification.
(The “convolution” part of the name comes from the mathematical term for operation of two related
functions, in which one affects the shape of the other.) These algorithms are increasingly being used
for tasks such as facial recognition, image classification, or video analysis.

When using a CNN to classify images – as, say, either dogs or cats, it can automatically find the
features that distinguish one species from the other and classify the images for you.
CNNs are often used in diagnosing medical scans, understanding customer brand perception and
usage, or detecting defective products on a production line.

CAT
OUTPUT

CAT
DOG GOT IT

15
Recurrent Neural Network (RNN)
A Recurrent Neural Network (RNN) is a multi-layer neural network that learns a data sequence and
provides output as a number or another sequence. It is used to analyze sequential input, such as
text, speech, or videos, for classification and prediction purposes.

As shown in the following image, RNN is used in language modeling and text generation. Here, a
language model captures the structure of a large body of text and can write additional text in the
same tone or style. Models work by predicting the most suitable next character, word, or phrase in
the generated text. RNN is also used in language translation, speech recognition, generating image
descriptions, time-series anomaly detection, and video tagging.

Though extensively used early on, RNNs have a couple of big issues that are making them lose their
luster quickly. First, RNNs are not very efficient in handling long sequences; the models tends to
forget the contents of the distant position. Secondly, they process data sequentially, making the
training of the model very difficult.

Seed sequence of words Predicted word


Step 1: the man is walking Down

Seed sequence of words Predicted word


Step 2: the man is walking down the
Seed sequence of words
Predicted word
Step 3: the man is walking down the street

Seed sequence of words Predicted word


Step 4: the man is walking down the street .

Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) architecture, often referred to as a fancy RNN, offers a slight
improvement over RNN. LSTM units include a “memory cell” that leverages “gate mechanisms” to
maintain information in memory for long periods of time. That makes them more efficient in handling
long sequences such as time series. However, just like RNNs, LSTMs cannot be trained in parallel. Well,
“Transformers” to the rescue!

16
Transformers
Introduced in 2017, with a paper titled “Attention is all you need”, Transformers, are one of the most
powerful classes of model to rock the Natural Language Processing (NLP) world. Unlike RNNs and
LSTM, Transformers can be trained in parallel. This allows them to leverage the power of GPUs and
train models with huge datasets fast, thus producing superlative results. The innovation of
Transformers is the result of two main features: positional encoding and self-attention.

Positional encoding allows the order of the word in the sentence to be a part of the word itself.
Example: YOUNG CATS ARE CALLED KITTENS. Each word in the above input sentence is appended by
its order in the sentence (Young-1) (cats-2) (are-3) (called-4) (kittens-5).

Self-attention allows a neural network to understand a word in the context of the words around it. The
self-attention mechanism allows the inputs to interact with each other (self) and find out who they should
pay more attention to (attention) The outputs are aggregates of these interactions and attention scores.

Example: Depending on whether the last word of the sentence is “full” or “empty”, the “it” refers to
different things and the attention is diverted to different words.

1 Michael poured water from the jar into the cup until it was full.

Michael poured water from the jar into the cup until itit was full.

1 Michael poured water from the jar into the cup until it was empty.

Michael poured water from the jar into the cup until itit was empty.

With the model’s understanding of the positions and self-attention, the results of using
these models are truly transformational in NLP — machine translation, document summarization,
document generation, named entity recognition, biological sequence analysis and video understanding.

BERT (BI-directional Encoder Representations from Transformers) and GPT-3 (Generative pretrained
transformer) created by Open AI are the two most popular Transformer-based models. GPT-3 is trained
on 45TB of data – an unusually large amount, which gives it a more sophisticated understanding.

The scale and scope of Transformers is driving a paradigm shift in the world of AI. As of today, Transformer
is the state-of-the-art technique and a model of choice replacing RNNs and CNNs.

17
Conclusion, and hopefully
the beginning!
Now you’ve finished your tour through the major types of machine learning and their most popular use
cases. We’ve defined terms such as Supervised and Unsupervised Learning, Clustering, and
Reinforcement, and explained when “CNN” is not a global news network, and “Transformers” are not
just toys or animation.

You’ve seen examples of how machines learn and how they can apply that learning, to everything
from face recognition to customer service chatbots. But you may still be wondering how this fast-
growing field is relevant to your organization.

In fact, AI and ML promise to streamline and speed up many B2B processes, from supply chain,
procurement to software testing, to self-diagnosing and even self-healing server systems – while
easing the manual response burden on exhausted support teams and reducing overhead costs.

If you’ve become curious about exploring the


applications for AI and ML in your industry, we
hope this eBook has better equipped you to
have productive conversations with subject
matter experts. Let this eBook be the start of
your AI/ML education journey. And let it
trigger exploration of important use cases in
your domain area where you can apply AI/ML
for better customer experience and improved
business outcomes. And if you’re ready to take
the next step and see how Digitate’s AI-based,
award-winning enterprise solutions can
streamline your business and technology
operations, get in touch with us. All the
very best!

18
Special thanks to Stannie Holt for painstaking copy
editing and to Aarti Joshi for the design help. You both
made it read and look much better…

- VS Joshi

c Copyright 2022, Tata Consultancy Services Limited. All Rights Reserved.

You might also like