You are on page 1of 24

Session 4 : DATA EXPLORATION WITH DATA VISUALISATION 83

The data exploration techniques are applied mainly :


SESSION
- to visually explore and identify relationships between different data variables,
- to understand the structure of the dataset,
Jl Whal is Doto Exploration ? - to identify the presence of data points that differ significantly from other observations
Data Exploration with A Doto Visualisation (called outliers).

A Ways to Visualise Doto - to obtain the distribution of data values in order to reveal trends and patterns and
Data Visualisation points of interest.

4.3 DATA VISUALISATION

Data visualisation refers to the process of representing data visually or graphically, by


using visual elements like charts, graphs, diagrams and maps etc.
The importance of data visualisation is summarised as follows :
4.1 11\!TRODUCTION
(i) Data visualisation is a powerful way to represent a bulk of data in a collective
After acquiring data, it is important to understand data and its characteristics before AI visual form.
model can be prepared using it. To understand the data characteristics, Data Exploration (ii) It is a way to explore data with presentable results.
is carried out which uses many techniques to describe the data collected. In this session,
(iii) Data visualisation makes it easy to interpret and comprehend data.
you will learn about one such data exploration technique - data visualisation.
(iv) It becomes easier to see the trends, relationships and trends of data through data
visualisation.
4.2 WHAT IS DATA EXPLORATION?
(v) Data visualisation is useful for combining categories of data and thereby reducing
After data acquisition, the data needs to be cleaned by removing redundant or unrequired data for processing.
data and handling missing values and then its characteristics are to be thoroughly
(vi) Data visualisation helps in defining strategy for using data for AI model to be
understood. All this process is known as data exploration. Data exploration uses various
developed at later stage.
techniques, such as, data visualization and statistical techniques to describe dataset
characterization in order to better understand the nature of the data by showing the 4.4 WAYS TO VISUALISE DATA
trends and patterns in data.
Some visualisation techniques are being discussed below :
Data exploration is the phase after data acquisition .....
J ata Exploration
wherein the collected data is cleaned by removing
Data Exploration is the
redundant data and handling missing values 1 and then phase of exploring data with
analysed using data visualisation and statistical an intention of understanding
Charts These use an established pattern or theme for displaying data. These may or may not scatterplot,
techniques to understand the nature of data before it can the nature of data. use axes. bubble chart
be converted into AI models.

Need for Dato Exploration


Examples
<Bubble chart I
&INl¥.. Tln'llrotnt.latignlntMtl.vlol\'la
.
During Data acquisition, the data ' which is .
· gath ere d from vanous sources is often in
1arge, unstructured v~lumes. Thus, it is first cleaned and handled to brin 'it in a form .I . . :.

(.y;,
• • :-.".• •. •. : :: •~• I ,.

useful for data analySIS and then data exploration techni


are applied to gain greater ins·1 ht . t h
9 m o t e raw data.
g. . .
ques, such as data visualisat10n,
; ,r . Sales in Major
·.:.~.~t~i{?/~
1. There are special techniques used which help clean d ta
" \, ,.,,. Showrooms .. .:. ... : _:
.. , • . • . ~I-Sca-tte_rp_lo~t) •·
some legal agreed upon values. This is done so that~ ; r:move redu nd ant data, handle missing values by filling
beyond the scope of the book, however, data visualisalia a ec;:;;es analysable. Covering these techniques here is
on tee ques are being discussed here.
82
Session 4 • DATA EXPLORATION WITH DATA VISUALISATION 85
84 ARTIFICIAL :IN:T:El~Ll:G:EN: C~E~X~=======-==-=~= ==----=====rji]]

· ,~
Let us now briefly talk about these data visualisation tools, one by one.
~ ri;;1,~~~·-l--~
ii....-
~~~·= h ·ng nu merical data.
.
line gr,1ph, 1. SCATTER CHART (Used with numeric type of doto)
G These rontain a X and y axis. with at least one s owi pie graph,
An XY (scatter) chart either shows the relationships among the numeric values in several
Pie chart ) bar gr,1ph,
Examples histogram data series or plots two groups of numbers as one series of XY coordinates.

~~N <
1101-1 lo draw?
Soles by Company
The sca tte r chart is drawn 55000 Now this one is
chart, which is
'X'f
uoegraph I by plotting the independent 47500 4 SW a scatter chart.
variable on the horizontal
. •ooo~ 4 CRCW
I, PCUd
axis X, the dependent
variable on the vertical axis
~ 32SOO
.,
■ N'Lld
Iii PCW
♦ sw

[3
~,
U Slid
25000 Ill CRCUd 1 I'"'.
Y and then by marking data ♦ "1'Ud
♦ PCUd
17500
points as per their XYvalues. ♦ CRCLld
10000 +----,.-- ~- ~ - - ~ -
05 15 2.5 JS ,s '-I

Figure 4 1 A Scatter Chart


2. BUBBLE CHART (Used with numeric type of doto)
These are used for visualising geospatiat. geographic data. choropleth,
heat map A bubble chart is primarily used to depict and

<
70
show relationships between numeric variables
p
Examples
Choroplelh I with marker size as additional dimension. 60

A Bigger marker means bigger value . 50

R 40
How to draw?
T 30
The bubble chart is drawn by plotting the
B Healmap > independent variable on the horizontal axis
(X), the dependent variable on the vertical
20

10
- - - - . 1 - - - - - - - - - - - -- - ----r-:-'-----1 axis (Y) and then by marking bubbles at their
These display data over a period of time, with a start and a finish time. timeline 0
XY values. The Y values will determine the 0 10 20 30 40 50

Example ' - 2005 T 2015 f --·~ bubble size. Figure 4.2 A Bubble Chart
0---9---0---<:' ___ 6___
2000 I-· 2010 20 20 3. LINE GRAPH (Used with numeric type of dato)
A line chart shows trends in data at equal intervals. Line charts are useful for depicting

.... ...·..
These demonstrate how data is related within a network. node-link
diagram the chang e in a value over a period of time.
Example

.. How to draw?
The line chart is drawn by Sales By Company

I------+--···- .•. .-
.
- ---1---1
plotting the independent
variable on the horizontal axis
These visually display textual data in a multi ple of aesthetically pleasi ng ways. word cloud (X), the dependent variable on
Example -
s-,- ~- -
~ - ' ,1 c:>.,..,
•-ell!IIJS c:--- - · -::cJ,-
,t;;/l ., • the vertical axis ( Y) and then

,fiTRcfB
sulG~~!ii~~j;s
~AT!P~e~ r_
JSToRAGE • · 1Hrl 8
by marking data points as per
their XYvalues. Then a line is
drawn by joining the marked

~illlll~~l~-
- -- --- ----- ____ ___
_.__ __.
data points.
figure 4.3 A Li ne Chart
? Session 4 . DATA EXPLORATION WITH DATA VISUALI SATION 87
86 ARTIFICIAL INTE LLIGENCE X
7, CHOROPLETH (Used with processed numeric dato linked with textual units)
4 PIE GRAPH (Used with numeric type of dola) . Choropleth maps are used with statistical data (numeric, processed data) attached to
· . . f. that make up a single data senes to the
A pie chart shows the proportional S1Ze o items enumeration units (textual data e.g., countries, provinces, states etc.) to depict data for
geographic regions. For example,
sum of the items. Hero comes tho Pie
saies IIY company Chart displaying tho
proportional size or
•> world map of income tax rates, country wise.
f 10,, 1t1 d .n, ' Items. •> world map of Covid 19 spread, country wise.
The pie chart ~pre:-en~ ~ingk •> map showing the percentage increase in real
data series, whole of which
estate value, state wise.
repr~-senh> full circle (3t'{1°).
Each data ,·alue is calculated How to draw 7
as a perCl."!ntage ol whole and In the region map, firstly the statistical
drawn ~ a pie of the circle. Covid 19 spread all over the wortd
values are written in the sub-regions. Then
Figure 4_., APie Chart the sub-regions are filled with the
Figure 4.7 Chorop[eth Chart
corresp onding colour for that value.

5. BAR GRAPH (Used with numeric lype of data,:.)_ _ _ _ _ _ -=-------.


sates lly company 8. HEAT MAP (Used with numeric data depicted through colour codes)
A bar cha.rt illustrates comparisons -~~ A heat map is a graphic representation of data in which values are represented by colours.
among individual items, mainly of Some examples of heat maps are :
number types. •> A geographical heatmap representing areas of high and low density of a certain
p
parameter (population density, network density, etc.) by displaying data points on a
A map through different colours. ••.•
R The bar chart is drawn by
•> A stock index heatmap depicting prevailing trends
/ ~-
T plotting the independent
- 'la = ~ "'
- -"::.-
B variable on the horizontal
20000
.....
40000 80000 80000 100000
in the market through colours, e.g., cold-to-hot ~'...,.. - -~~
axis (X), the dependent This is a Bar Chart.
colour scheme to indicate which stock options are ~ - -= -~-
variable(s) on the vertical axis See the difference in bullish and which are bearish. .;.., ~ - / - •=
(Y) and then by marking bars appearance of column
chart and bar chart. How to draw?
for their Y values.
The region is divided into sm aller squares.
Figure 4.5 A Bar Chart Then each square is filled with the colour India Climate heat map
code as per the data it is s toring.
6. HISTOGRAM (Used with numeric type of data) Figure 4.8 Heat Map
A histogram is used to summarize discrete or continuous data by showing the number of 9. TIMELINE (Used to represent all types of data aga inst ti me)
ABRIEF TIMELINEONAHO ENT !NOIA
data points that fall within a specified range of values (called "bins"). Unlike a bar chart, A Timeline Chart shows a series of events in
there are no gaps in between in a histogram. chronological order. It can be used to depict the
4.00.000 IC - 2SOO IC
::-.:::.-.:- 2SOOIC-ISOOIC
A histogram summarises order of historical events, critical milestones of a 1SOOIC-10008C
H w to dra 7 data by showing the number
3Z28C- 18SIC
of data points that fail within project schedule, and so on.
Like bar chart, rectangles of varying height 185BC - 320AD -
are used to represent the frequency of How to draw?
3201D-S101D
different values of the continuous variable smiAD - 647/Jl •~
Draw a horizontal/vertical line with ends 9lllC!ft.AD -131bCt~ID
(Y valu es). There are no spaces between
marking the start and end dates. Mark the
the rectangles.
»
11,~~lhrtl
IS lJ) I points on the line for each of the events.
Mark at each point the date and event.
- 648_AD-1000AD • - -

figure 4.9 Timeline Chart


Figure 4.6 A Histogram Chart
T1
88 ARTIFICIAL INTELLIGENCE- X Session 4 . DATA EZPLOP-ATION WITH DA.TA VISUALISATION 89

10 NODE LINK DIAGRAM 1Used with all types of data) C heck Point
A node-link diagram shows how things :Ue inter~onnected t hrough t he . use of
1. Data Visualisation is carried out durin g _ _ p hase of Al proJect
• cycle
nodes/ vertices and link lines to represent then connections and t he type of relationships
(n) Data Acquisition (b) Data .Exploration ·
between a group of entities. (c) Modell ing (d) Problem Scoping
These are used in many 2. Data Visualisation
. . . cannot happen before the _ _ ph ase of Al prOJect
. cycle IS
. over.
applications, for example, for (n) D a ta A cqu1s1tion (b) Data Exploration
analysis of social networks or (c) Modelling (d) p robl em Scopmg
·
mapping product sales across Sunll Brother-In-law
3. Data visu alization tools provide an accessibl e way to see and und erstand _ _ m · data.

~
geographic areas and many (a) trends (b) ou tliers (c) patterns (d) all of these
similar ones.
so<'~ 4. _ _ are visual methods of displaying data.

Points are represented as


Gavaskar
'----
r7 ___
A Cricket Family of India
(a) Tables (b) Data sets
5. What are the common types of data visualization ?
(c) Charts (d) Histogram

nodes (vertices) that are linked Figure 4.10 Node Link Diagram (a) Charts (b) Word O oud (c) Heat Map (d) All of these
by lines (edges). 6. Data can be visualized u sing _ _ .
(a) graphs (b) charts (c) maps (d) all of these
7. What are specific examples of methods to visu alize data?
11. WORD CLOUD (used w1ih textual dare)
p (a) Bubble Chart (b) Pie Chart (c) Scatter Map (d) All of these
The word cloud data visualisation technique represents the frequency of a word within a 8. The importance of data visualization is
A
body of text with its relative size in the cloud. This technique is used on unstructured data (a) Leading the target audience to focus on business insights to discover areas that require attention
R
T
as a way to display high- or low-frequency words. PROVIDERS LU (b) Revealing previously unnoticed key points about the data sources to help decision makers
~~1~~ ~ (/) 5 compose data analysis reports
8 tr dr, . ' §HACKERS ~~ ~ ~ (c) Helping decision makers understand how the business data is being interpreted to determine
~PUBLICc ~ ~ LOGS &i
The frequency of each word determines STRATEGY
ATTACKS
CYB ER
~SECURE
a, VULNERABLE
business decisions
its weight, which determines its priority. (d) All of these
~ RESPONS: ~SECURITV ~
The words with the highest priority get .. PRIVACY 31 ~ MALWARE 9. What are the benefits of data visualization?
drawn first, and will be drawn with larger 8
SOFTWAREgl LEGAL 5 ~ INTERNET
TECHNOLOGY :::> B NETWORK (a) Better analysis (b) Identifying patterns
font-size. ACCESS -:i: ~ (d) All of these
(c) Exploring business insights
10. The _ _ is a commonly used term referred for a value that appears far away and diverges from
Popular Doto Visualisation Tools Figure 4.11 Word Cloud
an overall pattern in a sample.
There are many software available (a) Data (b) Feature (c) Plotted value (d) Outlier
that let you visualise data in various
Qlik<Q 11. A ___ is a chart used to plot a correlation between ti,vo or more variables.
forms. Most commonly used such 1,1• FusionCharts
...... ,., "'·•···~ I ~ ►-~515:N5= - - (a) Bar chart (b) Scatter plot (c) Pie chart (d) Bubble chart
software are Ms Excel, Google
Charts etc. However, these days
there are many other software
))_ Data ___ I
1 l
12. To show "relationship" between variables, _ _ are used.
(a) Bar charts (b) Scatter plots (c) Pie charts
(d) Bubble charts
Vlsuallsatlon
available that are so strong with data 13. For what type of data visualisation are histograms usually used for ?
F.lf.N.I i.d, - - - Tools - - -·! f} +ab I• au
visualisation techniques that they (a) Continuous Data (b) Random Data
Watson __/r : - ~ ""'-- ~ atawrapper
have gained equal popularity. Some
such popular, open source data ~ -- 14.
(c) Redundant Data (d) Missing Data
is a map that represents data through different shades of colours.
th
visualisation tools are shown in the (a) Heat map (b) Choropleth (c) Bar graph (d) All of ese
adjacent figure 4.12. figure 4.12 Some Popular Open-source Data Visualisation Tools
UNIT 11 : Al PROJECT CYCLE ~
Session 4 : DATA E/PLOPATION V/ TH DATA
1 VISUALISATION 91
90 ARTIFICIAL INTElLIGENCE X
•"·l fiar understanding •
· the informa tion stored in data.
Data visualisation is important and use;u
C'ompct<ncy B:t:'d Ol_u·slilms numeric values in several data series or

.
-- ~-- r --
. · t :lrin' 1s to .bl' upd,1ll
<
'd and shown
. , . _. .
An XY (scatter) chart either shows the relation ships
plots two groups of numbers as one series of XY coordin
among
ates.
the

15. During th<' tough p.in<kmu: llmt•, the data c> , acon,1 10n lx•,.t lo sh<1\\ ',Kcm allon dat,1 of to depict and show relation ships between numeric vanables with marker
,1llr t".1 ch ..;.fah."' .1.n d L..., \\'h ll--Il {,f th,l ...... ,• 1',-
ll<)Wl·i..'" tlw w.1\
A. bubble ch~'.t is pri_maril,: used
I .l lV on. Bigger marker means bigger value.
regu Iarr size as additional dimensi
muJtipJt• ,t,1tes ,imult:int'ou~Jy' t Line charts are useful for depicting the change in a
(11) I.me Chart (b) Scatlt'r Chart (t) B.u Chart (d) HiS Ogram ❖ A line chart sho~s trend~ in data at equal intervals.
years. value over a period of time.
16. Government of lndi,1 ha, tc> n>le~e it" data
about year!\ amoun t spent on Education in past 5 th make up a single data series to the sum of the items.
pn•:,en t this data visual! ) ? ❖ A pie chart shows e proportional size of items that
Which of th<' foll<>wing i,- th<.' t,e,t way to al items, mainly of number types.
❖ A bar chart illustrates compar isons among individu
(C) Bar 01.1rt (d) Bubble Chart
(b) S..--a.ttl'r Chart
(11) Pie Graph or continu ous data by showing the number of data points that
,·0u ha,·e h.1 make a presen tation about increas
ing ❖ A histogr am is used to summa rize discrete
17. During the P<ipulatil>n control summi t. es. _How ':ou;d you Jail within a spedfie d range of values (called "bins") .
and its impact on _natural res~urc
p.1pulahon of the \\Orld, dcn,-ity of pt.>ople data (numeri c, processed data) attached to enumeration units
· s popula tion along with their densit y . ❖ Choropleth maps are used with statistical
pre'<'nl the world popula tion ,ho\\'m g each countn (textua l data e.g., counties, provinces, states etc.)
to depict data for geographic regions.
(c) Bar Chart (d) Bubble 01art
(a) Pie Chart (b) Hl'at .\fap in which values are represented by colours.
d in ❖ A heat map is a graphic representation of data
18. Pluto Compa ny's new manag er want:, to :s<.'e
how the prices of the most sellin~ item have evolve in chronol ogical order.
t this data VJsual ly? ❖ A Timeline Chart shows a series of events
the best way to presen
pa,;t len }<',ll's. Which of the follo\\ing is nnected through the use of nodes/vertices and link lines
(d) Pie Chart ❖ A node-li nk diagram shows how things are interco
(b) Scatter Chart (c) Bar Chart ships between a group of entities.
(a) Llne Oiart to represe nt their connections and the type of relation
charac ters puts an added pressu re on the produc tion nts the frequency of a word within a body of text with
19. Makin g a film or drama ha\·ing multip le main ❖ The word cloud data visualisation technique represe
to en~ that all the main charac ters get balanc ed screen time and nearly equal in the cloud.
team. The team has its relative size
tion teams use some data visuali sation charts to
share of dialog ues. To ensure this, many produc
p and representation. For instance, the maker s of
ensure that the main charac ters get equal share
used data visuali sation chart during shooti ng to ensure that every six
A popula r sitcom 'FRIE:-..'DS' b °lution T~
dialogu es. Which chart could it be to depict the share
R character:,, have an equal numbe r of jokes and
T of dialog ues for each of the main SIX characters ? l. What is Data exploration in AI project cycle ?
(c) Bar Chart (d) Bubble Chart acquis ition phase and in this phase the data is
(a) Pie Chart -{b) Line Chart Ans. The phase of data explor ation follows the Data
B of concern and govern ments of all countr ies are seriou
sly to describe datase t characterizations, to better
20. Global wanni ng has becom e a major issue t explor ed and analys ed using variou s techniques
nce, as a studen t partici pant, you have to presen data by showin g the trends and pattern s in data.
workin g toward s it. In the global warmi ng confere unders tand the nature of the
atures data given to you in the form of a chart/g raph. Your graph/ chart
the globaJ surface temper 2. What are the techniques used in Data explora
tion phase ?
to the 2020s. Which chart type would you pick
should track global ,,urface temper atures from 1880 exploration, most commo n of which are Data
Ans. There are many techniq ues used during data
to show the trend of global surface temper atures? ues.
(c) Bar Chart (d) Pie Chart visuali sation and statisti cal techniq
(a) Line Chart (b) Histog ram
in such a 3. Name some Data visualis ation software.
21. In the nation al wellne ss meet,, you have
to repres ent happin ess score of each state visuall y
others . re are : MS Excel, Google Charts, Tableau, Data
is able to tell which state is happie r than Ans. Some comm only used data visuali sation softwa
way so that ju.st by lookin g at the data, its size
would be the best way to show this ? wrapper, Infogra m, IBM Watson etc.
Which of the follow ing
(c) Bar Chart (d) Pie Chart
(a) Line Chart (b) Scatter Chart 4. What is the need of data visualisation ?
d a chart/g raph where for unders tandin g and compr ehendi ng the
22. After repres enting the budge t in the parliam
ent, the govern ment has release Ans. Data visuali sation is import ant and useful
such as militar y, to see the trends, relatio nships and trends of data
the pe:cent.ag~ share of the budge t allocat ed is shown for differe nt param eters inform ation stored in data. It becom es easier
e and so forth. Which also useful for combi ning categories of data and
educat ion, agricu lture, women and child develo pment
, health & family welfar throug h data visuali sation. Data visuali sation is
chart type could it be ? thereb y reduci ng data for proces sing.
(c) Bar Chart (d) Pie Chart
(b) Scatter Chart
(a) Line Chart 5. Mentio n some popular Data visualisation techniques.
Ans. (iv) Bar graph
6 TUS REVISE (ii) Scatter graph (iii) Line chart
(i) Bubbl e chart (viii) Heat map
(vii) Choro pleth
❖ Data Exploration is the phase of exploring data with an
intention of understanding the nature of data. (v) Histog ram
(vi) Pie chart
(xi) Word cloud
· v1sua
· //y, by usmg · / (x) Node link diagra m
to the. process of representing data Vl·sual/y or grap h1ca
. ation refers
❖ Data visualis
s and maps etc.
(ix) Timeli ne
elements like charts, graphs, diagram
UNIT II : Al PROJECT CYCLE

~
92 ARTIFICIAL INTfl,IGENCE X
SESS I ON

L/ c LOSSARY
\.J
Data Exploraoon
c uisition phase) for exploring data with an Intention of
The phase (follC'Win& tt>e Data A q
under;tand,ng the nat11re of data • v1sua 1 elements like
d ta visually or graphically, by using Jl Whal is Modelling ?
Doto Visualisation The procen of representing a
charh, graphs, diagram~ and rnaP5 etc Jl Categories of Al Models
Outl,en Data pomts that diffor ~1gnificantl1· from other observations Ji Supervised/Unsupervised Learning Modelling
Ji Semi-Supervised Learning/ Reinforcement Learnin g

o4,ssignment

\\'hat 1~ O.1ta E.\plor.1tJM 7


2 Wh.1: ~ tl1t' importanC\' of Data l!).ploration pha..se '
3 \\'h.11 a~ the awth<Xf, tt'dtniqu~~ u.sed during Data e,ploration phase ? 5.1 INTRODUCTION

... \\'h.11 b Data n,ualisation ? After the phases of Data acquisition and Data exploration, comes the modelling phase. In
5. \\'hat is tht.> nc'<-'<i for Data \i, ualisation ? this phase, now the model for the AI project is created using the dataset and its
6. '\'ame --Orne data \1, uahsation tl'Chniques u..sed for depicting statistical data.
discoveries (patterns, trends, outliers etc.) so that the prepared model is able to make
i ~•ame ;SOme data \':i~uali..,;ition k'Chniques used for depicting numerical data.
p predictions or decisions. In this session, you will learn about AI modelling where we shall
f 6. 1\'ame ,;onw data \1.Sual1sab0n techniques used for depicting textual data.
discuss various AI approaches.
I A 9. ~ame a data \1.Su:ihsation technique used that uses rolours for plotting values.
f R 10. l'l:ame a data \'tsuahsation technique m.ed to depict chronological order of events.
5.2 WHAT IS MODELLING?
1 T 11 r-:ame 3 data \'i,ualisabon technique that depicts the relationships of data among a network.
I B 12. ~ame a datJ \i,;ua!isation technique that can only be used for : Modelling is the phase during which the AI model for the desired outcome is trained using
(a) tel<tual data (b) numeric data (c) statistical data the collected data repeatedly until it starts producing the desired results. Mostly, for this,
13. Briefly discuss the use of data visualisation techniques listed below : some publically available pre-trained AI models are picked, tuned as per requirements and
(1) Bubble chart (ii) Scatter graph (iii) Line chart (iv) Bar graph then trained using own data. r- .
(v) Histogram (tn) Pie chart (vii) Oioropleth (viii) Heat map ~ odel hng
(tr) Timeline (r) !\ode link diagram (ri) Word cloud
The data processed and assessed in Data exploration Modelling (developing and
phase, after discovering the patterns and trends, is first training an AI model) refers to
14 ~ame ,;ome popular Data ,·isualisation tools. mathematically analysing the
represented mathematically. The representation of data
data and its inside relationships
in mathematical equations and other forms is necessary and with the parameters passed,
for developing AI models. This is because the AI model and finding ways through
PRACTI CA L ASS IGNMENT analyses, explores and analyses the relationships of data algorithm & repeated training
to reach to desired and expected
For our AT pr<>jed of Face Mask Detection our d t set ft . .. parameters and figures out the ways to reach the intelligent outcomes.
images of different types This data . / ~ a a er data acqu1s1tion phase contains many outcome through repeated training.
. . is no numenc but just images and pictures
Do you thmk you can use any data visualisation techni . .
dataset containing images ? Why ? / Why not ? que discussed above in this session, for the The mathematical processing is essential for all AI models developed via different ways. In
the coming lines, you will learn about the different categories of AI models.

Note
The ability to mathematically describe the relationships between
data and parameters forms the core of every AI model.

93
Sewon 5 . •AODELLING 95
94 ARTIFICIAL INTEWGENCE x
s..
32 Learning Based Al Model
5.3 CATEGORIES OF Al MOD~LS Also known as data-driven Al model, it is a system in which a lot of data is shown and
-- - ~ 'th be data Leaming
based questions/answers are asked in order to train the system about the right answer(s). For
The AI models can e1 er
driven or model driven. The model example, to train a system about recognising cats, it would be shown lost and lots of
driven Al models are mainly rule b~ed Rule
images of cats, and other animals and letting it r-Data Driven Al
wlu1e data driven Al models are mainly based know when it "guessed" it correctly or not. ...., The Learning based Al (Data
learning based. (see Fig. 5.1) After many (millions) of training cycles, it will Driven Al) refers to that branch of Al
figu1e 5. Types of AI Models "learn" to get it increasingly right. where models are trained to learn by
Let us quickly taUc about these inputting them tons of data. Here, there
categories of Al models. are no patterns. rules and relationships
Note predefined by the developer, rather
The learning based approach is used when data
5.3. l Rule Based Al Model . .. machine learns with each new input and
. d 1 it is a system which denves decmons about the is unknown or random or unlabelled .
comes up with own algorithm.
Also known as model-driven AI mo eta , o·on and rules for example, in such a system a
. ht through explicit represen . , Using this style when an AI system is developed then the relationship or patteTitS in data
ng answer l d animal with two eyes, a nose and
cat would be explicitly represented as a four- egge ' a are not defined by the developer. Rather, random data is repeatedly input to the machine.
. h t) and that is relatively small (except when not), etc. The machine analyses each input and tries to figure out patterns and trends out of the
mouth that is furry (except w en no
r-Model Drive □ AI
input collectively. This style is appropriate when the data to be processed is unlabelled
Note ...., and too random to fit into a frame of common rules and patterns .
The rule b3sed Al 1s used "'hen ,..e ba\'e known or Rule Based AI (Model driven Al)
labelled dataset refers to the branch of AI where models Learning based models (Machine Learning (ML) and Deep Learning (DL) models)
p are developed using the algorithms constantly "adap t" and "evolve" their performance in accordance with the continuous
In this, an AI system is developed using the having pre-defined labels, rules,
F streams of training data. Thus, they automatically learn from more and more data fed. and
A predefined labels, rules, patterns or relationships patterns and relationships.
J human experiences, and improve performance, and hence these are dynamic and scalable.
R as given by the developer in the algorithm.
T Thus, the machine would follow the algorithm's rules or instructions and perform the Both Machine Learning (ML) and Deep Leaming (DL) fall under the category of
B given tasks or take the decisioru accordingly. learning-based AI :
f r-Machine Learning
Drowbccks of Rule Based ,t.J Models •> Machine Leaming (ML) is a branch of AI that a...,
Machine Learning (ML)
enables machines to automatically learn and improve is a branch of AI that enables
Although the rule based AI models are comparatively easier to maintain and implement,
at tasks with experience and by the use of data. ML machines to progressively
they also suffer from the following drawbacks : learn and improve at tasks with
based machines undergo lots of repetitions of taking
(I) Lot of manual work. The rule based system requires a lot of manual work as all the experience ,,ithout being
data and testing it; these then keep track of when explicitly programmed and by
rules governing the decisions must be pre-coded and made available to the system. things went wrong or right, and keep improving their the use of data.
(fr) Consumes a lot of time. Creating all possible rules for a system requires a lot of results.
time. Bigger and more complex the system is, more time-consuming it becomes. The ML systems can automatically learn and improve without e:plicitl~ being
(ii!) Suitable only for less complex domains. Complex systems would require lots of programmed. The recommendation systems on music and video streaming semces are
rules and a large number of rules. Covering all the combinations and permutations examples of ML. Machine learning finds patterns in data and uses them to make
of rules for a complex system is challenging for a rule-based system. predictions.
(iv) Limited adaptability and learning capacity. Rule based systems' rules are •> Deep Learning (DL) is a subset of machine learning . -Deep Learning
a...,
predefined and do not get updated on their own. Changing and incorporating more where learning takes place through examples. Deep Deep Learning (DL) is a
rules depends upon human capacities and thus it limits the capabilities and learning Learning computer-models filter the input data subset of machine learning
capacity of a rule based system. using layers and rules-based algorithms to predict where learning takes place
through examples by filtering
(v) Static and not scalable. Since the rule Note and classify information. Tasks like speech and the input data using layers and
based systems cannot update their rules Rule based models are often preferred for image recognition are performed through d:ep rules-based algorithms to predict
on their own, these systems are static and limited scale projects that require limited learning systems. Driver-less cars ~re bemg and classify information.
not scalable. efforts, cost, and updates . In other words, the developed using deep learning technologies
rule based sys1ems are not largely scalable.
UNIT U : Al PROJECT CYCLE A~
Session 5 . MODELLING 97
96 ARTIFICIAL INTElllGENCl--X
The first approach of learning carpentry is Supervised learning, while the approach 2 is
5 3 2A Unlobe ed and ..obe' cd Doto l •ng based approaches, it is important to Unsupervised learning and approach 3 is Reinforcement learning.
. . s.sion of different earm
Before "·e proceed to the discu d t first After the above analogy, let us now talk about the three different learning based
d n\abelled a a ·
talk about labelled an u . ~: for pieces of data that have not been approaches in the coming sections.
data is a descnp ...on I . ;,; . f
, Unlabelled Data. UnlabeII ed . ,-; pronerties or c aSS!J ,cations o data.

tagged with labels 1denti,r
• "f.n 11 g charactens .. cs,

r . ·d' "d
• l de photos, audio reco1 mgs, v1 eos, news SUPERVISED LEARNING
,
. -Supervised Learning
S.4- Supervised Learning is a learning
Some examples of unlabelled data nugh: lll\C uplication), etc. There is no "explanation" for approach of machine where I.he machines
. of some medica ap Supervised learning is a machine learning
ar'ic/e,s, tweets, x-rays (m case . . t . s the data, and not h'mg else. with the help of an algorithm (the model)
. Of I b lied data - it 3ust con all1 approach in which a machine, with the help of learns on a labelled dataset and is later
each piece un a e f ples that have been marked with one or an algorithm (called the model), learns from a tested with some unlabelled data whose
d d t is a group o sam .
,,\ Labelled Data. Label\e a a . t data so that it gives some information or labelled dataset and desired outputs. Using this answers are pre-known to evaluate its
b \\in uts meanlllgful tags o d ,, ,, accuracy on training data.
more labels. La e g P . X ray images are labelle as tumour then dataset, it learns to identify the type or class of
explanation about the data, e.g.• tf some · rather now they belong to a category of
· t nlabelled any more, ' the data given to it. Later some data is shown to it to test if it can clearly identify the data
those X-ray images are no u Similarly if some audio-clips are labelled as or not. For example, a labelled dataset of flower images would contain photos of roses

-speeches·, then they ce~I~ gt::


images that show tumours ~f so~e ~ ~rmation that these contain speeches of some

people. Similarly, if an article 15 a e e


d as "news article" it gives information that it
tagged as roses, photos of daisies tagged as daisies and so on for other flowers. When shown
a new image, the model compares it to the training examples to predict the correct label
(see figure below) . The model gets feedback about its result as per the desired outputs and
contains some sort of news.
labelled and labelled data, let us now talk about this way, it learns to classify correctly and that is why it is supervised learning.
After knowing the difference between un . . .
Labelled data (Training Dataset)
ches , as covered m coming subsect10ns.
. . ed.
p three different \earrung bas approa , .i
A
R
5.3.26 Leommg Based Approcches
}
1
~-,
-:Wt
~
J \
,..,....' ' '~
V
.
Prediction

~ -t
The learning based models are developed through three different approaches : , Jasmine
T
~ Unsupervised learning
B
•) Supervised learning
•> Reinforcement learning
.f"•
•,!<
. ?-
g 1~ ~
} ~I"·~.:>--#l ...,.-~-' ~
Before we proceed to the detailed discussion of these learning based approaches, let us talk • j \ Daisy

about some analogy. •,A L "

How would you learn carpentry or wood-worki.ng? Well, you can follow any one of the following ways : Al Model

(,) Approach 1. You approach a carpenter or experienced wood-worker. Then learn Labels
and follow as he tells/ asks you to do, under his supervision. By following his
instructions, and after some repetition, you will learn to make wood items.
(ii) Approach 2. When no carpenter is there but an instruction
} ~ ·.
-r'
~

Test Data
manual is available with some premanufactured pieces. You
Rose
. Peony Orchids
In supervised machine learning, the model learns from

Gk -~· 't~
figure out by carefully looking at the pieces about its use ) labelled data. The data variables a( labelled data have
and fitting and assemble the furniture taking cues from the 4'I ~ 1 •
some relation or association and based on that the
algorithm learns co make predictions.
manual Repeat with many different items and types and Calendula Jasmine Daisy
you wi1l start getting a hang of it.
Figure 5.2 Supervised Learning
(iii) Approach 3. Neither carpenter nor instruction manual is available. All you have is
wooden planks and tools. You will now explore the tools and try to figure out With supervised machine learning, the algorithm (the model) learns from labelled data.
how to reach to a specific result. With every correct move and effort, your face The data variables of labelled data have some relation or association and based on that the
lights up and with every wrong move, you register it so that not to repeat it .
algonthm . .
learns to make predictions, e.g. , th roug h supemse
· d learning , an Al. model can
again. After some tries, trails and errors, you will start making sense of that pile learn to identify if a word used is positive (such as Superb, Bravo etc.) or negative (such as
of wooden dowels and planks.
pathetic, useless etc.).
UP! 1ft

rt
Session 5 MODELLING 99
98 ARTIFl(IAl NTHLIGENCE >.

51 (ii) Regression. Regression refers to a mathematical


approach used to find the relationship beb,een
;Discrete vs. ·Continuous D.1ta two or more variables. Regression works with
~ continuous data. For example, 'how would the
i ou M\'(' l'\'.ld about ,, h<llC number.- anJ dl'<1m,,1 mmih•n;. '? 1010
numbers/integers
h le ., :in,al numr..:r.- l'l'"tX'"l'nt the C\1n11nuous data. Rc,\d on
~ric~s of a specific fruit be affected if its production
• .,_ I
repl'IS'O t ,:I ISCJ'e"· \'.l U,'$ W I ul'\: , · r ' · 1s mc~eased _and there is overall dip in
Discrete data , • ,r.. that in,'01',~ inl\'f;CT:', Only ,1 limited number of values arc production cost can easily be determined through a
15 1
possibl..- The dl.SCrl'tl.' , aluc, ('(lllm,t t,.:, ,-ul-Jiddl-d inl\1 sm,11\er n~11n~ ~rs. For example, the regression model.
number of player.; ma tcam t, d i ~ data. \ou on t'OU~t whole mdi\:duals. ~ou can't say •
that the tc.llll ha~ 11 5 player.-. faamplcs oi di:.'<.-ret-e da~,1 mclude anythmg th~t 1s ~ountable, Some examples of problems that a regression model Regresoon Al model
11
r.g, t/zenumber<lf $ti.dent:; m ada.,, Sumhrrc,f romrutc:~ ma l,ih, N11 m/la of q est,ons 111 a11 exam, may answer, are :
numlxr <:' item., m a ba,l:d and !',Q forth. •> What would be the house prices in the wake of factors like square footage, location and
Continuous data ~ the op~ite oi di:-erete data as it is ,is, a/lit. That is, it can be proximity to schools, hospitals, public transport etc. ?
,ubdl\1ded mto smaller numbers, r.,;:, behn~n l and 2 lie other numbers such as 1.5, between ...,
'Regression
1.5 and .2. thl'7'e he number.< such as 'i.6, 1.7, l.S. 1.9. Between 1.7 and 1.8 lies 1.75. This goes on
•> What will be the temperature of the city
Regression Al model refers
tomorrow if the factors like historical tempe-
and on. Thus, it is continuc,us and can be measured on a scale or continuum and can have to a type of Supervised Leaming
rature data, precipitation, wind, humidity etc. technique. which is based on a
almOsit any numeric value. Height, 1,·eight. temperature, time and so on, all these things can
are available ? mathematical approach used to
be measun.--d and hence !belong to continuous data. find the relationship between
•> What all colleges and streams would be within two or more variables and
5.4.1 Types of Supervised Leeming reach if the qualifying scores are known ? predict the outcome.
p
There are two main areas where supervised learning is useful, based on non-continuous, +> What will be the sales if the historical data of
A
discrete data or continuous data : Classification problems and Regression problems. If demand and sales is available and other factors The regression models use
R continuous data.
you want to know the difference between discrete and continuous data, you may refer to like marketing and target customers are known ?
T
InfoBox 5.1 that discusses this difference. rv_ _ _ _ _ _ _ __
B • 0 Cirtulor Supervised learning is best suited to problems where there is enough labelled data is
(1) Classification. Classification means, as the name
suggests, identifying the class of input value. For • • • • ,. Trl•ngul•r
• •• available to train the model. But those aren't always available. So, in such cases, other

example, in the supervised learning model explained


above, the training dataset was based on labelled
images of flowers. Now upon receiving a new unlabelled •
•••
• • • "' ► "'
"' • "' "'► ◄
"' "' "'
"' "' "'
. types of learning approaches are used.
5.4.2 Advantages and Disadvantages of Supervised Learning
Let us now talk about some advantages and disadvantages of supervised learning.
image of a flower, if it can identify the class type
(which flower type) of the image, it has accurately Advantages
Classification Al model
performed classification. (i) It is computationally less complex.
The classification problems are based on non-continuous (also called discrete) data. (ii) It is a highly accurate and trustworthy method.
Some examples of problems that a classification model may answer are : (iii) It is very useful in cases when a user has an exact idea about the class of
♦ Is this a picture of a specific animal ? r-:
'5lassification projects.
♦ Is this email Spam or not ? A Classifica tion Al model (iv) It works even better and optimally when the user has some prior experience with
refers to a type of Supervised
♦ Is this borrower going to repay their loan or default ? similar cases.
Learning technique, which can
IJ Is this social media post negative or positive (Predict classify the category of new
Disadvantages
the sentiment) ? unlabelled test inputs on the
(i) Computation time is very high for supervised learning.
basis of training data.
I) What is the genre of this song/movie ? (ii) Enough knowledge about the classe~ of subjects is a necessity, otherwise the
~ Which type of gene is this ? The classification models use training of the system may go haywire. ,~ . . '

..
non-continuous, i.e., discrete data.
I) Is it going to rain or not ? UNIT Il : Al PROJECT CYCLE
- - ~
100 4-1\TIFICl._l iNTHllGfNCE...X Session 5 : MODE LUNG 101
. . . with labelled data, the data must be p~e-processed to be in a
(iii) Smee_it Mrks . f data is a huge challenge m terms of representin patterns in one group (or cluster). This way, unsupervised learning creates clusters with a
rertam fonn. Pre-processmg O . • • 9 similar set of values sufficiently different than other clusters. For every new input (also
and in allocated cost & time. Pre-processmg of data has
correct classes corr ectly unlabelled), it tries to put it in a cluster as per its pattern or characteristics and then
huge impact on the overall training of systen,.

,.~{~r·I u
enabling new data to be categorised into an existing cluster. (see figure below)
(i v) The training data must be based on real good working examples as their absence
would largely affect the efficiency of th e system. . No labels Clusier 1
(v) If unwanted data has creeped in, it affects and many a times hampers efficiency.
(v,j Supervised learning system requires continuous updating with all the new ~ 1~-~ '-'~I Clusler2
learning and findings.

"' •
~d-, '!:J./
~

<1?P J/-
~%
INFOSOX 5J .,,,,,.(
~-: \So_ine '.Rea1-world Applications. of ~ *
:· .~S~pervised learning
.. ,__· -- -- -- ~- -- -· /Y ~ ~ - -< ~ '
Cluster3 '
Supervised learning systems are being used in real-world in many applications, such as :
• BioM'etric Identification. Biological Information of humans such as fingerprints, iris
te:dure, earlobe and so on c.an be stored electronically in terms of some patterns.
If ' [:.-i • ~ \(· •)
lt:i u[
p Modem devices are trained in supervised way to recognise and identify people based Figure 5.3 Unsupervised Learning
F on their biometric information, e.g., finger-print unlocking or facial recognition by cell
A
I phones or other devices. Note
R
., Speech Recognition. Machines and devices of today can be taught in supervised way Unsupervised learning approach works with unlabelled data and creates clusters
T of items having similar fea tures, characteristics or patterns.
to remgnise how you speak. Using this, the machine is able to recognise your voice
I B through tonal quality, voice throw and diction. The most common examples of speech
recognition are virtual assislants such as Google Assistant, Alexa and Siri. The training dataset of an unsupervised learning based AI model, is a collection of
ir Spam Detection. Algorithms can be trained to identify the emails with specific
unlabelled data without a specific desired outcome or correct answer. The AI model then
keywords to be termed as spam, e.g., "Congratulations on winning so and so" and so attempts to automatically find structure or patterns in the data by extracting useful
forth. These days even apps are available to which we can choose to tell which features and analysing its structure. For example, after giving it a set of images of animals
keywords need t.o be blocked and the app will block those messages having the keyword. (as training data) without any label/tag/ explanation, it would try to combine images
• Object-Recognition for Vision. Under supervision, the machine can be trained to having similar features or characteristics in one cluster (see Fig. 5.3).
identify some~g • r~ll the Emoji Scavenger Hunt game you played. For this, you
teach. your algo~thm with a_set ~f data and their predicted result. Using this, the Note
machines or devices learn t.o identify and recognise a new instance. Unlike supervised learning where the training data set contains labelled data with
the corresponding outputs, unsupervised-learning works with unlabelled training
data without any corresponding outputs.
5.5 UNSUPERVISED LEARNING

In unsupervised. learning, as the name suggest s, t here 1s ·


· no supervision, no feedback, no 5.5. l Types of Unsupervised Learning
pre-known/desued outputs and not even any 1abelled data. Unsupervised Machine There are many areas where unsupervised learning is useful. However, the three most
. .
Leammg discovers patterns within an existin I'"':
common ones are : Clustering problems, Association and Dimensionality Reduction
·
set of unlabelled data , i.e., . g
th e data without ._JJnsupervised Learning
Unsupervised Learning is a learning problems. Let us briefly talk about these.
having a~y pre-existing labels or categories.
approach of machine where the machines
Unsupervised learning based AI model (th with the help of an algorithm (the model) 5.5. l A Clustering
algorithm) finds the patterns, trends an: learns on an unlabelled dataset where it Clustering is an unsupervised learning approach of AI models, which groups unlabelled
features, and clubs the data having same categorises data on the basis of common
characteristics, features and patterns. data based on their similarities or differences. Thus, we can say that it is because of
11 fl
102 ARTIFICIAL INTELLIGENCE X Session 5 : MODELLING 103
'clustering
. "th.ms an AI model without even ...., . Thus, it is crucial to select feature subset in a way so that the actual meaning and intent of
clustenng algon '. h . t (bird specialist), Clustering 1s an unsuiierv·.

c:
b in an expert omit o1ogzs
;ake a collection of bird photos and separli~kte
.
them roughly by speoes, re lym'9 on cues e
groups
.
unlabelled
f
learnmg approac11 o Al models \ 1sec1. ,
• v1llch
data based 011 ti .
similarities or differences.
1 1
~"
original dataset is not lost. Thus, dimensionality reduction will ensure ways to retain the
meaning, e.g., by also extracting the top view in the subset [Fig 5.4(d)]

feather colour. size or beak shape. Note


Dimensionality reduction reduces the complexity of a problem by reducing the number of
Some examples of clustering problems/applications are : . variab les involved (using feature subsets), concentrating on crucial information/variables
• . A roup of cancer patients may be considered for a special tyP while reducing/removing the unnecessary or less-contributing factors.
♦ Pattern recogrution. 9 . e
· th b''"S of their gene expression measurements.
of treatment on e ""' Some examples of dimensionality reduction applications are :
• Identifying fake news by clustering articles with high percentage of sensationalising
and click-bait terms.
•> Data visualisation applications •> Video and satellite observation compression
♦ Document analysis by clustering and organising s~lar documents quickly using the •> Email classification •> Human gene expressions
characteristics identified in the paragraphs of multiple documents. •> Determining an outcome with a smaller number of variables, e.g., with how less
information can a bank determine if a customer would be able to pay back the loan.
5.5.1 B Dimensionality Reduction
Dimensionality reduction broadly means representing an object in smaller dimensions. Other than Clustering and Dimensionality reduction algorithms, there are some other
algorithms used as unsupervised learning algorithms. One such algorithm is Association,
For example, a 3-dimensional object ~ requires more variables to represent all being covered below.

p its sides and dimensions. For visualisation purposes, we may need to view it in 5.5 .1C Association
A
R
2-dimensional views, i.e., as r ~ (front view) or as □ (side view).
Association is another unsupervised learning technique that finds important relations
between variables or features in a data set. For example, if you pick some home decor
T In other words, we may need to represent it in lesser number of variables for visualisation items such as lamps or shelves in an online shopping cart, it will start suggesting the
B pmposes. Thus, dimensional reduction is required in cases where the goal is to summarise related items such as furniture, rugs and even interior designing firms. This is an example of
the data in a reduced number of dimensions, i.e., by using a reduced number of variables. association, where certain features of a data sample ~A . .
.._,ssocianon
Dimensi~nality is represented via the number of variables, characteristics or features correlate with other features. By looking at a couple
Association is an unsupervised-
present m the dataset. For dimensionality reduction, using unsupervised learning key attributes of a data point, an unsupervised learning technique that fin ds
approach
dim '.featur
th . _es are identified , and then a feature subset is selected out of it to reduce learning model can predict the other attributes with important relations between
e_ ensions m way so that the actual meaning of the object is not lost. which they're commonly associated. variables or features in a data set.
An unportant aspect of dimensionality d r · Some examples of association problems/applications are :
of actual datas t . . re uc ion 15 to ensure that the meaning and sense
e IS retained even while dime · b •
when we reduce the di . nswns are emg reduced. This is important as •> As Recommendation Systems, based on people's own personality/habits (known traits)
mens10ns the informatio t t . .
consider a JD object as sh
0
. 'r . n s ar s gettmg distorted, for example, • People that buy a new home are most likely to buy new furniture and thus
side [Figs. 5 4(b c)J to . ~ m ig 5.4(a). If we click its image from the front and right suggesting furniture items and stores to them.
'n· . _ • ' view m a ZD manner, it might look like a rectangle whereas it is not.
t..., 1mens10na1Jty
Reducrion • Shopper who buys products for children, is most likely to buy children books or look
Dimensionality reduction 15 . for children activity schools and thus suggesting them the same.
an 15
approac h that uses rechniques f d . unsuperv·1sed Jearning
. bl .
vana es rn training data wh·J or re uc1ng the
. . . b
num er of input
•> Anomaly detection for detecting abnormal action or behaviour
' e retaining its sense and . 12

~
• For example, if a credit card is used at the same time in two distant cities it points
' 7 m,am,g to some kind of fraud or anomaly.
30

5 ~ sCIJ 5
18 12
5.5.2 Advantages and Disadvantages of Unsupervised Learning
Front view [=~===,-I,_....;-'-=-_-_JJ 8 Let us now talk about some advantages and disadvantages of unsupervised learning.
(a) (b) Right side view
Top view Advantages
(c)
(d) (i) It is very useful in finding all kinds of unknown patterns and features in data.
Figure S.4
ARTIFICIAL INTELLIGENCE- X Session 5 MODELLING 105
104
it makes it very useful in many reat-
. 1 belled data, Wortct
(ii) As it works with ubnlli~ g of data is not feasible. 5.6 SEMI -SUPERVISED LEARNING
situations where la e n
. d arcing the training of the model takes place in real t· training dataset has
··~ With unsupeMse Ie ' . . lllle, Semi-supervised learning refers to a Leaming approach where the
(1111 • • unlabelled data 1s easier to get than labelled data • Wh·Ich both labelled and unlabelled data. In semi-sup ervised learning , using labelled data
·) T • · g data contairung . d •
(11' r~ . f supervision before bemg ma e availabl e fo r traini ng . supervised learning techniqu es are used and using
reqwres certain type O ' semi-supervised Learning
unsupervised learning technique new unexplored &..,
Semi-supe rvised learning refers
0isodvortoges features are exacted from the unlabelled data. to a learning approach where the
. d l ming ' it is not easy. to commen t. about the accuracy of the Semi-supervised learning is especially useful for training dataset has both labelled and
(0 With unsupem se ea
. t d taset in unlabelled and its structur e 1s largely unknow n · Aft er a unlabelled data and the Al model uses
results as mpu a . medical images like CT scans or MRis. A trained
thought of movmg towards accuracy . a combination of supervised and
. • the results can be radiolog ist can go through and label a small subset
long training, unsupervised learning techniques.
(ii) The training dataset, which is unlabelled, may not contribu te
towards the of scans for t umours or diseases. It would be too Using labelled data supervise d
. time-intensive and costly to manually label all the learning techniques are used and
information intended to be garnered new/unexplored features are extracted
scans - but the AI model can still benefit from the
generated by
(iiO The user needs to spend time interpreting and labelling the classes from the unlabelled data using
small proportion of labelled data and improve its unsupervised learning techniques.
the system after training.
discoveries over accuracy compared to a fully unsupervised model.
(iv) The classes or clusters generated may not fit in with the new
time if new unknown features are unearthe d.
As the semi-supervised learning systems are a
INFOBO X 5.3 mix of supervis ed and unsupervised-le arning
p systems, the advanta ges and disadva ntages of
·. ·: Some Real-world Applications of
A these syst ems are also the mix of the two.
~-,. Unsupervised - Learnin g
R ~.. - .
T
ions, such as :
Unsupen ised learning systems are being used in real-world in many applicat REINFORCEMENT LEARNING
B 5.7
applications in
• Finding customer segments. Oustering algorithms have found great that strength ens or
g data. Marketin g teams need to know about As per Merriam Webster, the word reinforcement means somethi ng
finding customer segments in marketin exactly the same,
based on features like gender, location, age, education , income encourages something. The reinforcement learning approach is
\'arious customer segments as, if they utter a
bracket, and so on. Marketin g teams can then address differen t segmen ts of customers somethi ng similar to the way children and kids are taught things, such
of a wrong answer,
in unique ways to woo them. correct word or answer, they are applauded or rewarded, and in case
is similar to playing video
they are corrected. You can also say that reinforcement learning
• Reducing the complexi~ of ~ problem. Dimensionality reductio n
can project the
a badge or life or bonus;
feature ~pace to a lower-d1mens1onaJ space so that less correlate d variable s are used for games - do a correct thing or choose a better option, eam
a penalty. After repeated ly
pr~essm g. For ~mple, when you want to classify cancer patients from non-cancer similarly, do a forbidden action or take a wrong step, pay
k.m d of d a ta to be preferab ly perform ed.
patit!llts for a \'anety of datasets
.
based on their data ·
The dun· ·
ens1on o
f thi s playing such a video game, the player learns which actions are
can be 60000 hi
_ , w ch is so huge for processing. Howeve r, using Dimensionality Reinforcement learning works in the similar fashion.
Red uCtion (DR) and some other featu t · .
. re ex raction techniques, it can be reduced to few In reinforcement learning approach, the AI model (the algorithm,
also called the agent)
JOOs whil e still producing same results wit . h Ioss of much accuracy.
or improve perform ance on a specific
• Document clustering. Text can b 1 st iterative ly attempt s to accomplish a particul ar goal,
'd . • e c u ered at various levels of granular ity by task, in the best possible way known to it. If the action/ step of the agent is helpful
cons1 enng c1uster obiects as docum ts
algorithms use both . d en 'paragra phs, sentences, or phrases. Clustering toward achieving the goal, it is given a reward. The overall aim of the agent is to predict
supervise and uns uperv15 . d l .
li . . e earning methods . Document
clustering has several
app cations including coll ecfion browsmg · , summar1sahon, · · d the best next step to take to earn the biggest final reward.
an
document classification. :
To make its choices in reinforcement learning , the agent relies on
• Finding fraudulent transactions. F .
legitimate transactions do t h raudulent transacti ons have specific features that ~ the learning s from past feedback
no ave .Based on th'is assumption, machine learning .
algorithm s detect patterns 1-n t·manc1al ·
r •> exploration of new tactics that may present a larger payoff
O
pera ions and decide whether a given
transaction is legitimate. •> a long term strategy keeping in mind the final goal and best possible reward
UNIT II : Al PROJECT CYCLE
~
106 ARTIFICIAL INTELLIGENCE :X
"Reinforcem ent Learn ing Session 5 : MODELLING 107
..... Reinforc ement Learning (n
After each step, the agent gets feedback. As it's an
refers lo an 1\1 learning approach tll) 5 . 7 .1 Types of Reinforcement Learning
iterative process : the mo1e rounds of feedback, t~e . l
trams a gon· 1)1111s us111g
. lul
a systclll of Broadly, there are two types of Reinforcement Learning :
better the agent's strategy becomes (Fig. 5.5). This reward and pena lty. T he lca rn in
technique is especially useful for training robots, system (called agent) !corns in g •> Positive Reinforcement Learning which focusses on increasing rewards t o
• . a 11
which make a series of decisions in tasks such as intcracttvc env iron men t where th e encourage a certain behaviour.
agent iteratively sc~ccts and pcrforni s
walking through dangerous situations like building on
acuons and rece ives rewards 1
•> Negative Reinforcement Learning that focusses on removing or lessening penalty
fire avoiding the places of fue, steeling an auto- performing correc tly and penalties / Y to encourage a certain type of behaviour.
nomous vehicle or managing inventory in a warehouse. performing inco rrec tly. or
Following figure 5.7 explains this.

4$@;;;.;;;;,,m,;;;;;,,;.;;- :@1,am.;m,,;;,;;;;;11:a
.--:f - - ·~'°
Enwonmenl " 1► Observe

Positive behaviour followed by Positive behaviour followed by

lr<__ 2► Select action


using policy
(A)
positive consequences

Employee
creases his/her
I
(A)
removal of negative consequences

Manager stops
nagging the -
l Employee starts
being more
productivity ~ productive

d:J~~n~
1

3► Action
~~
The agent learns by itself
Without l/1e Intervention from
a ~ usmg lhe best strategy Student is more (B) I criticizing -
(Bl
I .likely1n discussion
to participate

~~~J
to max,:na;e reward In a partJcular active in class student's input
p s:tuation usmg Al princrples
4► Get reward
A or penalty
R
Figure 5.7 Types of Reinforcement learning
T
5► Update policy
B {learning step) Typical Practical Applications of Reinforcement Learning include :
•> Training Self-driving cars about how to drive on roads, in traffic situations, with speed
limits, in case of any stationary or moving obstacles in front of the car and so on.
6 ► Iterate until an
optimal policy
•> Training robots to work in hazardous situations and terrains.
is found •> Training robots to perform difficult and dangerous tasks in various industries.
Note Figure 5.5 Reinforcement Learning is Iterative Process where
!n reinforcement learnin g the agent learns by •> Experience-based recommendation systems based on the impact of non-controllable
Feedback Plays an Important Role
Itself, wubout the intervention from a human or unknown factors such as market, e.g., in trading and finance for determining
using the best s trategy to ma ximize reward i~ whether to sell hold or buy stocks.
a particular saua tion us ing Al principles.
Agent (Al model)
•> Text summarizing engines and Dialogue agents that use text and speech (in NLP), such
The main elements of an RL system are (Fig 5.6) : Policy
as Siri, Alexa, Google Assistant etc.
Observation
•>
r
Action In healthcare for determining time-dependent decisions for the best treatment for a
(,) The agent or the learner. o, A,
(ii) The environment the agent interacts with Policy
update patient at a specific time.
(iii) The policy that the agent follows to take
•> In gaming for learning ways to better and better game play and strategies.
Reinforcement
actions. ~ learning algorithm +- and many more
(iv) The reward signal that the
...
observes upon taking actions.
agent
j Reward R, Note
·
Reinforcement learning algorithms maintain
· · a balance between exploration and
.
'Learning to make correct decisions' is
Environment , rn
exploitation. Exploration is the process of trying different things see if t~ey are better
than what has been tried before. Exploitation is the process of trying t_he things that have
the core of reinforcement learning. worked best in the past. Other learning algorithms do not perform this balance.
Figure 5.6 Reinforcement Learning
108 ,._RTIFlt:cA.. NTHUGENC.E X Session 5 : MODELLING 109
es of Remforcement Learn ing
Advantages and D,sadvan tog · f
: C~=~
5.7.2 b
d d"sadvantages
1 of rem orcement ased learning
Let us now talk about the advantages an •
7 rised
Data Is pre-categ CLASSICAL LEARNlt~IG

· "d bl · l b or numerical labelled


Advantages hOle It does not d1vi e a pro em m o su problems • in any way
(i) 1t focuses on the problem as aw • • • •
t data collection step. In RL, trammg data 1s obtained Unsupervised Reinforcemenl
(i1) It does not need a s~para e "th the environment. Supervised
via the direct interacuon of the agent W1 ~ ~'::;{ Agent action.
to learn the behaviour Divid~n~ /
. unlmOwn environments. RL provides a• way Predict / sim,lai ... ,_,,.,.hft " ' -ces reward, penalty
(iii) rt can work easiilYm k
now
l d
e ge or model .
category
. k own environment without any pnor Predict a
of an agent m an un n number Find hidden
Clustering dependencies Associ011on
(fr) It is very useful in solving very complex problems, which cannot be solved using Clossilicolion
Find what
Split up similar
conventional techniques. Divide the socks clothing into stacks clothes/often
by color wear together
M Reinforcement learning produces long-term results, which are very difficult to
Dimension Redudion
achieve with other models. Regression (Generolizolion)
Divide the ties by length
(Vl) Reinforcement learning is very similar to the learning of human beings and thus we Make the best outfits
from the given clothes
can say that with times these models can inch towards perfection, just like humans.
(vii) Reinforcement learning based models can correct the errors occurred during the
training process.
INFOBOX 5.4
p (viii) It controls error repetition. Reinforcement learning based models have shown that Unsupervised Learning vs. Supervised Learning
A chances of occurring the same error are very less once this error is correctly identified . .... - - - ----......--· -- ~- - -,c,-- - -~ - •

R
and corrected.
T (ix) Reinforcement learning models can learn from their experience. Properties Unsupervised Learning Supervised Learning

B (x) Reinforcement learning models can outperform humans in many tasks such as Definition Unsupervised learning is the type of machine Supervised learning is the type of ma~~ine
learning that happens without human learning that happens under human supeMS1on,
playing games, chess and more, e.g., DeepMind's AlphaGo program, a reinforcement meaning people label input data with answer \
supervision. A machine tries to find any
learning model, beat the world champion Lee Sedol at the game of Go in March 2016. patterns in data by itself. keys showing a machine the desired outputs.
(xz) Reinforcement learning models are useful in situations where the training datasets Input data Unlabelled Labelled
are totally missing and with time and interaction, information is to be collected. Use of data A model is given only input variables (X) and A model is given input variables (X), output
Disodvontoges no corresponding output data. variables (Y), and an algorithm to learn the
function from input to output.
(11 Reinforcement learning is not suitable for simple problems. You know what you're looking for in data.
When to use You don't know what you're looking for in data.
(ii) Too much reinforcement learning may hamper exploration which can weaken or Classification and regression problems.
tweak the results. Applicable in Clustering and association problems.

d l f · Accuracy of May provide less accurate results. Provides more accurate results.
(iii) Reinforcement learning is data-he It
avy. nee s a ot o data and a lot of computat10n. the results
(iv) Pure reinforcement learning based d ls ·
. . mo e are not swtable for many generic problems. Algorithms * K-Means * Support vector machines
It should be combined with other AI techniques. * Gaussian Mixture Models * Decision trees
(v) Reinforcement learning models may take time m th * Frequent Pattern (FP) Growth * Random forest
satisfying results. ore an expected for generating * Principal Component Analysis --
* Naive Bayes
(vi) It is expensive, especially in fi Id lik . Use cases * Recommender systems * Spam filters
. e s e robotics where training takes time and robots
and maintain. * Anomaly detection * Demand forecasting
are expenSJve to create
Note * Customer segmentation ** Price prediction
* Preparing data for supervised learning Image recognition
As we can see chat all the learning (unsupervlsed super · d •
' vise reinforcement) base d systems can
adapt as per changes in rhe data being made availabl f ' . .
. models contrary co rule-based model which e or training ' we can say that they are
dynam,c
are comparatively static or fixed. With this we have come to the end of this session. Let us quickly revise what we have learnt so far.
SESSION

Jl 81ological Neu ml Netwo1 ks


/~ A1tificial Neu,al Netwo1 ks (ANN)
j~ features of Neural Networks
Jl Ad, ontages and Disadvantages of Neu, al Networks Neural Networks
A Applications of Neural Networks

6.1 INTRODUCTION

Long ago when we started learning computers, we were taught that even though
computers are very fast and accurate machines, they are not intelligent machines. In
other words, they cannot learn on their own. But years after, here I am, writing here that
modern computers are intelligent machines, thanks to Artificial Intelligence. A major
contribution to this artificial intelligence is through artificial Neural Networks, which can
learn on their own and increase and update their knowledge with time and experience.
This session is dedicated to the discussion on artificial Neural Networks, what they are,
how they work and learn, what their benefits are, and so on. So, let us begin.

6.2 BIOLOGICAL NEURAL NETWORKS

Our brain has a large number(::::: 10 11 ) of highly connected elements(::::: 10 4 connections per
element) called neurons. A neuron has three major components (Fig. 6.1) :
•> Soma. It is a cell body that contains nucleus, and sums and thresholds all incoming signals.
•> Axon. The axon is a long fibre that carries signal from the cell body out to other neurons.
•> numerous Dendrites. Dendrites are tree-like receptive networks of nerve fibres that
carry electrical signals into the cell body.
Along with the three components listed above, an important role is played by synapse.
•> Synapse. The point of contact between an axon of one cell and a dendrite of another
cell is called a synapse.
Message/Information Transfer in Neurons
Signals in the brain are transmitted between neurons by electrical pulses (called action-
potentials or 'spike' trains) travelling along the axon. Each pulse arriving at a synapse
initiates the release of a small amount of chemical substance or neurotransmitter which
travels across the synaptic cleft (the gap) and which is then received by the dendritic side
of the neuron on the other side of the synapse. This is how any information travels across
the neuron circuits.

I~
120 ARTIFICIAL INTfLLIGl NCf X . $-,:;s,on 6 • NEURAL NETWORKS 121
d working of human bram neurons.
Figures 6.l(a), (b) show the structure an
In an ANN, a node (or neurode) is the artificial equivalent to a neuron. It consists of a set
1mpu1.sescemed of weighted mput5 (dendrites), an activation function (soma) and one output (axon).
IOI\ arc! cell body
Information signal travels through multiple layers of connecting neurons before it is

I
Bninches transformed in the form on an output.
[)endn°leS / Terminal of axon
buttons The first artificial neural network
/ Bias is an additional parameter in the Neural Network was invented in 1958 by
which is used internally as per some hidden rules and psychologist Frank Rosenblatt.
Axon called Perceptron, it was intended to
algorithm to adjust the output along with the weighted model how the human brain
Impulses carried sum of the inputs to the neuron. processed visual data and learned to
away from cell body recognise objects.
in the form of
Note
I elecifO-Chemical signals

(8)
-----
Cell body

Structure of human brain neurons


An artificial neuron mimics a biological neuron but its capacity and functioning. is far limited than
a biological neuron. We cannot say that ii functions identically as a biological neuron but we can
say that it functions similar to a biological neuron with a limited capacity comparatively.

t.
Terminal
~ b uttons As this whole session is dedicated to the discussion of artificial neural networks (ANN),
Synapse (Gap at the end of terminal buttons) this point onwards, the terms ANN (Artificial Neural Networks) and Neural Networks (NN)
, / Signal transfer to another would refer to the same i.e., ANNs and will be used interchangeably.
/. , neuron happens at synapse
6.3. l Structure of ANN
p A Neural Network is divided into multiple layers. Each layer of an ANN consists of several
A 'Receivin " cell artificial neurons called nodes . Each node has to perform a specific task and pass the
R information to the next layer.
(b) lnfonnation transfer happens at synapse
T
Figure 6.1 ,' Sum of weights
B --ln_p_u_t1-ay_e_r-.---H-id_d_e_n-la-ye_r_1.---H-id_d_e_n_la-ye_r_2-,-0-u-tp_u_t1-ayi__,,,r We,g. ht 1 ~
The way the neurons are connected together and the nature of the synapses together
determine the function of the particular part of the brain. The brain learns new tasks by ,'
" We1ght2 q L f ~
Output

,, : ~
establishing new pathways and by modifying the strengths of the synapses. ✓ We1~ht n c;::::::J Activation
fr function
6.3 ARTIFICIAL NEURAL NElWORKS (ANN) Bias
Input
Blas is an additional
An Artificial Neural Network (ANN) is a software- or circuit-based simulation of a parameter which is used
biological neural network. An ANN can be thought of as an interconnected assembly of internally as per some
hidden rules and algorithm
simple processing elements, called units or nodes, whose functionality is loosely based on
a biological neuron. The components of an artificial neural network (ANN) are :
Inside an artificial neuron,
Machine learning algorithm
Neuron equivalent of neuron (also called node or neurode) + hidden rules work together
in the form of sum of weights
Weighted inputs equivalent to dendrites and aclivation function to bring
the final answer (output)
Activation function equivalent to soma (It defines how the weighted sum of Each layer can contain any number of nodes and
the each layer may have different number of nodes.
inputs is transformed into an output from a node or nodes in a
layer of the network.) Data flow direction
In an ANN , a no de (o r neurode) is the artificial equivalent
.
to a
Synapse neuron. It consists of a set of weighted inputs (dendrites), an
connection from a neuron to another that carry the information Figure 6.2 Artificial Neural Network
activation function (soma) and one output (axon).
Axon output
S>,1;s;on 6 . NEURAL NETWORKS 123
122 ARTIFICIAL INTELLIGENCE X
. f Uowing are also some terms related to ANN
In addition to the terms you learnt earlier, o s: 6.3.2 Training on ANN
+> PE. Processing Element (the neuron) Training is a necessary process for every ANN, and it is a process in which the ANN gets
Cl Exemplar. One individual set of input/ output data.
familiar with the problem it needs to solve. To train an ANN, we usually have some
t on each connection that scales the data passin collected data (called training dataset) based on which we need to create our predictions.
~ Weight. The adjustable parame er g The training of an ANN can take place in supervised, unsupervised or reinforcement
through it learning styles. You have read about these in the previous session. Out of these training
styles, the supervised learning is the most commonly used with neural networks.
Note . d .1 . Class 9th Artificial Intelligence textbook,
We haw al~o explained the neural network~ 10 et~I s m about neural ne1works. Following steps describe briefly how an ANN is trained :
in part B's unit 3. You ma) refer to that also for earning
(i) Initialize weights for all neurons
There are three types of layers in an ANN : (ii) Present input layer with required exemplar (one set of inputs)
(,1 Input Loyer (iii) Calculate outputs as per the weights and activation function
The first layer of a Neural Network is called the input l~yer, whose role i~ to ~c~uire data (iv) Compare outputs with the expected results
and feed it to the Neural Network. The input layer cames out no processing, it Just takes (v) Update weights if the output produced is not a match.
the input data and passes it on to the next connected layer. Make changes in the weight so that it can calculate a matching output, as per these :
(ii} Hidden Loyer ■ In case of correct guess or output, strengthen/increase the weight of the node
Input layer is connected to a hidden layer, which is further connected to other hidden ■ In case of incorrect guess or output, reduce the weight of the node
p layers or to the final output layer. The role of hidden layers is to process the inputs and (vi) Repeat until all exemplars presented
A carry out a task. The processing at the hidden layers is carried out as :
R Note
Sum of weighted inputs
T One iteration through the process of providing the network with an input and updating tbe
.,. activation function (i.e., machine learning algorithm) network's weights is called epoch. Typically, many epochs are required to train the neural network.
B + hidden rules (such as getting additional parameters such as a bias)
@ Hidden layer(s)
There can be multiple hidden layers in an ANN, (j)
depending upon the complexity of the task(s) being Each layer of an ANN can contain
any number of nodes and each layer
perfonned. Hidden layers are not visible to the user.
may have different number of nodes.
The processed output of a hidden layer is then fed to The back
'\
propagation,
the subsequent hidden layer of the network. the neurons
'
Difference in make internal
(iii) Output Loyer desired values adjustments so
that they can
After processed data travels through multiple hidden reach to the
A Neural Network is divided into correct output
layers, it (the final processed data) is finally fed to
the final layer known as the output layer. The multiple layers (hidden layers in
Back propagation
between input and output layers). output layer
output layer simply provides the final output to the Each layer of an ANN consists of
user. ~t the output layer also, no processing takes several artificial neurons called Figure 6.3 Back Propagation Helps Neural Networks Learn
place; it only provides user-interface for the output. nodes. Each node has to perform a
specific task and pass the processed Like Human brain, when we provide an input or feedback
This point onwards in this session, the term "neural information to the next layer. about the expected output, the brain registers it and
,_.
'Activation Function
An activation function
networks" will refer to Artificial Neural Networks. modifies its way to reach at that conclusion. Based on in a neural network defines
In a neural network, information transfer fr . the received feedback about the difference from the how the weighted sum of the
layer(s), which after req";"ed . s om the input layer nodes to the connected input is transformed into an
' u.u processing a d d · · . correct output, the neurons in ANNs make changes in
reaches to the final layer as outp t N n eczsron-makzng (using activation function) output from a node or nodes
the weights or bias to be used with the activation in a layer of the network.
data and then they keep learnin~ · ·t~ural network~ are initially trained with some input
WI every new input and feedback. function so that they can reach to the correct output.
UNIT LI : Al PROJECT CYCLE
ARTIFICIAL INTELL
IGENCE X 11
124
. ral network defines how the weighted sum of th . 5es-.ion 6 NEUPAL NETWORKS 125
. fu tion m a neu e input
An activatton nc . tr nsformed into an output from a node or nodes in a l s (iii) Th ey ~re ca~able of learning and generalizing. They can keep learning and
. · value) 1s a ayer of
(called activation ·ding feedback about the difference from .- B,1ck Propaguuon
th twork Prov1 ti .__. updating theu knowledge with time, exposure and experience.
e ne . tput is known as back propaga on. Providing rccdb
(iv) They support Black box functioning. There is no need to know the underlying
the correct ou about the cliffe rcnc" f ilck
' rom ti laws or governing equations.
correct output is kno le
Note t onrnionlY used with neural b ac k propagation , wn as Disadvantages
. • ,\ 1hr mos c " · cl
Thr , uprl" ,,,d fearmn!l . . . neural networks are also use .
. k. althou"h o1hN ways of tra1mng ANNs also suffer from some disadvantages. Some common ones are :
nr1111or ,. "

. ti , rror information ii: sent back to tl1e neural network (back (i) ANNs need massive amount of data to be trained. If for a task or situation, a
In ,·a-e ol. wn.'n"o output:-. ,~ e ·t1 . to . functioning
_ ••fi __, accordin,~l\', w1 , an aim correct its
. so a very large set of data containing thousands and \akhs of cases, is not available for
. .
· ) d weigh~ al\' nK,ul = "·
propagation 'an . ~ u ut next time. After initial training, Neural networks keep updating
s training and testing, ANN cannot be developed.
0
to produce a matchin~ tp chin,. to an output with every new feedback received.
th<>ir internal mecharu~m ot rea g An ANN can best be used only if you have a large set (thousands and lakhs
of c~ses,_ or more) of data for 1raining and testing, including all of the
poss ible inputs along with the corresponding correct (desired) outputs.
6.4 FEATURES Of NEURAL r-.:mvORKS
·on we can list the features of neural networks as follows . (ii) ANNs' black box nature also turns out to be its disadvantage. The developer
From the ab ove disCUSSl , ·
(i) Neural networks have been developed to mimic the structure and working of just cannot figure out how or why the ANN came up with a certain output. For
example, when you put an image of a cat into a neural network and it predicts it
human brain.
to be a car, it is very hard to understand what caused it to arrive at this
(ii) Neural networks evolve and automatically learn with each input and each new attempt.
p prediction, because of its black-box nature.
A (iii} Neural networks can work with big data sets.
(iii) ANNs are also more computationally expensive than traditional machine learning
R (iv) The neural networks employ machine learning techniques to function and evolve. algorithms. Deep learning based ANNs can take several weeks to train completely
T (v) The Neural Networks (NNs) are said to exhibit the following two abilities : from scratch in contrast to traditional machine learning algorithms, which take
B (a) Ability to learn much less time to train, ranging from a few minutes t o a few hours or days.
• NN's can figure out how to perform their function on their own. (iv) ANNs are not suitable for every type of problem as these are more complicated
and the development takes much longer (depending on what you want to
• NNs can determine their function based only upon sample inputs
build). Thus, it is important to decide if it is really worth it for expensive
(b) Ability to generalize engineers to spend weeks developing something that may be solved much faster
• NNs can produce reasonable outputs for those inputs for which it has not with a simpler algorithm.
been taught how to deal with, based on its past learning.
6.6 APPLICATIONS OF NEURAL NE1WORKS
(YI) ANNs take ample time to train and need lots of data to train.
Neural Networks have found their applications in multiple fields and areas. Some common
6.5 ADVANTAGES AND DISADVANTAGES OF NEURAL NETWORKS applications of neural networks are being listed below.

With their ability to learn and adapt, NNs and computing have revolutionized the technology •> Character Recognition. Character Recognition, is a process of recognizing text inside
world. Like other technologies, they also offer some advantages and disadvantages. Let us images and converting it into an electronic form. These images could be of
talk about these. handwritten text, printed text like documents, receipts, name cards, etc., or even a
natural scene photograph. Neural networks can be used to recognise characters in
Advantages
various forms, such as in images, printed text or even handwritten characters.
The advantages of neural networks and ANN computing are : •> Speech Recognition. NNs have found their successes in speech or communication
~'.1 They are massively parallel, i.e., they can perform/learn multiple tasks parallelly. recognition. You can see these applications around you in terms of Siri, Alexa, Google
(H) Thlelsy a:e fundamentally fault-tolerant - like biological brains where many brain
assistant and so on.
ce die each day and t 1·ts fu .
. . . ye nction does not deteriorate much, ANNs can a
\so •> Computer Vision. With the help of NNs, computers can accurately understand and
suMve and function m cas . . . process visual data efficiently like videos and images.
e some of its functional units stop functioning.
p 1
ARTIFICIA1 NTELLIGENC X
Session 6 • NEUP.Al NETWORKS 127
~ 126 , ks can receive and process vast amount
. Neural networ . s of 4. In an !\NN, th1: adjustable p,irameter on each connection that scales the data passing through it, is
., Image compression, eful in image compression. Image compr .
aking them us ess1on c,1 ll l'd _ _ .
information at once, m . . s with lesser amount of data.
n and processing U11age (rr) Soma (b) Synapse (c) Weight (d) Axon
-eans ston 9 . . n The day-to-day business of the s~ock market is extremet s. !\NNs ca n be trained using _ _ _ learning style.
41 Stock Market Piedictio · , . h in whether a given stock will go up or down on 'i (11) Supervised (d) All of these
(b) Unsupervised (c) Reinforcement
complicated. Many factors w_eigk can examine a lot of information quickly and so atn_'i
• d Si ce neural net\\Or 5 r it 6. An _ _ in a neura l network defines how the weigh ted sum of the input is transformed into an
given ay. n . diet stock prices. output from a node or nodes in a layer of the network.
all out. they can be used to pie ..
• • N"'' also very useful in recogmtion of a pattern or patte (11) activation function (b) back propagation
P tt Recogrution. "'s are . .
<> a em basically repeated trends m vanous forms of data Frn (c) action potential (d) trairung dataset
recogrution (PR). Patterns are . . · or
b fingerprint image, a handwntten cursive word, a hum 7. In _ _ , the error information is sent back to the neu ral network and weights are modified
exair.ple, a pattern couId e a ' an
accordingly.
face. or a speech signal and so forth. . . .
(11) activation function (b) back propagation
. t ti'on
m· medical diagnosis, secunty, image obJects, financial
.. Detection. 0e ec . (c) action potential (d) training dataset
• 1 • .:tv ... ult in a "''stem, are being enhanced through ANNs application. Thus
uregu1<1u _, a 1e1 ~, • • • h d' • '
ANN plays an essential role in the detection and diagnosis, sue as 1agnos1s of breast 8. What is an activation value ?

cancer, crime detection using DNA and so on. (11) Weighted sum of inputs (b) Threshold value
(c) Main input to neuron (d) None of these
•> Travelling Salesman's Problem. The travelling .Sales~an problem is ~efined as .
Given a set of cities a,d distance between every pmr of cztzes, the problem 1s to find the 9. ANNs are said to have mainly these two abilities :
p shortest possible route that visits every city exactly once and returns to the starting (11) Ability to code (b) Ability to learn (c) Ability to design (d) Ability to generalize

A point. Interestingly enough, neural networks can solve the travelling salesman 10. Which of the following is/are true for neural networks ?
R problem. but only to a certain degree of approximation. (11) The training time depends on the size of the nehvork.
T • Miscellaneous Applications. These are some very interesting applications of neural (b) Neural networks can be simulated on a conventional computer.
B networks in finance, loan applications, medicine etc. where using a neural network (c) Artificial neurons are identical in operation to biological ones.
they will decide whether or not to grant a loan whether this security lapse is (d) Artificial Neuron Networks can solve any and all type of problems.
repeatable; whether this ailment will relapse and so forth. 11. What is/are the advantage(s) of neural networks over conventional computers?
(11) They have the ability to learn by example.
With this we have come to the end of this session. Let us quickly revise what we have (b) They are more fault tolerant.
learnt so far.
(c) They are more suited for real time operation due to their high 'computational' rates.
(d) All of these
Check Point 12. What are the disadvantages of ANNs ?
(11) They need massive amounts of data for training and testing.
1· - -.- arc modelled on the human brain, it is essentially a machine learning (b) They are not suitable for every type of problems.
algorithm useful for
solving problems when the dataset is large. [CBSE o 2021-;2 (Term-1)] (c) They are computationally extensive. (d) All of these
(a) Computer Vi~ion
(b) Data Science
(c) ."Jatural Language Processing Competency Based 0!1estions
(d) Neural Network
2- The full form of AJ\:N s is
13. An online proofreading tool is very popular for automatic contract reviews. It is designed for
(a) AI Neural !'l:etworks attorneys, notaries, and other professionals who deal with multiple legal documents daily. The user
(b) Artificial Neural Node
(c) Artificial l\"eural Numbers uploads a document in PDF, Word or plain text file, and in less than a minute is able to read a
(d) Artificial Neural Networks summary report. This proofreading tool uses a neural network What type of data, do you think,
3. What are the three main parts of a Neuron ?
it'd be trained on ?
(a) Dendrite, Axon, Aoma
(b) Camio, Samia, Dendrition (n) Lots of images (b) Lots of Characters
(c) Dendrite, Soma, Axon
(d) D~drite, Axon, S6ma (c) Lots of professional documents (d) Lots of documents
S,rnion 7 EVALUATION 133

be correct answer and not an assurance that he has learnt the concept. In the
sess10i-.' same way, for an AI model, it is important to assess if it is a result by
memorisation or by learning. Learning is crucial for the success of an AI model.
ll What is Model Evaluation ? An AI model that is predicting result based on
memorisation will only predict correct results for Learning is crucial for the
ll Model Evaluation Metrics known data (the training data) and not for unknown success of an Al model.
data. Hence, such a model is an inefficient model.
evaluatton (ii) If an AI model is operating correctly and optimally, that is, to ensure if the AI
model is accurate, smart and good at learning. Thls is ensured by evaluating the
AI model through various evaluation metrics and ,-Evaluation Metrics
addressing the causes behind the performance of an ,._, Evaluation Metrics refers
AI model. Evaluation metrics refers to the measures to the measures used 10 test
used to test the quality of the AI model. the quality of the Al model.

7.1 INTRODUCTION
7.2.2 Causes Behind Performance of Al Model
. . (Modelling, Neural Networks), you learnt how to create AI
In your p~evious se5:51o; unsupervised or reinforcement learning, using either simple Let us now briefly discuss various causes behind the performance of an AI model. Data
mode~ USUl9 _supel'V!Seural, networks. While you can create or develop multiple/different plays an important role in the performance of an AI model and these causes also highlight
machine leanung or ne b d .d. b how the AI model is using and utilising data. The causes behind the performance of an
• th Modelling phase of AI project cycle, ut eci mg a out 'which AI
types of Al models m e • l' · · ky T AI model are :
I model to fin.allY ch oose,, or 'which AI model will be optima
. 1s tnc . o answer these
• 'E 1 ti" , of AI models is done based on certain performance parameters. In 1. Overfitting
f ouestions, va ua on . , .
1
thissession, we shall discuss about what 'Evaluation of AI models means: what di~ere~t Overfitting means the AI model performs so well against only known data, i.e., the
metrics and tests are used for it and how the best AI model out of available ch01ces 1s training data or the data very much similar to it (i.e., fits the data). However, the AI
I picked. model fails to fit the unknown data, i.e., cannot predict the r-Overfiuing
reliable results for unknown data. An overfitted model will - Overfittiog refers to a
7.2 WHAT IS MODEL EVALUATION? appear to have a higher accuracy when you apply it to the situation when an AI model
training data. The model developers may take it as the performs so well as the test
Evaluation, in general, refers to evaluating a system or ~ valuation success of the AI model, thinking it as highly accurate data it got, fitted exactly
device in a systematic way to check its merit, correctness '-' against its rraining data and
Evaluation refers to model, whereas in reality it will underperform in thus Al model always
and performance as per a set of standards. In AI project systematically checking and production when given new data. produced correct result.
cycle, we can say that, Evaluation refers to systematically analysing the merit, correct·
ness and reliability of an Al 2. Underfitting
checking and analysing the merit, correctness and

-
model based on the outputs
relialnlity of an AI model based on the outputs produced produced by it. Underfitting is the opposite of overfitting. It happens when ""underfitting
by it. AI model is not complex enough to accurately capture the Underfitting refers to a
structure and relationships of training data so as to use the situation when an AI model is
7.2.1 Why Evaluation is Important ? not complex enough to
dataset's features for producing a specific result. An capture the structure and
Evaluation is crucial for any AI model to succeed in real world applications. As the ai~ of underfitted model results in problematic or erroneous relationships of its training
an AI model to produce results for any unknown new data, based on its learning, outcomes on new data, or data not same as its training data and predict effective
evaluation ensures that it is truly happening. data, and often performs poorly even on training data.
outcomes.

Evaluation checks and assesses an AI model for these :


3. Generalization
(I) If an_AI model is producing result because of its learning or memorisatio;~ Generalization refers to how well the concepts learned by a machine learning model apply
e.g., if a student mugs up some questions and by chance a question same as tld to specific examples not seen by the model when it was loaming. The goal of a good ~
one he mug d ' ' · wou
ge up comes in exam, he will answer that beautifully. But th is
UNIT II : Al PROJECT CYCLE
U2.
~
134 ARTIFICIAL INTELLIGENCE- X Sess,on 7 EVALUATION 135
Ideally, ~n Al model hou)d
5
. to eneralize well from tl~e bal,,nccd hl'lwccn und ( b~
machine learning model is ;he problem domain. This . . er 1111 Before we proceed to how to create and use confusion matrices, it is important to discuss
and ov<'r(1t1111g lo br a u ng
training data to any data from . the future on data the ' ,,0 0 1I flt, some terms associated with it. These are :
allows us to make pre dictions in (i) True Poslttve (TP) , True positive refers to an instance for which both predicted
model has never seen, ed b tween underfitting and overfitting to be a value of the AI model and actual value are positive. For example, while testing a
ld be balanc e 90od
Ideally an AI model shou th two biggest causes for poor performance of h patient for Covid, if the test also produced the result (predicted value) as
• · derfittin9 are e · · 'l! positive and the actual result (actual value) is also positive, it is True positive.
fit. Overfittmg and un d lopers have to use certain protection means to contra\
models. And th us, eth model eve d · (ii) True Negative (Til). True negative refers to an instance for which both predicted
. trike the balance and create a goo fit AI model
the overfitting and underfittmg to s . value of the Al model and actual value are negative. Fot example, while testing a
patient for Covid, if the test also produced the result (p1edicted value) as
Note f poor performance of Al mod els negative and the actual result (actual value) is also negative, it IS True negative.
b b,1-ge,t causes or
o ,·u firting and undrr(imng art l ~tecuon methods to control these two.
" . i,.,-o . •
and thtt~, model dr,·eloper., emplo~ e1•rtatn p (iii) False Posibve (FP) (also called Type I Error) . False positive refers to an instance
for which predicted value of an AI model is positive but actual value is negative.
7.3 MODEL EVALUATION METRICS For example, while testing a patient for Covid, if the test p10duced the result
. (predicted value) as positive and the actual result (actual value) is negative. it is
. ef to a om::tem or standard of measurement used to assess
In techrucal tenns, metncs r eIS ~r False positive.
. ld f AI there are metrics to test and evaluate a developed AI model
something In the vmr o , b d • (iv) False Negative (FN) (also called Type II Error). False negative refers to an
. ·. . t d efficient enough. As AI models are ase on different types of
to assess if 1t is accura e an . . instance for which predicted value of an AI model is negative but actual value is
. ch
algonthms su as c
lassification regression deep learrung etc., there are different types
, , .. . positive. For example, while testing a patient for Covid, if the test produced the
p . dt
ot metncs use o ass
ess the AI models based on these, such as : Classification Metrics
. result (predicted value) as negative and the actual result (actual value) is
A (Car.fusion Matrix, Accuracy, Precision, Recalt Fl~score, ... ) ; Regression ~etri~s (MSE, positive, it is False negative.
R MAE) ; Deep Learning Related .Metrics (Inception score, Frechet Inception .distance).
T Covering all these metrics is beyond the scope of ~e book. So, w_e shall s_~ck ~o the The first two terms signify when both the actual and predicted values match. In the 3rd
syllabus and scope, and will cover some basic metrics for evaluating classification AI and 4th terms listed above, the latter part (Positive/ Negative) represents the predicted
B models. These metrics calculate some score which indicates how correct the AI model's value and the False means the actual value is opposite of the predicted value.
prediction is - the higher the score, the better our model is. For example, if looking at picture, the AI model has to identify if it is the picture of
vegetable Lady finger, then True Positive/ Negative and False Positive/Negative will be
7.3.1 Confusion MotriY identified as :
A Confusion .Matrix is a technique using a chart or table for summan·zing the performance
of a classification based AI model by listing the predicted values of an AI model and the True Positive False Negative
actuaVcorrect outcome values.
A confusion table includes both predictive and actual values in context of AI model,
which are :
~ the Actual Va1... e represents the actual result (observed or measured).
Result. Lady finger Vegetable Result. NOT Lady finger Vegetable
Actual Values
False Positive True Negative
True False
'> the P~"dicted Value is the value of the outcome/result of the AI model, produced on
the basis of its algorithm and learning.

Predicted Values
Result. Lady finger Vegetable Result, NOT Lady finger Vegetable
Positive
Negative
UNIT ll : Al PROJECT CYCLE
ARTIFICIAL INTELLIGENCE-X • Session 7 EVALUATION 137
136 tives rrue Negatives and False negatives
p,
. g the number of True Positives, False .os_, prep,ared which takes the format as sh out Table 7.2 Confusion Motrix for on Al model being evoluoted ofter 630 tests
Usm C fusion Matnx is , own
of Total Conduded Test.s, a on
below:
Table 7. 1 : Confusion Hotri>. rormot Positive 110 (TP) 50 (F/11

Negative Negative 60 (FP) 410 (TN)


· · ·· ---~P~oi: si~·ti~ve= ---f---:-~=~==~~
No. of Folse Negatives (FN)
~I
True
It means, out of N = 630 tests :
No. Of Positives 1 ' rvpe II error
-Positive (1)
True Positives (TP) = 110
No. of True Negatives (TN)
False Positives (FP) = 60
l
No. of Folse Positives (FP)
Negative (0) T~ 1 error False Negatives (FN) =50
match they are True positives or True negatives. True Negatives (fP) = 410
• When both the predlCled and actual va 1ues •
Total = 630
Note ..
. bl (N' rv marrix) for summanzrng Let us now understand each of the evaluation metrics mentioned in the previous section
A Coofu~foo Matru is a technique usrng a chart or ta ~ '. x 'th edicted values of an
the perfonnance of a classification based AI model by lisung e pr and learn how to calculate it.
AI model and the actualtcorrect outcome values.
1. Accuracy
Using the confusion matrices, you need to compute the following values to evaluate an A prediction by an AI model is considered correct only when the predicted result (the
Al model: outcome) matches the actual value (the reality). Accuracy of an AI model is determined as
~ Accuracy rate. This is the percentage of times the predictions out of all the the percentage of correct predictions (all True cases, i.e., TP + TN) out of all the
observations or tests-conducted (N, i.e., TP + FP +TN+ FN)
observations are correct.
~ Precision rate. This is the rate at which the desirable predictions turn out to be The Formula to determine Accuracy is :
correct (True Positives out of all positives). Number of correct predictions (TP + TN) tooo'/o
♦ Recall It is a rate of correct positive predictions to the overall number of positive Accuracy =---------=------'---~--x
Total number of predictions made (TP +TN+ FP + FN)
instances in the dataset.
Thus, the accuracy for the AI model, as per the Confusion Matrix given in Table 7.2
♦ Fl score. It is a measure of balance between precision and recall.
would be:
We shall discuss these evaluation metrics in details with formulas, in the corning section 110(TP) +410(TN) '/o
A ccuracy = - - - - ~ . ! . - - - 2 . . . . . . . . ! . . . - - - x 1000
with the help of an example. 110(TP) +50(FN) +60(FP) +410(TN)

7.3 .2 Evaluation Metrics using Confusion Matrix 520


Accuracy =- x 100% =82.5396825%
Let us now learn to compute evaluation metrics with the help an example case given 630
below: Thus, the Accuracy of our sample AI model is 82.5396825%
The accuracy metric itself is not sufficient to determine the efficiency of an AI model as it
'EX_ampfe Case 1 can be misleading in cases where the training data is not a balanced one. Thus, other
Let us assume that you developed an Al model that tests pooled specimen metrics must also be taken in account to evaluate an AI model.
(bl~o~/un~e/mucus/cell tissues etc.) to diagnose some ailment (say Covid). After its
trai~mg W1 th sample collection of specimens whose accurate results were known to you, P0st Note
The accuracy metric itself is not sufficient to determine the efficiency of an AI model as it
testing, you are now ready to evaluate your AI model. For your Al model, you conduct about may mislead when the training data is not balanced (known as Accuracy Paradox).
630 tests and the confusion matrix with these 630 tests results looked like :
UNIT 11 : Al PROJECT CYCLE

~
r 138 ARTIFICIAL INTELLIGENCE X Session 7 EVALUATION 139
n
2. Precision . predicted positive out of all its posit1v Depending upon the type and use of model, false positives or False Negatives may result
/ ccurate it tru 1y .. t f e into high costs and risk.
Precision refers to how a t ge of rrue Pos1tiVe cases ou o all the cas
•t · the percen a es
redictions. In other wor ds, is
1
·tt· and False Positives). A model with high Consider the examples given below :
P . . (True pos1 ve5
where the prediction is true
~ For an AI model developed to check for gas leakage for a manufacturing unit,
precision is trustworthy. TP False Negatives (i.e., it shows no gas leakage while in reality it is) will be highly risky
Thus, the tonnula for Precision rate is : ~ and may cost lives and money.
•> For an AI model developed to locate the position of lump not vislole through normal
,. t is . _!!,__ X 100% diagnosis, a False Positive for a body part may lead to a wrong operation/surgery and
rn percentage, Preos1on ra e · (TP +FP) the patient's life may be in danger.
.•
Thus, precl.Slon for our samp1e
AI
mo
del as per table 7.2's confusion matrix is : •> For an AI model developed to test if a customer can payback their loan considering
various parameters before approving their loan application, a False Positive will
A model with high precision
is considered trustworthy. result into turning down the application of a valid and rightful applicant.
•• 110 100~0 - llO X 100% = 64.7058824%
Preas1on = - - - >< -
170 •> For an AI model to predict the onset of calamities like flood danger etc., for a Dam
110+50
management group, a False Negative wm result to huge losses of money, resources
. •· t for our sample AI model is 0.64706 ; in percentage 64. 7058824% and human lives.
Thus. th e preasion ra e
3. Recall To minimise such cases, Precision and Recall metrics are used. Precision is used as a
p Recall indicates out of all actually positive values; how man! ar~ ~redict~d positive. metric when our objective is to minimize false positives and Recall is used when the
A Recall measures fraction of positive cases that are correctly identified. It 1s a ratio of objective is to minimize false negatives.
R correct positive predictions to the overall number of positive instances in the dataset.
T Note
The formula to compute Recall is Precision is used as a metric when our objective is to minimize false positives and Recall
B TP is used when the objective is to minimize false negatives.
Predictions actually positive
Recall · . values m
Actual poSitive . th e datas et (TP+FN)
Relevant elements
TP
In percentage = - - - x 100% False negatives True negatives
(TP+FN) 0 How many selected How many relevant

For our example case given above (Table 7.2), •• • 0 items are relevant? items are selected?

Recall =~x100% = llO x 100% = 68.75% ~


•0 0
110+50 160
• True False


positives positives = Precision = Recall= - -
Thus, the Recall for our sample AI model is 0.6875 and in percentage 68.75%. Selected-
elements

• • 0
As we can see, the Precision (::: 64.71%) and Recall (::: 68.75%) are both lower than
Accuracy (:::8254%), for our example case.
• 0
0

• •
0
0 0
Significance of Precision or Recall Metrics

Carefully examining the formulas of Precision and recall, you will observe that Precision Figure 7.1 Precision vs. Recall
co~ts the F~lse Positives while Recall takes False Negatives into consideration. How is
this mfo~~tion useful ? Precision is used as a metric when our objective is to minimize But we cannot increase the values for both these metrics simultaneously (it is
false posztives and Recall is used when the objective is to minimize false negatives. mathematically impossible) because both are inversely proportional to each other. Thus, we
Let us try to understand this statement. Read on. have to decide which is more important in our situation or depending on our case/pwblem. ~
UNIT II : Al PROJECT CYCLE

~
~
140 ARTIFICIAL INTELLIGENCE X
. . un
· portant in our case, we can then optimize
0
Session 7 EVALUATION
141 ~
Once we
d ·a •hich metncl 1st more
eo e " d metric. For this purpose, another metric, Fl Score ur. 'f.xpmpfe Case 2
model perfurmance on the se ec e Pr . . and Recall values. ls
useful that strikes a balance between ecis1on For an AI model developed to check if a painting is authentic or not, its confusion matrix is
given below {O means False and 1 means True).
4. Fl Score (F..Meosurel
When avoiding both False Positives and False Negatives, is equall! important for our
ff between Precision and Recall, which Fl Score metric Predicted Predicted
N = 192
problem. we need a trade-O .. 0 1
•a Fl Score refers to a metric that balances PrectSton and Recall and hence
provi es. • d
balances the impact of False Positives and False Negatives. It is
·
compute as per the Actual
118 12
0
following formula :
Actual
47 15
precision x recall TP
F1=2x£..:..:..:.;_.---:-.:--
precisi.on +recall TP + ~ (FP + FN)
As per given confusion matrix :
So fl score for our example case 1 (Confusion matrix in Table 7.2) will be :
110
True Positives (TP) =15
fl _ 0.66666666666
True Negatives (TN) = 118
( 110+ ~x{60+5o))
False Positives (FP) = 12
So, for our example case, the fl score is 0.6666666.
False Negatives (FN) = 47
p
Fl Score refers to a metric
A In an ideal situation both Precision and Recall wm be 100"/o that balances Precision and
Various evaluation metrics are calculated as :
R (i.e., value of 1). In that case, the fl score would also be an Recall and hence balances
the impact of False Positives
(TP+TN)
T ideal 1 (1000/o), known as the perfect value for fl Score. The Accuracy
and False negatives. (TP+TN +FP+FN)
B metrics Precision , Recall and fl score range from 0 to 1.
(15 + 118)
0.45547945205 =45.547945205%
How Precision and Recall impact the fl score is listed in table below : (15 +118 + 112 + 47)

PrecisiDn Recall F1 Score


TP
The metrics Precision, Precision
Low Low Low (fP+FP)
Recall and Fl score range
Low High Low from Oto 1.
High Low Low =~ =0.55555555555
15+12
High High High

TP
f 1 scure combines the two metrics - Precision and Recall of Recall
(TP+FN)
a model into one. Ahigh Fl score indicates a high value for For all the Al models deve·
loped, the AI model with the
both Recall and Precision. Fl score is very useful when we = ~ =0.24193548387
higher Fl score is chosen. 15+47
need to compare two or more classification AI models for the
same data. We opt for the AI model whose F1 score is higher.
TP
Note Fl Score
([P+(FP+FN)/2
The metrics Prtcision , Recall and Fl Kort range from O to 1.
15 0.33707865168
15 + (12 + 47) / 2
Let us now compute these met ncs
· values for another example case.

You might also like