Module2 PPT

Sri Sri Sri Dr.
Nirmalanandanatha Maha Swamiji

Padmabhushana Sri Sri Sri
President, Sri Adichunchanagiri Shikshana Trust ®
Dr. Balagangadharanath Maha Swamiji
Sri Sri Dr. Prakashanatha Swamiji

Managing Director, BGS &
SJB Group Of Institutions & Hospitals
Prof. B. G. Sangameshwara, ME., Ph.D

Dr. K. V. Mahendra Prashanth, ME., Ph.D Advisor, SJBIT
Principal
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 1

Artificial Intelligence
&
Machineth
Learning
5 Semester
CSE, 21CS54
Bhaktavatsala Shivaram
Adjunct Faculty
Module 2
Informed Search Strategies
• Greedy best-first search
• A* search
• Heuristic function
Machine Learning
• Introduction
• Understanding Data

Model 2: Informed Search
B INFORMED SEARCH UNINFORMED SEARCH
a
• Search with Information • Search without Information
c
k • Use Knowledge to find steps to • No Knowledge
g solution
r
• Quick Solution • Time Consuming
o
u • Less Complexity • More Complexity (Time, Space)
n • Eg: A*, Heuristic DFS, Best First • Depth First Search, Breadth First
d
Search Search, etc..
• Also called as “Heuristic” search • Also called as “Blind” search

Model 2: Informed Search
B INFORMED SEARCH • Informed search algorithm
a
• Search with Information contains an array of knowledge
c
• Use Knowledge to find steps to such as how far we are from the
k
g solution goal, path cost, how to reach to
r goal node, etc. This knowledge
• Quick Solution
o help agents to explore less to
u • Less Complexity the search space and find more
n • Eg: A*, Heuristic DFS, Best First efficiently the goal node.
d
Search • The informed search algorithm
• Also called as “Heuristic” search is more useful for large search
space.
• Informed search algorithm uses
the idea of heuristic, so it is also
called Heuristic search.
Model 2: Informed Search Strategies – Greedy Best First Search
G Greedy Best First Search:
r • is an AI search algorithm that attempts to find the most promising path from a
e given starting point to a goal.
e • It prioritizes paths that appear to be the most promising, regardless of whether
d or not they are actually the shortest path.
y • The algorithm works by evaluating the cost of each possible path and then
expanding the path with the lowest cost.
B • This process is repeated until the goal is reached.
e
s
Greedy Best First Search AI Algorithm:
t • Use heuristic function to determine which path is the most promising.
• The heuristic function takes into account the cost of the current path and the
S estimated cost of the remaining paths.
e • If the cost of the current path is lower than the estimated cost of the
a remaining paths, then the current path is chosen.
r • This process is repeated until the goal is reached.
c 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 12
G • Always selects the path which Algorithm
r appears best at the moment
e • Priority Queue “Y” containing
• It is a combination of depth-first
e initial states
d
search and breadth-first search Loop
y • It uses the heuristic function and • If Y= Empty Return Fail
• Else
search • Node <- Remove- First (Y)
B • With the help of best-first search, • If Node = Goal
e • Return path from Initial
at each step, we can choose the
s state
t
most promising node. • Else
• In the best first search algorithm, • Generate all successors of
NODE
S we expand the node which is • And insert newly generated
e closest to the goal node and the NODE
a minimum cost is estimated by • Into Y according to cost
r value
heuristic function. End Loop
G • The evaluation function is f(n) = h(n)
r where, h(n) = estimated cost from node n to the goal
e
e • Greedy search ignores the cost of the path that has already been
d traversed to reach n.
y • Therefore, the solution given is not necessarily optimal.
B
e
s
t
S
e
a
r
G Greedy Search Example1
r
e A 75 f(n) = h(n) = straight Open Closed
118 B
e 140 Line distance heuristic 1. [A] []
d [A]
C 2. [E,C,B]
y E
111
99 3. [F,G,C,B] [A, E]
B D 80
F 4. [I,G,C,B] [A, E, F]
e
G
s 211 5. [E,B,D] [A, E, F, I]
97
t
H 101
S I Traverse Path = A -> E -> F-> I
e Distance (A-E-F-I) =253+178+0 = 431
a
r
Path Cost (A-E-F-I) =140+99+211 = 450
r
e A 75 f(n) = h(n) = straight Open Closed
118 B
e 140 Line distance heuristic 1. [A] []
d [A]
C 2. [C,E,B]
y E
111
99 3. [D,E,B] [A, C]
80 ** 220
B D
F
e
s G
211 INFINITE LOOP
97
t
H 101
S I
e
a
r
G • Greedy best first search can start down an infinite path and never
r return to try other possibilities, it is incomplete
e
e • Because of its greediness the search makes choice that can lead to a
d dead end; then one backs up in the search tree to the deepest
y unexpanded node.
▪ Greedy best first search resembles depth-first search in the way it prefers to follow
B
a single path all the way to the goal, but will back up when it hits a dead end.
e
s ▪ The quality of the heuristic function determines the practical usability of greedy
t search
S
e
a
r
r
e A 7 Straight Line Distance Open Closed
11 D
e 14 Given (say): 1. [A] []
d B 18 A -> G = 40 2. [C,B,D] [A]
25
y C B -> G = 32
10 C -> G = 25 3. [F,E,B,D] [A, C]
15
8
B
F
D -> G = 35 4. [G,E,B,D] [A, C, F]
e E E -> G = 19
s 20 F -> G = 17 5. [E,B,D] [A, C, F, G]
9
t G -> G = 0
H 10 H -> G = 10
S G Traverse Path = A -> C -> F-> G
e Path Cost (A-C-F-G) = 14+10+20 = 44
a
r
Distance (A-C-F-G) = 40+19+17 = 66
r
e S Node (n) ->H(n) Open Closed
3
e 2
A -> 12 1. [S] []
d B -> 4 [S]
A B 2. [B,A]
y 4 C -> 7
1
3 1
D -> 3 3. [F,E,A] [S, B]
B E -> 8 4.
C E F [G,I,E,A] [S, B, F]
e D 5 F -> 2
2 3
s G -> 0 5. [I,E,A] [S, B, F, G]
t H -> 4
H I
G I -> 9
S Traverse Path = S -> B -> F-> G
S -> 13
e Path Cost (S-B-F-G) = 2+1+3 = 6
a
r
Distance (S-B-F-G) = 13+4+2+0 = 19
G • Greedy search is not optimal
r
e
• Greedy search is incomplete without systematic checking of repeated
e states.
d • In the worst case, the Time and Space Complexity of Greedy Search
y
are both same = O (bm)
B
where b – branching factor and m – max path length
e
s
t
S
e
a
r
G
Advantages: Disadvantages:
r
e • Simple and Easy to • Inaccurate Results
e
Implement • Local Optima
d
y • Fast and Efficient • Heuristic Function
• Low Memory Requirements • Lack of Completeness
B
• Flexible
e
s • Efficiency
t
S
e
a
r
G Few applications of Greedy Best First Search
r
Pathfinding: used to find the shortest path between two points in a graph. It is used
e
in many applications such as video games, robotics, and navigation systems.
e
Machine Learning: used in machine learning algorithms to find the most promising
d
path through a search space.
y
Optimization: used to optimize the parameters of a system in order to achieve the
desired result.
B
Game AI: used in game AI to evaluate potential moves and chose the best one.
e
Navigation: used to navigate to find the shortest path between two locations.
s
Natural Language Processing: used in natural language processing tasks such as
t
language translation or speech recognition to generate the most likely sequence of
words.
S
Image Processing: used in image processing to segment image into regions of
e
interest.
a
r
Model 2: Informed Search Strategies – A*Search
A A* Search
*
Greedy Best First Search minimizes a heuristic h(n) which is an
S estimated cost from a current state n to the goal state
e Greedy Best First Search is efficient but it is not optimal and not
a complete
r
Uniform Cost Search minimizes the cost g(n) from the initial state to
c
h the current state n.
Uniform cost search is optimal and complete but not efficient
A* SEARCH:
It combines Greedy Best Search and Uniform Cost Search to
get an efficient algorithm which is complete and optimal.

Model 2: Informed Search Strategies – A* Search
A • A* evaluates nodes by combining Algorithm
*
g(n), the cost to reach the node i. Enter Starting node in OPEN List
S and h(n), the cost to get from the ii. If OPEN List is empty Return FAIL
e node to the goal iii. Collect node from OPEN list which
has smallest value (g+h)
a • f(n)=g(n)+h(n)
r a. If node = Goal . Return Success
c
f(n) is the evaluation function which iv. Expand node n and generate all
h gives the cheapest solution cost. successors
g(n) is the exact cost to reach node a. Complete (g+h) for each
successors
n from the initial state
v. If node n is already OPEN/CLOSE,
h(n) is an estimation of the attach to backpointer
assumed cost from current state (n) vi. Continue iii.
to reach the goal.
A A* Search Example 1
*
A 75 f(n) = g(n)+ h(n)
118 B
S 140
[447] [449] f(c) = g(c)+ h(c)
e C f(c) = 118+329
a E
111 [393] f(c) = 447
r 99
c D 80 f(e) = g(e)+ h(e)
F
h f(e) = 140+253
G f(e) = 393
211
97
f(b) = g(b)+ h(b)
H 101 f(b) = 75+374
I
f(b) = 449

*
A 75 f(n) = g(n)+ h(n)
118 B
S 140
[447] [449] f(g) = g(g)+ h(g)
e C f(g) = (140+80)+193
a E
111 [393] f(g) = 413
r 99
c D 80 f(f) = g(f)+ h(f)
[417] F
h f(f) = (140+99)+178
[413] G f(f) = 417
211
97
H 101
I

*
A 75 f(n) = g(n)+ h(n)
118 B
S 140
[447] [449] f(h) = g(h)+ h(h)
e C f(h) = (140+80+97)+98
a E
111 [393] f(h) = 415
r 99
c D 80
[417] F
h
[413] G
211
97
H 101
[415] I

*
A 75 f(n) = g(n)+ h(n)
118 B
S 140
[447] [449] f(i) = g(i)+ h(i)
e C f(i) = (140+80+97+101)+0
a E
111 [393] f(i) = 418
r 99
c D 80
[417] F
h
[413] G
211
97
H 101
[415] I Traverse Path = A -> E -> G-> H->I
[418]

A A* with f() not admissible when h(n) overestimates the cost to
*
reach the goal state
S
e
a
r
c
h

*
A 7 Straight Line Distance
11 D
S 14 Given (say):
e B 18 A -> G = 40
25
a C B -> G = 32
r 10 C -> G = 25
15
8
c
F
D -> G = 35
h E E -> G = 19
20 F -> G = 17
9
G -> G = 0
H 10 H -> G = 10
G Traverse Path = ?

*
S Node (n) ->H(n)
3 2
S A -> 12
e A B B -> 4
a 4 C -> 7
3 1
r 1 D -> 3
c C E -> 8 .
E F
h D 5 F -> 2
2 3
G -> 0
H I H -> 4
G I -> 9
Traverse Path = ?
S -> 13

Model 1:Informed Search Strategies h
I A* Search Example 4 d
n
f a Start a
o b c d c
r g
e f g h b f
m
e e k
i j k l
d i
m
Goal m
s j
e l
a
Start – Arad Traverse Path = ?
r
Goal Node - Bucharest
c
h

Model 2: Informed Search Strategies – A*Search
A
Advantages: Disadvantages:
*
• Best Searching Algorithms • Doesn’t always produces
S
• Optimal and Complete shortest
e
a • Solving Complex Problems • Complexity issues
r • Requires memory
c
h

Model 2: Informed Search – Heuristic Function
B Heuristic Function
a • It takes the current state of the agent as its input and produces the estimation of
c how close agent is from the goal.
k • The heuristic method, however, might not always give the best solution, but it
g guaranteed to find a good solution in reasonable time.
r • Heuristic function estimates how close a state is to the goal.
o • It is represented by h(n), and it calculates the cost of an optimal path between the
u pair of states.
n • The value of the heuristic function is always positive.
d
Admissibility of the heuristic function is given as:
h(n) <= h*(n)
Where, h(n) is heuristic cost, and h*(n) is the estimated cost.
Heuristic cost should be less than or equal to the estimated

cost.
Model 2: Heuristic Search and Heuristic Function
B
a
Heuristic Search and Heuristic function are used
c in informed search
k
g
r • Heuristic search is a simple searching technique that
o tries to optimize a problem using Heuristic Function
u
n
d
• Optimization means
• Solve problem in minimum number of steps or
cost

Model 2: Heuristic Function
B
a
Heuristic function
c ▪ Function H(n) that gives an estimation on the cost of getting
k from node ‘n’ to the goal state.
g ▪ It helps in selecting optimal node for expansion
r [175]
o
u R1
35 Km 175 Km
n
d
City A City B
(Start) (Goal)
20 Km 215 Km
R2
[215]
Model 2: Heuristic Function
B
a
Types of Heuristic
c
k
Admissible Non-Admissible
g • Never overestimates the • Overestimates
r cost of reaching the goal • h(n) is always greater than
o
u
• H(n) is always less than or the actual cost of lowest
n equal to actual cost of cost path from node n to
d lowest cost path from goal
node n to goal

Model 2: Machine Learning
M
Introduction
a
c • Basic concepts
h • Relationships with other domains.
i • Types and applications.
n
e
L
e
a
r
n
i
n
g
Model 2: Machine Learning - Introduction
M
Data
a
c • DATA - full potential not utilized at most businesses
h • Data being scattered across different archive systems and organizations not
i being able to integrate these sources fully.
n • Lack of awareness about software tools that could help to unearth the useful
e information from data.
• To improve efficiency and productivity at Business Organizations
L
e
• Business organizations have now adopting use of latest
a technology, machine learning.
r
n
i Machine learning is the field of study that gives the computers ability to
n learn without being explicitly programmed – Arthur Samuel
g systems should learn by itself without explicit programming

Model 2: Machine Learning - Introduction
M
Machine Learning
a
c Popularity of Machine Learning
h 1. Managing Data
i ▪ High volume of data.
n ▪ It is estimated that the data approximately gets doubled every year.
e 2. Storage of Data
▪ Storage cost (hardware) has reduced.
L
▪ Easier now to capture, process, store, distribute, and transmit the digital
e
information.
a
r
3. Availability of complex algorithms.
n
▪ Especially with the advent of deep learning, many algorithms are available for
i
machine learning.
n
g
Introduction
5. Artificial Intelligence (AI) – It is the study of how we can Maturity
make computers to do things which people do better at the to take better
moment. decisions
Actionable
4. Intelligence – It is the knowledge in operation towards the
form of
solution – how to do? How to apply the solution?
knowledge
historical patterns
3. Knowledge – It relates to the understanding of the solution
and future trends
domain – what to do?
patterns,
associations, or
2. Information – Result of Processing, Manipulating &
relationships
Organising data in response to a specific need.
among data.
Information relates to the understanding of problem domain
1. Data – raw facts, unformatted information
Heuristics
mental shortcuts that allow people to solve problems and make judgments quickly and efficiently.
M
Machine Learning
a
c Machine learning is the field of study that gives the computers ability to
h learn without being explicitly programmed – Arthur Samuel
i systems should learn by itself without explicit programming
n Conventional Programming Initially – AI – General Purpose Rules

e
• A detailed design of the program such • develop general purpose rules
as a flowchart or an algorithm needs manually.
L
to be created and converted into • Then, these rules are formulated into
e
programs using a suitable logic and implemented in a program
a
programming language. to create intelligent systems.
r
• This approach could be difficult for • Require an expert’s knowledge – to
n
many real-world problems such as form set of rules and programs is
i
puzzles, games, and complex image called an expert system.
n
recognition applications. • Programs lacked real intelligence.
g
• Eg: MYCIN
M
Machine Learning
a
c • data driven systems.
h
i • The focus of AI is to develop intelligent systems by using
n
data-driven approach, where data is used as an input to
e
develop intelligent models.
L
e • The models can then be used to predict new inputs.
a
r
• Thus, the aim of machine learning is to learn a model
n
i or set of rules from the given dataset automatically so
n that it can predict the unknown data correctly.
g
Learning System
• Mathematical
equation
• Relational
diagrams like
trees/graphs
• Logical if/else
rules, or
• Groupings
called clusters

Module 1
• What is Artificial Intelligence?
Heuristics
mental shortcuts that allow people to solve problems and make judgments quickly and efficiently.
M
Models
a
c
h • Machine Learning models can be understood as a program
i
n that has been trained to find patterns within new data and
e make predictions.
L
• model can be a formula, procedure or representation that can
e generate data decisions.
a • model is generated automatically from the given data.
r
Pattern Model
n
i • It is local and applicable only • It is global and fits the entire
n
to certain attributes dataset.
g
M
Models
a
c
h
• Models of computer systems are equivalent to human experience.
i • Experience is based on data.
n • Humans gain experience by various means.
e • They gain knowledge by rote learning.
• They observe others and imitate it.
L
e
• Humans gain a lot of knowledge from teachers and books.
a • We learn many things by trial and error.
r • Once the knowledge is gained, when a new problem is
n encountered, humans search for similar past situations and
i then formulate the heuristics and use that for prediction.
n
g
M
Models – Experience
a
c
• Collection of data
h
• Once data is gathered, abstract concepts are formed out of that data. Abstraction
i
is used to generate concepts. This is equivalent to humans’ idea of objects, for
n
example, we have some idea about how an elephant looks like.
e
• Generalization converts the abstraction into an actionable form of intelligence. It
can be viewed as ordering of all possible concepts. So, generalization involves
L
ranking of concepts, inferencing from them and formation of heuristics, an
e
actionable aspect of intelligence.
a
r
• Heuristics are educated guesses for all tasks. For example, if one runs or
n
encounters a danger, it is the resultant of human experience or his heuristics
i
formation. In machines, it happens the same way.
n
• Heuristics normally works! But, occasionally, it may fail too. It is not the fault of
g
heuristics as it is just a ‘rule of thumb′. The course correction is done by taking
evaluation measures. Evaluation checks the thoroughness of the models and to-
do course correction, if necessary, to generate better formulations.
M
Relation to other fields
a
c ML uses concepts of:
h • Artificial Intelligence
i • Data Science (Big Data, Data Mining, Data Analytics, Pattern Recognition)
n
• Statistics
e
• Resultant of Combined Ideas and Diverse fields
L
e
a
r
n
i
n
g
Model 2: Machine Learning - Relation to other fields
M
Machine Learning and Artificial Intelligence
a
c ML is a branch of Artificial Intelligence
h • AI – develop intelligence agents
i • Focus on logic & logic inferences – ups & downs (AI Winters)
n
e Resurgence of AI – due to development of Data Driven Systems
• aim – find Relations and Regularities present in data
L Machine Learning
e Sub branch of AI
a Aim – extract patterns and predictions
r - learning from examples
n - reinforcement learning
i Deep Learning
n Sub branch of Machine Learning
g Models constructed using neural network technology
- neural networks based onAI&ML
21CS54, human neuron
- Bhaktavatsala models.
Shivaram, Adjunct Faculty, CSE 50
M
Machine Learning – Data Science, Data Mining & Data Analytics
a
c Data science – encompasses many fields & Machine Learning starts
h with Data.
i So Data science and Machine Learning are interlinked.
n
e ML is a branch of data science. Data science deals with gathering of
data for analysis
L ▪ Big Data
e
▪ Data Mining
a
r ▪ Data Analytics
n ▪ Pattern Recognition
i
n
g
M
Machine Learning & Data Science
a
c Big Data – is a field of Data science that deals with data with following
h characteristics
i
n ▪ Volume – huge amount of data
e ▪ Variety – data in variety of forms like images, videos, etc
L
▪ Velocity – refers to speed at which data is generated &
e processed
a
➢Big Data is used in many Machine Learning Algorithms for
r
n applications such has language translation and image
i recognition.
n
g
➢Big Data influences growth of Deep Learning
M
Machine Learning & Data Mining
a
c Data Mining – aims to extract the hidden patterns that are present in
h the data
i
n Machine Learning – aims to use the data mining for prediction
e
L Machine Learning & Data Analytics

e
Data Analytics – aims to extract useful knowledge from crude data.
a
r Different type of Analytics
n ▪ Descriptive
i
n ▪ Diagnostic
g ▪ Predictive
▪ Prescriptive 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 53
M
Machine Learning & Pattern Recognition
a
c Pattern Recognition – is an Engineering field.
h ▪ It uses ML algorithms to extract features for pattern analysis and
i
pattern classification.
n
e Machine Learning – is closely related to branch of analytics and
shares almost all algorithms
L
e
a
r
n
i
n
g
M
Machine Learning & Statistics
a
c Statistics– is an branch of mathematics that has a solid theoretical
h foundation regarding statistical learning.
i
• Learn from data
n
e
• Statistical methods look for regularity in data called patterns.
• Statistics requires knowledge of the statistical procedures and the
L guidance of a good statistician.
e • Statistical methods are developed in relation to the data being
a analysed.
r
n Machine Learning – has less assumptions and requires less statistical
i knowledge. But, it often requires interaction with various tools to
n automate the process of learning.
g
• machine learning is just the latest version of ‘old Statistics’
Model 2: Machine Learning – Types of Machine Learning
T Data
Y
P • it is a raw fact
E • represented in the form of a table
S
• can also be referred as data point, sample, or example
• Each row of a table represents a data point.
• Features are attributes or characteristics
• One important attribute is called Label

T Labelled data –examples (have labels)
Y
P
E
S
Unlabelled data – do not have any labels in dataset

T
Four Types of Machine Learning
Y
P • Supervised Learning
E
S • Unsupervised Learning
• Semi-supervised Learning
• Reinforcement Learning

T
Supervised Learning
Y
P use labelled dataset
E Supervisor provides labelled data, so that model is constructed
S
and generates test data
Supervised Learning Algorithm – learning takes place in 2 stages
1st stage – teaching, but no knowledge whether the information is
understood
2nd stage – asks set of questions to find out how much info is grasped
Supervised Learning – has 2 methods:
1. Classification
2. Regression
T
Supervised Learning - Classification
Y
P • Input attributes of the classification algorithms are called
E independent variables.
S
• The target attribute is called label or dependent variable
The relationship between the input and target variable is
represented in the form of a structure which is called a
classification model
• focus of classification is to predict the ‘label’ that is in a discrete
form (a value from the set of finite values)

T
Y
P • classification process – 2 Stages • During the first stage,
E called training stage, the
S learning algorithm takes a
labelled dataset and starts
learning.
• After the training set,
samples are processed
and the model is
generated.
• In the second stage, the
constructed model is
tested with test or
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct unknown
Faculty, CSE sample and 61
T
Y
P Some of the key algorithms of classification are:
E • Decision Tree
S
• Random Forest
• Support Vector Machines
• Naïve Bayes
• Artificial Neural Network and Deep Learning networks like
CNN

T
Supervised Learning - Regression
Y
P • predict continuous variables like price
E
S
• The regression model takes input
x and generates a model in the
form of a fitted line of the form y
= f(x).
• Here, x is the independent variable
that may be one or more attributes
and
• y is the dependent variable
linear regression takes the training set and tries to fit it with
a line – product sales = 0.66 × Week + 0.54.
T
Supervised Learning - Regression
Y
P • predict continuous variables like price
E
S
• The advantage of this model is
that prediction for product sales
(y) can be made for unknown
week data (x).
• For example, the prediction for
unknown eighth week can be
made by substituting x as 8 in
that regression formula to get y.
linear regression takes the training set and tries to fit it with
a line – product sales = 0.66 × Week + 0.54.
T
Supervised Learning – Classification vs Regression
Y
P • Regression models predict continuous variables
E
S such as product price, while
• Classification models concentrates on assigning
labels such as class.

T
Unsupervised Learning
Y
P • Learning by self-instruction
E • No supervisor
S
• Self instruction based on concept of trial & error.
• The program is supplied with objects, but no labels are defined.
• The algorithm itself observes the examples and recognizes patterns
based on the principles of grouping.
• Grouping is done in ways that similar objects form the same
group.
• Cluster analysis and Dimensional reduction algorithms are examples
of unsupervised algorithms.

T
Unsupervised Learning – Cluster Analysis
Y
P • It aims to group objects into disjoint clusters or groups.
E • Cluster analysis clusters objects based on its attributes.
S • All the data objects of the partitions are similar in some aspect
and vary from the data objects in the other partitions
significantly.
Examples :
• segmentation of a region of interest in an image,
• detection of abnormal growth in a medical image,
• and determining clusters of signatures in a gene database.

T
Unsupervised Learning – Cluster Analysis
Y
P • where the clustering
E algorithm takes a set of
S dogs and cats images
and groups it as two
clusters-dogs and cats.
• It can be observed that
the samples belonging
to a cluster are similar
and samples are
different radically
across clusters
• Some of the key clustering algorithms are:
• k-means algorithm
• Hierarchical algorithms
T
Unsupervised Learning – Dimensionality Reduction
Y
P • It takes a higher dimension data as input and outputs the
E data in lower dimension by taking advantage of the variance
S
of the data.
• It is a task of reducing the dataset with few features without
losing the generality.

T
Supervised Learning vs - Unsupervised Learning
Y
P
E
S

T
Semi-supervised Learning
Y
P • There are circumstances where the dataset has a huge
E collection of unlabelled data and some labelled data.
S
• Labelling is a costly process and difficult to perform by

the humans.
• Semi-supervised algorithms use unlabelled data by
assigning a pseudo-label.
• Then, the labelled and pseudo-labelled dataset can be
combined

T
Reinforcement Learning
Y
P • mimics human beings.
E • human beings use ears and eyes to perceive the world and
S
take actions, reinforcement learning allows the agent to
interact with the environment to get rewards.
• The agent can be human, animal, robot, or any independent
program.
• The rewards enable the agent to gain experience. The
agent aims to maximize the reward.
• The reward can be positive or negative (Punishment).
• When the rewards are more, the behavior gets reinforced
and learning becomes possible
T
Reinforcement Learning
Y
P • In this grid game,
E • the gray tile indicates the danger,
S • black is a block, and
• the tile with diagonal lines is the goal.
• The aim is to start, say from bottom-left grid, using the
actions left, right, top and bottom to reach the goal
state.
• To solve this sort of problem, there is no data.
• The agent interacts with the environment to get
experience.
• The agent tries to create a model • Many sequential decisions need to be taken to
by simulating many paths and
reach the final decision.
finding rewarding paths.
• This experience helps in
• Therefore, reinforcement algorithms are reward-
constructing a model.
based, Shivaram,
21CS54, AI&ML - Bhaktavatsala goal-oriented algorithms.
Adjunct Faculty, CSE 73
Model 2: Machine Learning – Challenges of Machine Learning
C The quality of machine learning systems depends on the
H
A
quality of data
L 1. Problems
L
E 2. Huge Data
N
G 3. High Computation Power
E
S
4. Complexity of the algorithm
5. Bias/Variance

T The quality of machine learning systems depends on the
Y
P
quality of data
E 1. Problems – ML can deal with well-posed problems and cant
S
solve ill-posed problems
2. Huge Data – Primary requirement of ML. availability of
quality data is challenge. No Missing Data or Incorrect Data
3. High Computation Power – availability of Big data, the
computational resource is increased.
Time complexity has increased, that can be solved with only
high computing power.

T The quality of machine learning systems depends on the
Y
P
quality of data
E 4. Complexity of the algorithms –Algorithms have become a
S
big topic of discussion and it is a challenge for machine
learning professionals to design, select, and evaluate optimal
algorithms. (comparison)
5. Bias/Variance – Variance is the error of the model.
• A model that fits the training data correctly but fails for test data,
in general lacks generalization, is called overfitting.
• The reverse problem is called underfitting where the model fails
for training data but has good generalization.
• Overfitting and underfitting are great challenges for
machine learning algorithms.
Model 2: Machine Learning – Understanding Data
U Data
N
D Human Interpretable – numbers, texts
E Diffused – images or video (can be interpreted only by computer)
R Operational Data – seen in normal business procedures & processes
S
Non-Operational Data – used for decision making
T
A Small Data – data volume is less, can be stored by Small scale computer
N Big Data – larger data whose volume is much larger than small data
D
I
Bit – 0 or 1 1 MB – 1024 KB 1 PetaByte – 1024 TB
N Byte – 8 bits 1 GB – 1024 MB 1 ExaByte – 1024 PB
G KB – 1024 bits 1 TB – 1024 GB 1 ZettaByte – 1024 EB
1 YottaByte – 1024 ZB

U Flighttracker24.com (Example for Big Data)
N
D
E
R
S
T
A
N
D
I
N
G

U Elements ( 6 V’s) to characterize Big Data
N
D Volume reduction in cost of storing devices, tremendous growth of data
E Velocity fast arrival speed of data and its increase in data volume
R
S
Variety Form – text, graph, audio, video, maps, etc.. Composite (video+audio)
Function – human conversations, transaction records, archive data
T
Source of Data – open/public, social media, multimodal
A
N Veracity Confirmity to facts, truthfulness, believability, confidence in data
D Validity Accuracy of the data for taking decisions
I
N Value Value of info extracted from the data and its influence on the decisions
G

U Data Quality
N
D Precision Closeness of repeated measurements
E Bias Due to erroneous assumptions of the algorithms or procedures
R
S
Accuracy Is the degree of measurement of errors that refers to the closeness of
measurements.
T
A
N
D
I
N
G

U Data Types
N
Structured • Data stored in organised manner in a DB, in form of table
D
E
• Data can be retrieved in organised manner using tools SQL, etc..
R
• Record data – dataset
S
• Data Matrix – variation of record type – numeric attributes
T
• Graph Data – relationships among objects
A • Ordered Data – attributes have implicit order
N • Temporal Data – data associated with time
D • Sequence Data – sequence of words or letters
I • Spatial Data – positions or areas
N Unstructured • Data includes video, image, audio, textual documents,
G programs, blog data
• 80% of the data are unstructured data
Semi- • Partially structured and partially unstructured
structured • Xml/json data, RSS data, hierarchical data
U Data Analytics & Types of Analytics
N
Refers to process of data collection, preprocessing and analytics
D
E Descriptive Diagnostic Predictive Prescriptive
R • Describing main • Causal Analysis • Deals with • Finding the best
S features of data • Aims to find future course of action
T • Deals with Cause and Effect • ML is mostly • Beyond
A collected data about predictive prediction and
N and quantifies it. analysis helps in decision
D making by giving
I a set of actions
N
G

U Eg: Flighttracker24.com
N
D
E
R
S
T
A
N
D
I
N
G

U Eg: Signals from GPS Receiver
N
D
$GPRMC,044840.00,A,3401.21044,N,11824.67722,W,0.165,,010621,,,D*60
E $GPVTG,,T,,M,0.165,N,0.306,K,D*21
R $GPGGA,044840.00,3401.21044,N,11824.67722,W,2,07,1.34,29.5,M,-
S 32.9,M,,0000*56
T $GPGSA,A,3,21,32,46,27,08,22,10,,,,,,3.07,1.34,2.76*0F
A $GPGSV,3,1,12,01,18,319,17,08,27,257,20,10,44,068,33,18,00,142,*73
N $GPGSV,3,2,12,21,39,313,23,22,20,296,14,23,16,089,,27,22,221,32*7A
D
$GPGSV,3,3,12,31,28,168,20,32,70,025,31,46,49,199,30,51,49,161,*7E
I
$GPGLL,3401.21044,N,11824.67722,W,044840.00,A,D*7F
N
G

Model 2: Machine Learning – Descriptive Statistics
S • Data Management Life Cycle
T
• A data management lifecycle refers to the different stages a unit of data
A
undergoes, from initial collection to when it’s no longer considered useful and
T
deleted.
I
• It’s a continuous, policy-based process where each phase informs the next.
S
• The different stages of the data lifecycle can vary depending on the organization,
T
I
six key phases are outlined.
C
S
1. Data Collection
2. Data Storage
3. Data Processing
4. Data Analysis
5. Data Deployment
6. Data Archiving
S Descriptive Statistics
T
A
• Data Life Cycle
T 1. Collection
I The first stage in the data lifecycle is collecting customer data from various internal and external
sources.
S
2. Storage
T
Once data is collected, it’s time to store it.
I One trap that many businesses fall into is keeping data scattered across different teams and tools.
C This creates blindspots throughout the organization, leaving teams with only a partial view of
S customer behavior or business performance.
3. Processing
The next step is to process your data so it becomes usable.
Data processing falls into three core categories: encryption, wrangling, and compression.
Data encryption: scrambling or translating human-readable data into a format that can only be decoded by
authorized personnel.
•Data wrangling: cleaning and transforming data from its raw state into a more accessible and functional
format.
•Data compression: reducing the size of a piece of data and making it easier to store by restructuring or re-
encoding it. 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 86
T
A
• Data Life Cycle
T 4. Analysis
Data analysis involves studying processed or raw data to identify trends and patterns.
I Some of the techniques you can use at this stage include machine learning, statistical modeling, artificial
S intelligence, data mining, and algorithms.
T This stage is critical as it provides valuable insight into the business and customer experience, like helping to
pinpoint weak points in the funnel or potential churn risks.
I
5. Deployment
C
Also called dissemination, the deployment stage is where data validation, sharing, and usage occur.
S •Data validation: checking the accuracy, structure, and integrity of your data.
•Data sharing: communicating all of the insights from your analysis to stakeholders using data reports and
different types of data visualizations like graphs, dashboards, charts, and more.
•Data usage: using data to inform management strategies and growth initiatives.
The goal of the deployment stage is to ensure that all relevant parties understand the value of this data and are
able to successfully leverage it in their day to day (also known as data democratization).
6. Archiving
Archiving involves moving data from all active deployment environments into an archive. At this point, the data
is no longer operationally useful, but instead of destroying it, you’re keeping it in a long-term storage location.
Archived data is useful for future reference or analysis,
21CS54, AI&ML but
- Bhaktavatsala it canAdjunct
Shivaram, also Faculty,
pose CSE
a security risk for businesses if their87
S Consistency, performance, compatibility, completeness, timeliness, and duplicate or
T corrupted records are all checked during the data quality monitoring process.
A
T Good Data Bad Data
I • Good data is near to correctness and • Bad data lacks precision and accuracy
S preciseness
T • Good quality data is under same units, • The format of Bad data is not well defined
I same format and not consistent
C • Good data is mostly upto date • Bad data might not be the latest one
S • Data is understandable and easy to access • Difficulties understanding the data
• Be confident with your dataset • Not very much sure about results
• Reduction in time and cost as the process • Increase in time and cost, and could raise
gets smooth obstacles in machine learning
• Helps in enhancing decision making • With Bad quality data companies take
capability of company and help to take wrong decisions and could affect further
better actions actions
T
A
• A branch of statistics that summarizes data.
T • Descriptive statistics are just descriptive and don’t go beyond that.
I • Data visualisation is a branch of study deals with presentation of data.
S Exploratory Data Analysis (EDA) – understand given data with both descriptive
T analytics and data visualisation and prepare machine learning algorithms
I EDA aims to understand the data better
C
S

T
A
• A branch of statistics that summarizes data.
T • Descriptive statistics are just descriptive and don’t go beyond that.
I • Data visualisation is a branch of study deals with presentation of data.
S Exploratory Data Analysis (EDA) – understand given data with both descriptive
T analytics and data visualisation and prepare machine learning algorithms
I
Dataset and Data types
C
S Dataset – collection of data objects.
• May be records, points, vectors, patterns, events, cases, samples or
observations.
• These records contain many attributes (property or characteristics of an
object)
• Measurement - Each attribute should be associated with a value.
• Type of attribute – determines data types (measurement scale types)
S Dataset and Data types
T Dataset – collection of data objects.
A • May be records, points, vectors, patterns, events, cases, samples or
T
observations.
I
• These records contain many attributes (property or characteristics of an
S
object)
T
• Measurement - Each attribute should be associated with a value.
I
• Type of attribute – determines data types (measurement scale types)
C
S
Patient ID Name Age Blood Test Fever Disease
1 John 21 Negative Low No
2 Andre 36 Positive High Yes
Table 2.2 Sample Patient Table (database info)

S Data types
T
Data Type
A
T
I Categorial Data Numerical Data
S
T
I
C Nominal Data Ordinal Data Interval Data Ordinal Data
S
Fig 2.1 Types of Data

S Data types
T
Data Type
A
T Categorical data - a form of information that can be
I Categorial Data stored and identified based on their
Numerical Datanames or labels.
S It is a type of qualitative data that can be grouped
T into categories instead of being measured
I numerically.
C
Categorical dataInterval
Nominal Data Ordinal Data Data Ordinal Data
variables - like a person’s gender,
S
hometown, and so on.
Categorical measurements are not given in numbers
but rather in natural language descriptions.
(Favourite sport, Hair Colour)
Numbers can sometimes represent it, but those
numbers don’t mean anything mathematically
(birthdate and the postcode)
S Data types
T
Data Type
A
T
I Categorial Data
Nominal Data Numerical Data
S - A type of data that consists of categories that
T can’t be ordered or ranked. It is also called
I a nominal scale.
C Nominal Data Ordinal Data
- This type of data can’t
Interval Data be ranked or Ordinal Data
S measured in any way.
Some examples of nominal data are symbols,
words, letters, and the gender of a person.
Still, nominal data can be both qualitative and

quantitative at times.
S Data types
T
Data Type
A
T
I Categorial Data
Ordinal Data Numerical Data
S - A type of data that has a natural order.
T - often used in surveys, questionnaires, and
I the fields of finance and economics.
C Nominal Data Ordinal Data Interval Data Ordinal Data
S Ordinal data stands out since it is impossible
to differentiate
Fig 2.1 Types of Data between data values.
Eg: Clothing sizes are one example of this type

of data (small, medium, and large are not
measurable differences, but they are clearly
ordered to show size comparisons).
Eg: “poor,” “satisfactory,” “good,”
S Data types
T
Data Type
A
T
S
T
I
C Nominal Data Ordinal Data Interval Data Ratio Data
S
Nominal data is the simplest data type. It classifies data purely by labeling or
naming values e.g. measuring marital status, hair, or eye color. It has no
hierarchy to it.
Ordinal data classifies data while introducing an order, or ranking. For instance,
measuring economic status using the hierarchy: ‘wealthy’, ‘middle income’ or
‘poor.’ 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 96
S Data types
T
Data Type
A
T Quantitative data is, quite simply, information
I that can be quantified.
Categorial Data Numerical Data
S
T It can be counted or measured, and given a
I numerical value—such as length in centimeters
C or Nominal
revenue in dollars. Ordinal Data
Data Interval Data Ordinal Data
S
Quantitative data tends to be structuredFig 2.1inTypes of Data
nature and is suitable for statistical analysis.
If you have questions such as “How many?”,
“How often?” or “How much?”, you’ll find the
answers in quantitative data.

S Data types
T
Data Type
A
T
I Interval Data Categorial Data Numerical Data
S - is a numeric data for which the differences
T between the values are meaningful.
I
C Eg:Nominal
difference
Data between 30deg C and 32 deg C
Ordinal Data Interval Data Ratio Data
S
Ordinal Data Fig 2.1 Types of Data
- is a numeric data for which the ratios between

the values are meaningful.
Eg: ratio between heights, or money etc..

S Data types
T
Data Type
A
T
S
T
I
C Ordinal Data Interval Data
S

T
A
Data types
T 1. Categorical OR Qualitative data
I a) Nominal Data – are symbols can’t be processed like number.
S b) Ordinal Data – provides enough info and has natural order.
T 2. Numerical OR Quantitative data
I a) Interval Data – a numeric data for which the differences between values are
C meaningful
S b) Ratio Data – both differences and ratio are meaningful.
Second way of classification – based on data received
1. Discrete Data – data recorded as integers and received at regular intervals
2. Continuous Data – data recorded with decimal point and received continuously

T
A Third way of classification – based on number of variables (category) in dataset
T 1. Univariate – analysis of dataset with one variable
I 2. Bivariate – analysis of dataset with 2 variables
S Aim is to find relationships among data OR correlations between data
T
I
3. Multivariate – analysis of dataset with 3 or more variables
C
S

T
I • It is the simplest form of analysis since the information deals with only one
S quantity that changes.
T • It does not deal with causes or relationships and the main purpose of the
I analysis is to describe the data and find patterns that exist within.
C
Eg: The example of a univariate data can be height.
S

T
I
Continuous Discrete variables
S OR Categorical
Variables
T Variable
I
Distributional
C Measures of Measures of
Forensics OR Frequency
central Dispersion
S Measure of Analysis
tendency OR Spread
Shape
• Mean • Range • Charts

• Variance • Skewness
• Median • Graphs
• Standard • Kurtosis
• Mode Deviation • Crosstabs

S Univariate Analysis
T
A • The central tendency summarizes the most likely value for a variable,
T and the average is the common name for the calculation of the mean.
I
• The arithmetic mean is appropriate if the values have the same units,
S
T
whereas the geometric mean is appropriate if the values have
I differing units.
C • The harmonic mean is appropriate if the data values are ratios of two
S
variables with different measures, called rates.

T
T 1. Univariate – analysis of dataset with one variable (DATA Visualisation -few charts)
I Bar Chart Student Marks Histogram
S Student Marks Bar Chart Piechart
100
22
22
T 80 90
40
60
Marks
I 40
40
C 20 90
0 70
S 1 2 3 4 5
Student ID Area Chart
85 70
Students Marks Area Chart
Dot Plots 90
80
70
60
50
40
30
20
10
0
1 2 3 4 5
T
T 1. Univariate – analysis of dataset with one variable (Central Tendency)
I 1. Mean – It is a measure of central
S tendency that represents the ‘center’
T of the dataset.
I Arithmetic Mean
C
S
In Larger Computing cases
2. Geometric Mean
is the average value or mean which signifies the central
tendency of the set of numbers by finding the product
of their values.
OR
is defined as the nth root of the product of n numbers.
T
I
3. Harmonic Mean
S • The harmonic mean is calculated as the number of values N divided by the sum of the
T reciprocal of the values (1 over each value).
I
• Harmonic Mean = N / (1/x1 + 1/x2 + … + 1/xN)
C
S
Tips:
•If values have the same units: Use the arithmetic mean.
•If values have differing units: Use the geometric mean.
•If values are rates: Use the harmonic mean.
T
I
S
T
I
C
S

T
I 1. Median
S The Middle value of the
T distribution is called Median
I
If the total number of items in the
C distribution is odd, then the
S middle value is called Median
If the total number of items in the

distribution is even, then the
average value of the 2 item in the
centre is the Median.

T
I 3.Mode
S It is the value that occurs more frequently in the dataset.
T
Mode is only for discrete data and not applicable for continuous data as there are no repeated values
I
in continuous data.
C
S Unimodal with mode = 1
Bimodal with mode = 2
Trimodal with mode = 3

T
I
S OR Categorical
Variables
T Variable
I
Distributional
central Dispersion
tendency OR Spread
Shape

• Etc..

T
T 1. Univariate – analysis of dataset with one variable (Measurement of Spread)
I The Range
S • The difference between the maximum and minimum values in the set
T is called the range
I Variance
C The variance is a measure of how far a set of data are dispersed out
from their mean or average value.
S
the variance of the data set is the

average square distance between the
mean value and each data value.

T
I The Standard Deviation
S The standard deviation is a measure of dispersion that can be interpreted as approximately the
T average distance of every data value from the mean.
I standard deviation defines the spread of data values around the mean.
C
S

T
A Example:
T Q: If a die is rolled, then find the variance and standard deviation of the possibilities.
I Solution: When a die is rolled, the possible outcome will be 6.
S So the sample space, n = 6 and the data set = { 1;2;3;4;5;6}.
T To find the variance, first, we need to calculate the mean of the data set.
I Mean, x̅ = (1+2+3+4+5+6)/6 = 3.5
C We can put the value of data and mean in the formula to get;
S σ2 = Σ (xi – x̅)2/n
σ2 = [(6.25+2.25+0.25+0.25+2.25+6.25)]/6
σ2 = 2.917
Now, the standard deviation, σ = √2.917 = 1.708

T
A Example:
T Q. Attendance of students in one of the semester are: 5, 20, 8, 10, 35, 12.
I Calculate the range, standard deviation?
S
T
I
C
S

T
A Example:
T Q. Attendance of students in one of the semester are: 5, 20, 8, 10, 35, 12.
I Calculate the range, standard deviation?
S
T
I
C
S
Range: 30
Variance: 121.6
Standard Deviation: 11.027239001672177

T
A Home Assignment
T Use the table given, which gives the final results for the
Team PTS W L T GF GA
Portland
I 2021 National Women’s Soccer League season. Thorns FC
44 13 6 5 33 17
S The columns are standings points (PTS; teams earn three OL Reign 42 13 8 3 37 24
points for a win and one point for a tie), wins (W), losses Washington
39 11 7 6 29 26
T (L), ties (T), goals scored by that team (GF), and goals
Spirit
Chicago Red
I scored against that team (GA). Stars
38 11 8 5 28 28
C NJ/NY
35 8 5 11 29 21
1. Compute the standard deviation and range of points.
Gotham FC
S 2. Compute the standard deviation and range of wins.

North
Carolina 33 9 9 6 28 23
3. Compute the standard deviation and range of losses. Courage
4. Compute the standard deviation and range of ties. Houston

Dash
32 9 10 5 31 31
5. Compute the standard deviation and range of goals Orlando
scored (GF). Pride
28 7 10 7 27 32
6. Compute the standard deviation and range of goals Racing

22 5 12 7 21 40
against (GA). Louisville FC
Kansas City
16 3 14 7 15 36
Current

T
I
S OR Categorical
Variables
T Variable
I
Distributional
central Dispersion
tendency OR Spread
Shape

• Etc..

T
I Quartiles :
S • Quartiles are the values that divide a list of numerical data into three quarters.
T • The middle part of the three quarters measures the central point of distribution and shows the
I data which are near to the central point.
• The lower part of the quarters indicates just half information set which comes under the median
C and the upper part shows the remaining half, which falls over the median.
S
Min Q1 Q2 Q3 Max
Median
Inter Quartile Range : It is the difference between Q3 and Q1. Semi Inter Quartile Range (SIQR) = IQR/2 or
IQR = (Q3 - Q1) SIQR = (Q3 - Q1) /2
Outliners are normally the values falling apart:

• at least by 1.5 times IQR above the third quartile or below the first quartile
• Outliners = [(Q1 - 1.5IQR), (Q3 + 1.5IQR)]
T
A Example:
T Find Inter Quartile Range, Outliners and Semi InterQuartile Range
I a) 5, 8, 15, 26, 10, 18, 3, 12, 6, 14, 11
S
T Min Q1 Q2 Q3 Max
I 3 6 11 15 26
C
IQR = (Q3 - Q1) = 15 -6 = 9
S
Outliners = [(Q1 - 1.5IQR), (Q3 + 1.5IQR)] = [-7.5, 28.5]
SIQR = IQR/2 or (Q3 - Q1) /2
SIQR = 4.5

T
A Example:
I a) 5, 8, 15, 26, 10, 18, 3, 12, 6, 14, 11
Q1 Q2 Q3 Max
S Min
IQR = (Q3 - Q1) = 15 -6 = 9 3 6 11 15 26
T
I Outliners = [(Q1 - 1.5IQR), (Q3 + 1.5IQR)] = [-7.5, 28.5]
SIQR = IQR/2 or (Q3 - Q1) /2
C SIQR = 4.5
S Find Inter Quartile Range, Outliners and Semi InterQuartile Range
b) 11, 31, 21, 19, 8, 54, 35, 26, 23, 13, 29, 17

T
A Example:
I a) 5, 8, 15, 26, 10, 18, 3, 12, 6, 14, 11
Q1 Q2 Q3 Max
S Min
IQR = (Q3 - Q1) = 15 -6 = 9 3 6 11 15 26
T
I Outliners = [(Q1 - 1.5IQR), (Q3 + 1.5IQR)] = [-7.5, 28.5]
SIQR = IQR/2 or (Q3 - Q1) /2
C SIQR = 4.5
S
Find Inter Quartile Range, Outliners and Semi InterQuartile Range

b) 11, 31, 21, 19, 8, 54, 35, 26, 23, 13, 29, 17
Min Q1 Q2 Q3 Max
IQR = (Q3 - Q1) = 30-15 = 15 8 15 22 30 54
Outliners = [(Q1 - 1.5IQR), (Q3 + 1.5IQR)] = [-7.5, 52.5]
SIQR = IQR/2 or (Q3 - Q1) /2
SIQR = 7.5
T
I Quartiles :
S • Quartiles are the values that divide a list of numerical data into three quarters.
T • The middle part of the three quarters measures the central point of distribution and shows the
I data which are near to the central point.
• The lower part of the quarters indicates just half information set which comes under the median
C and the upper part shows the remaining half, which falls over the median.
S
Min Q1 Q2 Q3 Max
Median
Inter Quartile Range : It is the difference between Q3 and Q1. Semi Inter Quartile Range (SIQR) = IQR/2 or
IQR = (Q3 - Q1) SIQR = (Q3 - Q1) /2
Outliners:
Are normally the values falling apart at least by 1.5 times IQR above the third quartile or below the first quartile
Outliners = [(Q1 - 1.5IQR), (Q3 + 1.5IQR)]
T
I Five point summary and box plots:
S The median, quartiles Q1 and Q3 and minimum and maximum, written in the
T order < minimum, Q1, Median, Q3, Maximum > is known as Five point summary
I
Min Q1 Q2 Q3 Max
C Median
S
Box plots: (also called as Box and Whisker Plot)
Suitable for continuous variables and nominal variable
Used to illustrate data distributions and summary of data
• Box contains bulk of data.
• These data are between first and third quartiles.
• The line inside the box indicates – mostly median of data.
• If the median is not equidistant, then data is skewed.
• The whiskers that project from the ends of the data indicate the spreads of the tails and the
max and min of the data 21CS54,
valueAI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 124
T
A Example:
T Find 5 point summary of the list {13,11,2,3,4,8,9}
I
S
T
I
C
S

T
A Example:
T Find 5 point summary of the list {13,11,2,3,4,8,9}
I Min Q1 Q2 Q3 Max
S 2 3 8 11 13
T
I 13
C
S 11
8
3

T
I
S OR Categorical
Variables
T Variable
I
Distributional
central Dispersion
tendency OR Spread
Shape

• Skewness
• Median • Variance • Graphs
• • Kurtosis
• Mode Standard • Crosstabs
Deviation
• Etc..

T
T 1. Univariate – analysis of dataset with one variable (Measure of Shape)
I Skewness:
S • Skewness is a statistical measure that assesses the asymmetry of a probability distribution.
T • It quantifies the extent to which the data is skewed or shifted to one side.
I • It quantifies the degree to which the data deviates from a perfectly symmetrical distribution, such as
a normal (bell-shaped) distribution. Skewness is a valuable statistical term because it provides
C
insight into the shape and nature of a dataset’s distribution
S Negative Skewness
Positive Skewness
Normal Distribution

T
I Pearson’s first coefficient of skewness is helping if
S the data present high mode.
T Pearson’s second coefficient of skewness if the
I data present low mode or multiple modes.
C
S
Rule of thumb:
• For skewness values between -0.5 and 0.5, the data exhibit approximate symmetry.
• Skewness values within the range of -1 and -0.5 (negative skewed) or 0.5 and 1
(positive skewed) indicate slightly skewed data distributions.
• Data with skewness values less than -1 (negative skewed) or greater than 1 (positive
skewed) are considered highly skewed.
T
I
Kurtosis
S • Kurtosis is a statistical measure that quantifies the shape of a probability distribution.
T • It provides information about the tails and peakedness of the distribution compared to a
I normal distribution.
C • Positive kurtosis indicates heavier tails and a more peaked distribution,
S • Negative kurtosis suggests lighter tails and a flatter distribution.
• Kurtosis helps in analyzing the characteristics and outliers of a dataset.
• The measure of Kurtosis refers to the tailedness of a distribution.
• Tailedness refers to how often the outliers occur.
For more details:

https://www.educba.com/kurtosis-formula/
T
I
Kurtosis
S • Kurtosis is a statistical measure that quantifies the shape of a probability distribution.
T • It provides information about the tails and peakedness of the distribution compared to a
I normal distribution.
C • Positive kurtosis indicates heavier tails and a more peaked distribution,
S • Negative kurtosis suggests lighter tails and a flatter distribution.
• Kurtosis helps in analyzing the characteristics and outliers of a dataset.
• The measure of Kurtosis refers to the tailedness of a distribution.
• Tailedness refers to how often the outliers occur.
Yi = mean
For more details:
Y- = Standard deviation
https://www.educba.com/kurtosis-formula/
T
I Leptokurtic (Kurtosis > 3)
S Leptokurtic has very long and thick tails, which means there
T are more chances of outliers. Positive values of kurtosis
I indicate that distribution is peaked and possesses thick tails.
Extremely positive kurtosis indicates a distribution where more
C
numbers are located in the tails of the distribution instead of
S around the mean.
Platykurtic (Kurtosis < 3)
Platykurtic having a thin tail and stretched around the center
means most data points are present in high proximity to the
mean. A platykurtic distribution is flatter (less peaked) when
For more details:
compared with the normal distribution.
https://www.analyticsvidhya.com/blog/20
Mesokurtic (Kurtosis = 3) 21/05/shape-of-data-skewness-and-
Mesokurtic is the same as the normal distribution, which kurtosis/
means kurtosis is near 0. In 21CS54,
Mesokurtic, distributions are
AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 132
moderate in breadth, and curves are a medium peaked height
T
I Mean Absolute Deviation (MAD):
S • It is another dispersion measure and is robust to outliners.
T • It is a measure of variability that indicates the average distance between observations and their
mean.
I
C Where:
X = the value of a data point
S µ = mean
|X – µ| = absolute deviation
N = sample size
Mean Absolute Deviation (MAD) =

T
I Coefficient of variation (CV)
S • It is used to compare datasets with different units.
T • Do not use when the mean is close to zero
I
• Do not use with interval scales (use with ratio scales)
C
S
Coefficient of variation:

T
I Coefficient of variation (CV) Find Coefficient of variation (CV)?
I
C
S

T
I Coefficient of variation (CV) Find Coefficient of variation (CV)?
I
C
S

T
I Special Univariate Plots (Stem and Leaf Plot)
S • This plot helps us to know the shape and distribution
T • Each value is split into a ‘stem’ and a ‘leaf’
I • The last digit is usually the ‘leaf’
• The left of the leaf is ‘stem’
C
S
• Ideal shape of dataset is a bellshaped curve, and the
points fall along the reference line (45 degree), if the
data follows normal distribution.
• If the deviation is more, then there is greater evidence
that the datasets follow some different distribution.
• Most of the statistical tests are distributed only for
normal distribution
Model 2: Machine Learning – Multivariate Statistics
S Univariate Analysis Problems
T
For a given univariate dataset S={5,10,15,20,25,30) of marks.
A • Find the mean, median, mode, standard deviation and variance
T • Find arithmetic mean, geometric mean
I • Find five point summary and plot the box chart
S
T
I
C
S

S Univariate Analysis Problems
T
For a Univariate attribute such as weight, English and Maths marks,
A • Find the mean, median, mode,
T • Find Weighted mean, Geometric mean and Harmonic mean
I • Find standard deviation and variance
S • Find absolute deviation, mean absolute deviation and median absolute deviation Age Weight
T • Coefficient of variance 1 4.2
• Skewness and Kurtosis 2 4.5
I
• Five point summary, IQR, Semi-Quartile
C 3 4.7
Std English Hindi Maths Science
S 4 5.2
1 45 70.5 90 40
5 6
2 60 72.5 80 45
6 6.2
3 60 80 90 50
7 7
4 80 80 90 80
8 7.2
5 85 72 70 60
9 7.5
10 8.5
T
I
S OR Categorical
Variables
T Variable
I
Distributional
central Dispersion
tendency OR Spread
Shape

• • Skewness
• Median Variance • Graphs
• Mode • Crosstabs
Deviation
• Etc..

T
T 2. Bivariate – analysis of dataset with 2 variables (not in syllabus)
I • It involves two different variables. The analysis of this type of data deals with
S causes and relationships and the analysis is done to find out the relationship
T among the two variables.
I Eg: temperature and ice cream sales in summer season..
C
S
• temperature and sales are directly proportional to
each other and thus related because as the
temperature increases, the sales also increase.
• This involves comparisons, relationships, causes
and explanations.
• These variables are often plotted on X and Y axis
on the graph for better understanding of data and
one of these variables is independent while the
other is dependent.
T
T 2. Bivariate – analysis of dataset with two variables (not in syllabus)
I For any combination of categorical and continuous
S The combination can be:
T • Categorical and Categorical
I • Categorical and Continuous
C • Continous and Continuous
S Different methods are used to tackle these combinations during anlaysis process

T
T 3. Multivariate – analysis of dataset with 3 or more variables
I • It involves 3 or more different variables. The analysis of this type of data
S depends on the goals to be achieved.
T
I
C - consider factors such as age,
S employment status, how often a person
exercises, and relationship status (for
example).

T
I • It involves 3 or more different variables. The analysis of this type of data
S depends on the goals to be achieved.
T
I Id Attribute1 Attribute2 Attribute 3
1 1 4 1
C 2 2 5 2
S 3 3 6 1

T
I 2. Bivariate – analysis of dataset with 2 variables
S Aim is to find relationships among data OR correlations between data
T
I
3. Multivariate – analysis of dataset with 3 or more variables
C
S Objectives of Multivariate
• Data Reduction – Simplifying data without sacrificing valuable information
• Data Organisation – sorting and grouping of data depending on certain
characteristics.
• Data Interdependency – understanding relationship between variables.
• Hypothesis construction – helps validate assumptions or to reinforce prior
convictions.
S Multivariate Statistics
T
A In Machine Learning, almost all datasets are multivariable
T
I Analysis of more than 2 variables and often thousands of
S multiple measurements need to be conducted for 2 or more
T
subjects
I
C - Simultaneous study of several variables
S - accounts for interdependence between them
- more informative than univariate analysis
- more complex than univariate analysis

T
A Heatmap – graphical representation -2D matrix
T
I
S
T
I
C
S

T
A Pairplot/Scatter Matrix – graphical representation -2D matrix
T Sl No sepal_width petal_length petal_width species
sepal_length
I 0 5.1 3.5 1.4 0.2 setosa
S 1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa
T 3 4.6 3.1 1.5 0.2 setosa
I 4 5.0 3.6 1.4 0.2 setosa
C
S sepal_length
sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunctmax

Faculty, CSE7.900000 4.400000 6.900000 2.500000
148
T
I • Dependence method and Interdependence method
S
T
Dependence methods
I • used when one or some of the variables are dependent on others.
C • Dependence looks at cause and effect;
S • Eg: dependent variable of “weight” might be predicted by independent
variables such as “height” and “age.”
• In machine learning, dependence techniques are used to build predictive

models. The analyst enters input data into the model, specifying which variables
are independent and which ones are dependent—in other words, which
variables they want the model to predict, and which variables they want the
model to use to make those predictions.
T
I • Dependence method and Interdependence method
S
T
Interdependence methods
I • used to understand the structural makeup and underlying patterns within a
C dataset.
S • no variables are dependent on others, so you’re not looking for causal
relationships.
• a set of variables or to group them together in meaningful ways.

S Multivariate Statistics (Examples)
T
A
T
I
S
T
I
C
S

S Multivariate Analysis techniques - Advantage
T
A The one major advantage of multivariate analysis is the depth of insight
T it provides.
I
S Multivariate Analysis enables the identification of:
T • Spurious Relationships
I • Intervening variables
C • The replication of relationships
S • The specification of relationships

S Multivariate Analysis techniques (commonly used)
T
A • Multiple Regression Analysis
T • Discriminant Analysis
I • Multivariate Analysis of variance (MANOVA)
S • Factor Analysis
T
I
• Cluster Analysis
C • Principal Component Analysis
S • Redundancy Analysis
• Path Analysis

Model 2: Machine Learning – Hypothesis
H Overview of Hypothesis
y
p What?
o
t
h
e
s
i
s

y
p
o
t
h
e
s
i
s

y
p When?
o
Whenever you want to prove or say something about the TEST LOT with
t
h
a sample.
e
s
i
s

y
p A Hypothesis is an assumption of a result that is falsifiable, meaning it
o can be proven wrong by some evidence.
t A Hypothesis can be either rejected or failed to be rejected.
h
e Statistical methods – to confirm or reject the hypothesis
s 1. Null Hypothesis (H0): says that there is no significant effect
i - assumes that there is NO difference between 2 or more groups
s
2. Alternative Hypothesis (Ha or H1): says that there is some significant
effect
- assumes that there is A difference between 2 or more groups

H Overview of Hypothesis (Hypothesis Test)
y
p Hypothesis testing is a formal procedure for investigating our ideas
o about the world using statistics.
t It is most often used to test specific predictions, called hypotheses, that
h arise from theories.
e
There are 5 main steps in hypothesis testing:
s
1. State your research hypothesis as a null hypothesis (Ho) and alternate
i
hypothesis (Ha or H1).
s
2. Collect data in a way designed to test the hypothesis.
3. Perform an appropriate statistical test.
4. Decide whether to reject or fail to reject your null hypothesis.
5. Present the findings in your results.

H Overview of Hypothesis (Hypothesis Test -types)
y
p Parametric Tests Non-Parametric Tests
o ▪ based on parameters such as mean ▪ based on characteristics such as
t and standard deviation independence of events or data
h
e
s
i
s
Eg: t-test, ANOVA or Pearson Correlation

H Overview of Hypothesis (Hypothesis Test -types)
y
p
o
t
h
e
s
i
s

H Statistical Tests help to:
y
p 1. Define null and alternate hypothesis
o 2. Describe the hypothesis using parameters
t
h
3. Identify the statistical test and statistics
e 4. Decide the criteria called significance value α
s
5. Compute p-value (probability value)
i
s 6. Take final decision of accepting or rejecting hypothesis
It is a value that sets in advance as the threshold for statistical significance.

It is the maximum risk of making a false positive conclusion (Type 1 error)
that you are willing to accept.
H Statistical Tests – 2 types (True Positive and True Negative)
y
p
o
t
h
e
s
"Wolf" is a positive class. "No wolf" is a negative class.
i
s
Do Not Reject Type 2 Error
Type 1 Error
Reject
H0 H1
H Statistical Tests – 2 types (True Positive and True Negative)
y
p
o
t
A true positive is an
h outcome where the
e model correctly predicts
s the positive class.
i "Wolf" is a positive class. "No wolf" is a negative class.
s
A true negative is an
outcome where the
Type 1 Error model correctly
Reject
predicts the
H1 negative class.
H0 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 163
H Statistical Tests – 2 types of errors involved
y
A false positive is an outcome where the model incorrectly predicts
p
o
the positive class.
t
- Type I error – incorrect rejection of a true null hypothesis
h And a false negative is an outcome where the model incorrectly predicts
e the negative class.
s - Type II error – incorrect failure of rejecting a false hypothesis
s
Type 1 Error
Reject
H0 H1
H Statistical Tests – 2 types of errors involved
y
A false positive is an outcome where the model incorrectly predicts
p
o
the positive class.
t
- Type I error – incorrect rejection of a true null hypothesis
h And a false negative is an outcome where the model incorrectly predicts
e the negative class.
s - Type II error – incorrect failure of rejecting a false hypothesis
s Type 2 Error
Do Not Reject
Type 1 Error
Reject
H0 H1
H For calculation -
y
p • Size of Data sample
o • Degree of freedom indicates the number of independent pieces – ν
t (mean or a variance)
h
e
s
i
s

H Hypothesis testing
y
p • Sample Error/Estimator Actual/True Error
o • The sample error of S with respect to target The true error can be said as the probability that
t function f and data sample S is the the hypothesis will misclassify a single randomly
proportion of examples S misclassifies. drawn from random.
h
e Let’s consider a hypothesis h(x)
s true/target function is f(x) of population P.
i The probability that h will misclassify an instance
s drawn at random i.e. true error is:
• if f(x) =h(x) then δ( f(x), h(x) ) =1
True error = Prob [ f(x)≠ h(x)]

H p value (probability value)
y • The p value is a number, calculated from a statistical test, that describes how
p likely you are to have found a particular set of observations if the null hypothesis
o were true.
t • p values are used in hypothesis testing to help decide whether to reject the null
h
hypothesis.
e
• The smaller the p value, the more likely you are to reject the null hypothesis.
s
i p value can only tell you whether or not the null hypothesis is supported.
s p values are most often used to say whether a certain pattern they have measured
is statistically significant.
Significant value depends on the threshold, called alpha α.
If p value ≤ α, then hypothesis H1 is rejected, then we say the result of the test is
statistically significant.
If p value > α, then hypothesis H0 is rejected
H Confidence Intervals
y
p • The acceptance or rejection of the hypothesis can be also be done using
o confidence interval.
t • Confidence Interval = 1 – Significant Interval
h • Confidence level is the range of values that indicates the location of true mean.
e • Confidence Interval indicate the confidence of the result
s • If confidence level is 90%, infers 90% chance that the true mean lies in this range
i and remaining 10% indicates that true mean is not present.
s
Confidence Interval = x +/- z*(s/√n) OR Margin of Error = z*(s/√n)
Confidence Interval = x +/- Margin of Error
where:
•x: sample mean
•z: the z-critical value associated with confidence level
•s: sample standard deviation
•n: sample size
H Comparing Learning Models
y
p Z Test
o • Z-test for sampling distribution is used to determine whether the sample mean
t is statistically different from the population mean.
h The formula for z-statistics in Z-test is the following:
e Z = (X-μ)/SE
s where,
i X is the sample mean
s μ is the mean of the observations in the population
The population mean is denoted by μ and the sample mean is denoted by x̄.
SE is standard error (SE) of the sample mean. The standard error of the sample
mean can be calculated as the following:
SE = σ/√n
Where standard deviation of the sampling distribution is denoted as σ and the
sample size is n. 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 170
y
p Example
o Consider sample X =[1,2,3,4,5] and population mean (μ ) is 12, the population
t variance (σ2) is 2.
h Apply Z test and show whether the result is significant?
e
Sample Mean X = 15/5 =3, n=5
s
i
Z = (X-μ)/SE
s SE = σ/√n
Z = -10.06
The critical value at significance 0.05, the null hypothesis H0 is
rejected.

y
p T-test
o • It is a hypothesis test and checks if the difference between 2 samples
t mean is real or by chance.
h
• Data is randomly selected
e
s • Only small number of samples and variance between groups is
i real
s • Distribution follows t-distribution rather than Gaussian distribution
• One Sample Test
• Independent Two Sample t-test

y
p T-Test
o One Sample Test
t
one-sample t-test always uses the following null hypothesis:
h
e
H0: μ = μ0 (population mean is equal to some hypothesized value μ0)
s The alternative hypothesis can be either two-tailed, left-tailed, or
i right-tailed:
s H1 (two-tailed): μ ≠ μ0 (population mean is not equal to some
hypothesized value μ0)
H1 (left-tailed): μ < μ0 (population mean is less than some
H1 (right-tailed): μ > μ0 (population mean is greater than some
y
p T-Test
o One Sample Test
t • the mean of one group is checked against the set average that be either
h theoretical value or population mean.
e • Procedure
s • Select a group
i • Compute Average
s • Compare it with theoretical value and compute t-statistic:
t = (x – μ) / (s/√n)
where:
t: t-statistic
x: sample mean
μ0: hypothesized population mean
s: sample standard deviation
n: sample size
y
p T-Test
o Independent Two Sample t-test
t • t-statistic for 2 groups A & B is computed as follows:
h
e • Refer Text Book for formula and more info
s
i
s

y
p Paired t-test
o A paired samples t-test always uses the following null hypothesis:
t • H0: μ1 = μ2 (the two population means are equal)
h
• The alternative hypothesis can be either two-tailed, left-tailed, or
e
s right-tailed:
i • H1 (two-tailed): μ1 ≠ μ2 (the two population means are not equal)
s • H1 (left-tailed): μ1 < μ2 (population 1 mean is less than population
2 mean)
• H1 (right-tailed): μ1> μ2 (population 1 mean is greater than
population 2 mean)

y
p Paired t-test
o • A paired samples t-test is used to compare the means of two samples when
t each observation in one sample can be paired with an observation in the other
h sample.
e • A paired samples t-test is commonly used in two scenarios:
s 1. A measurement is taken on a subject before and after some treatment
i 2. A measurement is taken under two different conditions
s
t = xdiff / (sdiff/√n)
where:
•xdiff: sample mean of the differences
xdiff =m – μ (where m is mean of the group, μ is the theoretical value or
population mean)
•s: sample standard deviation of the differences
•n: sample size (i.e. number of pairs)
y
p Chi-square test (Non-Parametric Test)
o • based on characteristics such as independence of events or data
t There are two different types of Chi-Square tests:
h 1. The Chi-Square Goodness of Fit Test – Used to determine whether or not a categorical
e variable follows a hypothesized distribution.
2. The Chi-Square Test of Independence – Used to determine whether or not there is a
s
significant association between two categorical variables.
i
s
1. The Chi-Square Goodness of Fit Test
- used to compare the observed frequency
of some observation with the expected
frequency of the observations and decide
whether statistical difference exists.
Chi-square test allows to detect the duplication of data

And helps to remove the redundancy of values
y
p • Chi-square test
o Apply Chi-square test and find out whether any differences Gender Registered Not Registered Total
between boys and girls for course registration
t Boys 35 15 50
h Girls 25 25 50
e Total 60 40 100
s
i
s

y
t Boys 35 15 50
h Girls 25 25 50
e Total 60 40 100
s
i
s

y
t Boys 35 15 50
Soln:
h Girls 25 25 50
Let the null hypothesis be H0 when there is no
e difference between boys and girls and H1 be the Total 60 40 100
s alternate hypothesis when there is a significant
Gender Registered Not Registered Total
i difference between boys and girls
Boys (50x60)/100 (50x40)/100 50
s = 30 = 20
Girls (50x60)/100 (50x40)/100 50
X2 = [(35-30)2]/30 + [(15-20)2]/20+ [(25-30)2]/30+ [(25-20)2]/30 = 30 = 20
X2= 4.166 Total 60 40 100
Degrees of freedom = no of categories = 2-1 = 1
The p value of statics is 0.0412 and this is less than 0.05

H Feature Engg and Dimensionality Reduction Techniques
y
p • Features are attributes
o • Feature Engg is about determining the subset of features
t
h
that form an important part of the input that improves the
e performance of the model.
s • Classification or any other model in ML
i
s • Feature Engg deals with 2 problems
o Feature Transformation – extraction of features and
creating new features that may be helpful in improving
performance
o Feature Selection – focusses on selection of features to
reduce the time but not at the cost of reliability
y
p Curse of Dimensionality (CoD)
o • refers to a set of problems that arise when working with high-dimensional data.
t • The dimension of a dataset corresponds to the number of attributes/features
h that exist in a dataset.
e • A dataset with a large number of attributes, generally of the order of a hundred
s or more, is referred to as high dimensional data.
i Domains of curse of dimensionality - There are a lot of domains where the direct effect of the curse
s of dimensionality can be seen, “Machine Learning” being the most effective.
Anomaly Detection - Anomaly detection is used for finding unforeseen items or events in the
dataset. In high-dimensional data anomalies often show a remarkable number of attributes
which are irrelevant in nature; certain objects occur more frequently in neighbour lists than
others.
Combinatorics - Whenever, there is an increase in the number of possible input combinations it
fuels the complexity to increase rapidly, and the curse of dimensionality occurs.
Machine Learning - In Machine Learning, a marginal increase in dimensionality also requires a
large increase in the volume in the data in order to maintain the same level of performance.
y
Dimensionality Reduction
p
o
• is the process of reducing the number of input variables in a dataset, also
t
known as the process of converting the high-dimensional variables into
h
lower-dimensional variables without changing their attributes of the same.
e
• It does not contain any extra variables, which makes it very simple for
s
analysts to analyze the data leading to faster results for algorithms.
• Feature subset selection problems typically uses greedy approach by looking at the best
i choice at the time using locally optimized choice, while hoping that it would lead to
s global optimal solutions.
Features can be removed based on 2 aspects
1. Feature Relevancy
• Some features contribute more for classification than any other features.
• The relevancy of the features can be determined based on information measures such as mutual
information, correlation based features like correlation coefficient and distant measures
2. Feature Redundancy
• Some features are redundant21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 184
y
• Procedure for Dimensionality Reduction:
p
o
1. Generate all possible sub-sets
t
2. Evaluate subsets and model performance
h
3. Evaluate the results for optimal feature selection
e • Filter based algorithms uses statistical measures for assessing features.
s • No learning algorithm is used
i • Eg: Correleation and information gain measures (mutual information)
s and entropy
• Learning algorithm used (selection and evaluation)
• Eg: Wrapper based methods use classifiers to identify best features
• Computationally intensive but has superior performance

y
p • Stepwise Forward Selection
o • Starts with empty set of attributes
t • Every time an attribute is tested for statistical significance for best quality and is added to
reduced set.
h
• This process is continued till a good reduced set of attributes is obtained.
e
s • Stepwise Backward Elimination
i • Starts with complete set of attributes
s • At every stage time an worst attribute is removed from the test leading to reduced test.
• This process is continued till a good reduced set of attributes is obtained.
COMBINED Approach
Both forward and reverse methods can be combined so that the procedure can add the best
attribute and remove the worst attribute.

y
p • Guassian Elimination Method problems
o
t
h
• LU Decompostion Method problems
e
s
• Refer Text Book
i
s

y
p • Principal Component Analysis (PCA or Karhunen-Loeve Transform or
o Hoteling Transform)
t • Transform a given set of measurements to a new set of features so that the
h features exhibit high information packing properties.
e • This leads to reduction and compact set of features
s • Elimination is possible because of the information redundancies.
i • The compact representation is of reduced transformation
s
• Goal of PCA
• To reduce the set of attributes to a newer, smaller set that captures the
variance of data.
• Advantages of PCA are immense. It reduces the attribute by
eliminating all irrelevant attributes
• PCA Algorithm 21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 188
y
p • Singular Value Decomposition (SVD)
o • Is useful in compression
t • Retain only certain components instead of the original matrix
h • Based on choice of retention, the compression can be controlled.
e Main advantage of SVD:
s • Compression
i • Used in data reduction
s

y
p • Linear Discriminant Analysis(LDA)
o • Is also feature reduction technique like PCA
t • Focus of LDA is to project higher dimensional data to a line (lower
h dimensional data).
e • It is also used to classify the data
s
i
s
• Singular Variable Decomposition

y
p • Refer Text Book Chapter2 for more information
o
t
h
e
s
i
s

Module2 PPT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module2 PPT

Uploaded by

Copyright:

Available Formats

Sri Sri Sri Dr.

Nirmalanandanatha Maha Swamiji

Sri Sri Dr. Prakashanatha Swamiji

Prof. B. G. Sangameshwara, ME., Ph.D

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 1

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 9

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 10

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 23

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 25

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 26

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 27

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 28

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 29

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 30

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 31

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 32

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 33

Heuristic cost should be less than or equal to the estimated

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 35

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 37

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 39

1. Data – raw facts, unformatted information

n Conventional Programming Initially – AI – General Purpose Rules

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 44

L Machine Learning & Data Analytics

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 56

Unlabelled data – do not have any labels in dataset

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 58

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 60

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 62

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 65

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 66

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 67

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 69

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 70

• Labelling is a costly process and difficult to perform by

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 71

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 74

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 75

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 77

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 78

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 79

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 80

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 82

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 83

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 84

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 89

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 91

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 92

Still, nominal data can be both qualitative and

Eg: Clothing sizes are one example of this type

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 97

Ordinal Data Fig 2.1 Types of Data

- is a numeric data for which the ratios between

Eg: ratio between heights, or money etc..

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 98

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 99

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 100

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 101

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 102

• Mean • Range • Charts

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 103

21CS54, AI&ML - Bhaktavatsala Shivaram, Adjunct Faculty, CSE 104