U 3

OMBAIML 301:
BASICS OF Artificial intelligence &

Machine Learning
Unit 3:
Statistical Analysis Initial Data Analysis
By: Asst. Prof. Toshi Dave

• Mining data includes knowing about
data, finding relations between data.
Introduction • Attribute is a characteristic or feature
that is measured for each observation
(record) and can vary from one
observation to another.
• Population is the entire group that you
want to draw conclusions about.
• Sample is the specific group that you

will collect data from.
3
Types of Data
4
Measuring Relationship between Attributes
• Covariance: Covariance measures how variables vary together.

Covariance indicates the direction of the linear relationship
between variables.
✓ Positive covariance means that the variables vary together
in the same direction.
✓ Negative covariance means they vary in the opposite
direction.
✓ 0 covariance means that the variables don’t vary together
or they are independent of each other.
5
• Correlation: A statistical measure of the strength of a linear
relationship between two variables. Its values can range from -
1 to 1.
✓ -1 means perfect negative or inverse correlation
✓ 1 means perfect positive or direct correlation
✓ 0 means no linear relationship.
o Two methods of calculating correlation can help with

these issues:
1) Pearson Correlation
2) Spearman Rank Correlation.
6
• Chi-square: A statistical procedure for determining the
difference between observed and expected data.
✓ It is also used to determine whether it correlates to
the categorical variables in our data.
✓ It helps to find out whether a difference between
two categorical variables is due to chance or a
relationship between them.
7
ρ (X,Y) = cov (X,Y) / σ𝑿.σy
8
Measure of Distribution
• Statistical dispersion means the extent to which numerical

data is likely to vary about an average value. Thus dispersion
helps to understand the distribution of the data.
• Skewness and Kurtosis are statistical measures that describe

the shape of the data distribution. Both are numerical ways to
assess the shape of the data set. These normality tests are
used to determine whether the distribution is asymmetrical
and irregular.
9
Skewness
• Measurement of the distortion of symmetrical distribution or

asymmetry in a data set.
• Skewness is demonstrated on a bell curve when data points are
not distributed symmetrically to the left and right sides of the
median on a bell curve.
• If the bell curve is shifted to the left or the right, it is said to be
skewed.
• Zero skew are called Normal Distribution of Data (bell curve/
shaped).
10
11
• Kurtosis is a numerical method in
statistics that measures the
sharpness of the peak in the data
distribution.
• Also called as Tailedness of a
distribution.
Definition of
Kurtosis
12
13
Box & Whiskers Plot
14
Box & Whiskers Plot
15
• Fundamental concept in
Probability statistics and data analysis
• Measures the likelihood of

events and their
outcomes.
16
Types of Probability
Marginal Conditional
Joint Probability
Probability Probability
Probability of a Probability of one

Probability of two single event event occurring
events occurring occurring, given that another
simultaneously. irrespective of the event has already
other event. occurred.
17
Probability Distributions
• Describehow probabilities are distributed across different

outcomes or values in a random experiment.
• Categories : 2
– Continuous Distributions for variables like height or
weight.
– Discrete Distributions for variables like the number of
coin tosses needed to get a head.
18
• Stands for Probability Density
Functions.
PDF • PDFs are used in continuous

probability distributions to
describe the probability of
observing a specific value.
• Emphasizes that the area

under the PDF curve
between two values
represents the probability of
the random variable falling
within that range.
19
• Stands for Cumulative
Distribution Functions.
CDF • Clarifies that CDFs are used in

both continuous and discrete
distributions.
• Shows how CDFs display the

cumulative probability of a
random variable being less
than or equal to a specific
value.
20
THANK YOU

U 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

U 3

Uploaded by

Copyright:

Available Formats

OMBAIML 301:

BASICS OF Artificial intelligence &

By: Asst. Prof. Toshi Dave

• Sample is the specific group that you

• Covariance: Covariance measures how variables vary together.

o Two methods of calculating correlation can help with

• Statistical dispersion means the extent to which numerical

• Skewness and Kurtosis are statistical measures that describe

• Measurement of the distortion of symmetrical distribution or

• Measures the likelihood of

Probability of a Probability of one

• Describehow probabilities are distributed across different

PDF • PDFs are used in continuous

• Emphasizes that the area

CDF • Clarifies that CDFs are used in

• Shows how CDFs display the

You might also like