You are on page 1of 4

Statistical Measures

Statistical measures of central tendency and dispersion are fundamental concepts in descriptive
statistics. They help summarize and understand the characteristics of a dataset. Here's an
explanation of each:

Central Tendency:

Central tendency measures provide insight into the center or average of a dataset. They indicate
where most of the data points are concentrated. The three common measures of central tendency
are:

a. Mean: The mean, also known as the average, is calculated by adding up all the values in a
dataset and dividing by the number of data points. It's represented by the formula: Mean = (Sum
of all values) / (Number of values). The mean is sensitive to outliers.

b. Median: The median is the middle value in a dataset when the data is arranged in ascending
or descending order. If there's an even number of data points, the median is the average of the
two middle values. The median is not affected by extreme values and is a robust measure of
central tendency.

c. Mode: The mode is the value that appears most frequently in a dataset. A dataset can have one
mode (unimodal), two modes (bimodal), or more (multimodal). The mode is useful for
categorical and discrete data.

Central tendency measures help answer questions like "What is the typical value in this dataset?"
and provide a sense of where the data clusters.

Dispersion (Variability):

Dispersion measures describe the spread or variability of data points in a dataset. They tell us
how much the data points deviate from the central tendency measures. Common measures of
dispersion include:

a. Range: The range is the difference between the maximum and minimum values in the dataset.
It provides a simple but limited view of data spread.

b. Variance: Variance measures the average of the squared differences between each data point
and the mean. It's represented by the formula: Variance = Σ (x - μ)² / N, where x is each data
point, μ is the mean, and N is the number of data points.

c. Standard Deviation: The standard deviation is the square root of the variance. It provides a
measure of how spread out the data is around the mean. A larger standard deviation indicates
greater variability.
d. Interquartile Range (IQR): The IQR is the range between the first quartile (25th percentile)
and the third quartile (75th percentile). It's less sensitive to outliers and provides a measure of the
middle 50% of the data.

e. Coefficient of Variation (CV): The CV is the standard deviation divided by the mean. It's
used to compare the relative variability of two or more datasets with different units or scales.

In summary, central tendency measures give us a sense of where the data clusters, while
dispersion measures provide insights into the spread and variability of the data. Together, these
statistics help us better understand the characteristics of a dataset and make informed decisions in
various fields, including science, business, and social sciences.

Correlation & it’s types


Correlation is a statistical measure that quantifies the relationship between two or more variables.
It indicates whether and to what extent changes in one variable are associated with changes in
another variable. In statistics, correlation is used to assess the degree and direction of the
relationship between variables. There are different types of correlation, including:

Pearson Correlation (Pearson's r):

Pearson correlation, also known as the Pearson correlation coefficient or Pearson's r, measures
the linear relationship between two continuous variables.

It ranges from -1 to 1, where:

r = 1 indicates a perfect positive linear relationship (as one variable increases, the other increases
proportionally).

r = -1 indicates a perfect negative linear relationship (as one variable increases, the other
decreases proportionally).

r = 0 indicates no linear relationship.

It is sensitive to outliers and assumes that the data is normally distributed.

Spearman Rank Correlation:

Spearman rank correlation is used when dealing with non-continuous or ordinal data. It measures
the strength and direction of the monotonic relationship (increasing or decreasing) between two
variables.

It is based on the ranks of the data rather than the actual values, making it less sensitive to
outliers.
Spearman's rank correlation coefficient (rho) ranges from -1 to 1, where positive values indicate
a direct relationship and negative values indicate an inverse relationship.

Kendall's Tau:

Kendall's Tau is another rank-based measure of correlation that assesses the strength and
direction of the relationship between two variables. It's also suitable for non-continuous or
ordinal data.

It measures the proportion of concordant pairs (where the order of values in both variables
agrees) to all pairs.

Kendall's Tau has a range from -1 to 1, with the same interpretation as Spearman's rank
correlation.

Point-Biserial Correlation:

Point-biserial correlation is used when one variable is continuous, and the other is dichotomous
(having only two categories, e.g., yes/no).

It measures the strength and direction of the relationship between a continuous variable and a
binary variable.

The point-biserial correlation coefficient can range from -1 to 1, with positive values indicating a
positive relationship and negative values indicating a negative relationship.

Biserial Correlation:

Biserial correlation is used when one variable is continuous, and the other is dichotomous, but
the dichotomous variable is believed to have an underlying continuous distribution.

It estimates the relationship between the continuous variable and the latent continuous variable
underlying the dichotomous variable.

Phi Coefficient:

The phi coefficient is used when both variables are dichotomous (binary).

It is used to measure the association between two categorical variables.

These different types of correlation coefficients allow statisticians and data analysts to assess and
describe the relationships between variables in various contexts, helping to understand patterns,
dependencies, and associations in data. The choice of the appropriate correlation method depends
on the nature of the data and the research question.

You might also like