You are on page 1of 29

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

CSN-373: Probability Theory for Computer Engineers

Lecture 7: Frequency Distributions, Distribution Shapes and Graphs, Measures of


Central Tendency, Measures of Variation, Measure of Position

Dr. Sudip Roy (a.k.a., SR)


Department of Computer Science & Engineering
Outline of Module 1:

● Concept of probability
● Random variables
● Distribution functions: discrete and continuous
● Moments and moment generating functions

2
Representing Data in Frequency Distributions Graphically:
Histograms, Frequency Polygons, and Ogives:

• The three most used graphs in research are


1. The histogram
2. The frequency polygon
3. The cumulative frequency graph, or ogive (pronounced o-jive)
• The Histogram: The histogram is a graph that displays the data by using contiguous
vertical bars (unless the frequency of a class is 0) of various heights to represent the
frequencies of the classes.
• Consider the data for each graph are the distribution of the miles that 20 randomly
selected runners ran during a given week.

3
Representing Data in Frequency Distributions Graphically:
Histograms, Frequency Polygons, and Ogives:

• The three most used graphs in research are


1. The histogram
2. The frequency polygon
3. The cumulative frequency graph, or ogive (pronounced o-jive)
• The Frequency Polygon: The frequency polygon is a graph that displays the data by
using lines that connect points plotted for the frequencies at the midpoints of the
classes. The frequencies are represented by the heights of the points.
• Consider the data for each graph are the distribution of the miles that 20 randomly
selected runners ran during a given week.

4
Representing Data in Frequency Distributions Graphically:
Histograms, Frequency Polygons, and Ogives:

• The three most used graphs in research are


1. The histogram
2. The frequency polygon
3. The cumulative frequency graph, or ogive (pronounced o-jive)
• The Ogive: The ogive is a graph that represents the cumulative frequencies for the
classes in a frequency distribution.
• Consider the data for each graph are the distribution of the miles that 20 randomly
selected runners ran during a given week.

5
Representing Data in Frequency Distributions Graphically:
Histograms, Frequency Polygons, and Ogives:

• Relative Frequency Graphs: These distributions can be converted to distributions using


proportions instead of raw data as frequencies. These types of graphs are called relative
frequency graphs.
• Example: Miles Run per Week

6
Distribution Shapes:
• When one is describing data, it is important to be
able to recognize the shapes of the distribution
values. Later, you will see that the shape of a
distribution also determines the appropriate statistical
methods used to analyze the data.

• A distribution can have many shapes, and one


method of analyzing a distribution is to draw a
histogram or frequency polygon for the distribution.
Several of the most common shapes are:
– the bell-shaped or mound-shaped
– the uniform shaped
– the J-shaped
– the reverse J-shaped
– the positively or right-skewed shape
– the negatively or left-skewed shape
– the bimodal-shaped
– the U-shaped

• Distributions are most often not perfectly shaped, so


it is not necessary to have an exact shape but rather
to identify an overall pattern.

7
Other Types of Graphs:

• Several other types of graphs are often used in statistics. They are the bar graph,
Pareto chart, time series graph, and pie graph.
• Bar Graphs: A bar graph represents the data by using vertical or horizontal bars
whose heights or lengths represent the frequencies of the data.

8
Other Types of Graphs:

• Several other types of graphs are often used in statistics. They are the bar graph,
Pareto chart, time series graph, and pie graph.
• Pareto Charts: A Pareto chart is used to represent a frequency distribution for a
categorical variable, and the frequencies are displayed by the heights of vertical
bars, which are arranged in order from highest to lowest.
• The Time Series Graph: A time series graph represents data that occur over a
specific period of time.

9
Other Types of Graphs:

• Several other types of graphs are often used in statistics. They are the bar graph,
Pareto chart, time series graph, and pie graph.
• The Pie Graph: A pie graph is a circle that is divided into sections or wedges
according to the percentage of frequencies in each category of the distribution.
Pie graphs are used extensively in statistics. The purpose of the pie graph is to
show the relationship of the parts to the whole by visually comparing the sizes of
the sections. Percentages or proportions can be used.

10
Measures of Central Tendency: Mean

• A statistic is a characteristic or measure obtained by using the data values from a


sample.
• A parameter is a characteristic or measure obtained by using all the data values from a
specific population.

11
Measures of Central Tendency: Mean

12
Measures of Central Tendency: Mean

13
Measures of Central Tendency:
Median and Mode

• The mode is the value that occurs most often in the data set.
• A data set that has only one value that occurs with the greatest frequency is said to
• be unimodal.
• If a data set has two values that occur with the same greatest frequency, both values
• are considered to be the mode and the data set is said to be bimodal. If a data set has
more than two values that occur with the same greatest frequency, each value is used as
the mode, and the data set is said to be multimodal. When no data value occurs more
than once, the data set is said to have no mode.
• Since each value occurs only once, there is no mode. Note: Do not say that the mode is
zero. That would be incorrect, because in some data, such as temperature, zero can be an
actual value.

14
Measures of Central Tendency: Median
and Mode

15
Measures of Central Tendency: Median
and Mode

16
Measures of Central Tendency:

• The Midrange
• The midrange is a rough estimate of the middle. It is found by adding the lowest and
highest values in the data set and dividing by 2. It is a very rough estimate of the average
and can be affected by one extremely high or low value.

• The Weighted Mean


• Sometimes, you must find the mean of a data set in which not all values are equally
represented. The type of mean that considers an additional factor is called the weighted
mean, and it is used when the values are not all equally represented.

17
Measures of Central Tendency:

18
Measures of Central Tendency:

19
Measures of Variation:

• Range
• The range is the simplest of the three measures and is defined now.

• Population Variance and Standard Deviation

20
Measures of
Variation:

21
Measures of Variation:

22
Measures of Variation:

23
Measures of Variation:

• Sample Variance and Standard Deviation

• This formula is not usually used, however, since in most cases the purpose of calculating
the statistic is to estimate the corresponding parameter.

24
Measures of Variation:

25
Measures of Variation:

26
Chebyshev’s Theorem:

• Chebyshev’s Theorem: As stated previously, the variance and standard deviation of a


variable can be used to determine the spread, or dispersion, of a variable. That is, the
larger the variance or standard deviation, the more the data values are dispersed.

• Chebyshev’s theorem can be used to find the minimum percentage of data values that will
fall between any two given values. This theorem can be applied to any distribution
regardless of its shape.

27
Chebyshev’s Theorem:

28
Next Class…

29

You might also like