You are on page 1of 92

STAT 1206 Probability and Modelling

Class-01:
Date: 28/08/2023

1.What is population?

Ans: In statistics, a population is the entire set of individuals, objects, or


events that are being studied. It is the collection of all possible data
values that could be obtained from the study. The population is typically
denoted by a capital letter, such as P.

For example, the population of all students in a school could be denoted


by P = {student1, student2, ..., studentN}.

A sample is a subset of the population that is used to represent the


population. The sample is typically denoted by a lowercase letter, such
as s.

For example, a sample of 100 students from the school population could
be denoted by s = {student1, student2, ..., student100}.

The goal of statistical inference is to use the sample data to make


inferences about the population. This can be done by estimating
population parameters, such as the mean and standard deviation, or by
testing hypotheses about the population.

There are two main types of populations:


 Finite populations: These are populations that have a finite number
of members. For example, the population of all students in a school
is a finite population.
 Infinite populations: These are populations that have an infinite
number of members. For example, the population of all possible
real numbers is an infinite population.

In practice, most populations are finite. However, it is often assumed


that the population is infinite when making statistical inferences. This is
because the results of statistical inference are often not sensitive to the
finiteness of the population.

Here are some examples of populations in statistics:

 The population of all students in a school


 The population of all adults in a country
 The population of all cars in a city
 The population of all possible outcomes of a coin toss
 The population of all possible hands in a game of poker

2. What is sample?

Ans: In statistics, a sample is a subset of the population that is used to


represent the population. The sample is typically denoted by a lowercase
letter, such as s.

For example, a sample of 100 students from the school population could
be denoted by s = {student1, student2, ..., student100}.

The goal of statistical inference is to use the sample data to make


inferences about the population. This can be done by estimating
population parameters, such as the mean and standard deviation, or by
testing hypotheses about the population.

There are many different ways to select a sample from a population.


Some of the most common methods include:

 Simple random sampling: Every member of the population has an


equal chance of being selected.
 Stratified random sampling: The population is divided into strata,
and then a random sample is selected from each stratum.
 Cluster sampling: The population is divided into clusters, and then
a random sample of clusters is selected.
 Systematic sampling: Every kth member of the population is
selected, where k is a random number.

The choice of sampling method depends on the specific situation and the
goals of the study.

Here are some of the advantages of using samples in statistics:

 Samples are easier to collect data from than populations.


 Samples are less expensive to collect data from than populations.
 Samples can be collected more quickly than populations.
 Samples can be used to make inferences about populations.

Here are some of the disadvantages of using samples in statistics:

 Samples may not be representative of the population.


 Samples may be biased.
 Samples may be too small to make accurate inferences about the
population.

It is important to carefully consider the advantages and disadvantages of


using samples in statistics before making a decision.
3.What is the perameter of population?

Ans: In statistics, a parameter is a characteristic of a population. It is a value that


describes the population as a whole, and it is not affected by the sample size.

Some common parameters include:

 The mean: The mean is the average value of the population.


 The median: The median is the middle value of the population, when all the
values are ranked from least to greatest.
 The mode: The mode is the most frequent value in the population.
 The variance: The variance is a measure of how spread out the values in the
population are.
 The standard deviation: The standard deviation is the square root of the
variance. It is a measure of how much variation there is from the mean.

Parameters are typically denoted by Greek letters, such as μ (mu) for the mean and
σ2 (sigma squared) for the variance.

It is important to note that parameters are typically unknown, and they must be
estimated from samples. The sample mean, sample median, sample mode, sample
variance, and sample standard deviation are all estimates of the corresponding
population parameters.

The accuracy of these estimates depends on the sample size. In general, larger
sample sizes produce more accurate estimates.

4.what is Data(information) and classify ?


Ans: Data in statistics is a collection of observations, measurements, or facts that can be used
to describe, analyze, or draw conclusions about a population. It can be quantitative or qualitative,
and it can be collected from a variety of sources, such as surveys, experiments, or observations.

There are two main types of data in statistics:


 Numerical data: This is data that can be represented by numbers. It can be either
discrete (countable) or continuous (measurable). Examples of numerical data include
height, weight, age, and income.
 Categorical data: This is data that can be categorized into groups or classes. It is not
represented by numbers. Examples of categorical data include gender, marital status, and
eye color.

Data can be classified in many different ways, depending on the specific needs of the study.
Some common ways to classify data include:

 Primary data: This is data that is collected directly from the source. For example, data
collected from a survey or experiment is primary data.
 Secondary data: This is data that has already been collected by someone else. For
example, data collected from a government census or a company's financial records is
secondary data.
 Discrete data: This is data that can only take on a finite number of values. For example,
the number of siblings a person has is discrete data.
 Continuous data: This is data that can take on any value within a given range. For
example, a person's height is continuous data.
 Qualitative data: This is data that describes a characteristic or attribute of a person or
thing. For example, a person's gender is qualitative data.
 Quantitative data: This is data that can be measured or counted. For example, a person's
height is quantitative data.

The type of data that is collected will depend on the specific research question that is being
asked. For example, if a researcher is interested in the average height of adults in a particular
country, they would collect quantitative data. If a researcher is interested in the different types of
cars that people drive, they would collect qualitative data.

Data can be analyzed using a variety of statistical methods. These methods can be used to
describe the data, to test hypotheses, and to make predictions. The choice of statistical method
will depend on the specific type of data that is being analyzed and the research question that is
being asked.

Data is an essential part of statistics. It is the foundation on which all statistical analyses are
based. By understanding the different types of data and how to collect and analyze it, statisticians
can make valuable contributions to a wide range of fields, such as business, economics,
healthcare, and education.

5. What is Variable?

Ans: In statistics, a variable is a characteristic or attribute of a person or thing that can be


measured or counted. It is a symbol that represents a value that can change. Variables are used to
collect data and to analyze the data.
There are two main types of variables in statistics:

 Quantitative variables: These variables are numbers that can be measured or counted.
Examples of quantitative variables include height, weight, age, and income.
 Qualitative variables: These variables are categories or groups. Examples of qualitative
variables include gender, marital status, and eye color.

Variables can also be classified as discrete or continuous.

 Discrete variables: These variables can only take on a finite number of values. For
example, the number of siblings a person has is a discrete variable.
 Continuous variables: These variables can take on any value within a given range. For
example, a person's height is a continuous variable.

The type of variable that is used will depend on the specific research question that is being asked.
For example, if a researcher is interested in the average height of adults in a particular country,
they would use a quantitative variable. If a researcher is interested in the different types of cars
that people drive, they would use a qualitative variable.

Variables are an essential part of statistics. They are used to collect data, to analyze the data, and
to make inferences about the population. By understanding the different types of variables and
how to use them, statisticians can make valuable contributions to a wide range of fields, such as
business, economics, healthcare, and education.

Here are some examples of variables in statistics:

 Height: This is a quantitative variable that can be measured in centimeters or inches.


 Weight: This is a quantitative variable that can be measured in kilograms or pounds.
 Age: This is a quantitative variable that can be measured in years.
 Gender: This is a qualitative variable that can be categorized as male or female.
 Marital status: This is a qualitative variable that can be categorized as married, single,
divorced, or widowed.
 Eye color: This is a qualitative variable that can be categorized as brown, blue, green, or
hazel.

6.Scale of measurement:

In statistics, a scale of measurement is a classification system for variables. It describes the type
of information that is being measured and the mathematical operations that can be performed on
the data.

There are four main scales of measurement:

 Nominal scale: This is the weakest scale of measurement. It only classifies data into
categories that have no intrinsic order. For example, gender, eye color, and blood type are
nominal variables.
 Ordinal scale: This scale has order, but the intervals between the categories are not
equal. For example, academic levels (freshman, sophomore, junior, senior) and Likert
scale (strongly agree, agree, neutral, disagree, strongly disagree) are ordinal variables.
 Interval scale: This scale has order and equal intervals between the categories. However,
it does not have a true zero point. For example, temperature in degrees Celsius or
Fahrenheit is an interval variable.
 Ratio scale: This scale has order, equal intervals, and a true zero point. This means that
the zero point on the scale represents the absence of the quantity being measured. For
example, height, weight, and time are ratio variables.

The scale of measurement of a variable determines the types of statistical analysis that can be
performed on the data. For example, nominal variables can only be used for descriptive statistics,
such as frequency counts and cross-tabulations. Ordinal variables can be used for descriptive
statistics and some inferential statistics, such as t-tests and analysis of variance (ANOVA).
Interval variables can be used for all types of statistical analysis, including t-tests, ANOVA, and
regression analysis. Ratio variables can also be used for all types of statistical analysis, but they
are especially useful for regression analysis.

It is important to choose the correct scale of measurement for a variable. This will ensure that the
data is analyzed correctly and that the results are meaningful.

Here is a table summarizing the four scales of measurement:

Scale of
measurement Characteristics Mathematical operations
Nominal scale Data is classified into categories Grouping, counting
Ordinal scale Data has order, but the intervals are not equal Ranking, comparing
Interval scale Data has order and equal intervals Adding, subtracting
Data has order, equal intervals, and a true Adding, subtracting, multiplying,
Ratio scale zero point dividing

7. Classify the Data variable :

Ans:In statistics, data variables can be classified into two main categories:

 Quantitative variables: These variables are numbers that can be measured or counted.
Examples of quantitative variables include height, weight, age, and income.
 Qualitative variables: These variables are categories or groups. Examples of qualitative
variables include gender, marital status, and eye color.

Quantitative variables can be further classified into two categories:

 Discrete variables: These variables can only take on a finite number of values. For
example, the number of siblings a person has is a discrete variable.
 Continuous variables: These variables can take on any value within a given range. For
example, a person's height is a continuous variable.

Qualitative variables can also be classified into two categories:

 Nominal variables: These variables are categories that have no intrinsic order. For
example, gender, eye color, and blood type are nominal variables.
 Ordinal variables: These variables have order, but the intervals between the categories
are not equal. For example, academic levels (freshman, sophomore, junior, senior) and
Likert scale (strongly agree, agree, neutral, disagree, strongly disagree) are ordinal
variables.

The scale of measurement of a variable determines the types of statistical analysis that can be
performed on the data. For example, nominal variables can only be used for descriptive statistics,
such as frequency counts and cross-tabulations. Ordinal variables can be used for descriptive
statistics and some inferential statistics, such as t-tests and analysis of variance (ANOVA).
Interval variables can be used for all types of statistical analysis, including t-tests, ANOVA, and
regression analysis. Ratio variables can also be used for all types of statistical analysis, but they
are especially useful for regression analysis.

Here is a table summarizing the different types of data variables and their characteristics:

Data type Characteristics Examples


Quantitative Numbers that can be measured or counted Height, weight, age, income
Number of siblings, number of students
Discrete Can only take on a finite number of values in a class
Continuous Can take on any value within a given range A person's height, a car's fuel efficiency
Qualitative Categories or groups Gender, eye color, marital status
Nominal Categories that have no intrinsic order Gender, eye color, blood type
Categories that have order, but the intervals are
Ordinal not equal Academic levels, Likert scale
Data variable

Qualitive Quantitive
Class-02:
Date: 30/08/2023

Data collection:
Data collection in statistics is the process of gathering information from different sources to
answer research questions or test hypotheses. It is an essential step in the statistical analysis
process, and the quality and accuracy of the data collected directly impact the validity and
reliability of the findings.

There are two main types of data collection methods in statistics: primary data collection and
secondary data collection.

 Primary data collection involves collecting new data specifically for the research study.
This can be done through surveys, interviews, experiments, or observations.
 Secondary data collection involves using data that has already been collected by
someone else. This data can be found in government databases, academic journals, or
commercial data sources.

The choice of data collection method will depend on the research questions being asked, the
resources available, and the time constraints.

Here are some examples of data collection methods in statistics:

 Surveys: Surveys are a popular method of collecting data from a large number of people.
They can be used to collect information about people's opinions, behaviors, or
demographics.
 Interviews: Interviews are a more in-depth way to collect data from people. They can be
used to get people's perspectives on a particular topic or to gather detailed information
about their experiences.
 Experiments: Experiments are used to test cause-and-effect relationships. They involve
manipulating one variable (the independent variable) and observing how it affects
another variable (the dependent variable).
 Observations: Observations are used to collect data about people's behaviors or the
environment. They can be conducted in a natural setting or in a laboratory setting.
 Reviews of existing records: This method involves collecting data from existing records,
such as government databases, academic journals, or commercial data sources.
The data collection process should be carefully planned and executed to ensure that the data is
accurate and reliable. The following are some important considerations when planning a data
collection study:

 Define the research questions or hypotheses. What do you want to learn from the data?
 Identify the target population. Who are you going to collect data from?
 Choose the appropriate data collection method. What method will best answer your
research questions?
 Develop a data collection plan. This should include the sampling strategy, the data
collection instruments, and the data collection procedures.
 Pilot test the data collection plan. This will help you identify any potential problems
and make necessary adjustments.
 Collect the data. This should be done carefully and according to the data collection plan.
 Clean and prepare the data. This involves checking for errors and inconsistencies, and
formatting the data for analysis.

Data collection is an important part of the statistical analysis process. By carefully planning and
executing the data collection process, you can ensure that you collect the data you need to
answer your research questions or test your hypotheses.

Data presentation:
Data presentation in statistics is the process of organizing and displaying data in a way that is
clear, concise, and easy to understand. It is an important step in the statistical analysis process, as
it allows the researcher to communicate the findings of the study to others.

There are three main types of data presentation in statistics:

 Textual presentation: This is the simplest form of data presentation, and involves
describing the data in words. It is often used to present small amounts of data or to
provide a brief overview of the data.
 Tabular presentation: This involves organizing the data in a table, which makes it
easier to see patterns and relationships. Tables are often used to present large amounts of
data or to compare different groups of data.
 Graphical presentation: This involves using graphs or charts to represent the data
visually. Graphs and charts can be used to communicate complex data in a way that is
easy to understand.
The choice of data presentation method will depend on the type of data, the purpose of the
presentation, and the audience. For example, if the data is complex or if the audience is not
familiar with statistics, then a graphical presentation may be the best option.

Here are some tips for effective data presentation in statistics:

 Keep it simple. The presentation should be easy to understand and should not overwhelm
the audience with too much information.
 Use clear and concise language. Avoid jargon and technical terms that the audience may
not understand.
 Use appropriate visuals. The visuals should be clear and easy to read, and they should be
used to enhance the presentation, not to distract from it.
 Label all axes and categories. This will help the audience understand the data and make
comparisons.
 Use consistent formatting. This will make the presentation look neat and professional.
 Proofread carefully. This will help to avoid errors and ensure that the presentation is
accurate.

Data presentation is an important part of the statistical analysis process. By carefully presenting
the data, the researcher can ensure that the findings of the study are communicated effectively to
others.

Exclusive & Inclusive method :


In statistics, the inclusive method and the exclusive method are two ways of grouping data into
classes or intervals.

 Inclusive method is a method of grouping data in which the upper limit of a class
interval is included in the class itself. For example, if the class interval is 10-19, then the
upper limit of 19 is included in the class. This means that a value of 19 is included in the
class 10-19.
 Exclusive method is a method of grouping data in which the upper limit of a class
interval is excluded from the class. For example, if the class interval is 10-19, then the
upper limit of 19 is excluded from the class. This means that a value of 19 is not included
in the class 10-19.

Inclusive method

The inclusive method is the simplest way to group data into classes or intervals. It is easy to
understand and use, and it is often the default method for grouping discrete data.
To use the inclusive method, simply define the lower and upper limits of each class interval. The
upper limit of each class interval is included in the class itself. For example, if you are grouping
data on the number of children in a family, you might use the following class intervals:

 0-1
 2-3
 4-5
 6-7
 8+

In this example, the upper limit of each class interval is included in the class itself. This means
that a family with 3 children would be included in the class interval 2-3.

Exclusive method

The exclusive method is a bit more complex than the inclusive method, but it can be useful for
grouping continuous data.

To use the exclusive method, simply define the lower and upper limits of each class interval. The
upper limit of each class interval is excluded from the class. For example, if you are grouping
data on the height of people, you might use the following class intervals:

 50-54
 55-59
 60-64
 65-69
 70+

In this example, the upper limit of each class interval is excluded from the class. This means that
a person who is 54 inches tall would be included in the class interval 50-54, but a person who is
55 inches tall would be included in the class interval 55-59.

When to use which method

The choice of whether to use the inclusive or exclusive method depends on the specific data set
and the purpose of the analysis.

The inclusive method is often used when the data is discrete, meaning that it can only take on a
certain number of values. For example, the number of children in a family is discrete data. The
inclusive method is also often used when the data is qualitative, meaning that it is not a
numerical value. For example, the color of a person's hair is qualitative data.

The exclusive method is often used when the data is continuous, meaning that it can take on any
value within a range. For example, the height of people is continuous data. The exclusive method
is also often used when the data is used to calculate measures of central tendency, such as the
mean and median.
Here are some general guidelines for choosing between the inclusive and exclusive methods:

 Use the inclusive method for discrete data.


 Use the exclusive method for continuous data.
 Use the inclusive method for qualitative data.
 Use the exclusive method when the data is used to calculate measures of central
tendency.

Feature Inclusive Method Exclusive Method


Upper limit of class
interval Included Excluded
Values included in class All values up to and including the All values up to the upper limit, but not
interval upper limit including it
When to use Discrete data Continuous data

Example:
1.Prepare a frequency distribution by
inclusive method taking class interval of 7
from the following data.
28,17,15,22,29,21,23,27,18,12,7,2,9,4,6,1,8,3,10
,5,20,16,12,8,4,33,27,21,15,9,3,36,27,18,9,2,4,6,
32,31,29,18,14,13,15,11,9,7,1,5,37,32,28,26,24,
20,19,25,19,20
Solution

Frequency Distribution
(inclusive Method)

2. To create a frequency table with class intervals and tally marks from
a set of 50 data points using the inclusive method, you'll need to follow
these steps:

Step 1: Sort the data in ascending order.

Step 2: Determine the range (the difference between the maximum and
minimum values) and the number of classes (intervals) you want. For
this example, let's choose 5 classes.
Step 3: Calculate the class width by dividing the range by the number of
classes and round up to the nearest whole number.

Step 4: Determine the starting point of the first class by subtracting the
class width from the minimum value and round down to the nearest
whole number.

Step 5: Create the class intervals.

Step 6: Count the frequency of data points in each class interval using
tally marks.

Here's an example using randomly generated data:

Data: 23, 29, 34, 38, 40, 42, 45, 46, 48, 51, 53, 56, 57, 59, 60, 61, 62, 63,
65, 67, 68, 69, 70, 73, 74, 75, 77, 79, 80, 81, 82, 83, 84, 85, 87, 88, 89,
91, 93, 95, 96, 97, 99, 100, 102, 105, 108, 110, 115

Step 1: Sort the data in ascending order:


23, 29, 34, 38, 40, 42, 45, 46, 48, 51, 53, 56, 57, 59, 60, 61, 62, 63, 65,
67, 68, 69, 70, 73, 74, 75, 77, 79, 80, 81, 82, 83, 84, 85, 87, 88, 89, 91,
93, 95, 96, 97, 99, 100, 102, 105, 108, 110, 115

Step 2: Determine the range and the number of classes.


Range = 115 (max) - 23 (min) = 92
Number of classes (intervals) = 5

Step 3: Calculate the class width:


Class width = Range / Number of classes = 92 / 5 = 18.4 ≈ 19 (round up
to the nearest whole number)

Step 4: Determine the starting point of the first class:


Starting point of the first class = Min - (Class width * (0-based index of
class)) = 23 - (19 * 0) = 23

Step 5: Create the class intervals:


Class 1: 23-41
Class 2: 42-60
Class 3: 61-79
Class 4: 80-98
Class 5: 99-117

Step 6: Count the frequency of data points in each class interval using
tally marks. I'll provide a frequency table with tally marks:

| Class Interval | Frequency |


|----------------|-----------|
| 23-41 | ||||| ||| |
| 42-60 | ||||| ||| ||||| |
| 61-79 | ||||| ||| ||||| |
| 80-98 | ||||| ||| ||||| ||||| |
| 99-117 | ||||| ||| ||||| ||||| ||||| |

This frequency table represents the data using the inclusive method
with 5 class intervals and tally marks for each interval.

Class:03
Date:04/09/2023

Basic principle of table constraction in statistics:


A statistical table has at least four major parts and some other minor parts.
(1) The Title
(2) The Box Head (column captions)
(3) The Stub (row captions)
(4) The Body
(5) Prefatory Notes
(6) Foot Notes
(7) Source Notes
The general sketch of table indicating its necessary parts is shown below:

—–THE TITLE—-
—-Prefatory Notes—-

—-Box Head—-
—-Row Captions—- ——Column Captions—–

—Stub Entries— —–The Body—–

Foot Notes…
Source Notes…

(1) The Title

The title is the main heading written in capitals shown at the top of the table. It must explain the
contents of the table and throw light on the table, as whole different parts of the heading can be
separated by commas. There are no full stops in the little.

(2) The Box Head (column captions)

The vertical heading and subheading of the column are called columns captions. The spaces
where these column headings are written is called the box head. Only the first letter of the box
head is in capital letters and the remaining words must be written in lowercase.

(3) The Stub (row captions)

The horizontal headings and sub heading of the row are called row captions and the space where
these rows headings are written is called the stub.

(4) The Body

This is the main part of the table which contains the numerical information classified with
respect to row and column captions.

(5) Prefatory Notes


A statement given below the title and enclosed in brackets usually describes the units of
measurement and is called the prefatory notes.

(6) Foot Notes

These appear immediately below the body of the table providing additional explanation.

(7) Source Notes

The source notes are given at the end of the table indicating the source the information has been
taken from. It includes the information about compiling agency, publication, etc.

General Rules of Tabulation

 A table should be simple and attractive. There should be no need of further explanation
(details).
 Proper and clear headings for columns and rows are necessary.
 Suitable approximation may be adopted and figures may be rounded off.
 The unit of measurement should be well defined.
 If the observations are large in numbers they can be broken into two or three tables.
 Thick lines should be used to separate the data under big classes and thin lines to separate
the sub classes of data.
 To represent data meaningfully within a short form.
 To represent complicated data in a simple and meaning form.
 To detect errors and omission on the data.
 To facilated staticale analysis.
 To help reference.

Graphiclae presentation of Data:

A graphical presentation of data is a visual representation of data using graphs, charts, and
diagrams. It is a more effective way of understanding and comparing data than seen in a tabular
form. Graphical presentation helps to qualify, sort, and present data in a method that is simple to
understand for a larger audience. Graphs enable in studying the cause and effect relationship
between two variables through both time series and frequency distribution.
There are many different types of graphical presentations of data, each with its own strengths and
weaknesses. Some of the most common types of graphical presentations of data in statistics
include:

 Bar graph: A bar graph is a chart that uses bars to represent the frequencies of
different data values. It is a good way to compare the frequencies of different categories
of data.

Bar graph of number of students in different classes

Bar graphs are the pictorial representation of data (generally grouped), in the form of vertical or
horizontal rectangular bars, where the length of bars are proportional to the measure of data.
They are also known as bar charts. Bar graphs are one of the means of data handling in statistics.

The collection, presentation, analysis, organization, and interpretation of observations of data are
known as statistics. The statistical data can be represented by various methods such as tables, bar
graphs, pie charts, histograms, frequency polygons, etc. In this article, let us discuss what is a bar
chart, different types of bar graphs, uses, and solved examples.

Table of Contents:

 Definition
 Types of Bar Graph
o Vertical Bar Graph
o Horizontal Bar Graph
o Grouped Bar Graph
o Stacked Bar Graph
 Properties
 Uses
 Advantages and Disadvantages
 Difference Between Bar Graph and Histogram
 Difference Between Bar Graph and Pie Chart
 Difference Between Bar Graph and Line Graph
 Steps to Draw Bar Graph

What is Bar Graph?


The pictorial representation of grouped data, in the form of vertical or horizontal rectangular
bars, where the lengths of the bars are equivalent to the measure of data, are known as bar graphs
or bar charts.

The bars drawn are of uniform width, and the variable quantity is represented on one of the axes.
Also, the measure of the variable is depicted on the other axes. The heights or the lengths of the
bars denote the value of the variable, and these graphs are also used to compare certain
quantities. The frequency distribution tables can be easily represented using bar charts which
simplify the calculations and understanding of data.

The three major attributes of bar graphs are:

 The bar graph helps to compare the different sets of data among different groups easily.
 It shows the relationship using two axes, in which the categories are on one axis and the discrete
values are on the other axis.
 The graph shows the major changes in data over time.

What Constitutes a Bar Graph?

Following are the many parts of a bar graph:

 Vertical axis
 Horizontal axis
 The bar graph’s title informs the reader of its purpose.
 The title of the horizontal axis indicates the information that is shown there.
 The title of the vertical axis indicates the data it is used to display.
 The categories on the particular axis indicate what each bar represents.
 The bar graph’s scale demonstrates how numbers are used in the data. It is a system of markings
spaced at specific intervals that aid in object measurement. For instance, the scale of a graph
may be stated as 1 unit = 10 fruits

Types of Bar Graphs


The bar graphs can be vertical or horizontal. The primary feature of any bar graph is its length or
height. If the length of the bar graph is more, then the values are greater than any given data.
Bar graphs normally show categorical and numeric variables arranged in class intervals. They
consist of an axis and a series of labelled horizontal or vertical bars. The bars represent
frequencies of distinctive values of a variable or commonly the distinct values themselves. The
number of values on the x-axis of a bar graph or the y-axis of a column graph is called the scale.

The types of bar charts are as follows:

1. Vertical bar chart


2. Horizontal bar chart

Even though the graph can be plotted using horizontally or vertically, the most usual type of bar
graph used is the vertical bar graph. The orientation of the x-axis and y-axis are changed
depending on the type of vertical and horizontal bar chart. Apart from the vertical and horizontal
bar graph, the two different types of bar charts are:

 Grouped Bar Graph


 Stacked Bar Graph

Now, let us discuss the four different types of bar graphs.

Vertical Bar Graphs

When the grouped data are represented vertically in a graph or chart with the help of bars, where
the bars denote the measure of data, such graphs are called vertical bar graphs. The data is
represented along the y-axis of the graph, and the height of the bars shows the values.

Horizontal Bar Graphs

When the grouped data are represented horizontally in a chart with the help of bars, then such
graphs are called horizontal bar graphs, where the bars show the measure of data. The data is
depicted here along the x-axis of the graph, and the length of the bars denote the values.

Grouped Bar Graph

The grouped bar graph is also called the clustered bar graph, which is used to represent the
discrete value for more than one object that shares the same category. In this type of bar chart,
the total number of instances are combined into a single bar. In other words, a grouped bar graph
is a type of bar graph in which different sets of data items are compared. Here, a single colour is
used to represent the specific series across the set. The grouped bar graph can be represented
using both vertical and horizontal bar charts.

Stacked Bar Graph


The stacked bar graph is also called the composite bar chart, which divides the aggregate into
different parts. In this type of bar graph, each part can be represented using different colours,
which helps to easily identify the different categories. The stacked bar chart requires specific
labelling to show the different parts of the bar. In a stacked bar graph, each bar represents the
whole and each segment represents the different parts of the whole.

Properties of Bar Graph


Some of the important properties of a bar graph are as follows:

 All the bars should have a common base.


 Each column in the bar graph should have equal width.
 The height of the bar should correspond to the data value.
 The distance between each bar should be the same.

Applications of Bar Graphs


Bar graphs are used to match things between different groups or to trace changes over time. Yet,
when trying to estimate change over time, bar graphs are most suitable when the changes are
bigger.

Bar charts possess a discrete domain of divisions and are normally scaled so that all the data can
fit on the graph. When there is no regular order of the divisions being matched, bars on the chart
may be organized in any order. Bar charts organized from the highest to the lowest number are
called Pareto charts.

Real-Life Applications of Bar Graph

Bar graphs are a visual representation of data. They are used to show the relationship between
two or more sets of data. They are mostly used in business and finance, but they can also be
found in other contexts. Bar graphs are used in many real-life situations. For example, a bar
graph can be used to show the distribution of different types of food in a restaurant. The height
of each rectangle would represent how many orders were placed for that type of food.

Bar graphs are also often used to represent the data grouped into categories, such as how many
people have voted for each candidate in an election or how much money was spent by each
department. The bars on this type of graph represent the number or percentage of people or
money spent and are usually stacked on top of one another so that they can be easily compared to
one another.

Advantages and Disadvantages of Bar Chart


Advantages:
 Bar graph summarises the large set of data in simple visual form.
 It displays each category of data in the frequency distribution.
 It clarifies the trend of data better than the table.
 It helps in estimating the key values at a glance.

Disadvantages:

 Sometimes, the bar graph fails to reveal the patterns, cause, effects, etc.
 It can be easily manipulated to yield fake information.

Difference Between Bar Graph and Histogram


The bar graph and the histogram look similar. But it has an important difference. The major
difference between them is that they plot different types of data. In the bar chart, discrete data is
plotted, whereas, in the histogram, it plots the continuous data. For instance, if we have different
categories of data like types of dog breeds, types of TV programs, the bar chart is best as it
compares the things among different groups. For example, if we have continuous data like the
weight of the people, the best choice is the histogram.

Difference Between Bar Graph and Pie Chart


A pie chart is one of the types of graphical representation. The pie chart is a circular chart and is
divided into parts. Each part represents the fraction of a whole. Whereas, bar graph represents the
discrete data and compares one data with the other data.

Difference Between Bar Graph and Line Graph


The major difference between bar graph and line graph are as follows:

 The bar graph represents the data using the rectangular bars and the height of the bar
represents the value shown in the data. Whereas a line graph helps to show the information
when the series of data are connected using a line.
 Understanding the line graph is a little bit confusing as the line graph plots too many lines over
the graph. Whereas bar graph helps to show the relationship between the data quickly.

Important Notes:

Some of the important notes related to the bar graph are as follows:

 In the bar graph, there should be an equal spacing between the bars.
 It is advisable to use the bar graph if the frequency of the data is very large.
 Understand the data that should be presented on the x-axis and y-axis and the relation between
the two.
How to Draw a Bar Graph?
Let us consider an example, we have four different types of pets, such as cat, dog, rabbit, and
hamster and the corresponding numbers are 22, 39, 5 and 9 respectively.

In order to visually represent the data using the bar graph, we need to follow the steps given
below.

 Step 1: First, decide the title of the bar graph.


 Step 2: Draw the horizontal axis and vertical axis. (For example, Types of Pets)
 Step 3: Now, label the horizontal axis.
 Step 4: Write the names on the horizontal axis, such as Cat, Dog, Rabbit, Hamster.
 Step 5: Now, label the vertical axis. (For example, Number of Pets)
 Step 6: Finalise the scale range for the given data.
 Step 7: Finally, draw the bar graph that should represent each category of the pet with their
respective numbers.

Bar Graph Solved Examples


To understand the above types of bar graphs, consider the following examples:

Example 1:

In a firm of 400 employees, the percentage of monthly salary saved by each employee is given in
the following table. Represent it through a bar graph.

Savings (in percentage) Number of Employees(Frequency)

20 105

30 199

40 29

50 73

Total 400

Solution:

The given data can be represented as


This can also be represented using a horizontal bar graph as follows:
Example 2:

A cosmetic company manufactures 4 different shades of lipstick. The sale for 6 months is shown
in the table. Represent it using bar charts.

Sales (in units)


Month
Shade 1 Shade 2 Shade 3 Shade 4

January 4500 1600 4400 3245

February 2870 5645 5675 6754

March 3985 8900 9768 7786

April 6855 8976 9008 8965

May 3200 5678 5643 7865

June 3456 4555 2233 6547

Solution:

The graph given below depicts the following data


Example 3:

The variation of temperature in a region during a year is given as follows. Depict it through the
graph (bar).

Month Temperature

January -6°C

February -3.5°C

March -2.7°C

April 4°C

May 6°C

June 12°C

July 15°C

August 8°C

September 7.9°C

October 6.4°C

November 3.1°C

December -2.5°C<

Solution:

As the temperature in the given table has negative values, it is more convenient to represent such
data through a horizontal bar graph.
Frequently Asked Questions on Bar Graph
Q1

What is meant by a bar graph?

Bar graph (bar chart) is a graph that represents the categorical data using rectangular bars. The
bar graph shows the comparison between discrete categories.

Q2

What are the different types of bar graphs?

The different types of bar graphs are:


Vertical bar graph
Horizontal bar graph
Grouped bar graph
Stacked bar graph

Q3
When is a bar graph used?

The bar graph is used to compare the items between different groups over time. Bar graphs are
used to measure the changes over a period of time. When the changes are larger, a bar graph is
the best option to represent the data.

Q4

When to use a horizontal bar chart?

The horizontal bar graph is the best choice while graphing the nominal variables.

Q5

When to use a vertical bar chart?

The vertical bar graph is the most commonly used bar chart, and it is best to use it while
graphing the ordinal variables.

Pie Chart

A pie chart is a type of graph that represents the data in the circular graph. The slices of pie
show the relative size of the data, and it is a type of pictorial representation of data. A pie chart
requires a list of categorical variables and numerical variables. Here, the term “pie” represents
the whole, and the “slices” represent the parts of the whole.
Table of Contents:

 Definition
 Formula
 How to Create Pie Chart
 Pie Chart Maker
 How to Solve Pie Chart
 Examples
 Uses
 Advantages
 Disadvantages
 Practice Problem
 FAQs

What is a Pie Chart?


The “pie chart” is also known as a “circle chart”, dividing the circular statistical graphic into
sectors or sections to illustrate the numerical problems. Each sector denotes a proportionate part
of the whole. To find out the composition of something, Pie-chart works the best at that time. In
most cases, pie charts replace other graphs like the bar graph, line plots, histograms, etc.

Formula
The pie chart is an important type of data representation. It contains different segments and
sectors in which each segment and sector of a pie chart forms a specific portion of the
total(percentage). The sum of all the data is equal to 360°.

The total value of the pie is always 100%.

To work out with the percentage for a pie chart, follow the steps given below:

 Categorize the data


 Calculate the total
 Divide the categories
 Convert into percentages
 Finally, calculate the degrees

Therefore, the pie chart formula is given as

(Given Data/Total value of Data) × 360°

Note: It is not mandatory to convert the given data into percentages until it is specified. We can
directly calculate the degrees for given data values and draw the pie chart accordingly.
How to Create a Pie Chart?
Imagine a teacher surveys her class on the basis of favourite Sports of students:

Football Hockey Cricket Basketball Badminton

10 5 5 10 10

The data above can be represented by a pie chart as following and by using the circle graph
formula, i.e. the pie chart formula given below. It makes the size of the portion easy to
understand.

Step 1: First, Enter the data into the table.

Football Hockey Cricket Basketball Badminton

10 5 5 10 10

Step 2: Add all the values in the table to get the total.

I.e. Total students are 40 in this case.

Step 3: Next, divide each value by the total and multiply by 100 to get a per cent:

Football Hockey Cricket Basketball Badminton

(10/40) × 100 (5/ 40) × 100 (5/40) ×100 (10/ 40) ×100 (10/40)× 100

=25% =12.5% =12.5% =25% =25%

Step 4: Next to know how many degrees for each “pie sector” we need, we will take a full circle
of 360° and follow the calculations below:

The central angle of each component = (Value of each component/sum of values of all the
components)✕360°

Football Hockey Cricket Basketball Badminton

(10/ 40)× 360° (5 / 40) × 360° (5/40) × 360° (10/ 40)× 360° (10/ 40) × 360°

=90° =45° =45° =90° =90°

Now you can draw a pie chart.


Step 5: Draw a circle and use the protractor to measure the degree of each sector.

Let us take an example for a pie chart with an explanation here to understand the concept in a
better way.

Question: The percentages of various cops cultivated in a village of particular distinct are
given in the following table.

Items Wheat Pulses Jowar Groundnuts Vegetables Total

Percentage of cops 125/3 125/6 25/2 50/3 25/3 100

Represent this information using a pie-chart.

Solution:

The central angle = (component value/100) × 360°

The central angle for each category is calculated as follows

Items Percentage of cops Central angle

Wheat 125/3 [(125/3)/100] × 360° = 150°

Pulses 125/6 [(125/6)/100] × 360° = 75°


Jowar 25/2 [(25/2)/100] × 360° = 45°

Groundnuts 50/3 [(50/3)/100] × 360° = 60°

Vegetables 25/3 [(25/3)/100] × 360° = 30°

Total 100 360°

Now, the pie-chart can be constructed by using the given data.

Steps to construct:

Step 1: Draw the circle of an appropriate radius.

Step 2: Draw a vertical radius anywhere inside the circle.

Step 3: Choose the largest central angle. Construct a sector of a central angle, whose one radius
coincides with the radius drawn in step 2, and the other radius is in the clockwise direction to the
vertical radius.

Step 4: Construct other sectors representing other values in the clockwise direction in
descending order of magnitudes of their central angles.

Step 5: Shade the sectors obtained by different colours and label them as shown in the figure
below.

Pie Chart Maker


Till now you understood how to draw a pie chart for the given data using geometric tools. In this
section, you will know how to make the pie chart using an online tool. People often use a
graphing feature in excel sheets to get the desired pie chart. However, we have provided an
online pie chart maker.

Click here to get the pie chart calculator.

How to Solve Pie Chart Questions?


In this section, you will learn how to solve or interpret the pie chart to get the original values. For
this, we need to check whether the given chart is given in percentages, degrees or without any
value. Based on this information, we can solve the questions related to pie charts. Let’s have a
look at the solved example to understand this thoroughly.

Question:

The pie-chart shows the marks obtained by a student in an examination. If the student
secures 440 marks in all, calculate the marks in each of the given subjects.

Solution:

The given pie chart shows the marks obtained in the form of degrees.

Given, total marks obtained = 440

i.e. 360 degrees = 440 marks

Now, we can calculate the marks obtained in each subject as follows.


Marks secured in mathematics = (central angle of maths/ 360°) × Total score secured

= (108°/ 360°) × 440 = 132 marks

Marks secured in science = (central angle of science / 360°) × Total score secured

= (81°/ 360°) × 440 = 99 marks

Marks secured in English = (central angle of English/ 360°) × Total score secured

= (72°/ 360°) × 440 = 88 marks

Marks secured in Hindi = (central angle of Hindi / 360°) × Total score secured

= (54°/ 360°) × 440 = 66 marks

Marks secured in social science = (central angle of social science / 360°) × Total score secured

= (45°/ 360°) × 440 = 55 marks

This can be tabulated as:

Subject Mathematics Science English Hindi Social science Total

Marks 132 99 88 66 55 440

Examples
A pie chart can be used to represent the relative size of a variety of data such as:

 The type of houses (1bhk, 2bhk, 3bhk, etc.) people have


 Types of 2 wheelers or 4 wheelers people have
 Number of customers a retail market has in all weekdays
 Weights of students in a class
 Types of cuisine liked by different people in an event
 Monthly expenditure of a family, etc.

Uses of Pie Chart


 Within a business, it is used to compare areas of growth, such as turnover, profit and exposure.
 To represent categorical data.
 To show the performance of a student in a test, etc.

Also, check some important topics here:


 Graphical Representation
 Types of Graphs
 Bar Graph
 Linear Graph
 Histogram
 Box and Whisker Plot

Advantages
 The picture is simple and easy-to-understand
 Data can be represented visually as a fractional part of a whole
 It helps in providing an effective communication tool for the even uninformed audience
 Provides a data comparison for the audience at a glance to give an immediate analysis or to
quickly understand information
 No need for readers to examine or measure underlying numbers themselves, which can be
removed by using this chart
 To emphasize a few points you want to make, you can manipulate pieces of data in the pie chart

Disadvantages
 It becomes less effective if there are too many pieces of data to use
 If there are too many pieces of data. Even if you add data labels and numbers may not help here,
they themselves may become crowded and hard to read
 As this chart only represents one data set, you need a series to compare multiple sets
 This may make it more difficult for readers when it comes to analyze and assimilate information
quickly

You can practice another pie chart question for Class 8, given below:

Frequently Asked Questions – FAQs


Q1

What is a pie chart?

A pie chart is a pictorial representation of data. The slices of pie here shows the relative sizes of data.
The same data is represented in different sizes with the help of pie charts.

Q2

Why do we use pie charts?


Pie charts are used to represent the proportional data or relative data in a single chart. The concept of
pie slices is used to show the percentage of a particular data from the whole pie.

Q3

How to calculate the percentage of data in the pie chart?

Measure the angle of each slice of the pie chart and divide by 360 degrees. Now multiply the value by
100. The percentage of particular data will be calculated.

Q4

How to find the total number of pieces of data in a slice of a pie chart?

To find the total number of pieces of data in a slice of a pie chart, multiply the slice percentage with the
total number of data set and then divide by 100.
For example, a slice of the pie chart is equal to 60% and the pie chart contains a total data set of 150.
Then, the value of 60% of pie slice is: (60×150)/100 = 90.

Q5

What are the examples of a pie chart?

There are many real-life examples of pie charts, such as:


Representation of marks obtained by students in a class
Representation of kinds of cars sold in a month
To show the type of food liked by people in a room

Assignment:01
Date:04/09/2023
1.Constraction a Bar Diagram.
2.Constraction a Pie Chart.
 Histogram: A histogram is a graph that uses bars to represent the frequencies of different
data values in a continuous range. It is a good way to show the distribution of data.

Histogram of heights of students

 Line graph: A line graph is a chart that uses lines to connect points that represent the
values of data over time. It is a good way to show trends in data over time.

Line graph of number of students enrolled in a university over time

 Scatter plot: A scatter plot is a graph that uses points to represent the values of two
variables. It is a good way to show the relationship between two variables.
Scatter plot of height vs. weight of students

The best type of graphical presentation of data for a particular set of data will depend on the
purpose of the presentation and the type of data being presented.

Here are some tips for creating effective graphical presentations of data in statistics:

 Choose the right type of graph for the data.


 Use clear and concise labels for the axes and the data points.
 Use consistent colors and symbols throughout the graph.
 Make sure the graph is large enough to be easily read.
 Use a title that accurately describes the data being presented.
 Provide a legend if necessary.
 Avoid cluttering the graph with too much information.

By following these tips, you can create graphical presentations of data that are clear, informative,
and easy to understand.

Class:04

Date:11/09/2023

1.Historigram

2.Frequency curve

3.Frequency polygon

4.Ogive curve

1.Historigram
Histogram

In statistics, a histogram is a graphical representation of the distribution of data. The histogram


is represented by a set of rectangles, adjacent to each other, where each bar represent a kind of
data. Statistics is a stream of mathematics that is applied in various fields. When numerals are
repeated in statistical data, this repetition is known as Frequency and which can be written in the
form of a table, called a frequency distribution. A Frequency distribution can be shown
graphically by using different types of graphs and a Histogram is one among them. In this article,
let us discuss in detail about what is a histogram, how to create the histogram for the given data,
different types of the histogram, and the difference between the histogram and bar graph in
detail.

Table of Contents:

 Definition
 How to Make Histogram
 When to Use Histogram?
 Difference between Histogram and Bar Graph
 Types of Histogram
o Uniform Histogram
o Bimodal Histogram
o Symmetric Histogram
o Probability Histogram
 Applications
 Example

What is Histogram?
A histogram is a graphical representation of a grouped frequency distribution with continuous
classes. It is an area diagram and can be defined as a set of rectangles with bases along with the
intervals between class boundaries and with areas proportional to frequencies in the
corresponding classes. In such representations, all the rectangles are adjacent since the base
covers the intervals between class boundaries. The heights of rectangles are proportional to
corresponding frequencies of similar classes and for different classes, the heights will be
proportional to corresponding frequency densities.

In other words, a histogram is a diagram involving rectangles whose area is proportional to the
frequency of a variable and width is equal to the class interval.
How to Plot Histogram?
You need to follow the below steps to construct a histogram.

1. Begin by marking the class intervals on the X-axis and frequencies on the Y-axis.
2. The scales for both the axes have to be the same.
3. Class intervals need to be exclusive.
4. Draw rectangles with bases as class intervals and corresponding frequencies as heights.
5. A rectangle is built on each class interval since the class limits are marked on the horizontal axis,
and the frequencies are indicated on the vertical axis.
6. The height of each rectangle is proportional to the corresponding class frequency if the intervals
are equal.
7. The area of every individual rectangle is proportional to the corresponding class frequency if the
intervals are unequal.

Although histograms seem similar to graphs, there is a slight difference between them. The
histogram does not involve any gaps between the two successive bars.

When to Use Histogram?


The histogram graph is used under certain conditions. They are:

 The data should be numerical.


 A histogram is used to check the shape of the data distribution.
 Used to check whether the process changes from one period to another.
 Used to determine whether the output is different when it involves two or more processes.
 Used to analyse whether the given process meets the customer requirements.

Difference Between Bar Graph and Histogram


A histogram is one of the most commonly used graphs to show the frequency distribution. As we
know that the frequency distribution defines how often each different value occurs in the data
set. The histogram looks more similar to the bar graph, but there is a difference between them.
The list of differences between the bar graph and the histogram is given below:

Histogram Bar Graph

It is a two-dimensional figure It is a one-dimensional figure

The frequency is shown by the area of The height shows the frequency and the width has no
each rectangle significance.

It consists of rectangles separated from each other with


It shows rectangles touching each other
equal spaces.
The above differences can be observed from the below figures:

Bar Graph (Gaps between bars)

Histogram (No gaps between bars)

Types of Histogram
The histogram can be classified into different types based on the frequency distribution of the
data. There are different types of distributions, such as normal distribution, skewed distribution,
bimodal distribution, multimodal distribution, comb distribution, edge peak distribution, dog
food distribution, heart cut distribution, and so on. The histogram can be used to represent these
different types of distributions. The different types of a histogram are:

 Uniform histogram
 Symmetric histogram
 Bimodal histogram
 Probability histogram

Uniform Histogram

A uniform distribution reveals that the number of classes is too small, and each class has the
same number of elements. It may involve distribution that has several peaks.

Bimodal Histogram
If a histogram has two peaks, it is said to be bimodal. Bimodality occurs when the data set has
observations on two different kinds of individuals or combined groups if the centers of the two
separate histograms are far enough to the variability in both the data sets.

Symmetric Histogram

A symmetric histogram is also called a bell-shaped histogram. When you draw the vertical line
down the center of the histogram, and the two sides are identical in size and shape, the histogram
is said to be symmetric. The diagram is perfectly symmetric if the right half portion of the image
is similar to the left half. The histograms that are not symmetric are known as skewed.
Probability Histogram

A Probability Histogram shows a pictorial representation of a discrete probability distribution. It


consists of a rectangle centered on every value of x, and the area of each rectangle is proportional
to the probability of the corresponding value. The probability histogram diagram is begun by
selecting the classes. The probabilities of each outcome are the heights of the bars of the
histogram.

Applications of Histogram
The applications of histograms can be seen when we learn about different distributions.

Normal Distribution

The usual pattern that is in the shape of a bell curve is termed normal distribution. In a normal
distribution, the data points are most likely to appear on a side of the average as on the other. It is
to be noted that other distributions appear the same as the normal distribution. The calculations
in statistics are utilised to prove a distribution that is normal. It is required to make a note that the
term “normal” explains the specific distribution for a process. For instance, in various processes,
they possess a limit that is natural on a side and will create distributions that are skewed. This is
normal which means for the processes, in the case where the distribution isn’t considered normal.

Skewed Distribution

The distribution that is skewed is asymmetrical as a limit which is natural resists end results on
one side. The peak of the distribution is the off-center in the direction of the limit and a tail that
extends far from it. For instance, a distribution consisting of analyses of a product that is
unadulterated would be skewed as the product cannot cross more than 100 per cent purity. Other
instances of natural limits are holes that cannot be lesser than the diameter of the drill or the call-
receiving times that cannot be lesser than zero. The above distributions are termed right-skewed
or left-skewed based on the direction of the tail.

Multimodal Distribution

The alternate name for the multimodal distribution is the plateau distribution. Various processes
with normal distribution are put together. Since there are many peaks adjacent together, the tip of
the distribution is in the shape of a plateau.

Edge peak Distribution

This distribution resembles the normal distribution except that it possesses a bigger peak at one
tail. Generally, it is due to the wrong construction of the histogram, with data combined together
into a collection named “greater than”.
Comb Distribution

In this distribution, there exist bars that are tall and short alternatively. It mostly results from the
data that is rounded off and/or an incorrectly drawn histogram. For instance, the temperature that
is rounded off to the nearest 0.2o would display a shape that is in the form of a comb provided
the width of the bar for the histogram were 0.1o.

Truncated or Heart-Cut Distribution

The above distribution resembles a normal distribution with the tails being cut off. The producer
might be manufacturing a normal distribution of product and then depending on the inspection to
segregate what lies within the limits of specification and what is out. The resulting parcel to the
end-user from within the specifications is heart cut.

Dog Food Distribution

This distribution is missing something. It results close by the average. If an end-user gets this
distribution, someone else is receiving a heart cut distribution and the end-user who is left gets
dog food, the odds and ends which are left behind after the meal of the master. Even if the end-
user receives within the limits of specifications, the item is categorised into 2 clusters namely –
one close to the upper specification and another close to the lesser specification limit. This
difference causes problems in the end-users process.

Histogram Solved Example

Question: The following table gives the lifetime of 400 neon lamps. Draw the histogram for the
below data.

Lifetime (in hours) Number of lamps

300 – 400 14

400 – 500 56

500 – 600 60

600 – 700 86

700 – 800 74

800 – 900 62

900 – 1000 48

Solution:
The histogram for the given data is:

Frequently Asked Questions on Histogram


Q1

Are histogram and bar chart the same?

No, histograms and bar charts are different. In the bar chart, each column represents the group
which is defined by a categorical variable, whereas in the histogram each column is defined by
the continuous and quantitative variable.

Q2

Which histogram represents the consistent data?

The uniform shaped histogram shows consistent data. In the uniform histogram, the frequency of
each class is similar to one other. In most cases, the data values in the uniform shaped histogram
may be multimodal.
Q3

Can a histogram be drawn for the normally distributed data?

Yes, the histogram can be drawn for the normal distribution of the data. A normal distribution
should be perfectly symmetrical around its center. It means that the right should be the mirror
image of the left side about its center and vice versa.

Q4

When a histogram is skewed to right?

A histogram is skewed to the right, if most of the data values are on the left side of the histogram
and a histogram tail is skewed to right. When the data are skewed to the right, the mean value is
larger than the median of the data set.

Q5

When a histogram is skewed to the left?

A histogram is skewed to the left, if most of the data values fall on the right side of the histogram
and a histogram tail is skewed to left. In this case, the mean value is smaller than the median of
the data set.

To know more about histograms, graphs and other statistical concepts, visit BYJU’S -The
Learning App today!

How to Find the Mode of a Histogram (With


Example)

The mode of a dataset represents the value that occurs most often.

To find the mode in a histogram, we can use the following steps:

1. Identify the tallest bar.

2. Draw a line from the left corner of the tallest bar to the left corner of the bar immediately after
it.
3. Draw a line from the right corner of the tallest bar to the right corner of the bar immediately
before it.

4. Identify the point where the two lines intersect. Then draw a line straight down to the x-axis.
The point where the line hits the x-axis is our best estimate for the mode.

The following step-by-step example shows how to find the mode of the following histogram:

Step 1: Identify the Tallest Bar


First, we need to identify the tallest bar in the histogram.

This is the bar with the bin range of 16 to 20:


Step 2: Draw the First Line
Next, we need to draw a line from the left corner of the tallest bar to the left corner of the bar
immediately after it:
Step 3: Draw the Second Line
Next, we need to draw a line from the right corner of the tallest bar to the right corner of the bar
immediately before it:
Step 4: Identify the Point of Intersection
Next, we need to identify the point where the two lines intersect. Then draw a line straight down
to the x-axis:
The point where the line hits the x-axis is our best estimate for the mode.

In this example, our best estimate for the mode is roughly 17.

Note: Since the data in a histogram is grouped into bins, it’s not possible to know the exact value
of the mode but the method that we used here allows us to make our best estimate.

Bar Graph V/s Histogram

The major difference between Bar Chart and Histogram is the bars of the bar chart are not
just next to each other. In the histogram, the bars are adjacent to each other. In statistics, bar
charts and histograms are important for expressing a huge or big number of data. The similarity
between bar chart and histogram is both are a pictorial representation of grouped data. Here, we
will learn histogram vs bar graph with examples.
What are the Differences Between Bar Chart and
Histogram?

Bar graph Histogram


The bar graph is the graphical representation of A histogram is the graphical representation of
categorical data. quantitative data.
There is equal space between each pair of
There is no space between the consecutive bars.
consecutive bars.
The area of rectangular bars shows the
The height of the bars shows the frequency,
frequency of the data and the width of the bars
and the width of the bars are same.
need not to be same.

Bar Graph V/s Histogram


A bar graph is a pictorial representation using vertical and horizontal bars in a graph. The length
of the bar is proportional to the measure of data. It is also called a bar chart.

A histogram is also a pictorial representation of data using rectangular bars, that are adjacent to
each other. It is used to represent grouped frequency distribution with continuous classes.
Frequency polygon

A frequency polygon is almost identical to a histogram, which is used to compare sets of data or
to display a cumulative frequency distribution. It uses a line graph to represent quantitative data.

Statistics deals with the collection of data and information for a particular purpose. The
tabulation of each run for each ball in cricket gives the statistics of the game. Tables, graphs, pie-
charts, bar graphs, histograms, polygons etc. are used to represent statistical data pictorially.

Frequency polygons are a visually substantial method of representing quantitative data and its
frequencies. Let us discuss how to represent a frequency polygon.

Steps to Draw Frequency Polygon


To draw frequency polygons, first we need to draw histogram and then follow the below steps:

 Step 1- Choose the class interval and mark the values on the horizontal axes
 Step 2- Mark the mid value of each interval on the horizontal axes.
 Step 3- Mark the frequency of the class on the vertical axes.
 Step 4- Corresponding to the frequency of each class interval, mark a point at the height
in the middle of the class interval
 Step 5- Connect these points using the line segment.
 Step 6- The obtained representation is a frequency polygon.

Let us consider an example to understand this in a better way.

Example

Example 1: In a batch of 400 students, the height of students is given in the following table.
Represent it through a frequency polygon.

Solution: Following steps are to be followed to construct a histogram from the given data:

 The heights are represented on the horizontal axes on a suitable scale as shown.
 The number of students is represented on the vertical axes on a suitable scale as shown.
 Now rectangular bars of widths equal to the class- size and the length of the bars
corresponding to a frequency of the class interval is drawn.

ABCDEF represents the given data graphically in form of frequency polygon as:

Frequency polygons can also be drawn independently without drawing histograms. For this, the
midpoints of the class intervals known as class marks are used to plot the points.
Question 1: Construct a frequency polygon using the data given below:

Test Scores Frequency

49.5-59.5 5

59.5-69.5 10

69.5-79.5 30

79.5-89.5 40

89.5-99.5 15

Answer: We first need to calculate the cumulate frequency from the frequency given.

Test Scores Frequency Cumulative Frequency

49.5-59.5 5 5

59.5-69.5 10 15

69.5-79.5 30 45

79.5-89.5 40 85

89.5-99.5 15 100

We now start by plotting the class marks such as 54.5, 64.5, 74.5 and so on till 94.5. Note that
we will also plot the previous and next class marks to start and end the polygon, i.e. we plot 44.5
and 104.5 as well.

Then, the frequencies corresponding to the class marks are plotted against each class mark. Like
you can see below, this makes sense as the frequency for class marks 44.5 and 104.5 are zero and
touching the x-axis. These plot points are used only to give a closed shape to the polygon. The
polygon looks like this:
 Study Materials
 BYJU'S Answer

 Scholarship

 BTC
 Buy a Course
 Success Stories
 Login

1. Maths
2. Math Article
3. Ogive

Ogive
The word Ogive is a term used in architecture to describe curves or curved shapes. Ogives are
graphs that are used to estimate how many numbers lie below or above a particular variable or
value in data. To construct an Ogive, firstly, the cumulative frequency of the variables is
calculated using a frequency table. It is done by adding the frequencies of all the previous
variables in the given data set. The result or the last number in the cumulative frequency table is
always equal to the total frequencies of the variables. The most commonly used graphs of the
frequency distribution are histogram, frequency polygon, frequency curve, and Ogives
(cumulative frequency curves). Let us discuss one of the graphs called “Ogive” in detail. Here,
we are going to have a look at what is an Ogive, graph, chart and an example in detail.

Table of Contents:

 Definition
 Graph
o Less Than Ogive
o More Than Ogive
 Chart
 Uses
 Examples
 Practice Questions
 FAQs

Ogive Definition
The Ogive is defined as the frequency distribution graph of a series. The Ogive is a graph of a
cumulative distribution, which explains data values on the horizontal plane axis and either the
cumulative relative frequencies, the cumulative frequencies or cumulative per cent frequencies
on the vertical axis.

Cumulative frequency is defined as the sum of all the previous frequencies up to the current
point. To find the popularity of the given data or the likelihood of the data that fall within the
certain frequency range, Ogive curve helps in finding those details accurately.

Create the Ogive by plotting the point corresponding to the cumulative frequency of each class
interval. Most of the Statisticians use Ogive curve, to illustrate the data in the pictorial
representation. It helps in estimating the number of observations which are less than or equal to
the particular value.

Ogive Graph
The graphs of the frequency distribution are frequency graphs that are used to exhibit the
characteristics of discrete and continuous data. Such figures are more appealing to the eye than
the tabulated data. It helps us to facilitate the comparative study of two or more frequency
distributions. We can relate the shape and pattern of the two frequency distributions.

The two methods of Ogives are:

 Less than Ogive


 Greater than or more than Ogive

The graph given above represents less than and the greater than Ogive curve. The rising curve
(Rose Curve) represents the less than Ogive, and the falling curve (Purple Curve) represents the
greater than Ogive.

Less than Ogive

The frequencies of all preceding classes are added to the frequency of a class. This series is
called the less than cumulative series. It is constructed by adding the first-class frequency to the
second-class frequency and then to the third class frequency and so on. The downward
cumulation results in the less than cumulative series.
Greater than or More than Ogive

The frequencies of the succeeding classes are added to the frequency of a class. This series is
called the more than or greater than cumulative series. It is constructed by subtracting the first
class, second class frequency from the total, third class frequency from that and so on. The
upward cumulation result is greater than or more than the cumulative series.

Ogive Chart
An Ogive Chart is a curve of the cumulative frequency distribution or cumulative relative
frequency distribution. For drawing such a curve, the frequencies must be expressed as a
percentage of the total frequency. Then, such percentages are cumulated and plotted, as in the
case of an Ogive. Below are the steps to construct the less than and greater than Ogive.

How to Draw Less Than Ogive Curve?

 Draw and mark the horizontal and vertical axes.


 Take the cumulative frequencies along the y-axis (vertical axis) and the upper-class limits on the
x-axis (horizontal axis).
 Against each upper-class limit, plot the cumulative frequencies.
 Connect the points with a continuous curve.

How to Draw Greater than or More than Ogive Curve?

 Draw and mark the horizontal and vertical axes.


 Take the cumulative frequencies along the y-axis (vertical axis) and the lower-class limits on the
x-axis (horizontal axis).
 Against each lower-class limit, plot the cumulative frequencies.
 Connect the points with a continuous curve.

Uses of Ogive Curve


Ogive Graph or the cumulative frequency graphs are used to find the median of the given set of
data. If both, less than and greater than, cumulative frequency curve is drawn on the same graph,
we can easily find the median value. The point in which, both the curve intersects, corresponding
to the x-axis, gives the median value. Apart from finding the medians, Ogives are used in
computing the percentiles of the data set values.

Ogive Example

Question 1:
Construct the more than cumulative frequency table and draw the Ogive for the below-given
data.

Marks 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80

Frequency 3 8 12 14 10 6 5 2

Solution:

“More than” Cumulative Frequency Table:

Marks Frequency More than Cumulative Frequency

More than 1 3 60

More than 11 8 57

More than 21 12 49

More than 31 14 37

More than 41 10 23

More than 51 6 13

More than 61 5 7

More than 71 2 2

Plotting an Ogive:

Plot the points with coordinates such as (70.5, 2), (60.5, 7), (50.5, 13), (40.5, 23), (30.5, 37),
(20.5, 49), (10.5, 57), (0.5, 60).

An Ogive is connected to a point on the x-axis, that represents the actual upper limit of the last
class, i.e.,( 80.5, 0)

Take x-axis, 1 cm = 10 marks

Y-axis = 1 cm – 10 c.f

More than the Ogive Curve:


More Information………………………………..

.
Ogive (Cumulative frequency curve)

Ogive is a graphical representation of the cumulative frequency distribution


of continuous series.

In a continuous series, if the upper limit (or the lower limit) of each class
interval is taken as x-coordinate and its corresponding cumulative frequency
(c.f.) as y-coordinate and the points are plotted in the graph, we obtain a
curve by joining the points with freehand. Such a curve is known as
the ogive or cumulative frequency curve. It is also called a free hand
curve.
Since less than c.f. and more than c.f. are the two types of cumulative
frequencies, there are two types of ogives as well. They are:

1. Less than ogive (or less than cumulative frequency curve)


2. More than ogive (or more than cumulative frequency curve)

Less than ogive

When the upper limit of each class interval is taken as x-coordinate and its
corresponding frequency as y-coordinate, the ogive so obtained is known as
less than ogive (or less than cumulative frequency curve).

Obviously, less than ogive is an increasing curve, sloping upwards from left
to right and has the shape of an elongated S.

Construction of less than ogive:

To construct a less than ogive, we proceed through the following steps:

1. Make a less than cumulative frequency table.


2. Choose the suitable scale and make the upper class limit of each class
interval along the x-axis and cumulative frequencies along the y-axis.
3. Plot the coordinates (upper limit, less than c.f.) on the graph.
4. Join the points by freehand and obtain a less than ogive.

More than ogive


When the lower limit of each class interval is taken as x-coordinate and its
corresponding frequency as y-coordinate, the ogive so obtained is known as
more than ogive (or more than cumulative frequency curve).

More than ogive is a decreasing curve sloping downward from left to right
and has the shape of an elongated S, upside down.

Construction of less than ogive:

To construct a more than ogive, we proceed through the following steps:

1. Make a more than cumulative frequency table.


2. Choose the suitable scale and make the lower class limit of each class
interval along the x-axis and cumulative frequencies along the y-axis.
3. Plot the coordinates (lower limit, more than c.f.) on the graph.
4. Join the points by freehand and obtain a more than ogive.

Examples:

Example 1: The table given below shows the marks obtained by 80 students in
science. Construct (i) less than ogive (ii) more than ogive.
Solution: Here,

Less than cumulative frequency distribution table:

Here, we have the coordinates to draw less than ogive: (10, 3), (20, 11), (30, 28),
(40, 57), (50, 72), (60, 78) and (70, 80).

Plotting these points on a graph, we have the following less than ogive.
More than cumulative frequency distribution table:

Here, we have the coordinates to draw more than ogive: (0, 80), (10, 77), (20,
69), (30, 52), (40, 23), (50, 8) and (60, 2).

Plotting these points on a graph, we have the following more than ogive.
Less than and more than (Combined) Ogive

If we draw both less than and more than ogive of a distribution on the same graph
paper, it is called combined ogive. The point of intersection of two ogives in a
combined ogive is called the vital-point.

The foot of the perpendicular drawn from vital-point to x-axis gives the value of the
median of the distribution. And, the corresponding class is the proper class for the
median.

Uses of ogive
Following are the uses of cumulative frequency curve (ogive):

1. Ogive or cumulative frequency curve helps us to determine as well as


portray the number or proportion of cases above or below the given value.
2. Median (Q2) and quartiles i.e. lower quartile (Q1) and upper quartile (Q3)
can be estimated from the ogive of a frequency distribution.

Examples:

Example 2: Construct the combined ogive from the data given in the table of
Example 1, and find the median.

Solution: Here,

Less than and more than cumulative frequency distribution table:


Here, we have the coordinates to draw less than ogive: (10, 3), (20, 11), (30, 28),
(40, 57), (50, 72), (60, 78) and (70, 80).

Here, we have the coordinates to draw more than ogive: (0, 80), (10, 77), (20,
69), (30, 52), (40, 23), (50, 8) and (60, 2).

Plotting these points on a graph, we have the following less than and more than
combined ogive.

A perpendicular is drawn from the point of intersection of two ogives which meets
the x-axis at 34.14 (approx.) units from the origin. So, the required median of the
given distribution is 34.14.

Example 3: The table given below shows the marks obtained by 60 students in
mathematics. Construct a less than ogive and compute median, first quartile (Q 1)
and third quartile (Q3).
Solution: Here,

Less than cumulative frequency distribution table,

Here, we have the coordinates to draw less than ogive: (10, 4), (20, 14), (30, 34),
(40, 49), (50, 55) and (60, 60).

Plotting these points on a graph, we have the following less than ogive.
Median lies in 50% of the total data.

∴ 50% of 60 students = 30 students

Q1 lies in 25% of the total data.

∴ 25% of 60 students = 15 students

Q3 lies in 75% of the total data.

∴ 75% of 60 students = 45 students


Straight lines drawn from 30, 15, and 45 (frequency) on y-axis parallel to x-axis
intersects the curve at A, B, and C. And, perpendiculars drawn from A, B, and C
meets the x-axis at P, Q, and R with x-coordinates 28, 20.5, and 37.33 respectively.

Therefore,

Median = 28 (approx.)

Q1 = 20.5 (approx.)

Q3 = 37.33 (approx.)

Example 4: Draw the less than ogive from the following data and the questions.

a. Find the no. of workers having wages more than 40.

b. Find the no. of workers having wages less than 80.


c. Find the no. of workers having wages between 80 and 100.

Solution: Here,

Less than cumulative frequency distribution table,


Here, we have the coordinates to draw less than ogive: (20, 5), (40, 13), (60, 23),
(80, 32) and (100, 35).

Plotting these points on a graph, we have the following less than ogive.

From the graph,


a. No. of workers having wages more than 40 = (35 – 13) = 22
b. No. of workers having wages less than 80 = 32
c. No. of workers having wages between 80 and 100 = 3

Example 5: From the given cumulative frequency curve (ogive), determine the
lower quartile (Q1) class, median (Q2) class and upper quartile (Q3) class.

Solution: From the given graph,

Total number of data (N) = 40


Passing the lines from 10, 20, and 30 of the y-axis (i.e cumulative frequencies)
parallel to x-axis, we get the corresponding points on the ogive curve. And, again
vertical lines from those points give the required class for Q1, Q2 and Q3.
Therefore, from the above graph,

∴ Q1 class = 10-20

∴ Q2 class = 30-40

∴ Q3 class = 40-50

Frequently Asked Questions on Ogive


Q1

What is an Ogive?
An ogive is a freehand graph drawn curve to show the cumulative frequency distribution. It is
also known as a cumulative frequency polygon.

Q2

What are the two types of ogive graphs?

The two types of ogives are less than ogive and greater than or more than ogive. In a less than
ogive, the frequencies of all preceding classes are added to the frequency of a class. In a more
than ogive, the frequencies of the succeeding classes are added to the frequency of a class.

Q3

How to draw a less than ogive?

Following are the steps to draw a less than ogive:


1. Draw and mark the horizontal and vertical axes.
2. Take the cumulative frequencies along the y-axis (vertical axis) and the upper-class limits on
the x-axis (horizontal axis).
3. Against each upper-class limit, plot the cumulative frequencies.
4. Connect the points with a continuous curve.

Q4

How to draw a more than ogive?

Following are the steps to draw a more than ogive:


1. Draw and mark the horizontal and vertical axes.
2. Take the cumulative frequencies along the y-axis (vertical axis) and the lower-class limits on
the x-axis (horizontal axis).
3. Against each lower-class limit, plot the cumulative frequencies.
4. Connect the points with a continuous curve.
Quartiles
In statistics, Quartiles are the set of values which has three points dividing the data set into four
identical parts. We ordinarily deal with a large amount of numerical data, in stats. There are
several concepts and formulas, which are extensively applicable in various researches and
surveys. One of the best applications of quartiles is defined in box and whisker plot.

Quartiles are the values that divide a list of numerical data into three quarters. The middle part of
the three quarters measures the central point of distribution and shows the data which are near to
the central point. The lower part of the quarters indicates just half information set which comes
under the median and the upper part shows the remaining half, which falls over the median. In
all, the quartiles depict the distribution or dispersion of the data set.

Quartiles Definition
Quartiles divide the entire set into four equal parts. So, there are three quartiles, first, second and
third represented by Q1, Q2 and Q3, respectively. Q2 is nothing but the median, since it indicates
the position of the item in the list and thus, is a positional average. To find quartiles of a group of
data, we have to arrange the data in ascending order.

In the median, we can measure the distribution with the help of lesser and higher quartile. Apart
from mean and median, there are other measures in statistics, which can divide the data into
specific equal parts. A median divides a series into two equal parts. We can partition values of a
data set mainly into three different ways:

1. Quartiles
2. Deciles
3. Percentiles

 Median
 Mean Deviation
 Statistics For Class 10
 Statistics For Class 11

Quartiles Formula

Suppose, Q3 is the upper quartile is the median of the upper half of the data set. Whereas, Q 1 is
the lower quartile and median of the lower half of the data set. Q 2 is the median. Consider, we
have n number of items in a data set. Then the quartiles are given by;

Q1 = [(n+1)/4]th item

Q2 = [(n+1)/2]th item
Q3 = [3(n+1)/4]th item

Hence, the formula for quartile can be given by;

Where, Qr is the rth quartile

l1 is the lower limit

l2 is the upper limit

f is the frequency

c is the cumulative frequency of the class preceding the quartile class.

Quartiles in Statistics

Similar to the median which divides the data into half so that 50% of the estimation lies below
the median and 50% lies above it, the quartile splits the data into quarters so that 25% of the
estimation are less than the lower quartile, 50% of estimation are less than the mean, and 75% of
estimation are less than the upper quartile. Usually, the data is ordered from smallest to largest:

 First quartile: 25% from smallest to largest of numbers


 Second quartile: between 25.1% and 50% (till median)
 Third quartile: 51% to 75% (above the median)
 Fourth quartile: 25% of largest numbers

Quartile Deviation

You have learned about standard deviation in statistics. Quartile deviation is defined as half of
the distance between the third and the first quartile. It is also called Semi Interquartile range. If
Q1 is the first quartile and Q3 is the third quartile, then the formula for deviation is given by;

Quartile deviation = (Q3-Q1)/2

Interquartile Range

The interquartile range (IQR) is the difference between the upper and lower quartile of a given
data set and is also called a midspread. It is a measure of statistical distribution, which is equal
to the difference between the upper and lower quartiles. Also, it is a calculation of variation
while dividing a data set into quartiles. If Q1 is the first quartile and Q3 is the third quartile, then
the IQR formula is given by;
IQR = Q3 – Q1

Quartiles Examples

Question 1: Find the quartiles of the following data: 4, 6, 7, 8, 10, 23, 34.

Solution: Here the numbers are arranged in the ascending order and number of items, n = 7

Lower quartile, Q1 = [(n+1)/4] th item

Q1= 7+1/4 = 2nd item = 6

Median, Q2 = [(n+1)/2]th item

Q2= 7+1/2 item = 4th item = 8

Upper Quartile, Q3 = [3(n+1)/4]th item

Q3 = 3(7+1)/4 item = 6th item = 23

Question 2: Find the Quartiles of the following age:-

23, 13, 37, 16, 26, 35, 26, 35

Solution:

First, we need to arrange the numbers in increasing order.

Therefore, 13, 16, 23, 26, 26, 35, 35, 37

Number of items, n = 8

Lower quartile, Q1 = [(n+1)/4] th item

Q1 = 8+1/4 = 9/4 = 2.25th term

From the quartile formula we can write;

Q1 = 2nd term + 0.25(3rd term-2nd term)

Q1= 16+0.25(23-26) = 15.25

Similarly,

Median, Q2 = [(n+1)/2]th item


Q2 = 8+1/2 = 9/2 = 4.5

Q2 = 4th term+0.5 (5th term-4th term)

Q2= 26+0.5(26-26) = 26

And,

Upper Quartile, Q3 = [3(n+1)/4]th item

Q3 = 3(8+1)/4 = 6.75th term

Q3 = 6th term + 0.75(7th term-6th term)

Q3 = 35+0.75(35-35) = 35

Test your Knowledge on Quartiles

Example 1: Calculate the median, lower quartile, upper quartile, and interquartile range of the
following data set of values: 20, 19, 21, 22, 23, 24, 25, 27, 26

Solution:

Arranging the values in ascending order: 19, 20, 21, 22, 23, 24, 25, 26, 27

Putting the values in the formulas above we get,

Median(Q2) = 5th Term = 23

Lower Quartile (Q1) = Mean of 2nd and 3rd term = (20 + 21)/2 = 20.5

Upper Quartile(Q3) = Mean of 7th and 8th term = (25 + 26)/2 = 25.5

IQR = Upper Quartile−Lower Quartile

IQR = 25.5 – 20.5

IQR = 5

Answer: IQR = 5

Example 2: What will be the upper quartile for the following set of numbers?
26, 19, 5, 7, 6, 9, 16, 12, 18, 2, 1.

Solution:
The formula for the upper quartile formula is Q3 = ¾(n + 1)th Term.

The formula instead of giving the value for the upper quartile gives us the place. For example, 8th
place, 10th place, etc.

So firstly we put your numbers in ascending order: 1, 2, 5, 6, 7, 9, 12, 16, 18, 19, 26. There are a
total of 11 numbers, so:

Q3 = ¾(n + 1)th Term.

Q3 = ¾(12)th Term. = 9th Term.

Solution: The upper quartile (18) is the 9th term or on the 9th place from the left.

Example 3: Find the 3rd quartile in the following data set: 4, 5, 8, 7, 11, 9, 9

Solution:

Let us first arrange our array in ascending order and it becomes 4, 5, 7, 8, 9, 9, 11

The median of our data is 8.

In order to find the 3rd quartile, we have to deal with the data points that are greater than the
median that is 9, 9, 10.

In order to find the 3rd quartile, we have to find the median of the data points that are greater
than the median that is 9, 9, 10.

You can use Cuemath's online quartile calculator to verify your answer.

Answer: Hence, the 3rd quartile of our data set is 9.

Decile
Decile is a method that is used to divide a distribution into ten equal parts. When data is divided
into deciles a decile rank is assigned to each data point in order to sort the data into ascending or
descending order. A decile has 10 categorical buckets while a quartile has 4 and a percentile has
100.
The concept of a decile is used widely in the field of finance and economics to perform the
analysis of data. It can be used to check the performance of a portfolio in the field of finance. In
this article, we learn more about a decile, its definition, rank, and see associated examples on
calculating the decile value.

What is Decile?
Decile, percentile, quartile, and quintile are different types of quantiles in statistics. A quantile
refers to a value that divides the observations in a sample into equal subsections. There will
always be 1 lesser quantile than the number of subsections created.

Decile Definition

Decile is a type of quantile that divides the dataset into 10 equal subsections with the help of 9
data points. Each section of the sorted data represents 1/10 of the original sample or population.
Decile helps to order large amounts of data in the increasing or decreasing order. This ordering is
done by using a scale from 1 to 10 where each successive value represents an increase by 10
percentage points.

Decile Class Rank


To split the given data and order it according to some specified metric, statisticians use the decile
rank also known as decile class rank. Once the given data is divided into deciles then each
subsequent data set is assigned a decile rank. Each rank is based on an increase by 10 percentage
points and is used to order the deciles in the increasing order. The 5th decile of a distribution will
give the value of the median.

Decile Formula

The decile formulas can be used to calculate the deciles for grouped and ungrouped data. When
data is in its raw form it is known as ungrouped data. When this data is sorted and organized then
it forms grouped data. These are given as follows:
Decile Formula for ungrouped data: D(x) = Value of the x(n+1)10

th term in the data set.

x is the value of the decile that needs to be calculated and ranges from 1 to 9. n is the total
number of observations in that data set.

Decile Formula for grouped data: D(x) = l+wf(Nx10−C)

l is the lower boundary of the class containing the decile given by (x × cf) / 10, cf is the
cumulative frequency of the entire data set, w is the size of the class, N is the total frequency, C
is the cumulative frequency of the preceding class.

The next section will cover the steps for calculating a particular decile.
Decile Example
Suppose a data set consists of the following numbers: 24, 32, 27, 32, 23, 62, 45, 80, 59, 63, 36,
54, 57, 36, 72, 55, 51, 32, 56, 33, 42, 55, 30. The value of the first two deciles has to be
calculated. The steps required are as follows:

 Step 1: Arrange the data in increasing order. This gives 23, 24, 27, 30, 32, 32, 32, 33, 36, 36, 42,
45, 51, 54, 55, 55, 56, 57, 59, 62, 63, 72, 80.
 Step 2: Identify the total number of points. Here, n = 23
 Step 3: Apply the decile formula to calculate the position of the required data point. D(1) =
(n+1)10

 = 2.4. This implies the value of the 2.4th data point has to be determined. This will lie between the
scores in the 2nd and 3rd positions. In other words, the 2.4th data is 0.4 of the way between the scores 24
and 27
 Step 4: The value of the decile can be determined as [lower score + (distance)(higher score - lower
score)]. This is given as 24 + 0.4 * (27 – 24) = 25.2

 Step 5: Apply steps 3 and 4 to determine the rest of the deciles. D(2) = 2(n+1)10

 = 4.8th data between digit number 4 and 5. Thus, 30 + 0.8 * (32 – 30) = 31.6

Important Notes on Decile

 A decile is a quantile that is used to divide a data set into 10 equal subsections.
 The 5th decile will be the median for the dataset.
 The decile formula for ungrouped data is given as x(n+1)10

 th term in the data set.


 The decile formula for grouped data is given by l+wf(Nx10−C)

Decile Worksheet

Worksheet on Mean, Median, Mode

Examples on Decile
1. Example 1: Find the 6th and the 9th decile for the data in the above-mentioned example.
Solution: The arranged data is 23, 24, 27, 30, 32, 32, 32, 33, 36, 36, 42, 45, 51, 54, 55,
55, 56, 57, 59, 62, 63, 72, 80
n = 23
D(6) = 6(n+1)10
= 14.4th data. This lies between 54 and 55.
D(6) = 54 + 0.4 * (55 – 54) = 54.4
D(9) = 9(n+1)10

 = 21.6th data. This lies between 63 and 72


D(9) = 63 + 0.6 * (72 – 63) = 68.4
Answer: D(5) = 54.4 and D(9) = 68.4

 Example 2: Find the median of the following data set using the concept of deciles.
55, 58, 61, 67, 68, 70, 74, 81, 82, 93, 20, 28, 29, 30, 36, 37, 39, 42, 53, 54
Solution: Arranging the data in increasing order 20, 28, 29, 30, 36, 37, 39, 42, 53, 54, 55, 58, 61, 67, 68,
70, 74, 81, 82, 93
The fifth decile is the median of the data set, thus,
n = 20
D(5) = 5(n+1)10

 = 10.5th data. This lies between


D(5) = 54 + 0.5* (55 - 54) = 54.5

 Example 3: Find the 7th decile for the following frequency distribution table.

Class Frequency

10 - 20 15

20 - 30 10

30 - 40 12

40 - 50 8

50 - 60 7

60 - 70 18

70 - 80 5

80 - 90 25

Solution:
From the given frequency distribution table, we can have,

Class Frequency Cumulative Frequency (cf)

10 - 20 15 15
Class Frequency Cumulative Frequency (cf)

20 - 30 10 25

30 - 40 12 37

40 - 50 8 45

50 - 60 7 52

60 - 70 18 70

70 - 80 5 75

80 - 90 25 100

D(7) = 7×10010

= 70th data in the cf column

This data lies in the 60 - 70 class

D(7) = l+wf(Nx10−C)

= 60+1018(7×10010−52)

3. = 70

What is a Decile in Statistics?

A decile in statistics is a method to divide the distribution into 10 equal parts by using 9 data
points and assigning decile ranks to each point.

What is a Decile Class Rank?

Once the data set is sorted into deciles then a decile class rank is assigned to each point so as to
arrange these deciles into increasing order.

What is the Decile Formula?

The decile formula for ungrouped data is determined by the value of the x(n+1)10
term. The formula for grouped data is l+wf(Nx10−C)

How to Find the Value of the Median Using the Decile Formula?

The value of the 5th decile represents the median. For ungrouped data the median will be given
by D(5) = 5(n+1)10
th
term.

How to Interpret the First Decile?

The first decile is a point such that 90% of the data lies above it and 10% of the data lies below
it. Similarly, the 2nd decile is a point with 20% of data lying below it and 80% lying above it.

How to Calculate the Decile for Ungrouped Data?

The steps to calculate the decile for ungrouped data are as follows:

 Arrange the data in increasing order.


 Find the position of the decile using the formula D(x) = x(n+1)10

 to check between which scores the decile will be.


 Find the value of the decile [lower score + (distance)(higher score - lower score)]

You might also like