You are on page 1of 11

Part 1

1. Explain the importance of data analysis in daily life.

Statistics is a very important field of study in mathematics and it is used frequently in


everyday life. Here are some of the importance of statistics in our daily life.

Firstly, statistics is used in weather forecasts. Do you watch the weather forecast sometime
during the day? Do you understand the meaning of there is a 25% chance on raining? Have you ever
heard the weather forecaster talk about weather models? These computer models are built using
statistics that compare prior weather conditions with current weather to predict the future weather
and to give you the most accurate prediction of the weather to plan out your clay accordingly.

Statistics is also used to predict diseases. Lots of times on the news reports, statistics about a
disease are normally reported. If the reporter simply reports the number of people who either have
the disease or who have died from it, it is an interesting fact bit it might not mean much to your life.
But when statistics become involved, you have a better idea of how that disease may affect you and
how likely you are going to get that disease.

Besides that, statistics is also used in medical studies. These medical studies are used to
ensure that when patients are taking the drug, the drug will not have any side effects. When a new
drug has been produced to overcome a specific disease. Scientist must run a few clinical tests to
show a statistically valid rate of effectiveness before any drug can be prescribed to patients.
Statistics are behind every medical study you hear about and every drug you have consumed.

Lastly, statistics are also used in quality testing of products. Companies make thousands of
products every day and each company must make sure that a good quality item is sold. But a
company can’t test each and every item that they ship to you, the consumer. If they did, it would be
very time consuming and costly. So, the company uses statistics to test just a few, called a sample, of
what they make. If the sample passes quality tests, then the company assumes that all the items
made in a group, called a batch, are good. If one of the sample fails the quality test, the whole batch
has to be rechecked for damage or spoilage.

2(a) Specify: -

i) Three types of measure of central tendency

Central tendency refers to a central or typical value for a probability distribution. It may also
be called a centre or location of the distribution. The three main measures of central tendency are
the mean, the mode and the median.
Mean

The mean is quite commonly known as the average and refer to a central value of a discrete
set of number. Specifically, the sum of values divided by the number of values. The mean is typically
denoted as 𝑥̅ and pronounced as “bar x”

To calculate the mean for ungrouped data, we use the formula: -

∑𝑥
𝑥̅ =
𝑁
where: 𝑥̅ = mean of ungrouped data

∑ 𝑥= sum of all data values

𝑁= number of data

To calculate the mean for grouped data, we use the formula: -

∑𝑓𝑥
𝑥̅ =
∑𝑥
where: 𝑥̅ = mean of grouped data

f= class frequency

𝑥= midpoint of class

∑ 𝑓𝑥=sum of the values of (frequency x midpoint) of all the classes

∑ 𝑥 = sum of frequencies of all classes

The advantage when using the mean is that the mean will take into account all the values in
the data. The disadvantage of the mean is that the mean will be influenced by extreme values and
will be immediately skewed and produced a skewed mean.

Mode

The mode is the most commonly occurring value in a data set. When there is only one mode
in the data set, it known as unimodal. When there is more than one mode in the data set, it is known
as multimodal.

To calculate the mode for ungrouped data, we just have to find the data with the highest
frequency or that is repeated the most times in a data set.

For grouped data, we can find the mode by drawing a histogram. Then determine the modal
class for the histogram and draw two lines from the adjacent classes. The intersection value is also
known as the mode. Figure 1 shows the method to find mode for grouped data.
Figure 1

The advantage of using the mode for measure of central tendency is that it can be used on a
set of data which contain a large number of values and many repeated values. The disadvantage of
using mode is that in some distributions, the mode may not reflect the centre of the distribution
very well. It is also possible for there to be more that one mode for the same distribution of data.
The presence of more that one more can limit the ability of the mode in describing the centre or
typical value of the distribution because a single value to describe the centre cannot be identified. In
some classes, particularly where the data have different values, the distribution may have no mode
at all.

Median

The median is the middle value in the data set when the values are arranged in ascending
order. The median separates the higher half of the data set from the lower half.

To calculate the median for ungrouped data, we can use the formula: -

𝑛 + 1 𝑡ℎ
( )
2
In a data set with an odd number of data, the median value can be directly identified using
the formula above.

In a data set with an even number of data, the median value is the wo middle values
average. For example, a data et contains 12 data. If we use the formula for median, we find the
median value of the data set is the 6.5th term, but there is no 6.5th term, so we add up term, so we
add up the 6th term and 7th term and divide by 2. This is the median value for the data set.
For grouped data, there are two ways to find the median of a data set. The first way is by
using the formula: -
1
𝑁−𝐹
𝑚 = 𝐿 + (2 )×𝐶
𝑓𝑚

where: m= median

L= lower boundary of the median class

N = total frequency

F = cumulative frequency up before the median class

fm = frequency of the median class

C = size of the class interval

Before using the formula, we have to determine the median class. The median class is the
class which contain the median of the data set or the middle frequency of the data set.

The second method is by drawing an ogive and draw a line on half of the cumulative
frequency and then determine the value. Figure 2 shows the way to determine the median by using
an ogive.

Figure 2

The advantage of the median that is less affected by outliers and skewed data and is usually
more preferred when there are extreme values or outliers in the data set.
(ii) at least two types of measure of dispersion

Dispersion also called variability, scatter or spread is the extent to which a distribution is
stretched or squeezed. Common examples of measure of dispersion are standard deviation,
interquartile range, variance and range.

Standard Deviation

The standard deviation which is represented by 𝜎 and pronounced as “sigma” is a measure


that is used to quantify the amount of variation or dispersion of a set of data values. A low standard
deviation indicates that the data points tend to close to the mean of the data set while a high
standard deviation indicates the data points are spread out over a wider range of values. The
standard deviation is the square root of variance. A useful property of the standard deviation is that
is expressed in the same units as the data.

To calculate the standard deviation for ungrouped data, we use the following formulas: -

∑(𝑥 − 𝑥)2
𝜎=√
𝑁

where: 𝜎 = standard deviation

𝑥 = data value

N = number of data
𝑥̅ = mean

∑(𝑥 − 𝑥)2 = the sum of the deviation from the mean squared.

∑ 𝑥2
𝜎= √ − (𝑥̅ )2
𝑁

where: 𝜎 = standard deviation

𝑥 = data value

𝑁 = number of data

𝑥̅ = mean

∑ 𝑥 2 = sum of all data values squared.


To calculate the standard deviation for grouped data, we use the following formula:
-

∑ 𝑓(𝑥 − 𝑥)2
𝜎=√
∑𝑓

where: 𝜎 = standard deviation

𝑥 = midpoint of class

f = frequency of class

𝑥̅ = mean
∑ 𝑓 = total frequency

∑ 𝑓(𝑥 − 𝑥)2 = the sum of (frequency x deviation from the mean squared)

∑ 𝑓𝑥 2
𝜎=√ − (𝑥̅ )2
∑𝑓

where: 𝜎 = standard deviation

𝑥 = midpoint of class

f = frequency of class

𝑥̅ = mean

∑ 𝑓 = total frequency

∑ 𝑓𝑥 2 = sum of (frequency x data value squared)

Variance

Variance is the expectation of the squared deviation of a variable from its mean. Informally,
it measures how far a set of numbers are spread out from the average value. Variance is often
denoted as 𝜎 2 and pronounced as “sigma squared”

To calculate the variance for ungrouped data, we can use the formulas: -

∑(𝑥 − 𝑥)2
𝜎2 =
𝑁
where: 𝜎 2 = variance

𝑥 = data value

𝑁 = number of data

𝑥̅ = mean

∑(𝑥 − 𝑥)2 = sum of deviation of the mean squared


∑ 𝑥2
𝜎2 = − (𝑥)2
𝑁
where: 𝜎 2 = variance

𝑥 = data value

𝑁 = number of data

𝑥̅ = mean

∑ 𝑥 2 = sum of all data values squared.

To calculate the variance for grouped data, we use the following formulas: -

∑ 𝑓(𝑥 − 𝑥)2
𝜎2 =
∑𝑓

where: 𝜎 2 = variance

f = frequency of class

𝑥̅ = mean
∑ 𝑓 = total frequency

∑ 𝑓(𝑥 − 𝑥)2 = the sum of (frequency x deviation from the mean squared)

2
∑ 𝑓𝑥 2
𝜎 = − (𝑥)2
∑𝑓

where: 𝜎 2 = variance

f = frequency of class

𝑥̅ = mean

∑ 𝑓 = total frequency

∑ 𝑓𝑥 2 = sum of (frequency x data value squared)

Interquartile Range

The interquartile range, also called the midspread or middle 50%, is a measure of statistical
dispersion, being equal to the difference between 75th and 25 the percentiles, or between upper and
lower quartiles. Quartiles are divided into four equal parts. The values that separate parts are called,
the first, second and third quartile which are denoted as Q1, Q2, Q3 respectively.

The formula use for the interquartile range of the grouped data are the same which is: -
Interquartile range= Q3 – Q1

where: Q3 = Third Quartile

Q1 = First Quartile

To find the first quartile and third quartile for ungrouped data, we first have to find the
second quartile also known as the median. After that, the data that is in front of the median is
divided in half again. The middle data of that set of data is the first quartile. Similarly, the data
behind the median is divided in half again. The middle data of that set of data is the third quartile.

For example, a data set contains 7 data. The second quartile also known as the median
would be the 4th data. Before the second quartile there are 3 data, so the first quartile is the 2nd
data. After the second quartile there are also three data, so the third quartile is the 6th data. The
interquartile range is the 6th data value subtracted by the 2nd data value.

There are two ways to find the first quartile and third quartile for grouped data. The first
way is by using the cumulative frequency table. First, determine the class of the first quartile and the
third quartile. Then, to find the value for the first quartile use the formula below: -
1
𝑁 − 𝐹1
𝑄1 = 𝐿1 + (4 )×𝐶
𝑓𝑄1

where: Q1 = First Quartile

𝐿1 = Lower boundary of first quartile class

N = total frequency

F1 = cumulative frequency up before the first quartile class

fQ1 = frequency of the first quartile class

C = size of the class interval

To find the third quartile, we use the formula below: -

3
𝑁 − 𝐹3
𝑄3 = 𝐿1 + (4 )×𝐶
𝑓𝑄3

where: Q1 = First Quartile

𝐿1 = Lower boundary of first quartile class

N = total frequency
F1 = cumulative frequency up before the first quartile class

fQ1 = frequency of the first quartile class

C = size of the class interval

After that, the interquartile range is just the difference between first quartile and third
quartile.

The second quartile method to find the interquartile range is by drawing an ogive. First draw
an ogive, then divide the cumulative frequency into fourth parts. Find the values corresponding to
1 3
the cumulative frequency of 𝑁 and 𝑁. Lastly, find the difference between these 2 values. Figure 3
4 4
shows an example of an ogive to find the interquartile range.

Figure 3

Range

Range of a set of data is the difference between the largest and smallest value.
b) For each type of measure of central tendency stated in (a), give examples of their uses in daily life.

Uses of Mean

The mean is also known as the simple average. One of the uses of mean in everyday life is
the students use mean to calculate their average marks on the examination. For example, parents
usually ask students for their average results and not the individual subject results for the
examination to evaluate how well they did and to know how well their performance in class.

Uses of Mode

The mode is the number that occurs most often in the data set and is not skewed when
extreme values are added. One of the uses of mode in daily life is to find the average salary for a
company. Some of the post there are in a company include the: Chief Executive Officer, Manager,
Clerk and Minimum Wages Workers. ALL of these people earn different amounts of money and
some of the workers earn a lot more than others. These workers who earn a lot of money are
extreme values to the data set and will influence the mean calculated. This is where the mode is the
most useful. It only shows the most occurring salary which is the mode of the data.

Uses of Median

The median is the middle value of the data set. The median is normally use to find data that
are evenly distributed and large amount of data. For example, the government wanted to know the
middle value of median age of the population. The government just have to arranged the data in
ascending order and find the middle of data.

You might also like