You are on page 1of 8

Ministère de l’Enseignement Supérieur ‫وزارة التعليـم العالي و البحث العلمي‬

et de la Recherche Scientifique
‫جامعـة قرطاج‬
Université de Carthage

Ecole Polytechnique de Tunisie ‫المدرسـة التونسية للتقنيات‬

Statistics
______________________________________

Assignment 3
______________________________________

Developed by: Houcem Ben Salem

College year: 2019 – 2020

Rue Elkhawarezmi BP 743 La Marsa 2078 2078 ‫ المرسى‬743 ‫ب‬.‫نهج الخوارزمي ص‬


Tel: 71 774 611 -- 71 774 699 Fax: 71 748 843 71 748 843 :‫ الفاكس‬71 774 699 -- 71 774 611 :‫الهاتف‬
Site Web: www.ept.rnu.tn www.ept.rnu.tn :‫موقع الواب‬

1
Exercise 1:
a) It’s easy to find the arithmetic mean by applying the rule:
1 1
x̅D = 10 ∑10
i=1 xi = 10 (12.5 + 29.9 + 14.8 + 18.7 + 7.6 + 16.2 + 16.5 + 27.4 + 12.1 + 17.5) = 17.32

1 1
x̅A = 10 ∑10
i=1 xi = 10 (342 + 1245 + 502 + 555 + 398 + 670 + 796 + 912 + 238 + 466) = 612.4

As we all know the median is the middle value in the group of numbers and to find it our numbers
must be arranged in order that’s why we are going to rearrange it after that we can find our median
value:

i 1 2 3 4 5 6 7 8 9 10

D 7.6 12.1 12.5 14.8 16.2 16.5 17.5 18.5 27.4 29.9

A 238 342 398 466 502 555 670 796 912 1245

1 1
𝑥̃0.5,𝐷 = (𝑥̃(5) + 𝑥̃(6) ) = (16.2 + 16.5) = 16.35
2 2
1 1
𝑥̃0.5,𝐴 = (𝑥̃(5) + 𝑥̃(6) ) = (502 + 555) = 528.5
2 2
b) In our case we have n=10 to find the first quartiles α = 0.25. After applying the rule, we find:
nα=10*0.25=2.5 which is not an integer. As we all know at least 25% of the values should be less than
or equal to the quantile(quartile in our case) and 75% of the values should be greater or equal to this
quantile so 𝑥̃0.25 = 𝑥(𝑘)where k is the smallest integer higher than 2.5 in our case k=3 then 𝑥̃0.25 =
𝑥(3)
𝑥̃0.25,𝐷 = 12.5
𝑥̃0.25,𝐴 = 398
with the same way we can find the third quartiles where α = 0.75 then nα=7.5 so: 𝑥̃0.75 = 𝑥(𝑘)where
k is the smallest integer higher than 7.5 in our case k=8 then 𝑥̃0.75 = 𝑥(8)
𝑥̃0.75,𝐷 = 18.5
𝑥̃0.75,𝐴 = 796

• Distance:
If we compare the difference between the median and the first quartiles (16.35 − 12.5) and the third
quartiles (18.5 − 16.35) we find that the first is so much larger this information help us to know that
this distribution is not symmetric but skewed towards the left.
• Altitude:
After the same comparison we find :528.5-398<<<796-528.5
That’s why we have same conclusion that distribution is not symmetric but skewed towards the right.
c)

The interquartile range:


dQ,A = 796 − 398 = 398

2
dQ,D = 18.5 − 12.5 = 6
The absolute median deviation:
1 10
Dd (𝑥̃0.5,𝐷 ) = ∑i=1 |xi − 𝑥̃ 0.5,𝐷 |=4.68
10
1 10
DA (𝑥̃0.5,𝐴 ) = 10 ∑i=1 |xi − 𝑥̃ 0.5,𝐴 | = 223.2
The standard deviation:

𝟏
𝒔̃𝑨 = √ 𝐬̃𝟐 𝐀 = √𝟏𝟎 ∑𝟏𝟎 ̅ A )2 = √82314.44 =286.9
𝒊=𝟏(𝐱 𝐢 − x

𝟏
𝒔̃𝑫 = √ 𝐬̃𝟐 𝑫 = √𝟏𝟎 ∑𝟏𝟎 ̅ D )2 = √𝟒𝟏. 𝟓 =6.44
𝒊=𝟏(𝐱 𝐢 − x

The absolute median deviation and the standard deviation give us an idea about how much the
observation vary (how much they are dispersed around the arithmetic mean) in our case value of the
standard deviation is important for the two variables that’s indicates lower concentration of the
observations around the mean.
d)
The best way to answer this question is to use the rule of linear transformation:

1
𝑦̅A = 3.28 ∗ x̅A =186.7
e) The box plots contain all the information about our observations which are: the median, first quartile,
third quartile, IQR, min, max. Thanks to the previous questions all those information are available, all
what we have to do is to find the extreme values if we had some of them
For the distance hiked each value higher than 27.5 or less than 3.5 is extreme and for the maximum
attitude each value higher than 1292 or less than -199 is extreme we obtain:

f)
Class intervals (5; 15] (15; 20] (20; 30]
nj 4 4 2
fj 4/10 4/10 2/10
∑ 𝑓𝑗 4/10 8/10 1

3
• The weighted arithmetic mean for grouped data is defined as:
the mid-value of the jth class interval is defined as mj = (ej−1 + ej)/2
3 3
1
𝑥̅ = ∑ 𝑛𝑗 𝑚𝑗 = ∑ 𝑓𝑗 𝑚𝑗 = 16
10
𝑗=1 𝑗=1
• We define the median class as the class Km for which this equation holds:

𝑚−1 𝑚

∑ 𝑓𝑗 < 0.5 𝑎𝑛𝑑 ∑ 𝑓𝑗 ≥ 0.5


𝑗=1 𝑗=1
In our case m=2
0.5−∑𝑚−1
𝑗=1 𝑓𝑗
Then we can determine the median as: 𝑥̃0.5 = 𝑒𝑚−1 + 𝑑𝑚
𝑓𝑚
In our case 𝑥̃0.5 =16.25

g) with R it’s so easy to find all information that we need by using the redefined functions

• We use mean() and median() to calculate the mean and the median:

• We use the function quantile() to find the first and third quartile by giving the argument probs which
defined as the value which divides the data in two proportions:

It’s clear that the values obtaines by


this function are not really the same in
qst(b) because R give as many way to
obtain quantile and each one of them
has some difference in the final result
that’s why if we choose another way
we could find the same result as the
qst(b).

• The interquartile ranges can be calculated by means of the difference of the quantiles
obtained above.

4
• There is only one way to find the absolute median deviation which is programming our own
function:

• As we all know the function var() in R give as the variance but with a different method all what we
have to do is to multiply the result by (n-1/n) and then we applicate the function sqrt() to find the
value of our standard deviation :

• We calcul the weighted mean of group of datt as follows:

• We use the function boxplot() to draw our box plot:

Exercise 2:
a) To find the average growth rate of the membership we should find the growth factor first
year 2011 2012 2013 2014 2015 2016
members 23 24 27 25 30 28
Growth factor -- 1.043 1.125 0.925 1.2 0.933
Growth rate -- 4.3 12.5 -7.5 20 -6.7
the average growth rate is: r=1/5*(4.3+12.5-7.5+20-6.7) =22.6/5=4.52% => 𝑥̅𝐺 = 1.0452
b) We can use the average growth factor to find the number of members on 2018:
N (2018) = 𝑥̅𝐺 *N (2017) = 𝑥̅𝐺 ∗ 𝑥̅ 𝐺 ∗N (2016) =30.58 ≈31
5
c) We can use the growth factor again but it doesn’t give a sense that an observation of 7 years between 2011
and 2016 will give a good estimation on 2025 that’s why the result will be unrealistic specially because in
only those 7 years we could find some years where the number of members decreases and in some other
decreases so its impossible to be sure that the number will increase for the next 9 years.

Exercise 3:

• In graph(a) we remarque that the quantiles for men are higher than those for women that’s mean
that the distribution of salary between mans and women not equal where mans earn more.
• In graph(b) the quantiles coincide approximately with the bisection line except between 80 and 95
months where the majority is for mans. This information means the distribution of length of service
is approximately equal between mans and women.
 Finally, after analyzing the two graphs we could conclude that in this company even when men and
women have the same length of service, they don’t have the same salary that’s why we could say
that it has a positive discrimination policy for men

Exercise 4:

a)

To find the right average we should multiply each speed with the percentage of number of kilometers that it goes
with:
180 117 121
𝑥̃
𝐻 = 418*48+ 418 ∗ 37 + 418 ∗ 52 =46.07

b) 8* 𝑥̃
𝐻 = 368.56𝑘𝑚 <418km => the bus couldn’t finish the trip in time

Exercise 5:

a) The best way to have all the information about your data is to use function summary() which give you the
min, max, mean ,medium, first and thirdquartile

b) All what we have to do is using function quantile :

6
 Thanks to those results we know that 99% of pizzas are delivered in less than 48.62 min and with a
temperature less than 79.87 ◦C and only 1% of pizzas are delivered in more than 48.62 min
and in with temperature higher than 79.87◦C.

c) As we all ready answer in exercise 1 in R we don’t have a function that give as the absolute mean deviation
that’s why we program it as follow:

d) All what we have to do is to use scale, mean and var

e) We can use the function boxplot with the parameter range which controls whether the extreme
values should be plotted, and if yes, how one wants to define such values: in our case we doesn’t
went any extreme values:

f) We use the cut command to create a variable which has the categories (10, 20], (20, 30], (30, 40],
(40, 50], (50, 60],then we took the interval mid-points and the relative frequencies in each class
which constate the arguments of function weighted.mean() :

7
8

You might also like