Professional Documents
Culture Documents
et de la Recherche Scientifique
جامعـة قرطاج
Université de Carthage
Statistics
______________________________________
Assignment 3
______________________________________
1
Exercise 1:
a) It’s easy to find the arithmetic mean by applying the rule:
1 1
x̅D = 10 ∑10
i=1 xi = 10 (12.5 + 29.9 + 14.8 + 18.7 + 7.6 + 16.2 + 16.5 + 27.4 + 12.1 + 17.5) = 17.32
1 1
x̅A = 10 ∑10
i=1 xi = 10 (342 + 1245 + 502 + 555 + 398 + 670 + 796 + 912 + 238 + 466) = 612.4
As we all know the median is the middle value in the group of numbers and to find it our numbers
must be arranged in order that’s why we are going to rearrange it after that we can find our median
value:
i 1 2 3 4 5 6 7 8 9 10
D 7.6 12.1 12.5 14.8 16.2 16.5 17.5 18.5 27.4 29.9
A 238 342 398 466 502 555 670 796 912 1245
1 1
𝑥̃0.5,𝐷 = (𝑥̃(5) + 𝑥̃(6) ) = (16.2 + 16.5) = 16.35
2 2
1 1
𝑥̃0.5,𝐴 = (𝑥̃(5) + 𝑥̃(6) ) = (502 + 555) = 528.5
2 2
b) In our case we have n=10 to find the first quartiles α = 0.25. After applying the rule, we find:
nα=10*0.25=2.5 which is not an integer. As we all know at least 25% of the values should be less than
or equal to the quantile(quartile in our case) and 75% of the values should be greater or equal to this
quantile so 𝑥̃0.25 = 𝑥(𝑘)where k is the smallest integer higher than 2.5 in our case k=3 then 𝑥̃0.25 =
𝑥(3)
𝑥̃0.25,𝐷 = 12.5
𝑥̃0.25,𝐴 = 398
with the same way we can find the third quartiles where α = 0.75 then nα=7.5 so: 𝑥̃0.75 = 𝑥(𝑘)where
k is the smallest integer higher than 7.5 in our case k=8 then 𝑥̃0.75 = 𝑥(8)
𝑥̃0.75,𝐷 = 18.5
𝑥̃0.75,𝐴 = 796
• Distance:
If we compare the difference between the median and the first quartiles (16.35 − 12.5) and the third
quartiles (18.5 − 16.35) we find that the first is so much larger this information help us to know that
this distribution is not symmetric but skewed towards the left.
• Altitude:
After the same comparison we find :528.5-398<<<796-528.5
That’s why we have same conclusion that distribution is not symmetric but skewed towards the right.
c)
2
dQ,D = 18.5 − 12.5 = 6
The absolute median deviation:
1 10
Dd (𝑥̃0.5,𝐷 ) = ∑i=1 |xi − 𝑥̃ 0.5,𝐷 |=4.68
10
1 10
DA (𝑥̃0.5,𝐴 ) = 10 ∑i=1 |xi − 𝑥̃ 0.5,𝐴 | = 223.2
The standard deviation:
𝟏
𝒔̃𝑨 = √ 𝐬̃𝟐 𝐀 = √𝟏𝟎 ∑𝟏𝟎 ̅ A )2 = √82314.44 =286.9
𝒊=𝟏(𝐱 𝐢 − x
𝟏
𝒔̃𝑫 = √ 𝐬̃𝟐 𝑫 = √𝟏𝟎 ∑𝟏𝟎 ̅ D )2 = √𝟒𝟏. 𝟓 =6.44
𝒊=𝟏(𝐱 𝐢 − x
The absolute median deviation and the standard deviation give us an idea about how much the
observation vary (how much they are dispersed around the arithmetic mean) in our case value of the
standard deviation is important for the two variables that’s indicates lower concentration of the
observations around the mean.
d)
The best way to answer this question is to use the rule of linear transformation:
1
𝑦̅A = 3.28 ∗ x̅A =186.7
e) The box plots contain all the information about our observations which are: the median, first quartile,
third quartile, IQR, min, max. Thanks to the previous questions all those information are available, all
what we have to do is to find the extreme values if we had some of them
For the distance hiked each value higher than 27.5 or less than 3.5 is extreme and for the maximum
attitude each value higher than 1292 or less than -199 is extreme we obtain:
f)
Class intervals (5; 15] (15; 20] (20; 30]
nj 4 4 2
fj 4/10 4/10 2/10
∑ 𝑓𝑗 4/10 8/10 1
3
• The weighted arithmetic mean for grouped data is defined as:
the mid-value of the jth class interval is defined as mj = (ej−1 + ej)/2
3 3
1
𝑥̅ = ∑ 𝑛𝑗 𝑚𝑗 = ∑ 𝑓𝑗 𝑚𝑗 = 16
10
𝑗=1 𝑗=1
• We define the median class as the class Km for which this equation holds:
𝑚−1 𝑚
g) with R it’s so easy to find all information that we need by using the redefined functions
• We use mean() and median() to calculate the mean and the median:
• We use the function quantile() to find the first and third quartile by giving the argument probs which
defined as the value which divides the data in two proportions:
• The interquartile ranges can be calculated by means of the difference of the quantiles
obtained above.
4
• There is only one way to find the absolute median deviation which is programming our own
function:
• As we all know the function var() in R give as the variance but with a different method all what we
have to do is to multiply the result by (n-1/n) and then we applicate the function sqrt() to find the
value of our standard deviation :
Exercise 2:
a) To find the average growth rate of the membership we should find the growth factor first
year 2011 2012 2013 2014 2015 2016
members 23 24 27 25 30 28
Growth factor -- 1.043 1.125 0.925 1.2 0.933
Growth rate -- 4.3 12.5 -7.5 20 -6.7
the average growth rate is: r=1/5*(4.3+12.5-7.5+20-6.7) =22.6/5=4.52% => 𝑥̅𝐺 = 1.0452
b) We can use the average growth factor to find the number of members on 2018:
N (2018) = 𝑥̅𝐺 *N (2017) = 𝑥̅𝐺 ∗ 𝑥̅ 𝐺 ∗N (2016) =30.58 ≈31
5
c) We can use the growth factor again but it doesn’t give a sense that an observation of 7 years between 2011
and 2016 will give a good estimation on 2025 that’s why the result will be unrealistic specially because in
only those 7 years we could find some years where the number of members decreases and in some other
decreases so its impossible to be sure that the number will increase for the next 9 years.
Exercise 3:
• In graph(a) we remarque that the quantiles for men are higher than those for women that’s mean
that the distribution of salary between mans and women not equal where mans earn more.
• In graph(b) the quantiles coincide approximately with the bisection line except between 80 and 95
months where the majority is for mans. This information means the distribution of length of service
is approximately equal between mans and women.
Finally, after analyzing the two graphs we could conclude that in this company even when men and
women have the same length of service, they don’t have the same salary that’s why we could say
that it has a positive discrimination policy for men
Exercise 4:
a)
To find the right average we should multiply each speed with the percentage of number of kilometers that it goes
with:
180 117 121
𝑥̃
𝐻 = 418*48+ 418 ∗ 37 + 418 ∗ 52 =46.07
b) 8* 𝑥̃
𝐻 = 368.56𝑘𝑚 <418km => the bus couldn’t finish the trip in time
Exercise 5:
a) The best way to have all the information about your data is to use function summary() which give you the
min, max, mean ,medium, first and thirdquartile
6
Thanks to those results we know that 99% of pizzas are delivered in less than 48.62 min and with a
temperature less than 79.87 ◦C and only 1% of pizzas are delivered in more than 48.62 min
and in with temperature higher than 79.87◦C.
c) As we all ready answer in exercise 1 in R we don’t have a function that give as the absolute mean deviation
that’s why we program it as follow:
e) We can use the function boxplot with the parameter range which controls whether the extreme
values should be plotted, and if yes, how one wants to define such values: in our case we doesn’t
went any extreme values:
f) We use the cut command to create a variable which has the categories (10, 20], (20, 30], (30, 40],
(40, 50], (50, 60],then we took the interval mid-points and the relative frequencies in each class
which constate the arguments of function weighted.mean() :
7
8