You are on page 1of 11

STATISTISCS 3001 - CLASSES 15/21

PRACTICAL SESSION 2

Exercise 1

Consider the following frequency distribution:

1. Construct a relative frequency distribution and draw the associated histogram.

2. Construct a cumulative relative frequency distribution and draw the associated ogive diagram.

3. Which is the relation between the two diagrams? Explain.

4. Compute 𝐹𝐹𝐹𝐹(𝑋𝑋 > 37).

5. BONUS: Compute the (approximate) mean associated to the above frequency distribution.

SOL

1. We observe that the total number of observations is 49 and construct the following table:
Class Rel. Freq. Freq. dens.
[0,10) 0,163 0,0163
[10,20) 0,204 0,0204
[20,30) 0,265 0,0265
[30,40) 0,245 0,0245
[40,50) 0,122 0,0122
Then we draw the following histogram:

2. Starting from the previous table we obtain

Class Cum. Rel. Freq.


[0,10) 0,163
[10,20) 0,367
[20,30) 0,632
[30,40) 0,877
[40,50) 1

and draw the ogive diagram:


3. The histogram illustrates how (relative) frequency is distributed among different classes,
considering the frequency of each class as uniformly spread all over the corresponding
interval. The ogive diagram consists in the graph of the associated cumulative relative
frequency function 𝐹𝐹𝐹𝐹(𝑋𝑋 ≤ 𝑥𝑥): at each value 𝑥𝑥, the value 𝑦𝑦 of the function represented
in the ogive diagram corresponds to the total area of the rectangular regions on the left of
the value 𝑥𝑥 in the histogram diagram. E.g.
The area of the red-shaded region
corresponds to 𝑦𝑦

𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑙𝑙′𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑞𝑞𝑞𝑞𝑞𝑞.

𝑥𝑥

While, in the ogive diagram we can read


such value 𝑦𝑦 on the ordinates

𝑦𝑦

𝑥𝑥
In the ogive diagram, on each interval (i.e. each class) [𝑥𝑥1 , 𝑥𝑥2 ), the slope of the linear piece
joining the points (𝑥𝑥1 , 𝑦𝑦1 ), (𝑥𝑥2 , 𝑦𝑦2 ) – where 𝑦𝑦1 = 𝐹𝐹𝐹𝐹(𝑋𝑋 ≤ 𝑥𝑥1 ) and 𝑦𝑦2 = 𝐹𝐹𝐹𝐹(𝑋𝑋 ≤ 𝑥𝑥2 ) -
corresponds to the frequency density associated to such interval. This can be easily
verified using the formula which gives the expression of a line through two points
𝑥𝑥 − 𝑥𝑥1 𝑦𝑦 − 𝑦𝑦1 𝑦𝑦2 − 𝑦𝑦1 𝑦𝑦2 − 𝑦𝑦1
= ⇔ 𝑦𝑦 = �𝑦𝑦1 − 𝑥𝑥1 � + 𝑥𝑥 ,
𝑥𝑥2 − 𝑥𝑥1 𝑦𝑦2 − 𝑦𝑦1 𝑥𝑥2 − 𝑥𝑥1 𝑥𝑥2 − 𝑥𝑥1
where 𝑥𝑥1 ≤ 𝑥𝑥 < 𝑥𝑥2 ; we can observe that:
𝑦𝑦2 − 𝑦𝑦1 𝑅𝑅𝑅𝑅𝑅𝑅. 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓. 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 [𝑥𝑥1 , 𝑥𝑥2 )
= = 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓. 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑. 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 [𝑥𝑥1 , 𝑥𝑥2 ).
𝑥𝑥2 − 𝑥𝑥1 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 [𝑥𝑥1 , 𝑥𝑥2 )

From the above, we also learn that for any 𝑥𝑥 inside the interval (i.e. the class) [𝑥𝑥1 , 𝑥𝑥2 )
the cumulative relative frequency at 𝑥𝑥 is given by
𝑦𝑦2 − 𝑦𝑦1 𝑦𝑦2 − 𝑦𝑦1 𝑦𝑦2 − 𝑦𝑦1
𝐹𝐹𝐹𝐹(𝑋𝑋 ≤ 𝑥𝑥) = 𝑦𝑦 = �𝑦𝑦1 − 𝑥𝑥1 � + 𝑥𝑥 = 𝑦𝑦1 + (𝑥𝑥 − 𝑥𝑥1 ) × .
𝑥𝑥2 − 𝑥𝑥1 𝑥𝑥2 − 𝑥𝑥1 𝑥𝑥2 − 𝑥𝑥1

4. Using areas/histograms, we can compute


𝐹𝐹𝐹𝐹(𝑋𝑋 > 37) = 𝐹𝐹𝐹𝐹(37 < 𝑋𝑋 < 40) + 𝐹𝐹𝐹𝐹(40 ≤ 𝑋𝑋 < 50)
= (40 − 37) × 0.0245 + 0.123 = 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏.

Equivalently, using the ogive/cumulative relative frequency we can compute


𝐹𝐹𝐹𝐹(𝑋𝑋 > 37) = 1 − 𝐹𝐹𝐹𝐹(𝑋𝑋 ≤ 37) = 1 − 0.632 + (37 − 30) × (0,877 − 0,632)/(40 − 30)
= 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏.

5. We need to compute
5 × 0,163 + 15 × 0,204 + 25 × 0,265 + 35 × 0,245 + 45 × 0,122 = 24.59.
Exercise 2
The Municipality of Milan has produced a report on a survey of the city bars. The report contains
the following graph representing the relative cumulative frequencies:

Specify the type of the variable represented in the graph and provide a reasonable example of a
variable that could generate the results shown.

SOL It is a numerical discrete variable. Examples: number of employees, number of windows, number of
bars.
Ecercise 3
TIR is an international store chain that sells innovative products for body care. The chain does
not have stores in Italy. However, a pilot store has been opened in a small town to evaluate the
profitability of entering this market. The following data have been obtained from a sample of
customers who have made purchases in the pilot store. The following variables have been
collected:

The table also contains the result of some calculations performed on the data:

1. After classifying the variable EXPENSE in the classes

produce the corresponding histogram. Could you repeat such task using R/Radiant?
2. Produce a bar chart for the variable TOP using R.

3. BONUS: compute the mean of the variable TIME using R/Radiant.

SOL

As for the histogram, we need to obtain the following table and graph.

For the R/Radiant part, have a look at the script file “commands_PS2.R”, uploaded on Blackboard.
To open it, proceed as follows:

A) save the file “commands_PS2.R” in a directory of your choice (on my laptop it’s gonna be
“PS2”);

B) then open RStudio;

C) click on File >> Open File… >> then you can browse your computer folders and select the
directory in which you stored the command file >> click on the file and select “Open”.
Exercise 4

The food delivery company Eat-at-home produced the following table, which reports the number
of customers who made an order for different food categories in the city of Milan:

Food Number of customers who made an


order

Pizza 360

Sushi 280

Chinese 265

Hamburger 143
Indian 87

Other 26

Classify the variable “Food” and represent its frequency distribution by a suitable graph using R.

SOL See the script file “commands_PS2.R”.

Exercise 5
A tour operator extracts from its database a sample of people who have bought an organized holiday for a
certain destination in 2011. For the clients extracted, gender and money spent (in euro) for extra
excursions (not included in the package) have been revealed:

Money spent 0 ≤ 𝑥𝑥 < 50 50 ≤ 𝑥𝑥 < 200 200 ≤ 𝑥𝑥 < 500


Gender
Male 47 45 29
Female 34 28 17

Could you build the histogram of the variable “Money spent”, considering the whole sample,
using R?

SOL Notice that the information in this exercise is presented in the form of a frequency table and
we do not have the raw data. We will not use R or Radiant in these cases, since we do not have
access to the data set: the data has already been summarized.

You might also like