You are on page 1of 8

Statistics I

Exercises for Topic 2


Academic year 2021/22

Exercises
1. The following table shows the absolute frequency distribution of the duration (in minutes) of 60 taxi services with origin
in a certain airport:

Duration (interval) Number of services


[0, 10) 8
[10, 20) 17
[20, 30) 14
[30, 40) 10
[40, 60) 11
Total 60

(a) Draw a histogram, taking into account that not all classes have the same width. Calculate the height of each bar so
that the area of each rectangle equals the relative frequency of its class (histogram with unit total area)
(b) From the histogram, describe the shape of the distribution. Indicate the modal and median intervals.
(c) From the table of frequencies, calculate (approximately) the mean and variance of the duration using the class marks.

2. The spreadsheet data_condemned_2016_INE of the Excel workbook Datos_spreadsheet2.xls contains information provided
by the INE1 about the age and number of prison sentences dictated in 2016.

(a) Represent the relative and cumulative frequency distributions of the variable age through a bar chart. What information
can you obtain about the age of the condemned? (Note: if you use Excel you can represent simultaneously both
distributions through a combined chart. Select the cumulative frequencies as “secondary axis”).
(b) Represent the relative frequency distribution of the variable number of prison sentences through a pie chart. Do the
quartiles and percentiles make sense for this variable? If yes, calculate the 80 percentile and interpret it.

3. The following is a chart from the report “La Universidad Española in Cifras 2015/2016 ” 2 .
1 In Estadística de condenados: Adultos, from information in the Registro Central of Penados of the Ministerio de Justicia.
2 Published by the Conferencia the Rectores de las Universidades Españolas (CRUE) with the collaboration of Santander Universidades.

1
(a) What is the variable of interest? What is its type? What are the population and the sample?
(b) From the chart, obtain the maximum, the minimum, the median, the first and the third quartiles, the range and
the interquartile range (IQR) of the variable. Which universities occupy each of these positions? Interpret the values
obtained.
(c) Draw the chart that you consider more appropriate to represent the data and comment about the shape and the posible
presence of outliers.
(d) If you observe outliers, calculate the mean and the standard deviation of the data with and without outliers. Obtain
also the median and the interquartile range of the data without outliers. Comment on the results obtained.
(e) Taking into account the datum at the end of the chart, indicating that the total percentage of mobility students in
public Spanish universities is 6, 18 %, which criterion do you guess has been used to select the 20 universities in the
chart?

4. The following bar chart represents the distribution of cumulative frequencies of a certain variable:

(a) What is the type of the variable?


(b) Deduce and represent the corresponding absolute frequency table.

2
(c) Discuss the shape of the distribution.
(d) Calculate the mean and the standard deviation of this dataset.
(e) Calculate the mode, the median and the percentiles 20 and 80.

5. The following table shows the number of university graduates for the academic year 2009-2010 by Comunidad Autónoma
(CA) of the university in which they graduated (INE, Encuesta of inserción laboral de titulados universitarios 2014).

CA Number of univ. graduates


Andalucía 31655
Aragón 4989
Asturias, Principado of 3947
Balears, IllIs 1905
Canarias 4615
Cantabria 1751
Castilla and León 14368
Castilla - the Mancha 4924
Cataluña 31345
Comunitat Valenciana 19799
Extremadura 3767
Galicia 10175
Madrid, Comunidad of 38739
Murcia, Región of 6771
Navarra, Comunidad Foral of 3162
País Vasco 9744
Rioja, the 1005

If the variable of interest is the CA of the university in which graduates obtained their degree:
(a) What is the variable type and what is the population?
(b) What frequency distribution does the above table show?
(c) What statistical measures can you obtain for such a variable?
Draw a Pareto chart to check whether the following claims, which refer to university graduates in the academic year
2009-2010, are true or false:
(a) Less than 25 % of universities produce more than 60 % of graduates.
(b) The median of the distribution is Cataluña.
(c) 20 % of CCAA concentrate the universities from which more than 50 % of graduates come.
(d) From the universities of 35 % of the CCAA come less than 10 % of graduates.
6. Consider the following charts published in El Mundo3 about diffusion data of Spanish press (OJD, Oficina of Justificación
of the Difusión).
3 24 September, 2014. Source: blog Malaprensa

3
(a) Do you find the charts adequate? Why?
(b) Represent properly the data in such charts and compare the charts obtained with those that were published.

7. The following table shows data from the Encuesta of Condiciones de Vida (INE) corresponding to the years 2014 and 2006
about the percentage of households facing economic hardship by CCAA.

CA 2014 2006
Andalucía 24,3 16,8
Aragón 9,8 5,7
Asturias, Principado of 4,6 3,1
Balears, IllIs 14,7 9,7
Canarias 19,5 18,4
Cantabria 15,2 9,3
Castilla and León 12,1 8,5
Castilla - the Mancha 15,9 10,7
Cataluña 12,2 11,3
Comunitat Valenciana 18,0 12,3
Extremadura 19,6 8,7
Galicia 20,8 11,9
Madrid, Comunidad of 12,4 8,8
Murcia, Región of 22,7 14,1
Navarra, Comunidad Foral of 4,2 6,6
País Vasco 11,5 5,2
Rioja, the 12,9 6,6
Ceuta 32,9 25,7
Melilla 12,9 15,9

The following tables show information about the variable percentage of households facing economic hardship in each of the
observed periods:

4
2014 2006

Media 15,5895 Media 11,0158


Error típico 1,5677 Error típico 1,2385
Mediana 14,7 Mediana 9,7
Moda 12,9 Moda 6,6
Desviación estándar 6,8333 Desviación estándar 5,3986
Varianza de la muestra 46,6943 Varianza de la muestra 29,1447
Curtosis 1,1493 Curtosis 1,7247
Coeficiente de asimetría 0,6357 Coeficiente de asimetría 1,1245
Rango 28,7 Rango 22,6
Mínimo 4,2 Mínimo 3,1
Máximo 32,9 Máximo 25,7
Suma 296,2 Suma 209,3
Cuenta 19 Cuenta 19

(a) Represent the data of 2006 and of 2014 in histograms and compare their distributions. What differences do you find?
(b) To analyze the evolution of the percentage of households facing economic hardship in the period 2006–2014, obtain the
percentiles 20, 40, 60 and 80 for each year. Tabulate these data for each year, along with the minimum and maximum
values. What conclusions can you draw? Also, represent the data in the table as a chart.
(c) What central tendency measure is more adequate in each case and why?
(d) Which year shows more variability in the data?
(e) In which of the two periods, 2014 or 2006, do the Comunidad de Madrid and the Comunitat Valenciana show the worst
results relative to the situation in those years?

Exercises from exams of previous academic years


8. (May 2015 exam) Vendors doing business with a particular company were sampled to determine the economic impact
of company business on their gross sales. A sample of 15 firms that provide services to the company had the following
percentages of total annual sales as a result of sales to the company:

27 12 14,9 1,2 0,1 1 0,1 5,3 7,6 5 1 1 3,2 3 7

(a) Is the sample mean of the 15 percentages larger than the sample median? If true, what does this result suggest? Justify
your answers.
(b) Calculate the three sample quartiles. Interpret them in term of percentages.
(c) Compute the sample quasi-variance and coefficient of variation of the 15 percentages.
(d) Draw a box-plot of the data and identify the outliers (if any). Justify your answer.
9. (June 2015 exam) The following tables contain information about the GDP and the unemployment rate of the Spanish
Autonomous Regions:

5
Answer and justify the following questions:
(a) Fill in the gaps in Table 2.
(b) Which of the two variables is more disperse?
(c) Determine the group of Autonomous Regions formed by the 15 % with higher GDP.
(d) Darw the box-plot of the unemployement rate. What can you tell about the shape of the distribution?
(e) From the previous box-plot, decide if there are outliers and /or extreme outliers in the data. Identify the Autonomous
Regions that can be considered outliers and/or extreme outliers.
10. (May 2016 Exam) The following tables contain information about 10 companies of the IBEX 35. In particular, three variables
are shown: X1 =“average remuneration of the governing board”, X2 =“average remuneration of senior management” and
X3 =“average expenditure per employee” (in millions of euros). Source: El País, 8th May 2016.

6
Tabla 1 / Table 1
Empresa / Company X1 X2 X3 Figura 1 / Figure 1
BBVA 0,985 1,144 0,455
ACS 0,667 0,540 0,401
FCC 0,720 0,650 0,323
I dit
Inditex 1,270
1 270 1,730
1 730 0,231
0 231 A
Acciona 0,463 0,590 0,390
Santander 1,484 2,580 0,586
IAG 1,220 2,440 0,809
Iberdrola 0,920 1,979 0,894
Ferrovial 1,330 1,800 0,391
Telefónica 1,240 1,869 0,491 B

Tabla 2 / Table 2
X1 X2 X3
Media / Mean 1,030 0,497
Mediana / Median 1,765 0,428
Desv. típica / Standard dev. 0,333 0,756 0,210
Varianza / Variance 0,572 0,044
Q1 0,770 0,774 0,390
Q3 , 63
1,263 ,95
1,952

Answer to the following questions:


(a) Fill in the gaps in Table 2.
(b) Determine the shape of the distribution of X2 . Justify your answer.
(c) Which of the three variables is more disperse? Justify your answer.
(d) Are there any outliers in X3 ? Justify your answer.
(e) Match the box-plots A and B of Figure 1 with the corresponding variables (X1 , X2 , X3 ). Justify your answer.
(f) It is known that the correlation between X1 and X3 is 0.175 and, on the other hand, that the covariance between X2
and X3 is 0.093. Is it true that the linear relationship between X3 and X1 is stronger than between X3 and X2 ? Justify
your answer. (Note: this question is from Chapter 3)
11. (June 2016 Exam) The following tables contain information about 10 companies of the Dow Jones. In particular: X1 =“highest-
paid CEO (in million dollars)” and X2 =“share price (in dollars)”. Source: El País, 8th May 2016.
Tabla 1 / Table 1 Figura 1 / Figure 1
X1 X2
44,91 106,11
27,29 87,81
24,2 53,95
23,79 113,69
23,37 29,97
22,58 158,31
22,03 100,31
21 98
21,98 64,21
64 21 A
20,01 110,68
19,82 147,6
B
C

a) Draw the box-plot for X2 and identify the outliers (if any). Justify your answer.
b) Determine if X1 and X2 have the same type of asymmetry. Justify your answer.
c) Determine which box-plot (A, B, C) corresponds to X1 . Justify your answer.

7
d ) If 1 euro = 1.14 dollars, calculate the average salary for those CEOs (in million euros) and the variance.

12. (May 2017 exam) The following table shows the values of the Human Development Index (HDI) for different countries of
Africa, America and Europe in the year 2015.
África 0,348 0,411 0,413 0,416 0,419 0,646 0,666 0,684 0,69 0,698 0,721 0,724 0,736 0,772 0,777
América 0,483 0,666 0,679 0,714 0,715 0,772 0,78 0,783 0,785 0,79 0,793 0,827 0,847 0,919 0,923
Europa 0,693 0,751 0,754 0,761 0,771 0,899 0,907 0,907 0,908 0,916 0,916 0,922 0,923 0,93 0,944

Answer to the following questions:

(a) Find the three quartiles for each of the three continents and decide if there are any outliers in the data of each continent.
(b) Draw the box-plot of the American data in the following picture. Determine the shape of each distribution and compare
them. Which measures of centrality and variability are more appropriate in each case? Do not calculate them.

(c) Justify the truthfulness or falseness of the following statements. Apply the quartiles to justify your answers.
1) 50 % of African countries in the table have an HDI that is below the level reached by any of the European countries
in the table.
2) 75 % of American countries in the table have an HDI that is above the level reached by any African countries in the
table.
(d) The HDI can be classified as follows: Very High [0,8, 1); High [0,7,0,8]; Medium [0,55, 0, 7) and Low [0,0,55]. The
contingency table for the variable continent (X) and the variable HDI in categories (Y ) is depicted below:

X/Y Low, (0; 0, 55) Medium, [0, 55; 0, 7) High, [0, 7; 0, 8) Very High, [0, 8; 1)
África 5 5 5 0
América 1 2 8 4
Europa 0 1 4 10
What percentage of countries with high or very high HDI belongs to Europe? And what percentage of countries with
an HDI of less than 0, 55 belongs to the African continent?

You might also like