Professional Documents
Culture Documents
Escarilla Task 1 - Descriptive Statistics
Escarilla Task 1 - Descriptive Statistics
1-MALT Filipino
University of Southern Mindanao – Graduate School
Kabacan, Cotabato
Task 1
Topic: Descriptive Statistics which includes measures of central tendency,
measures of variation, quartiles, box-plot, graphs.
1. Select four variables (two qualitative variables and two quantitative variables) that are in some
way related. Discuss your rationale for selecting the variables and describe which one you have
chosen to be the dependent variable. Elaborate on the relationship between the two variables,
including your prior expectation of the probable relationship.
-2 Qualitative Variable
Country
Region
-2 Quantitative Variable
COVID-19 Active Cases
COVID-19 Deaths
2. Gather a cross sectional data for the variables chosen via survey, google docs or you can use data
from the internet. You must have sufficient observations for each variable for some descent analysis.
Record and indicate the source(s) of your data. Minimum number of observations is 30.
Country Region Active Deaths
Cases
United States of Northern 77,729,481 926,287
America America
India Asia 42,851,929 512,344
Brazil South America 28,208,212 644,286
France Western 21,650,413 133,958
Europe
The United Kingdom Western 18,654,576 160,610
Europe
Russian Federation Northern Asia 15,657,928 347,031
Germany Western 13,762,895 121,603
Europe
Turkey Western 13,588,620 92,719
Europe
Italy Western 12,494,459 153,190
Europe
Spain Western 10,858,000 98,462
Europe
Agerntina South America 8,838,674 125,451
Iran Asia 6,961,562 135,276
Netherlands Western 6,118,542 21,489
Europe
Colombia South America 6,047,042 138,106
Poland Central 5,582,217 110,157
Europe
Mexico Northern 5,413,425 315,688
America
Indonesia Asia 5,289,414 146,798
Ukraine Eastern 4,758,773 104,932
Europe
Japan Asia 4,540,656 22,000
South Africa Africa 3,659,698 98,804
Philippines Asia 3,653,526 55,763
Israel Asia 3,569,718 10,003
Czechia Central 3,523,869 38,335
Europe
Belgium Western 3,512,212 30,015
Europe
Peru South America 3,496,009 209,468
Malaysia Asia 3,246,779 32,390
Canada Northern 3,241,869 36,041
America
Portugal Western 3,193,178 20,866
Europe
Chile South America 2,876,455 41,491
Viet Nam Asia 2,834,373 39,605
Thailand Asia 2,749,561 22,691
3. Encode your data in the free statistical software (the software to be used will be agreed and
discussed in the class).
4. Present all data in a table and give the table a proper title, names of variables, time period and
units of variables. Minimum number of observations is 30.
Country Region Active Deaths
Cases
United States of Northern 77,729,481 926,287
America America
India Asia 42,851,929 512,344
Brazil South America 28,208,212 644,286
France Western 21,650,413 133,958
Europe
The United Kingdom Western 18,654,576 160,610
Europe
Russian Federation Northern Asia 15,657,928 347,031
Germany Western 13,762,895 121,603
Europe
Turkey Western 13,588,620 92,719
Europe
Italy Western 12,494,459 153,190
Europe
Spain Western 10,858,000 98,462
Europe
Agerntina South America 8,838,674 125,451
Iran Asia 6,961,562 135,276
Netherlands Western 6,118,542 21,489
Europe
Colombia South America 6,047,042 138,106
Poland Central 5,582,217 110,157
Europe
Mexico Northern 5,413,425 315,688
America
Indonesia Asia 5,289,414 146,798
Ukraine Eastern 4,758,773 104,932
Europe
Japan Asia 4,540,656 22,000
South Africa Africa 3,659,698 98,804
Philippines Asia 3,653,526 55,763
Israel Asia 3,569,718 10,003
Czechia Central 3,523,869 38,335
Europe
Belgium Western 3,512,212 30,015
Europe
Peru South America 3,496,009 209,468
Malaysia Asia 3,246,779 32,390
Canada Northern 3,241,869 36,041
America
Portugal Western 3,193,178 20,866
Europe
Chile South America 2,876,455 41,491
Viet Nam Asia 2,834,373 39,605
Thailand Asia 2,749,561 22,691
5. For quantitative variables, present it in a table of frequency distributions (frequency and percent
only), present histogram for both variables separately using the free statistical tools. Explain the
graphs (make it concise).
Active Cases
Bin Frequency %
2,000,000 0 0%
15,000,000 24 80%
27,000,000 3 10% Histogram
40,000,000 1 3%
52,000,000 1 3% 20
Frequency
65,000,000 0 0% 0
Frequency
77,000,000 0 0% 00 0 0 0 0 00 0 0 0 0 00 00 r e
0 ,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 Mo
90,000,000 1 3% 00 00 00 00 00 00 00 00
2 , 1 5 , 2 7 , 40 , 5 2 , 6 5 , 77 , 90 ,
More 0 0% Bin
TOTAL 30 100%
The histogram for Active Cases shows that the highest frequency with 80% of the data is
15,000,000.
Deaths
Bin Frequency %
10,000 0 0%
160,000 23 77%
310,000 2 7%
460,000 2 7% Histogram
620,000 1 3% 20
Frequency
770,000 1 3% 0 Frequency
920,000 0 0% 00 00 00 00 00 00 00 00 re
0 ,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 Mo
1,000,000 1 3% 1 16 31 4 6 62 7 7 92 0 0
Bin 1,
More 0 0%
TOTAL 30 100%
The histogram for COVID-19 deaths shows that the highest frequency with 77% of the data is
160,000.
6. For qualitative variables, choose two types of graphs (for example pie chart and bar graph) and
present your data using it. Explain the graphs (make it concise)
Asia
Northern America
South America
Africa
Central Europe
Western Europe
Eastern Europe
With 30 countries taken from the data table of the World Health Organization (WHO), 9 of the
countries are from Western Europe, 8 from Asia, 5 from South America, 3 from Northern America, 2
from Central Europe, and 1 from Africa and Eastern Europe.
7. Display the measures of central tendency for all your variables. Explain the results.
Active Cases Deaths
The mean active cases of the thirty countries is 11,527,150. The median of the active cases is
5,497,821 which is the average of the 15 th and 16th term in the data set. With the gathered data set,
there are no repeating numbers which is why there is no mode.
The mean COVID-19 deaths of the thirty countries is 164,106. The median of the deaths is 107,545.
The data set also does not have a mode since there are no repeating values.
8. Display the measures of variation for the quantitative variables. Compare the results. Which
variable is more spread? Why?
Active Cases Deaths
9. Display the box-plot for two quantitative variables. What is the shape of the distribution? Compare
the box plot of the two variables.
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
Both variables are skewed in terms of distribution. The pattern of skewness for both variables
is right-skewed, with a long tail to the right (high values), as shown by the longer right whisker and
also by the fact that the right portion of the box (median to third quartile) is longer than the left
portion. The variables are not symmetric, but its pattern of skewness is a little more complicated to
describe.