You are on page 1of 7

JADE ESCARILLA

1-MALT Filipino
University of Southern Mindanao – Graduate School
Kabacan, Cotabato

Task 1
Topic: Descriptive Statistics which includes measures of central tendency,
measures of variation, quartiles, box-plot, graphs.

1. Select four variables (two qualitative variables and two quantitative variables) that are in some
way related. Discuss your rationale for selecting the variables and describe which one you have
chosen to be the dependent variable. Elaborate on the relationship between the two variables,
including your prior expectation of the probable relationship.
-2 Qualitative Variable
Country
Region
-2 Quantitative Variable
COVID-19 Active Cases
COVID-19 Deaths

2. Gather a cross sectional data for the variables chosen via survey, google docs or you can use data
from the internet. You must have sufficient observations for each variable for some descent analysis.
Record and indicate the source(s) of your data. Minimum number of observations is 30.
Country Region Active Deaths
Cases
United States of Northern 77,729,481 926,287
America America
India Asia 42,851,929 512,344
Brazil South America 28,208,212 644,286
France Western 21,650,413 133,958
Europe
The United Kingdom Western 18,654,576 160,610
Europe
Russian Federation Northern Asia 15,657,928 347,031
Germany Western 13,762,895 121,603
Europe
Turkey Western 13,588,620 92,719
Europe
Italy Western 12,494,459 153,190
Europe
Spain Western 10,858,000 98,462
Europe
Agerntina South America 8,838,674 125,451
Iran Asia 6,961,562 135,276
Netherlands Western 6,118,542 21,489
Europe
Colombia South America 6,047,042 138,106
Poland Central 5,582,217 110,157
Europe
Mexico Northern 5,413,425 315,688
America
Indonesia Asia 5,289,414 146,798
Ukraine Eastern 4,758,773 104,932
Europe
Japan Asia 4,540,656 22,000
South Africa Africa 3,659,698 98,804
Philippines Asia 3,653,526 55,763
Israel Asia 3,569,718 10,003
Czechia Central 3,523,869 38,335
Europe
Belgium Western 3,512,212 30,015
Europe
Peru South America 3,496,009 209,468
Malaysia Asia 3,246,779 32,390
Canada Northern 3,241,869 36,041
America
Portugal Western 3,193,178 20,866
Europe
Chile South America 2,876,455 41,491
Viet Nam Asia 2,834,373 39,605
Thailand Asia 2,749,561 22,691

3. Encode your data in the free statistical software (the software to be used will be agreed and
discussed in the class).
4. Present all data in a table and give the table a proper title, names of variables, time period and
units of variables. Minimum number of observations is 30.
Country Region Active Deaths
Cases
United States of Northern 77,729,481 926,287
America America
India Asia 42,851,929 512,344
Brazil South America 28,208,212 644,286
France Western 21,650,413 133,958
Europe
The United Kingdom Western 18,654,576 160,610
Europe
Russian Federation Northern Asia 15,657,928 347,031
Germany Western 13,762,895 121,603
Europe
Turkey Western 13,588,620 92,719
Europe
Italy Western 12,494,459 153,190
Europe
Spain Western 10,858,000 98,462
Europe
Agerntina South America 8,838,674 125,451
Iran Asia 6,961,562 135,276
Netherlands Western 6,118,542 21,489
Europe
Colombia South America 6,047,042 138,106
Poland Central 5,582,217 110,157
Europe
Mexico Northern 5,413,425 315,688
America
Indonesia Asia 5,289,414 146,798
Ukraine Eastern 4,758,773 104,932
Europe
Japan Asia 4,540,656 22,000
South Africa Africa 3,659,698 98,804
Philippines Asia 3,653,526 55,763
Israel Asia 3,569,718 10,003
Czechia Central 3,523,869 38,335
Europe
Belgium Western 3,512,212 30,015
Europe
Peru South America 3,496,009 209,468
Malaysia Asia 3,246,779 32,390
Canada Northern 3,241,869 36,041
America
Portugal Western 3,193,178 20,866
Europe
Chile South America 2,876,455 41,491
Viet Nam Asia 2,834,373 39,605
Thailand Asia 2,749,561 22,691

5. For quantitative variables, present it in a table of frequency distributions (frequency and percent
only), present histogram for both variables separately using the free statistical tools. Explain the
graphs (make it concise).
Active Cases
Bin Frequency %
2,000,000 0 0%
15,000,000 24 80%
27,000,000 3 10% Histogram
40,000,000 1 3%
52,000,000 1 3% 20
Frequency

65,000,000 0 0% 0
Frequency
77,000,000 0 0% 00 0 0 0 0 00 0 0 0 0 00 00 r e
0 ,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 Mo
90,000,000 1 3% 00 00 00 00 00 00 00 00
2 , 1 5 , 2 7 , 40 , 5 2 , 6 5 , 77 , 90 ,
More 0 0% Bin
TOTAL 30 100%

The histogram for Active Cases shows that the highest frequency with 80% of the data is
15,000,000.

Deaths
Bin Frequency %
10,000 0 0%
160,000 23 77%
310,000 2 7%
460,000 2 7% Histogram
620,000 1 3% 20
Frequency

770,000 1 3% 0 Frequency
920,000 0 0% 00 00 00 00 00 00 00 00 re
0 ,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 Mo
1,000,000 1 3% 1 16 31 4 6 62 7 7 92 0 0
Bin 1,
More 0 0%
TOTAL 30 100%

The histogram for COVID-19 deaths shows that the highest frequency with 77% of the data is
160,000.

6. For qualitative variables, choose two types of graphs (for example pie chart and bar graph) and
present your data using it. Explain the graphs (make it concise)
Asia
Northern America
South America
Africa
Central Europe
Western Europe
Eastern Europe

With 30 countries taken from the data table of the World Health Organization (WHO), 9 of the
countries are from Western Europe, 8 from Asia, 5 from South America, 3 from Northern America, 2
from Central Europe, and 1 from Africa and Eastern Europe.

7. Display the measures of central tendency for all your variables. Explain the results.
Active Cases Deaths

Mean 11,527,150 Mean 164,106


Median 5,497,821 Median 107,545
Mode 0 Mode 0

The mean active cases of the thirty countries is 11,527,150. The median of the active cases is
5,497,821 which is the average of the 15 th and 16th term in the data set. With the gathered data set,
there are no repeating numbers which is why there is no mode.

The mean COVID-19 deaths of the thirty countries is 164,106. The median of the deaths is 107,545.
The data set also does not have a mode since there are no repeating values.

8. Display the measures of variation for the quantitative variables. Compare the results. Which
variable is more spread? Why?
Active Cases Deaths

Mean 11527150.13 Mean 164105.6


Standard Error 2806785.517 Standard Error 37325.98
Median 5497821 Median 107544.5
Mode #N/A Mode #N/A
Standard Deviation 15373397.42 Standard Deviation 204442.8
Sample Variance 2.36341E+14 Sample Variance 4.18E+10
Kurtosis 12.09620352 Kurtosis 6.633561
Skewness 3.245813571 Skewness 2.497822
Range 74895108 Range 916284
Minimum 2834373 Minimum 10003
Maximum 77729481 Maximum 926287
Sum 345814504 Sum 4923168
Count 30 Count 30
Largest(1) 77729481 Largest(1) 926287
Smallest(1) 2834373 Smallest(1) 10003
Confidence Level(95.0%) 5740520.938 Confidence Level(95.0%) 76340.2
Comparing the results, Active Cases has s.d.=15373397.42, variance=2.36341E+14 and
range=74895108 while COVID deaths has s.d.= 204442.8047, variance=41796860385 and
range=916284. This implies that Active Cases is more spread than COVID deaths.

SD: Active Cases>Deaths


Variance: Active Cases>Deaths
Range:Active Cases>Deaths

9. Display the box-plot for two quantitative variables. What is the shape of the distribution? Compare
the box plot of the two variables.

0 20,000,000 40,000,000 60,000,000 80,000,000 100,000,000

Box plot of Active Cases


1

0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000

Box plot of Deaths

Both variables are skewed in terms of distribution. The pattern of skewness for both variables
is right-skewed, with a long tail to the right (high values), as shown by the longer right whisker and
also by the fact that the right portion of the box (median to third quartile) is longer than the left
portion. The variables are not symmetric, but its pattern of skewness is a little more complicated to
describe.

You might also like