Professional Documents
Culture Documents
ACTIVITIES/ASSESSMENTS: Read each item carefully. Write the answer on the yellow
paper. Answers Only.
I. A research objective is presented. For each, identify the (A) population and (B) sample in
the study.
1. A polling organization contacts 2141 male university graduates who have a white-collar
job and asks whether or not they had received a raise at work during the past 4 months.
2. Every year the PSA releases the Current Population Report based on a survey of 50,000
households. The goal of this report is to learn the demographic characteristics, such as
income, of all households within the Philippines.
3. Researchers want to determine whether or not higher foliate intake is associated with a
lower risk of hypertension (high blood pressure) in women (27 to 44 years of age). To make
this determination, they look at 7373 cases of hypertension in these women and find that
those who consume at least 1000 micrograms per day of total foliate had a decreased risk
of hypertension compared with those who consume less than 200.
B. 7373 women with hypertension ages 27 to 44 years old who consume at least 1000
micrograms of foliate a day.
II. Indicate whether the following statements require the use of descriptive or inferential
statistics.
Inferential Statistics 1. A teacher wants to know the attitudes of all students towards
abortion.
Descriptive Statistics 2. A market analyst of a sales firm draws a chart showing the
sales figures of a given product for the period 2006-2007.
Descriptive Statistics 3. A forecaster predicts the results of an election using the
number of votes cast in 15 out of 25 barangays.
Descriptive Statistics 9. Records indicated that 75% of the faculty in the graduate
school are doctoral degree holders.
III. Identify the qualitative and quantitative variables and indicate the highest level of
measurement required in each. If quantitative, classify whether discrete or continuous.
Qualitative-nominal 1. Occupation
1.
MODULE 2: DATA COLLECTION AND BASIC CONCEPTS IN SAMPLING DESIGN
ACTIVITIES/ASSESSMENTS:
Secondary 2. Dictionary
Primary 3. Artifact
Primary 6. Enrile diary describing what he thought about the World War II.
Primary 8. Speeches
Secondary 9. Newspaper
II. Determine the sample size of the following problems. Show your solution.
2. The administration at a college wishes to estimate, the proportion of all its entering
freshmen who graduate within four years, with 95% confidence. Estimate the
minimum size sample required. Assume 1. That the population standard deviation is
σ = 1.3 and precision level is 0.05.
z 2 (
n≥ ()
ⅇ
p 1− p )
n≥ (1.65/.01) ^2 (.12(1-.12))
n≥ 165^2 x .1056
n≥ 27,225 x .1056
n≥ 2875
4. An internet service provider wishes to estimate, to within one percentage error, the
current proportion of all email that is spam, with 85% confidence. Last year the
proportion that was spam was 71%. Estimate the minimum size sample required if
the total email that is spam is 10,000.
N
n≥ 2
1+ N ⅇ
n ≥ 10,000/ (1+10,000(.01^2))
n ≥ 10,000/2
n ≥ 5,000
III. Determine the type of sampling. (ex. Simple Random Sampling, Purposive Sampling)
Systematic Random Sampling 6. A tax auditor selects every 1000th income tax return
that is received.
Multi-stage Sampling 7. For a survey, a sample of municipalities was selected
from every province in the country and included all child laborers in the selected
municipalities.
Cluster Sampling 9. A college official divides the student population into five
classes: freshman, sophomore, junior, senior, and graduate student. The official takes a
simple random sample from each class and asks the members opinions regarding student
services.
Simple Random Sampling 10. In the game of lotto, 6 balls are selected from a container
with 42 balls.
IV. Using proportional allocation, determine the sample size needed for every school. The
total population of students is 10,679, and the minimum sample is 2,450.
IV. SOLUTION
ACTIVITIES/ASSESSMENTS:
- For me, the second bar graph is more informative because it has a different categorization
of levels of likeliness. And it also provided the legends for the corresponding data. The
graphic’s axes are titled and labeled clearly. It also includes the units of measurement and
an appropriate data source.
2. What features of the ‘Good Presentation’ make it better than the ‘Bad Presentation’?
- The features that make the good presentation better than the bad presentation are; It is
more organized and neater to look at. The graph shows a more detailed and specific data
for the readers to easily interpret the given graph. The variables on the left-side graph of
the bad presentation lacks details. It only enumerates the wages in one-liner form and the
range of the specified quantity on the right-side graph is very difficult to estimate if one is
given a data like that. The good presentation has its title at the top, its variables
corresponding the subjects of the data obtained and its labels.
2. What percentage of the employees are both internal and rated ‘Very Good’?
- The percentage of the employees who earns less than or equal to 80,000 is
78%.
- The salary category that includes the most employees are within the 61,000
-70, 000 salary range.
5. The length of life of an instrument produced by a machine has a normal distribution with
a mean of 12 months and standard deviation of 2 months. Find the probability that an
instrument produced by this machine will last
A. less than 7 months.
μ =12months
σ =2 months
z=(X−μ)/σ
For, Z=(7−12)/2=−2.5
P(X<7)= P(Z<−2.5)
=0.0062 (using z-score table)
B. between 7 and 12 months.
=P(-2.5<Z<0)
Be sure to draw a normal curve with the area corresponding to the probability shaded.
μ =266
σ =16
z=(X−μ)/σ
Z=(270−266)/16=0.25
P(X>270) =P(Z>0.25)
= 0.4013
μ =266
σ =16
z=(X−μ)/σ
Z=(250−266)/16=-1
P(X<250) =P(Z<-1)
= 0.1587
C. What proportion of pregnancies lasts between 240 and 280 days?
μ =266
σ =16
z=(X−μ)/σ
D. What is the probability that a randomly selected pregnancy? lasts more than 280
days?
The probability that a randomly selected pregnancy lasts more than 280
days is 0.1908. The probability that a randomly selected pregnancy
lasts fewer than 240 days is 0.0521.
μ =266
σ =16
z=(X−μ)/σ
Z=(280−266)/16=0.875
P(X>280) =P(Z>0.875)
= 0.1908
Be sure to draw a normal curve with the area corresponding to the probability shaded.
LC +UP
x=¿
2
26 +30
x=¿ x= 28
2
46+50
x = 2 x= 48
31 -35 10 33 330
36- 40 16 38 608
41 - 45 18 43 774
46 - 50 18 48 864
Total n= 75 2,940
Mean:
2,940
x̄ =
75
x̄ = 39. 2
31 -35 10 30.5 23
36- 40 16 35.5 39
41 - 45 18 40.5 57
46 - 50 18 45.5 75
Total n= 75
n 75
2
= 2
= 37.5
Median:
(37.5−23 ) 5
x̄ = 35.5 + = 40.03
16
Modal class:
d1 = 16– 10= 6
d2 =16 – 18= -2
6
Mode = 35.5 + 5 = 43
6 +(−2)
Measures of Variations:
2,940
x̄ =
75
x̄ = 39. 2
(1)Standard Deviation
3,692
s=
√ 75−1
s= 7.06341
(2) Variance
3692
s2=
75−1
s2= 49.89
Q1
nk (75 ) (1)
= = 18.75
4 4
(18.75−13)5
Q1 = 30.5+
10
Q1 = 33.375
D9
nk (75 ) (9)
= = 67.5
10 10
(67.5−57)5
D9 = 45.5 +
18
D9 = 48.417
P10
nk (75 ) (10)
= = 7.5
100 100
(7.5−0 ) 5
P10 = 25.5 +
13
P10 = 28.385
Q3
nk (75 ) (3)
= = 56.25
4 4
(56.25−39) 5
Q3 = 40.5 +
18
Q3 = 45. 292
P90
nk (75 ) (90)
= = 67.5
100 100
(67.5−57 ) 5
P90 = 45.5 +
18
P90 = 48.417
3 (39.2−40.03)
Sk=
7.06341
Sk = -0.35252
Kurtosis
QD 5.9585
k= k= k= 0.29745
P 90−P10 48.417−28.385
Q 3−Q1
QD =
2
45.292−33.375
= = 5.9585
2
BASED ON EXCEL:
B. Based on the raw data, compute measures of central tendency, measures of
variation, Skewness and kurtosis using Excel
- In this example, we want a data set with a large mean value and a
small standard deviation. The computed value for grouped and
ungrouped are different with each other. The grouped data has a
larger mean value compared to ungrouped data and it has a smaller
standard deviation compared to that of the ungrouped data.
Therefore, the grouped data is much more preferred here.
A. Compute the sample standard deviation and sample mean of Data Set I.
35
x̄ = = 3.18 or 4
11
x 5 -2 6 14 -3 0 1 4 3 2 5
x-x̄ 1 -6 2 10 -7 -4 -3 0 -1 -2 1
(x−x̄ )² = (1)² + (-6)² + (2)² + (10)² + (-7)² + (-4)² + (-3)² + (0)² + (-1)² +
(-2)² + (1)²
( x−x̄ )² 221
s² = =
n−1 10
√s ² = √ 22.1 ≈ 4.70
B. Form a new data set, Data Set II, by adding 3 to each number in Data Set I.
Calculate the sample standard deviation and sample mean of Data Set II.
Data Set I. 5, -2, 6, 14, -3, 0, 1, 4, 3, 2, 5 (adding 3 each number for Data Set II)
68
x̄ = = 6.18 or 7
11
x 8 1 9 17 0 3 4 7 6 5 8
x-x̄ 1 -6 2 10 -7 -4 -3 0 -1 -2 1
(x−x̄ )² = (1)² + (-6)² + (2)² + (10)² + (-7)² + (-4)² + (-3)² + (0)² + (-1)² + (-2)²
+ (1)²
( x−x̄ )² 221
s² = =
n−1 10
√s ² = √ 22.1 ≈ 4.70
C. Form a new data set, Data Set III, by subtracting 6 from each number in Data
Set I. Calculate the sample standard deviation and sample mean of Data Set III.
Data Set I. 5, -2, 6, 14, -3, 0, 1, 4, 3, 2, 5 (subtracting 6 in each number for Data
Set III)
Data Set III. -1, -8, 0, 8, -9, 6, -5, -2, -3, -4, -1
−31
x̄ = = -2.18 or -3
11
x -1 -8 0 8 -9 6 -5 -2 -3 -4 -1
x-x̄ 2 -5 3 11 -6 -3 -2 1 0 -1 2
(x−x̄ )² = (2)² + (-5)² + (3)² + (11)² + (-6)² + (-3)² + (-2)² + (1)² + (0)² + (-1)²
+ (2)²
( x−x̄ )² 214
s² = =
n−1 10
√s ² = √ 21.4 ≈ 4.63
D. Comparing the answers to parts (a), (b), and (c), can you guess the pattern?
State the general principle that you expect to be true.
The sample standard deviation of all three data sets is the same. The pattern
is that the action “add or subtract the same number from every data point”
doesn’t change the standard deviation of the whole data set. This makes
sense, because that action doesn’t change the spread of the data, only
its location. It slides the data along the number line without changing its
shape.
9. Using “Encoded Data file”, construct frequency distribution table for age, sex, marital
status and educational attainment and interpret the table.
Table 1 shows the frequency and percentage Table 3 shows the frequency
and percentage
distribution of the respondents in terms of sex. distribution of the respondents in
terms of
It can be gleaned from the data, out of 75 respondents, marital status. The information
from the
30 or 40% are male while there are 45 or 60% females. table shows that 40 or 53.33%
are single, 40% of 75 respondents are
married, 4% are divorce or separated,
and 2.67% are considered to be
widowed.
Dependent 2. A political scientist wants to know how a random sample of 18- to 25-
year-olds feel about Democrats and Republicans in Congress. She obtains a random
sample of 1030 registered voters 18 to 25 years of age and asks; do you have
favorable/unfavorable opinion of the Democratic/ Republican party? Each individual
was asked to disclose his or her opinion about each party.
Dependent 5. An urban economist believes that commute times to work in the South
are less than commute times to work in the Midwest. He randomly selects 40
employed individuals in the south and 45 employed individuals in the Midwest and
determines their commute times.
ACTIVITIES/ASSESSMENTS:
Solve the following problems. Make sure to follow the 6 steps procedure.
=
=
Normal Bone (X - (X -
Density 938.3) 938.3333)2
Total 0 130,083.3
Thus,
Osteoporosi (X - (X -
s 715.0) 715.0)2
Total 0 449,750
Thus,
ANOVA TABLE
Conclusion.
2. Some studies have shown that in the United States, men spend more than women
buying gifts and cards on Valentine’s Day. Suppose a researcher wants to test this
hypothesis by randomly sampling nine men and 10 women with comparable
demographic characteristics from various large cities across the United States to be
in a study. Each study participant is asked to keep a log beginning one month before
Valentine’s Day and record all purchases made for Valentine’s Day during that one-
month period. The resulting data are shown below. Use these data and a 1% level of
significance to test to determine if, on average, men actually do spend significantly
more than women on Valentine’s Day. Assume that such spending is normally
distributed in the population and that the population variances are equal.
x̄ 1=106.45 , x̄ 2=88.82
s1=26.80 and s2=23.90 ,
n1=12 and n2=12
Null hypothesis: the average spending of men on Valentine's Day is equal to the
average spending of women on Valentine's Day.
Alternative hypothesis: the average spending of men on Valentine's Day is greater
than the average spending of women on Valentine's Day.
Given,
x̄ 1=106.45, x̄ 2=88.82,n1=12,n2=12,s1=26.8,s2=23.9,α=0.01
H0:μ1=μ2
H0:μ1>μ2
n1+n2−2(n1−1)s12+(n2−1)s22=12+12−2(12−1)26.82+(12−1)23.9= 644.725
Alternative hypothesis:
The training course increases the teaching performance of teachers who attended
the training course. H0: µ1 < µ2
Conclusion:
There is sufficient evidence to support that the training course helped increase the
teaching performance of teachers who attended the training course.
4. A pediatrician wants to determine the relation that may exist between a child’s
height and head circumference. She randomly selects eleven 3- year-old children
from her practice, measures their heights and head circumference, and obtains the
data shown in the table below.
Select the data > INSERT > Recommended Charts > All Charts > Scatter X Y.
c) The correlation coefficient can be obtained using the excel function
=CORREL(array1, array2)
rc= 0.707
Head
Sampl Heigh Circumferen
e t ce
1 68.58 44.196
2 64.77 43.688
3 66.04 43.688
65.40
4 5 43.18
5 70.48 44.45
5
6 67.31 43.688
66.67
7 5 43.688
67.94
8 5 44.196
5. The following data represent the smoking status from a random sample of 1054 U.S.
residents 18 years or older by level of education.
Test whether smoking status and level of education are independent at the α = 0.05
level of significance.
Objective 1: Does the smoking status and Level of Education are Independent?
Test used:
Attributes:
1. A: smoking status
2. B: Level of Education
Hypothesis:
To test,
Ho: Attributes A&B are independent
H1: Attributes A & B are not independent.
Data:
Smoking
Status
Level of Current Former Never Total
Education
<12 178 88 208 474
2 137 69 143 349
13-15 44 25 44 113
16 or more 34 33 51 118
Total 393 215 446 1054
Eij = Ai x Bj/ N
Smoking
Status
Level of Current Former Never Total
Education
<12 176.7381 96.6888 200.5731 474
2 130.31 71.1907 147.6793 349
13-15 42.13378 23.05028 47.81594 113
16 or more 43.9981 24. 07021 49.93169 118
Total 393 215 446 1054
Smoking
Status
Level of Current Former Never Total
Education
<12 176.2709 80.092 215.702 475.0648
2 144.2327 66. 87671 138. 469 349.5784
13-15 45.94888 27.11463 40. 48859 113.5521
16 or more 26.27386 45.24265 52. 09117 1123.6077
Total 395. 7263 219. 326 446. 7507 1061.803
Conclusion: hence there is not sufficient evidence to reject ho and hence accept Ho
and conclude that smoking status and Level of Education are Independent.
6. A pediatrician wants to determine the relation that may exist between a child’s height
and head circumference. She randomly selects eleven 3-year-old children from her
practice, measures their heights and head circumference, and obtains the data shown in
the table below.
From the correlation coefficient value we can conclude that a linear relationship
exists between Height and head circumference
The correlation coefficient can be obtained using the excel function =CORREL(array1,
array2)
rc= 0.707
Yes, there appears to be a positive linear association because r is positive and is greater
than the critical
68.
1 58 44.196
64.
2 77 43.688
66.
3 04 43.688
65.
4 405 43.18
70.
5 485 44.45
67.
6 31 43.688
66.
7 675 43.688
67.
8 945 44.196
The correlation coefficient is obtained using the excel function =CORREL(array1, array2)
Directions: Read each item carefully. Write the letter corresponding to the best answer on a
yellow paper on each item. Write NONE if no correct choice is given. Make sure to write
also your solution.
PROBLEM SOLVING
A. The PUPCET scores for the math portion of the test were normally
distributed, with a mean of 23.4 and a standard deviation of 4.8. Find the
probability that a randomly selected student who took the math portion of
the PUPCET has a score that is
P(X<18)= P(Z<−11.25)
= 0.13029 (using z-score table)
μ =23.4
σ= 4.8
z=(X−μ)/σ
(a) Mean
9,715
x̃ =
50
x̃ = 194.3
(b)Median
n 50
x̃ = = = 25
2 2
(25−15) 20
x̃ = 179.5 +
13
x̃ = 194.88
(c) Mode
d1= 13 – 10 = 3
d2= 13 – 12 = 1
3
x̃ = 179.5 + ( ¿ 20
(3+ 1 )
x̃ = 194.5
(d)Standard Deviation
47,648
s=
√ 50−1
s= 31. 18
(e) Q1
nk (50 ) (1)
= = 12.5
4 4
(12.5−10)20
Q1 = 159.5 +
5
Q1 = 169.5
(f) Q3
nk (50 ) (3)
= = 37.5
4 4
(37.5−28)20
Q3 = 199.5 +
12
Q3 = 215.33
(g) D1
nk (50 ) (1)
= =5
10 10
(5−0)20
D1 = 139.5+
10
D1 = 149.50
(h) D9
nk (50 ) (9)
= = 45
10 10
(45−40) 20
D9 = 219.5 +
5
D9 = 239. 50
(i) P10
nk (50 ) (10)
= =5
100 100
(5−0 ) 20
P10 = 139. 5+
10
P10 = 149.50
(j) P90
nk (50 ) (90)
= = 45
100 100
( 45−40 ) 20
P90 = 219.5 +
5
P90 = 239.50
3 (194.30−194.88)
Sk=
31.18
Sk = -0.05581
(l) Kurtosis
QD 22.915
k= k= k= 0.25461
P 90−P10 239.50−149.50
Q 3−Q1
QD =
2
215.33−169.50
= = 22.915
2
(a) What percentage of couples married seven years has two children?
(b) What percentage of couples married seven years has at least two
children?
D. The ACT is a college entrance exam. ACT has determined that a score of
22 on the mathematics portion of the ACT suggests that a student is ready
for college-level mathematics. To achieve this goal, ACT recommends that
students take a core curriculum of math courses: Algebra I, Algebra II, and
Geometry. Suppose a random of 2020 students who completed this core set
of courses results in a mean ACT math score of 22.6 with a standard
deviation of 3.9. Do these results suggest that students who complete the
core curriculum is ready for college-level mathematics? That is, are they
scoring above 22 on the math portion of the ACT?
The smaller the p-value, the greater the evidence against the null
hypothesis. If P value is 0.001, it is statistically highly significant and we can reject
HO.