Professional Documents
Culture Documents
Biostatistics
Lesson 2
Sampling, Numerical Presentation of Data
N. Yassine Biostatistics 1
Objective
N. Yassine Biostatistics 2
Why Sampling?
• When the population is too large
• When the time required to study the entire population is too
long
• When the cost of studying the entire population is too high
• When it is very difficult or even impossible to survey, measure, or
locate the individuals or objects of the population
• When the objects of the population must be destroyed when
examined
• When the sample results are adequate estimates of the
population parameters
N. Yassine Biostatistics 3
Sampling Methods
• Samples must be representative and randomly selected
• Simple random sampling: Each member, or each group of a
particular size, of the population has the same probability of
being selected
• Systematic sampling: Sample members are selected
according to a starting point and a fixed interval whose
length is the quotient obtained when dividing the population
size by the required sample size.
N. Yassine Biostatistics 4
Systematic sampling
Himaya NGO is seeking to form a systematic sample of 500
volunteers from a population of 5000.
1.Calculate and fix the sampling interval. (The number of
elements in the population divided by the number of
elements needed for the sample.)
The interval is N/n = 5000/500 = 10
2. Choose a random starting point between 1 and the
sampling interval.
Start at the first observation
3. Lastly, repeat the sampling interval to choose subsequent
elements.
Hence, they select every 10th person in the population to
build a sample systematically.
N. Yassine Biostatistics 5
Sampling Methods
• Stratified sampling: Populations member are grouped
according to certain characteristics (age group, gender,
location, …). From each group or strata, a certain number of
members are selected based on simple random or systematic
sampling.
- Divide the population into various strata
- Select a random sample from each strata
- The sample size from each strata should be proportional
N. Yassine Biostatistics 6
Stratified Sampling
The distribution of employees in a certain
company:
male, full-time: 90
male, part-time: 18
female, full-time: 9
female, part-time: 63
A stratified sample of 40 staff according to the
above categories.
N. Yassine Biostatistics 7
Stratified Sampling
% male, full-time = 90 ÷ 180 = 50%
% male, part-time = 18 ÷ 180 = 10%
% female, full-time = 9 ÷ 180 = 5%
% female, part-time = 63 ÷ 180 = 35%
Hence, the stratified sample should have
50% (20 individuals) should be male, full-time.
10% (4 individuals) should be male, part-time.
5% (2 individuals) should be female, full-time.
35% (14 individuals) should be female, part-time.
N. Yassine Biostatistics 8
Sampling Methods
• Cluster sampling: When a list of population members is not available
or the population members are located within disperse geographical
locations (clusters), the two-stage process of cluster sampling can be
deployed. First, a sample of clusters is selected. Second, cluster
members are sampled.
An organization aims to survey the performance of Moodle across
Germany. They can divide the entire country’s population into cities
(clusters) and select further towns with the highest population and also
filter those using Moodle.
N. Yassine Biostatistics 9
Frequency Tables and Graphical Presentation:
Qualitative Data
• Bar Graph
• Pie Chart
N. Yassine Biostatistics 10
Frequency Table
Frequency table is a grouping of qualitative data into mutually
exclusive and collectively exhaustive classes showing the number of
observations in each class.
N. Yassine Biostatistics 11
Bar Graph
N. Yassine Biostatistics 12
Pie Chart
N. Yassine Biostatistics 13
Example 2.1
The following are the blood groups of a sample of 50 patients admitted
to ER in the last 24 hours:
O-, O+, O-, O+, A+, O-, O+, B-, O-, AB+, O+, B-, A+, O+, O+, A+, O+, O+,
O+, B-, AB+, O+, A+, O-, O+, O-, O+, O-, A+, O-, O+, O-, AB+, O+, B-, A+,
O-, O+, A+, O+, O-, O+, AB+, O+, O+, A+, O-, B-, AB+, O-
N. Yassine Biostatistics 14
Frequency Table
N. Yassine Biostatistics 15
Bar Chart
Blood Group
20 19
18
16
14 13
12
10
8
8
6 5 5
0
O+ O- AB+ A+ B-
N. Yassine Biostatistics 16
Pie Chart
Blood-Groups
10%
O+
16% 38% O-
AB+
A+
10%
B-
26%
N. Yassine Biostatistics 17
Frequency Distributions and Graphical
Presentation: Quantitative Data
• Quantitative data can be summarized using:
N. Yassine Biostatistics 18
Frequency Distributions: Quantitative Data
• Frequency distribution: A grouping of quantitative data into mutually
exclusive and collectively exhaustive classes showing the number of
observations in each class.
• Class lower limit: smallest value in the class (included).
• Class upper limit: largest possible value in the class (excluded).
• Class frequency: number of observations that are greater than or
equal to the between lower and upper limits of the class.
• Class relative frequency: class frequency divided by the total.
• Class interval: difference between its upper and lower limits.
• Class upper midpoint: average of its upper and lower limits.
N. Yassine Biostatistics 19
Example 2.2
• Refer to example 1.1. Construct a frequency distribution based that
represent letter grades of A, B, C, D, and F.
66 80 73 58 63 77
72 52 86 65 75 67
67 69 73 70 68 66
80 75 78 67 73 61
66 72 65 71 69 82
83 72 78 74 60 91
81 84 90 73 64 68
87 82 88 76 85 68
68 79 79 68 75 72
65 66 80 59 75 64
N. Yassine Biostatistics 20
Set the individual class limits, frequency table
A frequency distribution representing letter grades must have a class
interval of 10. Since no grade in the data is less 50, the first class must
have a lower limit of 10. The frequency distribution is:
Class Frequency Relative Frequency % Frequency
50 up to 60 3 0.050 5.0 %
60 up to 70 22 0.367 36.7 %
70 up to 80 21 0.350 35.0 %
80 up to 90 12 0.200 20.0 %
90 up to 100 2 0.033 3.3 %
Total 60 1.00 100.0 %
N. Yassine Biostatistics 21
Example 2.2
Consider the first class.
• Class interval: 60-50 = 10 for 1st class.
• Midpoint: (50+60)/2 = 55.
• Frequency: 3 indicating that there are three grades are between 50
and 60.
• Relative frequency: 0.05 indicating that the proportion of grades
between 50 and 60 is 0.05.
• Percent frequency: 5% indicating that the percentage of grades
between 50 and 60 is 5%.
N. Yassine Biostatistics 22
2-K Rule
N. Yassine Biostatistics 23
2-K Rule
N. Yassine Biostatistics 24
Example 2.3
• A random sample of 30 students was selected and the number of
hours each student studied last week was recorded.
• 15, 23, 19, 15, 18, 23, 14, 20, 13, 20, 17, 12, 20, 13, 31, 18, 29, 17, 18,
11, 26, 15, 15, 17, 33, 31, 23, 12, 27, 16.
a) Organize the data into a frequency distribution.
b) Find the relative frequency distribution and the percent frequency
distribution.
c) Find the cumulative frequency, relative cumulative frequency, and
percent cumulative frequency distributions.
N. Yassine Biostatistics 25
Ex. 2.3: Number of Classes
• To find the smallest K satisfying 2K>30, consider the powers of 2:
• The smallest value satisfying 2K > n is K = 5.
• Number of classes selected is 5.
K 1 2 3 4 5
2K
2 4 8 16 32
N. Yassine Biostatistics 26
Ex. 2.3: Class Interval and Class Limits
• I ≥ (H − L)/k
• I ≥ (33 – 11)/5
• I ≥ =4.4
N. Yassine Biostatistics 27
Frequency Distribution
Study Hours Frequency Relative % Frequency
Frequency
10 up to 15 6 0.2 20 %
15 up to 20 12 0.4 40 %
20 up to 25 6 0.2 20 %
25 up to 30 3 0.1 10 %
30 up to 35 3 0.1 10 %
Total 30 1.0 100.0 %
N. Yassine Biostatistics 28
Cumulative Frequency Distribution
Study Hours Cumulative Relative % Cumulative
Frequency Cumulative Frequency
Frequency
10 up to 15 6 0.2 20 %
15 up to 20 18 0.6 60 %
20 up to 25 24 0.8 80 %
25 up to 30 27 0.9 90 %
30 up to 35 30 1.0 100 %
N. Yassine Biostatistics 29
Cumulative Frequency Distribution
Study Hours Cumulative Relative % Cumulative
Frequency Cumulative Frequency
Frequency
10 up to 15 6 0.2 20 %
15 up to 20 18 0.6 60 %
20 up to 25 24 0.8 80 %
25 up to 30 27 0.9 90 %
30 up to 35 30 1.0 100 %
N. Yassine Biostatistics 30