Week 2 - Principles in Sampling

BIOSTAT WEEK 2
TERMS IN PRINCIPLES OF SAMPLING Note: If a sample is taken from a population, formula must
be used to take into account confidence level & margin of
Population (N) - Is the set of complete collection or totality
error so we can assess.
of all possible values of the variable.
Examples: Determine the number of samples needed
Sample (n) - A subset or sub-collection of elements drawn
from a population. 1. N = 10,000, e = 1%
Solution:
STAGES IN THE SELECTION OF SAMPLE
N 10,000 10,000
n= = = =5000
Define the target population
2
1+ Ne 1+ ( 10,000 )( .01 )
2
2
2. N = 10,000, e = 5%
Select a sampling frame Solution:
N 10,000 10,000
n= = = =385
2
1+ Ne 1+ ( 10,000 )( .05 )2
26
Determine if a probability or nonprobability
sampling method will be chosen
3. N = 10,000, e = 10%
Solution:
N 10,000 10,000
Plan procedure for selecting sampling units n= = = =100
2
1+ Ne 1+ ( 10,000 )( .1 )
2
101
Determine sample size

SAMPLING TECHNIQUES
Probability Sampling (Non-biased) – a sample is selected

Select actual sampling units from the population by means of systematic way, in which
every element in a population has an equal chance of
being included in the sample. It is more complex way, time
Conduct fieldwork
consuming, and costly
 Simple random sampling - elements of the sample

DETERMINING THE SAMPLE
are selected through lottery.
 There is no general rule regarding the sample size. - Gives every element in the target population and
 However, the higher the percentage of the sample, each possible sample of a given size an equal
the higher the validity of the study. chance of being selected.
 The bigger the population, the lesser percentage of  Systematic sampling / Interval Random Sampling
the sample is taken. - it is done by taking every element in the
 For a specific calculation of the sample for the population assignment of number as a part of the
purpose of adequate sampling, the use of the sample.
Slovin’s Formula presented below is advised as - To select the systematic sample of n elements
given by Pagoso from a population of N element, we divide the N
 Slovin’s Formula – used to calculate the element in the population in the n groups of kth
sample size, given the N size and e elements.
 Simplified Formula for Proportion - A random selection is made of the first element
for the sample, subsequent element are selected
N using a fixed or systematic interval until the desired
n=
1+ Ne2 sample is reached.
Where: - The random start distinguishes this sampling
procedure from its non-probability
N = Population; n = sample counterpart/systematic sampling.
e = margin of error or percentage of error
BIOSTAT WEEK 2
 Cluster sampling - Population is being divided into
sections (or cluster), randomly select some of these
 Non-proportional Stratified Sampling –
cluster as the member of the sample size
a.k.a Disproportionate Stratified Sampling
- Randomly selected in naturally occurring
- The number of elements sample from
groupings
each stratum is not proportional to the
 Cluster - an aggregate or intact grouping
representation in total population.
of population elements
- The population elements are not given an
 Stratified sampling - the population is subdivided
equal chance to be included in the sample.
into at least two different subpopulation (or
strata/ stratum - plural) that share the same Note: If there’s no given margin of error, use 5%
characteristics and then the elements of the
Non-Probability Sampling (Biased) – the sample is not a
sample are drawn from its stratum proportionately
proportion of the population. There is no system in
- Formed groups, all clusters will have
selecting a sample, the selection depends on the situation.
representative proportional to the number of
- No assurance that each item has a chance of being
representative per class, the larger the class, the
included in the sample. Assuming that even distribution of
more sample we get from the class.
characteristics within the population, thus believing any
- The target population is first separated into
sample would be a representative.
mutually exclusive homogeneous segment or
strata/subset. Simple random sample is selected  Purposive sampling (Judgmental / Authoritative) -
from each segment or stratum. The sample the elements of the sample are being selected
selected from the various subset are then according to the criteria or rules set.
combined into a single sample. - There’s a selection or criteria
- Also referred to as Quota Random Sampling - Not all in the selection criteria will be removed
 Proportional Stratified Sampling – the  Quota sampling - the sample size is limited on the
number of elements allocated to the required number or subject in the study.
various strata is proportional to the - It will consider the distribution of the population
representation of strata in target pop.  Convenience sampling (Haphazard / Opportunity)
- the sample are being selected from a particular
place at specified time preferred.
 Snowballing sampling - the researcher asks
respondents to give referrals to other possible
respondents
- Used in very sensitive research
 Linear Snowballing sampling
Note: The total should be +-(1) to the sample

 Exponential Non-Discriminative
 Exponential Discriminative
BIOSTAT WEEK 2
large space on the page, without enhancing the
readers’ understanding of the data
DATA PRESENTATION
NOTE: If quantitative information to be conveyed,
consisting one or two numbers, it is more
appropriate to use written language than table or
graphs. But if more data are to be presented or
other information such as regarding data trends
are to be conveyed, a table or graph would be
more appropriate.
 Tabular (Table Presentation)
*Data is structured into rows each of which
contains information about something, arranged in
a table
- Convey information that has been converted into
words or numbers in rows and columns.
- Tables are the most appropriate for presenting
individual information, and can present both
quantitative and qualitative information.
- Useful for summarizing and comparing
Pictures of Data – Depict the nature or shape of the data
quantitative information of different variables and
distribution. Its purpose is to tell others about a set of data
information with different units can be presented
quickly, and allowing them to grasp important
together
characteristic of the data.
- A table is the simplest means of summarizing of
 Graphs – visual aids to rapid understanding set of observations, can be used for all types of
numerical data
METHODS OF PRESENTING DATA  Graphical (Graph Presentation)
 Textual (Text Presentation) * A way of analyzing numerical data, it exhibits the
* Textual are systematically collected materials relation between data, ideas, concepts in a
consisting of written, printed, electronically printed diagram
words, typically or either purposely written or - Graphs simplify complex information by using
transcribed from speech. images and emphasizing data patterns or trends
*Used in qualitative data - interviews and are useful for summarizing, explaining, or
- Main method of conveying information as it is exploring quantitative data.
used to explain results and trends, and provide DESCRIPTIVE STATISTICS
contextual information.
- Data are fundamentally presented in paragraphs - Can assume a number of different forms – tables, graphs,
or sentences. numeral summary measures.
- Used to provide interpretation or emphasize a 1. ORGANIZE DATA – means for organizing and
certain data. summarizing observations, they provide an overview of the
- For instance, information about the incidence general features of a set of data.
rates of delirium following anesthesia in 2016–
2017 can be presented with the use of a few  Tables
numbers: “The incidence rate of delirium  Frequency Distributions – each type of
following anesthesia was 11% in 2016 and 15% in variable has its own properties, and the
2017; no significant difference of incidence rates distribution of each type of variable has a
was found between the two years.” particular shape and characteristics.
- If this information were to be presented in a - Distribution of a variable consists of a
graph or a table, it would occupy an unnecessarily summary of the possible values the variable
BIOSTAT WEEK 2
can have and the number of subjects with each - Percentage =
of these values. frequency of theclass
x 100
- Understanding the shape and characteristics total number of observation
of a distribution will provide an investigator
with greater insights and can help in answering
research question.
- A table listing all classes and their frequencies
- For nominal and ordinal data, a frequency
distribution consists of a set of classes or
categories along with the numerical counts
that correspond to each one.
- To display discrete or continuous data in the
form of a frequency distribution, break down
the range of values of the observations into a
series of distinct, non-overlapping intervals.
 Intervals – often constructed so they have
all equal wins and this facilitates
comparisons among the classes  Cumulative Relative Frequency - Is the
percentage of the total number of
Frequency Distribution Probability Distribution
observations that have a value less than or
uses counts to describe Uses proportions to
the number of subjects describe the number of equal to the upper limit of the interval
with a particular value. subjects with a particular - It is calculated by summing the relative
value frequencies for the specified interval and
all previous ones.
When the data are presented in the picture above, it is

difficult to make a sense of them quickly, hence a summary
of the data can make things easier.
Note: In order to synthesize the information using a table

or graph, it is important to count the number of
observations in each category of variable, thus obtaining
its absolute frequency.  Graphs
 Bar Chart or Histogram
Frequency - number of times that something occurs
 Stem and Leaf Plot
 Relative Frequency Distributions - The  Frequency Polygon
proportion of the total number of observations
2. SUMMARIZE DATA
that appears in that interval.
- It is computed by dividing the number of  Central Tendency (or Groups’ “Middle Values”)
values within an interval by the total number  Mean
of values in the table, multiplied by 100% to  Median
obtain the percentage of values in the interval.  Mode
- useful for comparing sets of data that contain  Variation (or Summary of Differences Within
unequal numbers of observations. Groups)
- Often interested in the percentage of the  Range
class  Interquartile Range
 Variance
BIOSTAT WEEK 2
 Standard Deviation Used to display Used to present “continuous data” –
“categorical data” data that represents measured
GRAPHICAL PRESENTATION OF DATA – data that fits quantity. The data would then be
into categories collected into categories to present
histogram.
FREQUENCY DISTRIBUTION
Categorical Numerical
C. Pareto Chart - used to represent a frequency
may be presented Can be displayed in a table,
distribution for a categorical variable, and
in table or graph histogram, freq. polygon, scatter
frequencies are displayed by the height of vertical
and line chart
bars, which are arranged in order from highest to
lowest.
A. Bar Charts - popular type of graph used to display a - It is used when the variable displayed on the
frequency distribution for nominal or ordinal data. horizontal axis is qualitative or categorical.
- In a bar chart, the various categories into which the
observations fall are presented along a horizontal axis.
- vertical bar is drawn above each category such that
the height of the bar represents either the frequency
or the relative frequency of observations within that
class.
- Bar graphs do not touch each other.
- At vertical bar graph, the classes are displayed on the
vertical axis, and the frequencies of the classes are on
the horizontal axis.
HISTOGRAM PARETO
It is used when the data to It is used when the data to
be tallied is quantitative be tallied is qualitative
Computation of average, Average and variability
variability, and changes computation not possible
over time is possible
Can be used to display Can be used to display
how bad the problem is. which and where the
problem is the greatest
D. Pie Chart - Useful for comparing individual
categories with the total.
E. Frequency Polygon - It is constructed by placing a
B. Histogram – depicts a frequency distribution for point at the center of each interval such that the height
discrete or continuous data. No spaces. of the point is equal to the frequency or relative
- It is a bar graph in which the horizontal scale frequency associated with that interval.
represents classes and the vertical scale represents - Points are also placed on the horizontal axis at the
frequencies. midpoints of the intervals immediately preceding and
- The horizontal axis displays the true limits of the immediately following the intervals that contain
various intervals. observations.
- The true limits of an interval are the points that - The points are then connected by straight lines.
separate it from the 21 intervals on either side - Similar in histogram in many factors, it uses the same
- height of the bar marks the frequency associated with two axis as histogram
that interval
BAR CHART HISTOGRAM

BIOSTAT WEEK 2
G. Line Graphs - Similar to a two-way scatter plot in

F. Scatter Plots – that it can be used to illustrate the relationship
 One-Way Scatter Plots - Another type of between continuous quantities.
graph that can be used to summarize a set of - Each point on the graph represents a pair of values.
discrete or continuous observations. - Adjacent points are connected by straight lines
- Uses a single horizontal axis to display the - Useful for representing time-series data
relative position of each data point in the - Useful for studying patterns and trends across data
group. - Also appropriate for representing not only time-series
- Advantage: Since its observations is data, but also data measured over the progression of a
represented individually, no information is lost continuous variable such as distance
- Disadvantage: the graph maybe difficult to - Most commonly, in line graphs, the scale along the
read if many data points lie close together horizontal axis represents time
- We are able to trace the chronological changes in the
quantities in vertical axis over a specified period of
time.
 Box Plots - e similar to one-way scatter plots in

that they require a single axis; instead of
plotting every observation, however, they
display only a summary of the data
OTHER PICTURES OF DATA
Five number summary: minimum, first quartile, median,

third quartile, maximum
 Two-Way Scatter Plots - Used to depict the

relationship between two different continuous
measurements.
- Each point on the graph represents a pair of
values;
- The scale for one quantity is marked on the
horizontal axis, or x-axis, and the scale for the
other on the vertical axis, or y-axis
BIOSTAT WEEK 2
It is a device for presenting quantitative data in a graphical

format similar to a histogram to assist the visualizing the
shape of the distribution

Week 2 - Principles in Sampling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 2 - Principles in Sampling

Uploaded by

Copyright:

Available Formats

BIOSTAT WEEK 2

Determine sample size

Probability Sampling (Non-biased) – a sample is selected

 Simple random sampling - elements of the sample

Note: The total should be +-(1) to the sample

When the data are presented in the picture above, it is

Note: In order to synthesize the information using a table

BAR CHART HISTOGRAM

G. Line Graphs - Similar to a two-way scatter plot in

 Box Plots - e similar to one-way scatter plots in

OTHER PICTURES OF DATA

Five number summary: minimum, first quartile, median,

 Two-Way Scatter Plots - Used to depict the

It is a device for presenting quantitative data in a graphical

You might also like