You are on page 1of 54

DR.

Wafaa Ibrahim
● Population or Universe: the group of all individuals, items, units in any region or area.

- It refers to the group of people, items or units under investigation and includes
every individual.

● Target Population:

- is the collection of all individuals, families, groups organizations or events that we are
interested in finding out about. It Is the population to which the researcher would
like to generalize the results.

For example, all adults population of Myanmar aged 65 or older

● If we are interested in any topic related to ‫ اﻟﻤﻌﺎﺷﺎت‬in Egypt, then


- the population is all egyptians.
- the target population is all egyptians whose age is > 65 years old.
● Sample: a collection consisting of a part or subset of the objects or individuals of target
population which is selected for the purpose, representing the population
target
target

● Sampling: It is the process of selecting a sample from the population. For this
population is divided into a number of parts called Sampling Units.
Sampling unit/Element/ Unit of analysis

or Element or unit of analysis or explanatory unit


● Sampling unit is the unit about which information is collected. by asking it the questions in
the questionnaire and get their answers that represent the collected data which will be used to
make conclusions about the target population also, it is each & every unit in the target population.
● Unit of analysis is the unit that provides the basis of analysis.

● Each member of a population is an element. (e.g. a child under 5)

● Sometimes it is household, e.g. any injury in the household in the last three months.
❖ The sampling frame:
The list of all sampling units / observations / explanatory units in the target population
NOT the population. And it must be clear for all persons in this list how to contact them.
The choice of the sampling frame depends on the mode of data collection (F2F, mail
(postel), telephone, or online (email))
like D2D
❖ Notes
- For a face-to-face poll or a mail poll a list of addresses is required (Samling frame in
F2F and mail polls).

- For a telephone poll, the researcher must have a list of telephone numbers (The
sampling frame in telephone polls/surveys).

- For an online poll, he must have a list of email addresses.

● How is a sample drawn from a target population? How are a number of people selected that can be considered
representative? A researcher needs a sampling frame to do this. A sampling frame is a list of all people in the
target population. And it must be clear for all persons in this list how to contact them. The choice of the
sampling frame depends on the mode of data collection.
A sampling frame can exist on paper (e.g., a card-index box for the
members of a club or a telephone directory) or in a computer (e.g., a
database containing names and addresses or a population register).
If such lists are not available, detailed geographical maps are
sometimes used to locate people. itemails, is not necessary for the sampling frame to be a list of names>> it may be a list of
mails, phone numbers, detailed geographical maps, sections overall the
faculty "FEPS",..etc. to represent the target population
The sampling frame should be an accurate representation of the target
population. There is a risk of drawing wrong conclusions from a poll if
the sample is selected from a sampling frame that differs from the
population.Generally and theoretically speaking, the perfect sampling frame must include all the sampling
units/observations in the target population, but practically we may face either of the following 2
problems in conducting the sampling frame (under coverage and Over-coverage)

Two problems associated with sampling frame


Sampling frame is Sampling frame is
Under-coverage Over-coverage
❖ Under-coverage problem of sampling frame:
occurs if the target population contains people that have no counterpart in the sampling frame
hence, the target population size is > the sampling frame size. in other words, the target population
has sampling units NOT included in the sampling frame. Such persons can never be selected in
the sample.

❖ Example of undercoverage:
- A poll in which the sample is selected from a population register. Illegal immigrants are part of
the population but will never be encountered / will never exist in the sampling frame.
‫ ﻟﻜﻨﮭﻢ ﻟﻦ ﯾﺘﻮاﺟﺪوا أﺑﺪًا ﻓﻲ‬،‫ ﯾﺸﻜﻞ اﻟﻤﮭﺎﺟﺮون ﻏﯿﺮ اﻟﺸﺮﻋﯿﯿﻦ ﺟﺰءًا ﻣﻦ اﻟﺴﻜﺎن‬.‫اﺳﺘﻄﻼع ﯾﺘﻢ ﻓﯿﮫ اﺧﺘﯿﺎر اﻟﻌﯿﻨﺔ ﻣﻦ اﻟﺴﺠﻞ اﻟﺴﻜﺎﻧﻲ‬
.‫إطﺎر أﺧﺬ اﻟﻌﯿﻨﺎت‬
- An online poll, in which respondents are selected via the internet. There will be under-coverage
due to people who are without internet access.
‫ ﺳﯿﻜﻮن ھﻨﺎك ﻧﻘﺺ ﻓﻲ اﻟﺘﻐﻄﯿﺔ ﺑﺴﺒﺐ اﻷﺷﺨﺎص اﻟﺬﯾﻦ ﻟﯿﺲ‬.‫ ﯾﺘﻢ ﻓﯿﮫ اﺧﺘﯿﺎر اﻟﻤﺸﺎرﻛﯿﻦ ﻋﺒﺮ اﻹﻧﺘﺮﻧﺖ‬،‫اﺳﺘﻄﻼع رأي ﻋﺒﺮ اﻹﻧﺘﺮﻧﺖ‬
.‫ﻟﺪﯾﮭﻢ إﻣﻜﺎﻧﯿﺔ اﻟﻮﺻﻮل إﻟﻰ اﻹﻧﺘﺮﻧﺖ‬
Also remember the example of elections in LEC 3 >>>> the problem in this example was under coverage because the sampling
frame included all people who only have mobile phone and cars and neglected those who don’t have.
Under-coverage can have serious consequences. If people outside the sampling frame systematically
differ from those in the sampling frame, estimates of population characteristics may be seriously
biased. complicating factor is that it is often not very easy to detect under-coverage.
❖ Another problem that can occur in a sampling frame problem is over-coverage:
This refers to the situation in which the sampling frame contains people that do not
belong to the target population (i.e. the sampling frame is larger than the target
population).

❖ If such people end up in the sample and their data are used in the analysis, estimates of
population characteristics may be affected. It should be rather simple to detect over-coverage
as these people should experience difficulties to answer the questions in the questionnaire.

❖ NOTE:
Overcoverage is easier/simpler than undercoverage to be detected as these people who
are in the overcoverage sampling frame and don’t belong to the target population should
experience difficulties to answer the questions in the questionnaire because they are not
familiar with these questions as they are not in the target population that is familiar with these
questions.

❖ Examples:
street intercept
‫‪Coverage problems in a sampling frame‬‬

‫‪Target‬‬ ‫‪Sampling‬‬
‫‪population‬‬ ‫‪frame‬‬

‫‪Observed‬‬
‫‪population‬‬
‫‪Under‬‬ ‫‪Over‬‬
‫‪coverage‬‬ ‫‪coverage‬‬

‫‪ -‬اﻟﺪاﺋﺮة اﻟﻠﻲ ﺑﺎﻟﻠﻮن اﻟﺒﻨﻔﺴﺠﻲ دي ﺑﺘﻤﺜﻞ ال ‪ target population‬وﺑﺎﻟﺘﺎﻟﻲ اﻟﻤﻔﺮوض ﺗﻜﻮن ھﻲ ھﻲ ﻧﻔﺴﮭﺎ ال ‪sampling‬‬
‫‪ frame‬ﻟﻤﺎ ﻣﺶ ﺑﯿﻜﻮن ﻓﯿﮭﺎ أي ﻣﺸﻜﻠﺔ ﺳﻮاء ‪ overcoverage‬او ‪.undercoverage‬‬

‫‪ -‬ﻟﻮ ﻗﻤﻨﺎ ﺑﺈﺿﺎﻓﺔ ﻧﺺ اﻟﺪاﺋﺮة اﻟﻠﻲ ﺑﺎﻟﻠﻮن اﻟﺴﯿﻤﻮن ده إﻟﻰ ال ‪ sampling frame‬ﻟﻠﻲ ﻣﻔﯿﺶ ﻓﯿﮭﺎ أي ﻣﺸﻜﻠﺔ وﻗﺖ ﺳﺎﻋﺘﮭﺎ‬
‫ھﺘﺒﻘﻰ ‪ sampling frame‬ﺑﺘﺤﺘﻮي ﻋﻠﻰ ‪ units‬ﻣﺶ ﻣﻮﺟﻮدة ﻓﻲ ال ‪ target population‬وﺑﺎﻟﺘﺎﻟﻲ ھﺘﺒﻘﻰ‬
‫‪ .… overcoverage‬ﻟﻜﻦ ﻟﻮ ﻗﻤﻨﺎ ﺑﺤﺬف ﻧﺺ اﻟﺪاﺋﺮة اﻟﻠﻲ ﺑﺎﻟﻠﻮن اﻟﺒﻨﻔﺴﺠﻲ ده ﻣﻦ ال ‪ sampling frame‬اﻟﻠﻲ ﻣﻔﯿﺶ ﻓﯿﮭﺎ‬
‫أي ﻣﺸﻜﻠﺔ ﻓﻮﻗﺖ ﺳﺎﻋﺘﮭﺎ ھﺘﺒﻘﻰ ال ‪ sampling frame‬ﻣﺶ ﺑﺘﺤﺘﻮي ﻋﻠﻰ ﻛﻞ ال ‪ units‬اﻟﻠﻲ ﻓﻲ ال‪target population‬‬
‫)ﺑﺘﺤﺘﻮي ﻋﻠﻰ ﻋﺪد أﻗﻞ ﻣﻦ ال ‪ (target pop‬وﺑﺎﻟﺘﺎﻟﻲ ھﺘﺒﻘﻰ ‪undercoverage‬‬
1) Target population is a
NOTE: subset from the population
2) Sample is a subset from
both target population and
population.
❖ Sample Design

- A set of rules or procedures (that specify how a sample is to be selected or by which the
sample is to be selected). This can either be probability or non-probability.

❖ Sample size:
The number of elements in the obtained sample
● Large population can be conveniently covered.
● Time, money and energy is saved.
● Helpful when units of area are homogenous.
● Used when percent accuracy is not acquired.
● Used when the data is unlimited.
❖ Economical:
Reduce the cost compare to entire population.(i.e. money and cost are saved)
(‫ )أي ﯾﺘﻢ ﺗﻮﻓﯿﺮ اﻟﻤﺎل واﻟﺘﻜﻠﻔﺔ‬.‫ ﺗﻘﻠﯿﻞ اﻟﺘﻜﻠﻔﺔ ﻣﻘﺎرﻧﺔ ﺑﺠﻤﯿﻊ اﻟﺴﻜﺎن‬:‫اﻗﺘﺼﺎدﯾﺔ‬
❖ Increased speed: from a sample target
Collection of data, analysis and Interpretation of data etc take less time than the population.
(i.e. time is saved) because the target pop size is > the sample size so the time needed to interview each one in
the target pop is more than that needed for a sample collected from this target pop to
represent it
(‫ )أي ﯾﺘﻢ ﺗﻮﻓﯿﺮ اﻟﻮﻗﺖ‬.‫ ﯾﺴﺘﻐﺮق ﺟﻤﻊ اﻟﺒﯿﺎﻧﺎت وﺗﺤﻠﯿﻠﮭﺎ وﺗﻔﺴﯿﺮھﺎ وﻣﺎ إﻟﻰ ذﻟﻚ وﻗﺘًﺎ أﻗﻞ ﻣﻦ اﻟﻮﻗﺖ اﻟﺬي ﯾﺴﺘﻐﺮﻗﮫ اﻟﺴﻜﺎن‬:‫زﯾﺎدة اﻟﺴﺮﻋﺔ‬

❖ Accuracy: ONLY when we have a limited area of coverage

Due to limited area of coverage, completeness and accuracy is possible.


.‫ ﻓﻤﻦ اﻟﻤﻤﻜﻦ اﻻﻛﺘﻤﺎل واﻟﺪﻗﺔ‬،‫ ﻧﻈﺮًا ﻟﻤﻨﻄﻘﺔ اﻟﺘﻐﻄﯿﺔ اﻟﻤﺤﺪودة‬:‫اﻟﺪﻗﺔ‬
❖ Rapport:

Better rapport is established with the respondents, which helps in validity and reliability of the
results
‫ا‬ ‫ ﻣﻤﺎ ﯾﺴﺎﻋﺪ ﻓﻲ ﺻﺤﺔ اﻟﻨﺘﺎﺋﺞ وﻣﻮﺛﻮﻗﯿﺘﮫ‬،‫ ﯾﺘﻢ إﻧﺸﺎء ﻋﻼﻗﺔ أﻓﻀﻞ ﻣﻊ اﻟﻤﺸﺎرﻛﯿﻦ‬:‫اﻟﻌﻼﻗﺔ‬
● Biasedness:
Chances of biased selection leading to incorrect conclusion
● Selection of true representative sample:
Sometimes it is difficult to select the right representative sample
● Need for specialized knowledge:
The researcher needs knowledge, training and experience in sampling technique,
statistical analysis and calculation of probable error.
● Impossibility of sampling:
Sometimes population is too small or too heterogeneous to select a representative sample.
● A true representative of the population.
● Free from error due to bias.
● Adequate in size for being reliable.
● Units of sample should be independent
and relevant
● Units of sample should be complete
precise and up to date
● Free from random sampling error
● Avoiding substituting the original sample
for convenience.
Random sampling (probability) (No pattern for Non-random sampling (Non-probability
selecting the sampling units exists ) sampling) (A pattern for selecting units exists )

The difference lies between the above two is whether the


sample selection is based on randomization or not.
NO PATTERN PATTERN

11/3/2023
➢ Probability Sampling: (Random Sampling) (Sampling with no pattern)
A sampling method that uses random criteria and in which each member of the target
population must have an equala chance of being selected. either equal or unequal chance
It is NOT necessary for the chance to be equal i.e. the chance may be EQUAL or UNEQUAL
➢ Non-Probability Sampling:
- A sampling method that uses non-random criteria (i.e. depends on researcher judgments
or the researcher selects samples based on the subjective judgment) where each unit in the
target population does not have an equal chance of being included. a particular member
of the target population being chosen is unknown.i.e. the probability of being selected is unknown
unlike the probability sampling.
- is a method in which not all target population members have an equal chance of
participating in the study, unlike probability sampling.

➢ In probability sampling, randomness is the element of control. In Non-probability


sampling, it relies on personal judgment.

LINK 1: What Is Non-Probability Sampling? | Types & Examples (scribbr.com)


LINK 2:Non-Probability Sampling: Types, Examples, & Advantages | QuestionPro
Types of sampling

Sampling

Non-Probability Probability Samples


Samples
Simple
Random Stratified
Convenience Snow ball
Cluster
Quota Purposive Systematic
from each stratum/sub-pop/ homogenous group
Simple Random Sampling: Here all
1.
members have the same chance
(probability) of being selected. Equal probabilities
Random method provides an unbiased
cross selection of the population.
For Example,
We wish to draw a sample of 50 students
from a population of 400 students. Place all
400 names in a container and draw out 50
names one by one. Target pop size= 400 students
Sample size = 50 students
Disadvantages:
1) it needs a complete sampling frame without an under or over coverage problem .... which is not the case in practice
2. Systematic Sampling: Each member of the
sample comes after an equal interval from its
previous member. the first observation is the only one that is selected on a random basis
while all other observations are selected on a systematic basis/sequence.

For Example, for a sample of 50 students, the


sampling fraction is 50/400 = 1/8 i.e. select one
student out of every eight students in the population.
The starting points for the selection is chosen at
random.
NOTE:
this type is preferred than the simple random sample when the sampling frame is unknown >>>in other words, when we don’t
have the sampling frame, then it is preferred to use the systematic sampling or systematic random sample rather than the
simple random sampling

NOTE:
this technique did not
solve the problem of
heterogeneity of
population.
3. Stratified Sampling: The population is
divided into smaller homogenous group or
strata by some characteristic and from each
of these strata members are selected
randomly.
Finally from each stratum using simple
random or systematic sample method is used
to select final sample.

NOTE:
It solves the problem of hetero population.
4. Cluster Sampling (Area Sampling): A
researcher/ enumerator selects sampling units at
random and then does complete observation of all
units in the group.

For example, the study involves Primary schools.

Select randomly 15 schools. Then study all the


children of 15 schools. In cluster sampling the unit of
sampling consists of multiple cases.It is also known as
area sampling, as the selection of individual member is
made on the basis of place residence or employment.
WITHIN the group>>>> HOMO AMONG/ BETWEEN the group>>>>HOMO

AMONG BETWEEN the groups>>> HETERO WITHIN the group >>>> HETERO
Difference between strata and clusters
Although strata and clusters are both non-overlapping subsets of the
population, they differ in several ways.
All strata are represented in the sample; but only a subset of
clusters are in the sample.
With stratified sampling, the best survey results occur when
elements within strata are internally homogeneous. However,
with cluster sampling, the best results occur when elements
within clusters are internally heterogeneous.
1) Purposive, judgmental, selective, or subjective sampling: without dividing the target population into mutually
exclusive groups like what we did in the quota sampling
is a non-probability sampling technique (that involves selecting units based on specific characteristics that
whether this
are relevant to the research question/in which researchers rely on their own judgment when choosing judgment is
members of the target population to participate in their surveys). In other words, units are selected "onknown or not
purpose" in purposive sampling.As the name suggests, researchers went to this community "on purpose"
because they think that these individuals fit the profile of the people that they need to reach.
‫ ﺣﯿﺚ ﯾﻌﺘﻤﺪ اﻟﺒﺎﺣﺜﻮن ﻋﻠﻰ ﺣﻜﻤﮭﻢ‬/‫ھﻲ ﺗﻘﻨﯿﺔ أﺧﺬ اﻟﻌﯿﻨﺎت ﻏﯿﺮ اﻻﺣﺘﻤﺎﻟﯿﺔ )اﻟﺘﻲ ﺗﺘﻀﻤﻦ اﺧﺘﯿﺎر اﻟﻮﺣﺪات ﺑﻨﺎ ًء ﻋﻠﻰ ﺧﺼﺎﺋﺺ ﻣﺤﺪدة ذات ﺻﻠﺔ ﺑﺴﺆال اﻟﺒﺤﺚ‬
‫ وﻛﻤﺎ‬.‫ ﯾﺘﻢ اﺧﺘﯿﺎر اﻟﻮﺣﺪات "ﻋﻦ ﻗﺼﺪ" ﻓﻲ أﺧﺬ اﻟﻌﯿﻨﺎت اﻟﮭﺎدﻓﺔ‬،‫ وﺑﻌﺒﺎرة أﺧﺮى‬.(‫اﻟﺨﺎص ﻋﻨﺪ اﺧﺘﯿﺎر أﻓﺮاد ﻣﻦ اﻟﺴﻜﺎن اﻟﻤﺴﺘﮭﺪﻓﺔ ﻟﻠﻤﺸﺎرﻛﺔ ﻓﻲ اﺳﺘﻄﻼﻋﺎﺗﮭﻢ‬
.‫ ﻓﻘﺪ ذھﺐ اﻟﺒﺎﺣﺜﻮن إﻟﻰ ھﺬا اﻟﻤﺠﺘﻤﻊ "ﻋﻦ ﻗﺼﺪ" ﻷﻧﮭﻢ ﯾﻌﺘﻘﺪون أن ھﺆﻻء اﻷﻓﺮاد ﯾﻨﺎﺳﺒﻮن ﺻﻮرة اﻷﺷﺨﺎص اﻟﺬﯾﻦ ﯾﺤﺘﺎﺟﻮن إﻟﻰ اﻟﻮﺻﻮل إﻟﯿﮭﻢ‬،‫ﯾﻮﺣﻲ اﻻﺳﻢ‬
2) Convenience sampling Like (Street intercept )
is a non-probability sampling technique where samples are selected from the target population only
because they are conveniently available to the researcher i.e. these are often individuals who are
geographically close to the researchers or have previously completed an online survey. Researchers choose
these samples just because they are easy to recruit (so it is based on accessibility), and the researcher did not
consider selecting a sample that represents the entire population (representative).
‫ أي أن ھﺆﻻء ﻏﺎﻟﺒًﺎ ﻣﺎ ﯾﻜﻮﻧﻮن أﻓﺮادًا‬،‫ھﻲ ﺗﻘﻨﯿﺔ أﺧﺬ اﻟﻌﯿﻨﺎت ﻏﯿﺮ اﻻﺣﺘﻤﺎﻟﯿﺔ ﺣﯿﺚ ﯾﺘﻢ اﺧﺘﯿﺎر اﻟﻌﯿﻨﺎت ﻣﻦ اﻟﺴﻜﺎن اﻟﻤﺴﺘﮭﺪﻓﯿﻦ ﻓﻘﻂ ﻷﻧﮭﺎ ﻣﺘﺎﺣﺔ ﺑﺴﮭﻮﻟﺔ ﻟﻠﺒﺎﺣﺚ‬
‫ وﻟﻢ ﯾﻔﻜﺮ اﻟﺒﺎﺣﺚ ﻓﻲ اﺧﺘﯿﺎر‬،‫ ﯾﺨﺘﺎر اﻟﺒﺎﺣﺜﻮن ھﺬه اﻟﻌﯿﻨﺎت ﻟﻤﺠﺮد ﺳﮭﻮﻟﺔ اﺳﺘﻘﺪاﻣﮭﺎ‬.‫ﻗﺮﯾﺒﯿﻦ ﺟﻐﺮاﻓﯿًﺎ ﻣﻦ اﻟﺒﺎﺣﺜﯿﻦ أو ﺳﺒﻖ ﻟﮭﻢ إﻛﻤﺎل اﺳﺘﻄﻼع ﻋﺒﺮ اﻹﻧﺘﺮﻧﺖ‬
.(‫ﻋﯿﻨﺔ ﺗﻤﺜﻞ ﻛﺎﻣﻞ اﻟﺴﻜﺎن )ﻣﻤﺜﻠﺔ‬
3) Quota Sampling:
is a non-probability sampling method that relies on the non-random selection of a predetermined number or
proportion of units. This is called a quota. You first divide the population into mutually exclusive subgroups
(called strata) and then recruit sample units from the strata until you reach your quota. These units share
specific characteristics, determined by you prior to forming your strata i.e. on researcher’s purpose
:‫أﺧﺬ اﻟﻌﯿﻨﺎت اﻟﺤﺼﺺ‬
‫ ﺗﻘﻮم‬.‫ وھﺬا ﻣﺎ ﯾﺴﻤﻰ اﻟﺤﺼﺔ‬.‫ھﻲ طﺮﯾﻘﺔ أﺧﺬ ﻋﯿﻨﺎت ﻏﯿﺮ اﺣﺘﻤﺎﻟﯿﺔ ﺗﻌﺘﻤﺪ ﻋﻠﻰ اﻻﺧﺘﯿﺎر ﻏﯿﺮ اﻟﻌﺸﻮاﺋﻲ ﻟﻌﺪد أو ﻧﺴﺒﺔ ﻣﺤﺪدة ﺳﻠﻔﺎ ﻣﻦ اﻟﻮﺣﺪات‬
‫أو‬
‫ ﺗﺸﺘﺮك ھﺬه اﻟﻮﺣﺪات ﻓﻲ ﺧﺼﺎﺋ‬.‫ﻟﺎ ً ﺑﺘﻘﺴﯿﻢ اﻟﺴﻜﺎن إﻟﻰ ﻣﺠﻤﻮﻋﺎت ﻓﺮﻋﯿﺔ ﻣﺘﺒﺎدﻟﺔ )ﺗﺴﻤﻰ اﻟﻄﺒﻘﺎت( ﺛﻢ ﺗﻘﻮم ﺑﺘﺠﻨﯿﺪ وﺣﺪات اﻟﻌﯿﻨﺔ ﻣﻦ اﻟﻄﺒﻘﺎت ﺣﺘﻰ ﺗﺼﻞ إﻟﻰ ﺣﺼﺘﻚ‬
‫‪4) Snowball sampling or chain-referral sampling:‬‬
‫‪Target‬‬
‫‪is a non-probability sampling method. It’s used when the population is hidden, unspecified, or‬‬
‫‪” or when the‬زي اﻹدﻣﺎن ﻣﺜﻼ“ ‪hard to reach. It’s particularly useful when studying sensitive topics‬‬
‫‪members of a population are difficult to locate.‬‬
‫أﺧﺬ اﻟﻌﯿﻨﺎت ﻛﺮة اﻟﺜﻠﺞ‪ ،‬واﻟﻤﻌﺮوف أﯾﻀًﺎ ﺑﺎﺳﻢ أﺧﺬ ﻋﯿﻨﺎت اﻹﺣﺎﻟﺔ اﻟﻤﺘﺴﻠﺴﻠﺔ‪ ،‬ھﻮ أﺳﻠﻮب أﺧﺬ ﻋﯿﻨﺎت ﻏﯿﺮ اﺣﺘﻤﺎﻟﻲ ﯾﺴﺘﺨﺪم ﻋﻨﺪﻣﺎ ﯾﻜﻮن اﻟﺴﻜﺎن‬
‫ﻣﺨﻔﯿﯿﻦ أو ﯾﺼﻌﺐ اﻟﻮﺻﻮل إﻟﯿﮭﻢ‪ .‬إﻧﮫ ﻣﻔﯿﺪ ﺑﺸﻜﻞ ﺧﺎص ﻋﻨﺪ دراﺳﺔ ﻣﻮﺿﻮﻋﺎت ﺣﺴﺎﺳﺔ أو ﻋﻨﺪﻣﺎ ﯾﺼﻌﺐ ﺗﺤﺪﯾﺪ ﻣﻮﻗﻊ أﻓﺮاد اﻟﻤﺠﺘﻤﻊ‪.‬‬

‫‪Example:‬‬

‫ﻣﺜﻼ ﻟﻮ أﻧﺖ ﻋﺎﯾﺰ ﺗﻌﻤﻞ دراﺳﺔ ﻋﻦ اﻹدﻣﺎن … ھﯿﺒﻘﻰ ﺻﻌﺐ إﻧﻚ ﺗﻮﺻﻞ ﻟﻠﻨﺎس اﻟﻤﺪﻣﻨﯿﻦ ﻷن أﻛﯿﺪ ﻣﺶ ھﺘﺮوح ﺗﺘﻌﺎﻣﻞ ﻣﻊ‬
‫ﺷﺨﺺ وﺗﺴﺄﻟﮫ إﻧﮫ ﻣﺪﻣﻦ وﻻ ﻷ وھﻮ ﺑﻜﻞ ﺑﺴﺎطﺔ ﯾﺠﺎوﺑﻚ "وده ﻷن اﻹدﻣﺎن ﻣﻮﺿﻮع ﺣﺴﺎس ﺟﺪا"‪ ....‬ﻟﻜﻦ اﻟﻠﻲ ھﻌﻤﻠﮫ ھﻮ إﻧﻲ‬
‫ھﺴﺘﺨﺪم ال ‪ snowball sampling technique‬ﻣﻦ ﺧﻼل إﻧﻲ ھﻮﺻﻞ ﻷول واﺣﺪ ﻣﺪﻣﻦ وھﺨﻠﯿﮫ ﯾﻮﺻﻠﻨﻲ ﺑﺼﺎﺣﺒﮫ اﻟﻤﺪﻣﻦ‬
‫وﺻﺎﺣﺒﮫ ده ﯾﻮﺻﻠﻨﻲ ﺑﺼﺎﺣﺒﮫ اﻟﻤﺪﻣﻦ ﺑﺮدو وھﻜﺬا ﺑﻘﻰ ﻟﺤﺪ ﻣﺎ أوﺻﻞ ﻟﻞ ‪ sample‬ﺑﺘﺎﻋﺘﻲ‬
6) Sampling
similar to C.I
Precision: Precision is the range within which the population average
(or other parameter) will lie in accordance with the reliability specified
in the confidence level. For example, if the estimate is LE 4000 and the
precision desired is ± 4%, then the true value will be no less than LE
3840 and no more than LE 4160. This is the range (LE 3840 to LE 4160)
within which the true answer should lie.
Higher precision requires lower margin of error and this will
be done by using large sample size and VICE VERSA
6) Sampling

Confidence level and significance level: The confidence level or


reliability is the expected percentage of times that the actual value will
fall within the stated precision limits. Thus, if we take a confidence
level of 95%, then we mean that there are 95 chances in 100 (or .95 in
1) that the sample results represent the true condition of the
population within a specified precision range against 5 chances in 100
(or .05 in 1) that it does not.

95% Confidence level


5% significance level
6) Sampling

Confidence level indicates the likelihood that the answer will fall
within that range,
and the significance level indicates the likelihood that the answer will
fall outside that range.
We can always remember that if the confidence level is 95%, then the
significance level will be (100 – 95) i.e., 5%; if the confidence level is
99%, the significance level is (100 – 99) i.e., 1%, and so on.
We should also remember that the area of normal curve within
precision limits for the specified confidence level constitute the
acceptance region and the area of the curve outside these limits in
either direction constitutes the rejection regions.
left tail or right tail or two tailed
NOTE:
The higher the CL the higher the Z, hence the higher the margin of error "e" ,hence
the wider the CI i.e. the less the precision
6) Sampling

The Sample Size


How large must the sample be?
There is no simple answer to this question.
It can be shown that the precision of the estimates of population
characteristics depends on the sample size. So, if very precise
estimates are required, a large sample must be selected. If one is
satisfied with less precision, a smaller sample may suffice. The
sample size can be computed once it is known how precise the
estimates must be. determine the precision of the estimates
then
determine the sample size n for this precision
6) Sampling
the distribution of the sample statistic like "x-bar " "proportion"
Sampling distribution: If we take certain number of samples and for
each sample compute various statistical measures such as mean,
proportion, etc., then we can find that each sample may give its own
value for the statistic under consideration. All such values of a
X-bar
particular statistic, say mean, together with their relative frequencies
will constitute the sampling distribution of the particular statistic.
Accordingly, we can have sampling distribution of mean, or the
sampling distribution of proportion. The sampling distribution tends
quite closer to the normal distribution if the number of samples is
large because the higher the n, the lower the e, hence the narrower the CI, and
the more precision the estimate is i.e. the closer the estimate is to the true
value of the parameter
ERRORS

SAMPLING ERRORS NON SAMPLING ERRORS


increasing the sample size n error not related to the sample.
will decrease the sampling error

SAMPLE SAMPLE AND CENSUS


ONLY
Sampling error will =0 (i.e. will not exist) when we take i.e. the non-sampling error will exist
all the target population instead of taking a sample whether we use a sample from the target
from it as the sampling error happens due to taking a pop in our research or we use the target
sample from the target population i.e. happens in the population itself >> because this error
samples ONLY type is not related to the sample
Errors

non-sampling Sampling
Systematic Random

Measurement
Coverage Non response Increase
and
error error sample size
happens due to the
processing to decrease
sampling frame this error
NON SAMPLING ERRORS

• Data specification inadequate & inconsistent


with respect to objective of census.
• Inaccurate or inappropriate methods of
interview, observation, definitions.
• Lack of trained & experienced investigators.
• Errors due to non response.
• Errors in data processing operations
• Errors committed during presentation.

MORE IN COMPLETE ENUMERATION


SURVEY
Types of Survey Errors

Coverage error
Excluded from
frame.
Non response error
Follow up on
Processing error non responses.

Measurement error Chance


differences from
sample to sample.

Bad Question!
2.1 Coverage Error remember the over-coverage and under-coverage
sampling frames.
Coverage errors consist of omissions, inclusions,
duplications and misclassifications of units in the
survey frame. Since they affect every estimate
produced by the survey, they are one of the most
important types of error; in the case of a census
they may be the main source of error. Coverage
errors may cause a bias in the estimates.
2.2 Measurement Error
Measurement error is the difference between the recorded response to a
question and the ‘true’ value. It can be caused by the respondent, the
interviewer, the questionnaire, the data collection method or the
measuring tool.
One of the main causes of measurement error is misunderstanding on the
part of the respondent or interviewer.
Misunderstanding may result from: -
- the lack of clarity of the concepts (i.e., use of non-standard concepts);
- poorly worded questions;
- inadequate interviewer training;
- false information given;
- a language barrier;
- poor translation (when several languages are used).
2.2 Measurement Error
The method of data collection can also affect measurement error.
For example, interviewer-assisted methods (using well-trained
interviewers) can result in smaller measurement error than
self-enumeration methods, where the respondent has no assistance in
completing the questionnaire.
In direct measurement surveys, the interviewer collects data through
observation or by taking measurements (e.g. pricing surveys). Here,
measurement error may be due to the interviewer or the measurement
tool. For example, in a survey of people’s weight, if the measuring scale is
not properly calibrated, the weights will not be correctly measured.
2.2 Measurement Error
If the measurement errors are systematically skewed to reflect certain
values or categories, a bias will be introduced and the survey estimates will
be misleading.

For example, systematic error would occur if an interviewer were


instructed to measure the height of school children and took these
measurements while the children had their shoes on, in which case all
heights would be systematically overestimated.
Note that measurement error is sometimes referred to as response error
and the terms are often used interchangeably.
2.3 Nonresponse Error
There are two types of nonresponse:
• item (or partial) nonresponse and total nonresponse. Item
nonresponse occurs when information is provided for only some
items, Such as when the respondent responds to only part of the
questionnaire.
• Item nonresponse may occur when the respondent does not know the
answer to a question, refuses to answer a question, forgets to answer
or follows the wrong flow through the questionnaire. Sometimes the
respondent cannot provide the answer due to an illness or language
difficulties. A poorly designed questionnaire can also lead to item
nonresponse.
• Total nonresponse occurs when all or almost all data for a sampling unit
are missing. Nonresponse can create several problems in a survey. The
main problem is that non-respondents often have different
characteristics from respondents, which can result in biased survey
estimates if nonresponse is not corrected for properly. For example, in a
literacy survey, if most non-respondents are illiterate, this could bias the
survey results. If the nonresponse rate is high, bias can be so severe as
to invalidate the survey results.
• The second problem with total nonresponse is that it reduces the
effective size of the sample, since fewer units than expected answered
the survey. As a result, the sampling variance increases and the
precision of the estimates decreases. If the response rate can be
predicted in advance, the initial sample size should be inflated
accordingly.

Precision will
Total nonresponse may occur for reasons such as
• no one being at home, or a refusal or the inability of the selected
person to participate in the survey.
• A poor explanation as to the purpose of the survey or its intended use
can also result in nonresponse.
• Poor or out-of-date frame data is another factor: the identification data
of the survey unit might be inadequate to locate it.
Partial nonresponse may occur for reasons such as
The concepts presented to the respondent on the questionnaire or during
the interview might be difficult to understand or be poorly defined.
The interview might be too long or contain an illogical flow of questions.
As a result, respondents may get discouraged and stop answering before
the end of the interview or they might simply follow the wrong flow of
questions.
Interviewers can also contribute to total or item nonresponse. Poor
interviewing techniques prevent some interviewers from establishing a
good rapport with the respondent who might as a result refuse to
participate or, if they agree to participate, quickly lose interest in the
survey. Some interviewers introduce item nonresponse errors because
they do not follow instructions or do not read the questions as worded.
Finally, data collection procedures can be a source of nonresponse.
Nonresponse is often followed-up by interviewers in order to obtain some
responses (e.g. convert a refusal). Inadequate follow-up of
non-respondents, or follow-up at the wrong time can prevent nonresponse
from being corrected. Data lost from a computer file or a lost
questionnaire can also result in ‘nonresponse’ errors.
Improving Response Rates
Methods of Improving
Response Rates

Reducing Reducing
Refusals Not-at-Homes

Prior Motivating Incentives Questionnaire Follow-Up Other


Notification Respondents Design Facilitators
and
Administration

Callbacks
2.4 Processing Error
Processing transforms survey responses obtained during collection into a
form that is suitable for tabulation and data analysis. It includes all data
handling activities after collection and prior to estimation.
Since it is a mixture of automated and manual activities, it is
time-consuming, resource intensive and potentially a source of errors. For
example, processing errors can occur during data coding, data capture,
editing or imputation. Like all other errors, they can be random in nature,
and inflate the variance of the survey’s estimates, or systematic, and
introduce bias.
Coding is the process of assigning a numerical value to responses to
facilitate data capture & processing in general.
For closed questions (questions with pre-determined response categories),
codes are often assigned before interviewing takes place.
For open questions (where the respondent provides the answer in his or her
own words), coding may be either manual or automated.
The quality of the coding depends on the completeness and quality of the
response to the open question, and the way in which the answer is coded.
Manual coding of open questions requires interpretation and judgement and
hence is subject to error. Two different coders could code the same answer
differently. Inexperienced and poorly trained coders are particularly prone to
making coding errors. In an automated coding operation, errors can result
from an error in the program or because the program may not properly take
into account all information that is available.
Data capture is the transformation of responses into a machine-readable
format. Data capture errors result when the data are not entered into the
computer exactly as they appear on the questionnaire. This can be caused
by the lack of clarity in the answer provided. The physical layout of the
questionnaire itself or the coding documents can cause data capture
errors. The method of data capture may also result in errors (data capture
may be a manual operation or it may be automated, for example, using an
optical scanner).
Editing is the application of checks to identify missing, invalid or
inconsistent entries that point to data records that are potentially in error.
Imputation is a process used to determine and assign replacement values
to resolve problems of missing, invalid or inconsistent data.
Errors arising from editing and imputation often occur together since these
two processes are very closely linked.
Editing and imputation errors can be caused by the poor quality of the
original data or by its complex structure.
Evaluating Survey Worthiness
What is the purpose of the survey?

Is the survey based on a probability sample?

Coverage error – appropriate frame

Non-response error – follow up

Measurement error – good questions elicit good


responses

Sampling error – always exists


Unreliable polls: example 1
A poll in a shopping mall:
A radio station conducted a radio-listening poll. To quickly collect a lot
of data, it was decided to send interviewers to a local shopping mall on
Saturday afternoon and it was not too difficult to get a large number of
completed questionnaires.
Analysis of the collected data led to a surprising conclusion: almost no
one listened to the sports program that was broadcasted every Saturday
afternoon. 😒😒😒

You might also like