You are on page 1of 6

DEFINITIONS OF BASIC CONCEPTS OF SAMPLE SURVEYS1

CONTENTS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. INTRODUCTION................................................................................................................................... 1 POPULATION ....................................................................................................................................... 1 TARGET POPULATION........................................................................................................................ 2 SURVEY POPULATION ....................................................................................................................... 2 CENSUSES ............................................................................................................................................ 3 SAMPLING ............................................................................................................................................ 3 SAMPLING UNIT .................................................................................................................................. 3 SAMPLING FRAME .............................................................................................................................. 3 PARAMETER ........................................................................................................................................ 3 STATISTIC ........................................................................................................................................ 3 ESTIMATOR/ESTIMATE ................................................................................................................. 4 UNBIASED ESTIMATOR ................................................................................................................. 4 ACCURACY...................................................................................................................................... 4 PRECISION ....................................................................................................................................... 4 EFFICIENCY or RELATIVE EFFICIENCY ...................................................................................... 4 SAMPLING FRACTION AND WEIGHT .......................................................................................... 5 STANDARD ERROR ........................................................................................................................ 5 SAMPLING AND NON-SAMPLIN ERRORS ................................................................................... 5 COEFFICIENT OF VARIATION or RELATIVE ERROR.................................................................. 6 MARGIN OF ERROR ........................................................................................................................ 6

1. INTRODUCTION In this lecture the basic concepts and definitions of the sampling theory and estimation are discussed. A thorough grasp of these ideas is necessary to facilitate a clear understanding of the survey methods considered in the subsequent lectures. 2. POPULATION The population is the group of people or entities to which findings are to be generalized. The population must be defined explicitly before a sample is taken. This group is defined by whatever question is being asked i.e. the objective of the study. Example 1: Do UB students get stipend enrolled during 2011-12? How many populations are of interest? One What is the population of interest? All current UB students Example 2: Is the IQ of female students the same as the IQ of male students in UB?

Prepared by Dr. V. K. Dwivedi, Department of Statistics, UB for the STA 354 Course

-1-

How many populations are of interest? Two What are the populations of interest? All female students and all male students It is essential to define the population in terms of: Content refers to the definition of the type and characteristics of the elements which comprise the population; e.g. list of establishment, list of households, Extent refers to geographic boundaries as they relate to coverage; and e.g. list of establishments, list of Households2 in Gaborone. Time would refer to the time period to which the population refers. e.g. list of establishments, list households in Gaborone in 2001 Census.

Remark: It is to be noted that the term population is used in statistical sense, i.e. to denote the aggregate of units to which the survey results are to apply. It need not, though of course it often does, refer to a population of human beings. An alternative term is universe. 3. TARGET POPULATION The target population is the population (of elements of course), which we want to investigate by means of a sample survey (i.e. population which required to meet the survey objectives). For example; population 12 years and above, children below 5 years etc. 4. SURVEY POPULATION The survey/study population is the populations actually covered by the survey, or better still, the population we have access to by means of the sampling frame. Ideally the target and survey population will be the same but for practical reasons they may not be identical. The survey population is usually a sub-set of the target population. Remark: When the target and survey populations are not identical then the results of the sample should be generalized to the survey population. Generalizing the sample results to the target population may only be done with some degree of caution. Examples 1. Many national surveys in the Botswana would ideally exclude hospitals, hotels, prisons, army barracks and other institutions. However, the severe problems involved in collecting responses from such persons frequently lead to their exclusion from target population. The advantage of starting with the ideal target population is that the exclusions are explicitly identified, thus enabling the magnitude and consequences of the restrictions to be assessed. 2. In HIES survey, one of the main objectives was to measure average income and expenditures of all households in Botswana (Target population) but the survey population comprised of those households residing in private dwellings. Thus in this instance

Household: A household consists of one or more persons, related or unrelated, living together "under the same roof" in the same lolwapa, eating together "from the same pot" and/or making common provision for food and other living arrangements.
2

-2-

generalizing the sample results to all households in Botswana should be done with some degree of caution. 5. CENSUSES Censuses are collections of data from every person or entity in the population. 6. SAMPLING Sampling is the selection of a part (sample) from a population, observing some characteristics of interest and then drawing some conclusions (inference) about the parent population. Statements based on samples being always probability statement, it is therefore important to know the underlying principles and the limitations of such results. The study of the methods of collecting and analyzing data through samples is termed sampling theory. In summary, the main objective of sampling is to estimate certain population parameters (mean, total, proportion) using statistics derived through sample. 7. SAMPLING UNIT The sampling unit refers to any potential member of the sample at the appropriate stages of selection. It is important that the sampling units are clearly defined since their nature may affect the usefulness of different sampling methods. For example (i) in a single stage sampling on housing needs, houses may be used as sampling units, but in making such a choice it is important that a complete list (sampling frame) of houses from which to draw the sample exists. (ii) With two stage design PSU (Enumeration Area3) comprise SSU (Household) i.e. sampling unit depends on the selection stage. 8. SAMPLING FRAME A complete list of sampling units which represent the population to be covered is called the sampling frame. 9. PARAMETER The population parameter is the summary value of the characteristics (variables/attributes) of the population one is trying to estimate using the sample. In the HIES, for instance, we measure household income and thus the mean household income for all households in Botswana is a parameter. 10. STATISTIC Any function of sample values is called statistic. For example, the mean household income calculated from the sample is a statistic. In general we use statistic to estimate population parameter.

Enumeration Areas: An Enumeration Area (EA) is the smallest geographic unit, which represents an average workload for an enumerator over a specified period. The average size of an EA is approximately 120-150 malwapa. An EA may be a whole locality (this is the case of a small village which is an EA by itself), a part of a locality (this is the case of a bigger village which has been divided into more than one EA) or a group of localities (this is the case of cattle posts, lands areas or freehold farms).
3

-3-

11. ESTIMATOR/ESTIMATE The estimator is the method (e.g. sample mean) of estimating the population parameter. An estimator is a random variate and may take different values from sample to sample. The value of the estimator obtained from any particular sample is called the estimate. 12. UNBIASED ESTIMATOR An estimator is said to be unbiased if for example the mean of the estimate derived from all possible samples equal the population parameter i.e. population mean. 13. ACCURACY The accuracy of a sample estimate refers to its closeness to the correct population value (parameter) i.e. the size of the deviation from the true mean m .
( m - y) = 0 accurate ( m - y) 0 inaccurate

Remark: But since the population value is not usually known, the accuracy of a sample estimate can not usually be assessed. For this reason we usually talk of precision of the estimate. 14. PRECISION The precision of an estimate refers to the probable accuracy of the estimate. The probable accuracy is measured by the standard error of the estimate. All other things being equal with a lower variation is more precise than one with greater variation. 1 Pr ecision of estimate v (estimate) 15. EFFICIENCY or RELATIVE EFFICIENCY If for a given sample size one unbiased estimator has a lower variation than another, we say it is more efficient. When comparing the efficiency of sample designs for fixed sample sizes and outlay of resources, we compare the variances of the estimator. The more efficient of the two is the one with lower variance. Define V1 = Variance of the complex design = S12 / n 2 V2 = Variance of the SRS for the same sample size = S 2 /n Thus, efficiency of complex design with respect to SRS of the same size is 2 2 / n S2 Pr ecision of complex design V2 S 2 Efficiency ( E ) = = = 2 = 2 Pr ecision of SRS V1 S1 / n S1 There would be three cases: (i) if E = 1 means both the designs are equally efficient. (ii) if E < 1 means complex design is less efficient than SRS, and (iii) if E > 1 means complex design is more efficient than SRS. In general efficiency is presented in percentage. Percent gain in efficiency of complex design over SRS is

-4-

V Percent gain in Efficiency (GE ) = 2 - 1 x 100 V1


16. SAMPLING FRACTION AND WEIGHT The ratio of the size of the sample (n ) to that of the population (N) is called the sampling n fraction and is denoted by the letter f that is f = . The inverse of this quantity, that is N 1 N w= = sometimes called the expansion or raising or weighting factor. It is the factor by f n which the sample results are expanded or raised to derive estimates of population total. One other use of the sampling fraction is in the finite population correction symbolized by (1 - f ). 17. STANDARD ERROR Standard error is the measure of variability between all possible samples. It is the square root of the variance of the mean squared deviation around the mean.

SE ( y ) = Var ( y ) . Standard error plays numerous roles in sampling theory viz. measuring the sampling error, confidence interval, sample size, etc.
18. SAMPLING AND NON-SAMPLIN ERRORS The errors involved in collection, processing and analysis of the data in a survey may be classified as: (i) Sampling error, and (ii) Non-sampling error SAMPLING ERROR The error which arises due to only a sample being used to estimate the population parameter is termed sampling error or sampling fluctuation. Whatever the degree of cautiousness in selecting a sample, there will always be a difference the population value (parameter) and its corresponding estimate. It is evaluated statistically. It is measured in terms standard error for a particular statistics (mean, total, proportion etc.) This error can be reduced by increasing the size of the sample. In fact the decrease in sampling error is inversely proportional to the square root of the sample size. 1 Sampling error sample size The relationship can be examined graphically as shown below.
Sampling error

Sample size

-5-

Remark: When sample survey becomes a census (complete enumeration), the sampling error becomes zero. NON-SAMPLING ERROR Besides sampling error, the sample estimate may be subject to other errors, grouped together, are termed non-sampling error. The main sources of non-sampling errors are: i. ii. iii. Failure to measure some of the units in the selected sample; Observational errors Errors introduced in editing, coding tabulating the results.

Remark: The non-sampling error is likely to increase with increase in sample size, while sampling error decreases with increase in sample size. 19. COEFFICIENT OF VARIATION or RELATIVE ERROR The coefficient of variation is defined as 100 times the coefficient of dispersion based upon standard deviation is called coefficient of variation (CV), , i.e. the CV is the percentage variation in mean, while standard deviation y being considered as the total variation in the mean. It is a good statistic for comparing the variability of two series. The series having greater CV is said to be more variable than the other. It is also of interest to note that the population coefficient of variation (CV) is usually fairly stable overtime and over characteristics of similar nature. This stability of CV makes it possible to determine the sample size for estimating a parameter with a specified margin of error. This indicates one of the many importances of coefficient of variation. 20. MARGIN OF ERROR The margin of error is a common summary of sampling error, which quantifies about the uncertainty about a survey result. The margin of error can be interpreted by making use of ideas from the laws of probability. Example: A researcher wishes to estimate the percentage of people belonging to blood group O in a particular region and want to know the size of the sample required to conduct a small sample survey using SRS. The next question comes that how accurately the researcher wishes to know the percentage of people with blood group O. The researcher will be content if the percentage is correct within 5% (margin of error) in the sense that if the sample shows 45% to have blood group O, the percentage for the region is to sure lie between 40 and 50.
CV = 100 x

-6-