SAMPLING DISTRIBUTION CHAPTER ONE 1. SAMPLING THEORY AND SAMPLING DISTRIBUTION 1.1 Sampling Theory Introduction • Usually the population under study is very large or infinite which makes studding it very difficult or impossible. • Under such circumstances we take a sample or a subset of the population to study the population. • As we know Statistics is a science of inference. It is the science of making general conclusion about the entire group (the population) based on information obtained from a small group or sample. 1.1.1 Basic Definitions: Sampling Verses Census In collecting information from the population, there are two alternative approaches: census and sampling. Census refers to an approach by which every element of the population will be included under the investigation so as to get information. Sampling refers to an approach taking only part of the population and using the information gained from the part to make judgment about the population. Sampling is a process what we all are essentially involved in doing in our day-to-day activity consciously or unconsciously. Cont’d A housewife makes judgment about the quality of wheat she finally buys for domestic use by examining only a handful of wheat grains taken from the offered for sale. A production manager does the same when he closely examines only a few items of raw materials to be sure of its quality before placing the bulk purchase order. A doctor examines a few drops of blood as sample and draws conclusion about the blood constitution of the whole body. The ultimate purpose of any sampling process is to make judgment about the totality, the whole by examining the part. Population is the totality about which judgment is to be made Cont’d Sample is part of the totality, which is used to make the judgment. Sampling in effect is consists of selecting a sample and using the sample data to gain knowledge about a given population parameters. Investigators need to choose either of these two alternatives based on their merits. Census approach is by far more reliable than that of sampling as far as the validity and accuracy of the data to be collected concerns. More often investigators prefer sampling to census especially when the population size is very large. 1.1.2 The need for Samples Major reasons why sampling is necessary. 1.Feasibility aspect: in some circumstance, it may not be possible to include the whole elements of the population under the investigation. This holds true in the case of infinite population, those population that are otherwise on constant movement or very large. e.g. population of insects ,fish, birds and the like. The same objective can be achieved through the use of samples. 2. Destructive aspect: • In some cases, the investigation may involve some kind of test that may result in the damage of items included. • In such cases, using census would not only be costly but also self-defeating. • For, e.g. supposes that, the quality control manager of kalit is contemplating of conducting a test on the factory’s products. • So as to determine this, products will be exposed to increased heat until they start melting. • In this regard, including every element of the product will ultimately result in the damage of the whole products. 3. Efficiency aspect: In some cases, information might be needed with in a prescribed time limit. In such cases, to contact the whole population through census would be time consuming that it may lead to the dalliance of the needed information. Hence, to deliver required information in accordance with the needed period of time, sampling is more preferable than that of census as it takes much less time as compared to census. 4. Cost aspect: • Obviously as it accommodates each element of the population, census is often more costly than that of sampling. • This is so especially when the population is very large or the cost of inspection per unit is very large. • In such circumstances, the cost of census will often be far exceeded from the budgetary limit that it prohibits complete enumeration and in some cases the cost of studying the population may be far greater than the value that the research will have. 5. Manageability: •All survey results, whether based on sample or census, are subject to some kind of error. •These errors might be emanated from poor planning, inefficient execution and lack of desired control and coordination over the survey staff. •Owing to the out sizedness of elements included, census results are more vulnerable to these errors than sample survey results. 1.1.3 Sampling and Non-Sampling Errors Sampling error is the difference between a sample statistics and the population parameter. It is related with the sampling technique and approaches Cont’d Non-sampling error is related with administering the survey; incorrect enumeration of population members, non random selection of samples, use of incomplete, vague or faulty questionnaire for data collection, wrong editing, coding, and presenting of responses received through the questionnaire. Cont’d • Population mean = sum of elements in a population Number of elements • Sample mean = sum of elements in the sample Number of sample elements • When a sample is chosen, we find that the difference between a sample statistics and the population parameter consist sampling and non sampling errors. • Example: Suppose that you have a population consisting of 5 households, A, B, C, D and E. Their monthly income is as follows: Households A B C D E MI (birr) 10000 7400 12000 20000 5600 Assuming that you take sample of 3 households, A, C and D, find sampling and non sampling errors Solution: • MI of the ppn = 10000 + 7400 + 12000 + 20000 +5600 5 55000/5 = 11000 • MI of 3 Hds = 10000 + 12000 + 20000 = 42000 = 14000 3 3 • Thus, the difference between sample mean and population mean = 14000 – 11000 = 3000 • Now suppose the income of household, C has wrongly been recorded as birr 15000. The sample mean = 10000 + 15000 + 20000 = 45000 = 15000 3 3 • This means that the difference between sample mean and population mean is not due to the sampling error alone. There is also a non sampling error i.e. you may split the error in to two components: 1. Sampling error = 14000 – 11000 = 3000 2. Non sampling error = (15000 – 11000) – sampling error (3000) = 1000
Total error = birr 4000 (sampling error + non sampling error)
1.1.4 Types of samples Probability (Random) o Each item or person in the population being studied has a known (nonzero) likelihood of being included in the sample. Non-probability (non random) o The sample is selected based on contingency and judgment. I Random/ Probability/ Sampling Methods Every member of the population must have a known nonzero chance of being selected. a) Simple Random Sampling - Every member of the population has an equal and independent chance of being selected. Simple random sampling without replacement: Elements can enter the sample only once (ie) the units once selected is not returned to the population before the next draw. Simple random sampling with replacement: The population units may enter the sample more than once. Methods of selection of a simple random sampling: Lottery method- we can easily list the identification of all items, place their identification numbered card in a bowl, mix & then select the needed cards (samples). This method is time consuming and awkward. Cont’d Example: suppose a population consists of 400 employees in small industries. A sample of 50 employees is to be selected from that population. The way is: first write the name of each one on small slip of paper, deposit all of the slips in a box, thoroughly mixed, the first selection is made by drawing a slip out of the box without looking at it. Table of random – More convenient method of selecting a random sample. The way is: first to give identification for all elements in the population. Select a starting number by closing your eyes Cont’d placing your finger on a numbering the table at random, then proceed downward until you have selected the needed sample. Example: In an area there are 500 families. Using the following extract from a table of random numbers select a sample of 15 families to find out the standard of living of those families in that area. 4652 3819 8431 2150 2352 2472 0043 3488 9031 7617 1220 4129 7148 1943 4890 1749 2030 2327 7353 6007 9410 9179 2722 8445 0641 1489 0828 0385 8488 0422 7209 4950 Cot’d Solution: • In the above random number table we can start from any row or column and read three digit numbers continuously row- wise or column wise. • Now we start from the third row, the numbers are: 203 023 277 353 600 794 109 179 272 284 450 641 148 908 280 • Since some numbers are greater than 500, we subtract 500 from those numbers and we rewrite the selected numbers as follows: 203 023 277 353 100 294 109 179 272 284 450 141 148 408 280 b) Systematic Random Sampling • The items or individuals of the population are arranged in some way (alphabetical) or some other method / order/, and selecting every kth (third or fifth or tenth, etc.) number from the population to be included in the sample. • This is done after the first number is selected at random. The value K is called the sampling interval. k (sample interval) = N/n • For example: suppose a sample size of 50 is desired from a population consisting of 1000 accounts receivable. The sampling interval (K) is N/n = 1000/50 =20. Thus, a sample of 50 accounts is identified by moving systematically through the population and identifying every 20th account after the first randomly selected account number. c) Stratified Random Sample • This method is useful when the population consists of a number of heterogonous subpopulations and the members within a given subpopulation are relatively homogenous compared to the population as a whole. • Thus, population is first divided into subgroups called strata, and a sample is selected from each stratum. • E.g. we can divide a human population in to different strata (subgroups) on the basis of age, sex, occupations, education, region, religion etc. Types of Stratified Sampling: • There are two types of stratified sampling. They are proportional and non-proportional. • In the proportional sampling proportionate representation is given to subgroups or strata. If the number of items is large, the sample will have a higher size and vice versa. • The population size is denoted by N and the sample size is denoted by ‘ n’ the sample size is allocated to each stratum in such a way that the sample fractions is a constant for each stratum. • That is given by n/N = c. So in this method each stratum is represented according to its size. • In non-proportionate sample, equal representation is given to all the sub-strata regardless of their existence in the population. Example: • A sample of 50 students is to be drawn from a population consisting of 500 students belonging to two institutions A and B. The number of students in the institution A is 200 and the institution B is 300. How will you draw the sample using proportional allocation? Solution: • There are two strata in this case with sizes N1 = 200 and N2 = 300 and the total population N = N1 + N2 = 500 • The sample size is 50. • If n1 and n2 are the sample sizes, • n1 = n × N1/N = 50×200/500 = 20 • n2 = n × N2/N = 50×300/500 = 30 • The sample sizes are 20 from A and 30 from B. Then the units from each institution are to be selected by simple random sampling. d) Cluster Sampling • It is dividing the population in to small units. These units are called primary units. • Then select at random certain groups or clusters. After cluster has been selected all of the elements in each cluster are included in the sample. • The remaining clusters are ignored. Precision will suffer if the ignored clusters are not similar to the sampled clusters. • It works best for homogeneous population. This technique is often employed to reduce cost of sampling a population scattered over a large geographic area. II Non-random (Non-probability) Sampling Methods • A random sample is ideal for statistical analysis but non random sampling methods have been also advised when random sampling is not feasible. i) Quota sampling • With random sampling, the investigator plays no part in the selection of respondents; he/she merely has a list with names and addresses of respondents. • But, in quota sampling the selection of respondents lies with the investigator, although in making such selection he/she must ensure that each respondent satisfies certain criteria which are essential for the study. ii) Judgment/ purposive sampling • Judgment sampling method can also be called as sampling by opinion. • In this procedure, the investigator selects units of the sample that he/she feels are most representative of the population with respect to the population characteristics under study. • For example, a teacher may select two or three students from his class, Judging that these students would reflect the general opinion of all students in the class on certain issues. iii) Convenience sampling • In this procedure, units to be included in the sample are selected at convenience of the investigator rather than by any pre-specified or known probabilities of being selected. • Convenience samples are easy for collecting data on a particular issue. • The results obtained by convenience sampling method can hardly be said to be representative of the population parameters. Therefore, the results obtained are generally biased and unsatisfactory. • However, convenient sampling approach is generally used for making pilot studies, particularly for testing a questionnaire and to obtain preliminary information about the population. 1.2 Sampling Distribution To arrive at the statistical inference, samples of a given size are drawn repeatedly from the population and statistic computed for each sample. The computed value of a particular statistic will be differing from sample to sample. Theoretically or by observation it would be possible to construct a frequency table showing the value assumed by the statistic and frequency of occurrence. This distribution of values of statistic is called sampling distribution, because the values are the outcome of a process of sampling. A sampling distribution is a probability distribution consisting of all possible values of a sample statistics. Cont’d In a population containing N observations, the number of samples each containing n observations = NCn if the samples is done without replacement. But if the sample is done with replacement the number of samples is equal to Nn Two important terms in sampling distribution: Population parameter – A numerical measure of a population; population mean ( ) population variance ( ( 2) population standard deviation () ( population proportion (p) etc. Sample statistics / Statistic/ - A numerical measure of the sample; Sample mean (x) sample variance (S2) sample standard deviation (S) sample proportion ( p) etc. 1.2.1 Sampling Distribution of the means (x) A sampling distribution of sample means is a distribution using the means computed from all possible random samples of a specific size taken from a population. Sampling error is the difference between the sample measure and the corresponding population measure due to the fact that the sample is not a perfect representation of the population. Properties of the Distribution of Sample Means 1. The mean of the sample means will be the same as the population mean. 2. The standard deviation of the sample means will be smaller than the standard deviation of the population Cont’d 3.The sampling distribution of sample mean values from normally distributed populations is the normal distribution for samples of all sizes. If the sampling distribution of x is normal, the standard error of the mean, x can be used in conjunction with normal distribution to determine the probabilities of various values of sample mean. For this purpose the value of sample mean is first converted in to a Z value on the standard normal distribution to know how any single mean value deviates from the mean of sample mean values Z value is actually the number of standard deviations that a particular X value is away from the mean. The variable Z is also called Standardized normal random variable. Cont’d Example1: The average IQ scores of students in a school for gifted children is 165 with standard deviation of 27. A random sample of 36 students is taken. what is the probability that: a) the sample mean will be greater 170 b) the sample mean will be less than 158 c) the sample mean will be between 155 and 160 d) the sample mean will be either less than 170 or more than 175 Example2: The mean length of life of a certain cutting tool is 41.5 hrs with a standard deviation of 2.5hrs. What is the probability that a simple random sample of size 50 drawn from this population will have a mean between 40.5hrs and 42hrs? Cont’d Example 3: The annual wages of all employees of a company has a mean of 20,400 per year with standard deviation of 3200. The personnel manager is going to take a random sample of 36 employees. What is the probability that the sample mean will exceed 21,000? Example 4: A company makes engine used in speedboats. The company’s engineers believe that the engine delivers an average power of 220 horse power / HP/ and that the standard deviation of power delivered is 15 HP. A potential buyer intends to sample 100 engines (each engine to be run a single time). What is the probability that the sample mean will be less than 217 HP. 1.2.2 Sampling distribution of sample proportion There are many situations in which each individual member of the population can be classified in to two mutually exclusive categories such as success/ failure, accept/reject, head/tail of a coin etc. In such cases the sample proportion p having the characteristics of interest is the best statistics to use for statistical inferences about the population proportion parameters, p. Example 1: 15% of people in small community of sands point have type B blood. A random sample of 500 persons is selected. What is the probability that the sample proportion of people with blood type B is: a) more than 17.5% b) less than 14% c) between 16% and 18% Cont’d Example 2: The quality control department of a paints manufacturing company, at the time of dispatch of decorative paints, discovered that 30 percent of the containers are defective. If a random sample of 500 containers is drawn with replacement from the population, what is the probability that the sample proportion will be less than or equal to 25 percent defective? Example 3: A manufacturer of screws has found that on an average 0.04 of the screw produced are defective. A random sample of 400 screws is examined for the proportion of defective screws. Find the probability that the proportion of defective screws in the sample is between 0.02 and 0.05. 1.2.3 Sampling distribution of the difference between two means Sampling distribution of sample mean can also be used to compare a population of size N1 having mean 1 and standard deviation 1 with another similar type of population of size N2 having mean 2 and standard deviation 2. n1,n2 = independent random samples drawn from 1st and 2nd population respectively. The sampling distribution of x1 - x2 is the probability distribution of all possible values the random variable x1 - x2 may take when independent samples of size n1 and n2 are taken from two specified populations. Cont’d When sampling is done from two populations with means μ1 and μ2 and standard deviations σ1 and σ2 respectively, the sampling distribution of the difference of sample means x1 - x2 approaches to a normal distribution with mean μ1 - μ2 and standard deviation as the sample sizes n1 and n2 increases.
Example 1: Car manufacturer A have a mean of lifetime of
1400 hours with a standard deviation of 200 hours while those of manufacturer B have a mean life time of 1200 hours with a standard deviation of 100 hours. If random samples of 125 of each manufacturer are tested, what is the probability that manufacture A will have a mean lifetime which is at least (a) 160 hours more than manufacturer B. (b) 250 hours more than manufacturer B. Cont’d Example 2: The makers of Duracell batteries claims that the size AA battery lasts on an average of 45 minutes longer than Duracell’s main competitor, the Energizer. Two independent random samples of 100 batteries of each kind are selected. Assuming σ1 = 84 minutes and σ2 = 67 minutes, find the probability that the difference in the average lives of Duracell and Energizer batteries based on samples does not exceed 54 minutes. Reading Assignment 1.2.4 sampling distribution of the difference between two proportions • Let us assume we have two binomial populations labeled as 1 and 2. So that – p1 and p2 denote the two population proportions – n1 and n2 denote the two sample sizes – p1 and p2 denote the two sample proportions The sampling distribution of p1 - p2 is the probability distribution of all possible values the random variable p 1 - p2 may take when independent samples of size n1 and n2 are taken from two specified binomial populations.