This action might not be possible to undo. Are you sure you want to continue?
Researchers usually cannot make direct observations of every individual in the population they are studying. Instead, they collect data from a subset of individuals – a sample – and use those observations to make inferences about the entire population. Ideally, the sample corresponds to the larger population on the characteristic(s) of interest. In that case, the researcher's conclusions from the sample are probably applicable to the entire population. This type of correspondence between the sample and the larger population is most important when a researcher wants to know what proportion of the population has a certain characteristic – like a particular opinion or a demographic feature. Public opinion polls that try to describe the percentage of the population that plans to vote for a particular candidate, for example, require a sample that is highly representative of the population.
Probability samples and convenience samples
Two general approaches to sampling are used in research. With probability sampling, all elements (e.g., persons, households) in the population have some opportunity of being included in the sample, and the mathematical probability that any one of them will be selected can be calculated. With nonprobability sampling, in contrast, population elements are selected on the basis of their availability (e.g., because they volunteered) or because of the researcher's personal judgment that they are representative. The consequence is that an unknown portion of the population is excluded (e.g., those who did not volunteer). One of the most common types of nonprobability sample is called a convenience sample – not because such samples are necessarily easy to recruit, but because the researcher uses whatever individuals are available rather than selecting from the entire population. Because some members of the population have no chance of being sampled, the extent to which a convenience sample – regardless of its size – actually represents the entire population cannot be known. Recruiting a probability sample is not always a priority for researchers. A scientist can demonstrate that a particular trait occurs in a population by documenting a single instance. For example, the assertion that all lesbians are mentally ill can be refuted by documenting the existence of even one lesbian who is free from
psychopathology. Another situation in which a probability sample is not necessary is when a researcher wishes to describe a particular group in an exploratory way. For example, interviewing 25 people with AIDS (PWAs) about their experiences with HIV could provide valuable insights about stress and coping, even though it would not yield data about the proportion of PWAs in the general population who share those experiences.
Types of probability samples
Many strategies can be used to create a probability sample. Each starts with a sampling frame, which can be thought of as a list of all elements in the population of interest (e.g., names of individuals, telephone numbers, house addresses, census tracts). The sampling frame operationally defines the target population from which the sample is drawn and to which the sample data will be generalized. Probably the most familiar type of probability sample is the simple random sample, for which all elements in the sampling frame have an equal chance of selection, and sampling is done in a single stage with each element selected independently (rather than, for example, in clusters). Somewhat more common than simple random samples are systematic samples, which are drawn by starting at a randomly selected element in the sampling frame and then taking every nth element (e.g., starting at a random location in a telephone book and then taking every 100th name). In yet another approach, cluster sampling, a researcher selects the sample in stages, first selecting groups of elements, or clusters (e.g., city blocks, census tracts, schools), and then selecting individual elements from each cluster (e.g., randomly or by systematic sampling). An Example Suppose some researchers want to find out which of two mayoral candidates is favored by voters. Obtaining a probability sample would involve defining the target population (in this case, all eligible voters in the city) and using one of many available procedures for selecting a relatively small number (probably fewer than 1,000) of those people for interviewing. For example, the researchers might create a systematic sample by obtaining the voter registration roster, starting at a randomly selected name, and contacting every 500th person thereafter. Or, in a more sophisticated procedure, the researchers might use a
computer to randomly select telephone numbers from all of those in use in the city, and then interview a registered voter at each telephone number. (This procedure would yield a sample that represents only those people who have a telephone.) Several procedures would also be available for recruiting a convenience sample, but none of them would include the entire population as potential respondents. For example, the researchers might ascertain the voting preferences of their own friends and acquaintances. Or they might interview shoppers at a local mall. Or they might publish two telephone numbers in the local newspaper and ask readers to call either number in order to "vote" for one of the candidates. The important feature of these methods is that they would systematically exclude some members of the population (respectively, eligible voters who do not know the researchers, do not go to the shopping mall, and do not read the newspaper). Consequently, their findings could not be generalized to the population of city voters.
Non Probability samples
The difference between nonprobability and probability sampling is that nonprobability sampling does not involve random selection and probability sampling does. Does that mean that nonprobability samples aren't representative of the population? Not necessarily. But it does mean that nonprobability samples cannot depend upon the rationale of probability theory. At least with a probabilistic sample, we know the odds or probability that we have represented the population well. We are able to estimate confidence intervals for the statistic. With nonprobability samples, we may or may not represent the population well, and it will often be hard for us to know how well we've done so. In general, researchers prefer probabilistic or random sampling methods over nonprobabilistic ones, and consider them to be more accurate and rigorous. However, in applied social research there may be circumstances where it is not feasible, practical or theoretically sensible to do random sampling. Here, we consider a wide range of nonprobabilistic alternatives. We can divide nonprobability sampling methods into two broad types: accidental or purposive. Most sampling methods are purposive in nature because we usually approach the sampling problem with a specific plan in mind. The most important distinctions among these types of sampling methods are the ones between the different types of purposive sampling approaches.
Types of Non Probability samples
Accidental, Haphazard or Convenience Sampling One of the most common methods of sampling goes under the various titles listed here. I would include in this category the traditional "man on the street" (of course, now it's probably the "person on the street") interviews conducted frequently by television news programs to get a quick (although nonrepresentative) reading of public opinion. I would also argue that the typical use of college students in much psychological research is primarily a matter of convenience. (You don't really believe that psychologists use college students because they believe they're representative of the population at large, do you?). In clinical practice,we might use clients who are available to us as our sample. In many research contexts, we sample simply by asking for volunteers. Clearly, the problem with all of these types of samples is that we have no evidence that they are representative of the populations we're interested in generalizing to -and in many cases we would clearly suspect that they are not. Purposive Sampling In purposive sampling, we sample with a purpose in mind. We usually would have one or more specific predefined groups we are seeking. For instance, have you ever run into people in a mall or on the street who are carrying a clipboard and who are stopping various people and asking if they could interview them? Most likely they are conducting a purposive sample (and most likely they are engaged in market research). They might be looking for Caucasian females between 30-40 years old. They size up the people passing by and anyone who looks to be in that category they stop to ask if they will participate. One of the first things they're likely to do is verify that the respondent does in fact meet the criteria for being in the sample. Purposive sampling can be very useful for situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern. With a purposive sample, you are likely to get the opinions of your target population, but you are also likely to overweight subgroups in your population that are more readily accessible. All of the methods that follow can be considered subcategories of purposive sampling methods. We might sample for specific groups or types of people as in modal instance, expert, or quota sampling. We might sample for diversity as in heterogeneity sampling. Or, we might capitalize on informal social networks to identify specific respondents who are hard to locate otherwise, as in snowball sampling. In all of these methods we know what we want -- we are sampling with a purpose.
Modal Instance Sampling
In statistics, the mode is the most frequently occurring value in a distribution. In sampling, when we do a modal instance sample, we are sampling the most frequent case, or the "typical" case. In a lot of informal public opinion polls, for instance, they interview a "typical" voter. There are a number of problems with this sampling approach. First, how do we know what the "typical" or "modal" case is? We could say that the modal voter is a person who is of average age, educational level, and income in the population. But, it's not clear that using the averages of these is the fairest (consider the skewed distribution of income, for instance). And, how do you know that those three variables -- age, education, income -- are the only or even the most relevant for classifying the typical voter? What if religion or ethnicity is an important discriminator? Clearly, modal instance sampling is only sensible for informal sampling contexts.
Expert sampling involves the assembling of a sample of persons with known or demonstrable experience and expertise in some area. Often, we convene such a sample under the auspices of a "panel of experts." There are actually two reasons you might do expert sampling. First, because it would be the best way to elicit the views of persons who have specific expertise. In this case, expert sampling is essentially just a specific subcase of purposive sampling. But the other reason you might use expert sampling is to provide evidence for the validity of another sampling approach you've chosen. For instance, let's say you do modal instance sampling and are concerned that the criteria you used for defining the modal instance are subject to criticism. You might convene an expert panel consisting of persons with acknowledged experience and insight into that field or topic and ask them to examine your modal definitions and comment on their appropriateness and validity. The advantage of doing this is that you aren't out on your own trying to defend your decisions -- you have some acknowledged experts to back you. The disadvantage is that even the experts can be, and often are, wrong.
In quota sampling, you select people nonrandomly according to some fixed quota. There are two types of quota sampling: proportional and non proportional. In proportional quota sampling you want to represent the major characteristics of the population by sampling a proportional amount of each. For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop. So, if you've already got the 40 women for your sample, but not the sixty men, you will continue to sample men but even if legitimate women respondents come along, you will not sample
them because you have already "met your quota." The problem here (as in much purposive sampling) is that you have to decide the specific characteristics on which you will base the quota. Will it be by gender, age, education race, religion, etc.? Non proportional quota sampling is a bit less restrictive. In this method, you specify the minimum number of sampled units you want in each category. here, you're not concerned with having numbers that match the proportions in the population. Instead, you simply want to have enough to assure that you will be able to talk about even small groups in the population. This method is the nonprobabilistic analogue of stratified random sampling in that it is typically used to assure that smaller groups are adequately represented in your sample.
We sample for heterogeneity when we want to include all opinions or views, and we aren't concerned about representing these views proportionately. Another term for this is sampling for diversity. In many brainstorming or nominal group processes (including concept mapping), we would use some form of heterogeneity sampling because our primary interest is in getting broad spectrum of ideas, not identifying the "average" or "modal instance" ones. In effect, what we would like to be sampling is not people, but ideas. We imagine that there is a universe of all possible ideas relevant to some topic and that we want to sample this population, not the population of people who have the ideas. Clearly, in order to get all of the ideas, and especially the "outlier" or unusual ones, we have to include a broad and diverse range of participants. Heterogeneity sampling is, in this sense, almost the opposite of modal instance sampling.
In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find. For instance, if you are studying the homeless, you are not likely to be able to find good lists of homeless people within a specific geographical area. However, if you go to that area and identify one or two, you may find that they know very well who the other homeless people in their vicinity are and how you can find them.
The sample size is very simply the size of the sample. If there is only one sample, the letter "N" is used to designate the sample size. If samples are taken from each of "a" populations, then the small letter "n" is used to designate size of the sample from each population. When there are samples from more than one population, N is used to indicate the total number of subjects sampled and is equal to (a)(n). If the sample sizes from the various populations are different, then n1 would indicate the sample size from the first population, n2 from the second, etc. The total number of subjects sampled would still be indicated by N. When correlations are computed, the sample size (N) refers to the number of subjects and thus the number of pairs of scores rather than to the total number of scores. The symbol N also refers to the number of subjects in the formulas for testing differences between dependent means. Again, it is the number of subjects, not the number of scores.
Sample Size Calculator Confidence Level
The confidence interval is the plus-or-minus figure usually reported in newspaper or television opinion poll results. For example, if you use a confidence interval of 4 and 47% percent of your sample picks an answer you can be "sure" that if you had asked the question of the entire relevant population between 43% (47-4) and 51% (47+4) would have picked that answer. The confidence level tells you how sure you can be. It is expressed as a percentage and represents how often the true percentage of the population who would pick an answer lies within the confidence interval. The 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95% confidence level. When you put the confidence level and the confidence interval together, you can say that you are 95% sure that the true percentage of the population is between 43% and 51%. The wider the confidence interval you are willing to accept, the more certain you can be that the whole population answers would be within that range. For example, if you asked a sample of 1000 people in a city which brand of cola
they preferred, and 60% said Brand A, you can be very certain that between 40 and 80% of all the people in the city actually do prefer that brand, but you cannot be so sure that between 59 and 61% of the people in the city prefer the brand.
Factors that Affect Confidence Intervals
There are three factors that determine the size of the confidence interval for a given confidence level:
• • •
Sample size Percentage Population size
Sample Size The larger your sample size, the more sure you can be that their answers truly reflect the population. This indicates that for a given confidence level, the larger your sample size, the smaller your confidence interval. However, the relationship is not linear (i.e., doubling the sample size does not halve the confidence interval). Percentage Your accuracy also depends on the percentage of your sample that picks a particular answer. If 99% of your sample said "Yes" and 1% said "No," the chances of error are remote, irrespective of sample size. However, if the percentages are 51% and 49% the chances of error are much greater. It is easier to be sure of extreme answers than of middle-of-the-road ones. When determining the sample size needed for a given level of accuracy you must use the worst case percentage (50%). You should also use this percentage if you want to determine a general level of accuracy for a sample you already have. To determine the confidence interval for a specific answer your sample has given, you can use the percentage picking that answer and get a smaller interval. Population Size How many people are there in the group your sample represents? This may be the number of people in a city you are studying, the number of people who buy new cars, etc. Often you may not know the exact population size. This is not a problem. The mathematics of probability proves the size of the population is
irrelevant unless the size of the sample exceeds a few percent of the total population you are examining. This means that a sample of 500 people is equally useful in examining the opinions of a state of 15,000,000 as it would a city of 100,000. For this reason, The Survey System ignores the population size when it is "large" or unknown. Population size is only likely to be a factor when you work with a relatively small and known group of people (e.g., the members of an association). The confidence interval calculations assume you have a genuine random sample of the relevant population. If your sample is not truly random, you cannot rely on the intervals. Non-random samples usually result from some flaw in the sampling procedure. An example of such a flaw is to only call people during the day and miss almost everyone who works. For most purposes, the non-working population cannot be assumed to accurately represent the entire (working and non-working) population.
In statistics, sampling error or estimation error is the error caused by observing a sample instead of the whole population. An estimate of a quantity of interest, such as an average or percentage, will generally be subject to sample-to-sample variation. These variations in the possible sample values of a statistic can theoretically be expressed as sampling errors, although in practice the exact sampling error is typically unknown. Sampling error also refers more broadly to this phenomenon of random sampling variation. The likely size of the sampling error can generally be controlled by taking a large enough random sample from the population, although the cost of doing this may be prohibitive; see sample size and statistical power for more detail. If the observations are collected from a random sample, statistical theory provides probabilistic estimates of the likely size of the sampling error for a particular statistic or estimator. These are often expressed in terms of its standard error. Sampling error can be contrasted with non-sampling error. Non-sampling error is a catch-all term for the deviations from the true value that are not a function of the sample chosen, including various systematic errors and any random errors that are not due to sampling. Non-sampling errors are much harder to quantify than sampling error.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.