This action might not be possible to undo. Are you sure you want to continue?
Methodological Rigor with Internet Samples: New Ways to Reach Underrepresented Populations
ROBIN M. MATHY, M.A.,1,2 MARC SCHILLACE,1 SARAH M. COLEMAN, B.A.,1 and BARRIE E. BERQUIST, B.A.S.1
ABSTRACT We present several rigorous methods for sampling difficult-to-reach and empirically underrepresented populations via the Internet. The methodology’s representativeness was tested by comparing the demographics of a small sample of 82 lesbian and bisexual females with a much larger Gallup Organization sample of the general population (n > 1,000) obtained via random digit dialing. Compared to the latter poll, the rigorous sampling designs developed for the Internet were found to be significantly more robust and equally representative of the U.S. general population. The Gallup Organization reached a sample more representative of the age distribution of the United States. The Internet sample reached a sample more representative of the population, with less education, lower incomes, and a broad spectrum of ethnic diversity. The samples were equally effective in representing the distribution of the population with rural and urban residence. INTRODUCTION lesbians and gay males in the general population. The presence of a statistically significant sex difference in convenience samples vis-àvis population surveys suggests the existence of sampling bias3 in the former. Few researchers have the time and financial resources to conduct surveys of sufficient size to gather a subsample of lesbian, gay male, and bisexual respondents large enough to conduct rigorous analyses. This observation is important and rarely delineated for the benefit of readers who are somewhat less familiar with the difficulties of conducting research in this population. Therefore, it is appropriate to articulate them here. Consider that a sample of 384 gay men and 384 lesbian women would be needed to obtain sample sizes with a 95% confidence interval with no more than a relatively generous 0.05 error of estimation (n = [((1.96)2
EMALES HAVE BEEN SIGNIFICANTLY UNDERREPRESENTED
(p < 0.00001) in research concerning sexual orientation and suicidal behavior.1 Mathy1 found that females were significantly underrepresented (p < 0.05), even under the debatable assumption that there are twice as many gay men as lesbian women in the general population. The first population-based survey on the topic of sexual orientation and suicidal behavior was published only recently.2 That study did not find a statistically significant difference in the numbers of males and females who self-identified as homosexual. Other population-based surveys conducted by the National Opinion Research Center (NORC) also have found a much smaller difference between the proportions of
1 Department of Psychiatry, Medical School, and School of Social Work, University of Minnesota—Twin Cities, Minneapolis, Minnesota. 2 Evidence-Based Health Care, University of Oxford, Oxford; International Relations, University of Cambridge, Cambridge, United Kingdom; and Institute of Child Development, University of Minnesota—Twin Cities, Minneapolis, Minnesota.
it was considered an ideal if not altogether unrealistic and impossible objective to find a way to conduct random sampling in a population that was not circumscribed by a clinical nature. making uncertain the extent to which the sample statistics reflect the population parameters within the geographic area in which the study was conducted. community agency mission. gay male.360 men (1. gay male. We would expect a clinical sample to have more depressed and suicidal respondents than a sample of the general population.000 respondents would be very considerable.025 of the general population publicly identifying as homosexual to NORC survey interviewers. and bisexual individuals. The smaller the survey size of the general population and hence the fewer the number of lesbian and gay male respondents in the sample. The general methodology delineated here falls within the domain described as social research in cyberspace. “It is not possible to select a truly random sample of lesbian. gay male. In studies of sexual orientation and suicidal behavior. and bisexual respondents has been hampered because the population is unknown. The absence of known population parameters makes impossible the reliable estimation of sample statistics. social service agencies. 40 2 384 = 15. and bisexual individuals. Similarly. including lesbian and bisexual women and other empirically underrepresented groups. Finally.” 12 Hence.9 gay and lesbian social support groups. and bisexual (LGB) people. The time and expense of interviewing more than 30. The new methodology was designed to meet several criteria. community. First. and bisexual clients may attract a clientele particularly in need of its specialized resources and assistance. gay male. and bisexual respondents has been vigorously criticized. This is important to note because even random surveys may not generate a large enough subsample of lesbian and gay male respondents to obtain a 95% confidence interval with a reasonable error of estimation.5 Some researchers6–8 have conducted studies in geographic areas where there are relatively large populations of gay male. it had to address the bias inherent in using samples drawn from mental health clinics.00/0. and bisexual individuals has been conducted using a sampling design in which respondents refer researchers to other prospective participants. the use of convenience samples of lesbian.4 With about 0. suggesting the existence of adjustment difficulties that may not exist among other lesbian.10 and gay and lesbian social service agencies.05) 2]). lesbian. The methodology presented here overcomes several of the well-known problems of conducting research in difficult-to-reach populations.360) and the same number of women would be needed to obtain the requisite sizes of subsamples of gay men and lesbian women in the general population.3 The first specific methodology is here defined as cybersurvey. Any or all of these (clinical. A community social service agency providing services to lesbian. the less certain we can be that findings are reliable within a reasonable error of estimation. a support group would include clients who need or want support specifically related to their sexual orientation. (0. sampling has been conducted opportunistically. a random sample of 15. It is quite probably prohibitive without substantially greater resources than those available to university-affiliated researchers in the United States today. much less the population of the United States. indeed. it had to be able to address the statistically significant underrepresentation of females on the topic of sexual orientation and suicidal behavior. it had to find a way of conducting research with more (demonstrably) representative samples.254 MATHY ET AL. or support group) samples easily could bias the findings in a study of sexual orientation and suicidal behavior.5 Third. or support-group focus. Another frequent objection5 is the use of samples drawn largely from mental health clinics. This (“snowball”) sampling design has well-known methodological problems. gay. The most significant sampling problems have been sample homogeneity and the absence of a truly random sample. most research with lesbian.025 = 40. and support groups. However.11 Some researchers have simply concluded that.1 Second.25))/(0. 13 which have been uniformly acknowledged as undesirable and nonetheless considered unavoidable—until now. lesbian. . Random sampling of gay male.
all identifying information is stripped and the participant is coached to clear their computer’s memory cache. It is not even certain that the e-mail was received or read. The ethical protocols are identical to research conducted in-person. bisexual. one step above standing on a street corner and asking bystanders and passersby to answer some questions or participate in a research project. chat rooms have acephalous (or headless) organization. Therefore. The next phase depends upon whether the researcher is using electronic mail to transmit the questionnaire or whether it is posted on a web page. Participants who accept the invitation to participate are informed of the next phase. The participant uses an e-mail Reply command to complete and return the questionnaire. community agency. This methodology is. assured of confidentiality and/or anonymity. dyke.. Participants who decline are replaced with the next person in the list. All electronic mail is stored in the computer of the recipient organization. The random number key enables researchers to identify in the list the next enumerated potential participant. Distribution of information regarding the web-based questionnaire circumscribes the research to rather homogeneous groups holding institutional or organizational membership. participants are offered relevant information they might find useful. the questionnaire is sent via e-mail. In terms of anthropological research. Once the researcher has obtained an exhaustive list of chat rooms related to the target topic. The procedure is similar to doing an on-line literature search. political. and support group samples.METHODOLOGICAL RIGOR WITH INTERNET SAMPLES 255 Cybersurvey research includes the electronic transmission of a questionnaire via electronic mail or the placement of a questionnaire on an Internet web page. This is a dimension notably absent in clinical. The third specific methodology is here defined as multi-method cyberspace research. Using keywords related to lesbian and bisexual females (e. If the participant is using their actual name in their e-mail. It is slightly better because it is a bit more heterogeneous. norms. invited to participate. just as one would use each on-line index separately to search for periodicals containing particular keywords. Each In- ternet search engine is used separately. and values of chat room communities are generally constructed without social. Often. To ensure the benefit of participation outweighs the risks. Participants are told of the study’s (general) purpose. In this case. In the multi-method study of sexual orientation and suicidal behavior. bisexual. If the researcher is using electronic mail (e-mail). the participant is asked to email the researcher a request to participate in the study. chat rooms are located. lesbian. and informed of their right to decline or rescind agreement of participation at any time without negative repercussions. Upon receipt of the request.g. perhaps. the target sample was lesbian and bisexual females. participants are given the name and phone number of a national toll-free hotline specific to lesbian. a random number key is used to ask individuals in each chat room to participate in the study. Cybersurveys are conducted by using major search engines to find chat rooms for one’s target sample. This latter methodology is an integrated combination of cyberethnography and cybersurvey research methods. Cybersurvey We are concerned by the proliferation of questionnaires being placed on the Internet. Chat rooms list community members who are on-line. homosexual). This approach is less confidential than we would prefer. members of disability student services) are asked to participate in the survey and given a link to the web site on which the questionnaire has been placed. gay male. it is conceivable that the medium (tape or disc) on which e-mail data are backed . or economic hegemony. Cyberethnography is the process of engaging in participant observation and ethnographic research in chat rooms on the Internet.g. and transgender issues. The second specific methodology is here defined as cyberethnography. the attitudes. Cybersurvey uses random sampling of Internet chat rooms. However.. the response rates are generally unknown. potential respondents in focus groups (e. Upon receipt of the questionnaire. for example.
Hence. However. As with ethnographies and naturalistic studies. friends. A better approach is to place the questionnaire on a web page configured so that only responses are transmitted to the researcher. The focus of cyberethnography is participant observation of interactions between and among chat room community members. and the focus member was rotated. Lists of community mem- bers were available and recorded. there is no way to associate the responses with the sender. researchers who use cyberethnography will find it most useful to commit the majority of each day for several months or more to the study of the selected site. experiences of child maltreatment and victimization. a chat room (block) which has the characteristics most representative of other chat rooms is identified and studied intensively. we do not recommend the study of multiple sites simultaneously without considerable time and research assistance. Cyberethnography Cyberethnography is generally conducted in one or perhaps several closely related chat room communities. However. three researchers shared responsibility for observing interactions a combined total of 36 h in each 24 h for several months. Nothing on the e-mail containing questionnaire responses includes the electronic address or the handle (nickname) of the sender. and all interactions in the chat room are visible to the observer. and parents. A person who entered the chat room and interacted with others once also would not be considered a community member. The data gathered are related to the questions being asked. and parents. Free anonymous e-mail is available. multiple chat rooms (blocks) can be selected and alternated. coaching the participant to send and receive the questionnaire via an anonymous e-mail intermediary is an additional and somewhat tedious step in the research process. cyberethnography enables researchers to print out complete interactions among members over extended periods.5% during the cybersurvey . This ensures both anonymity and confidentiality. friends. The researchers’ role as participant observer was neither emphasized nor minimized. for example. just as ethnographies do for anthropologists in nonvirtual communities. for example. That is. Although the answers (only) are stored on an electronic medium in the computer hosting the questionnaire answer recipient’s e-mail account. family members. our first block sample yielded a response rate greater than 97. However. In the study of sexual orientation of suicidal behavior. disclosure to others. As with cluster sampling in non-virtual research. interactions are generally very fluid. family members. In the Multi-Method Study on Sexual Orientation and Families of Origin. Further. This facilitates qualitative analyses of interactions. a person was considered a community member who entered on two or more occasions (logging out and logging in at a later date) and who both recognized others and was recognized by others. When the focus member was not available. Multi-Method Cyberspace Research Multi-Method Cyberspace Research is highly effective. cyberethnography uses cluster sampling to identify the chat room community to be studied. Reliability is ensured by inviting other researchers to participate in the observation and to ascertain independently the extent to which data gathered are reliable between observers. In contrast to cybersurvey. the next person on the list was observed.256 MATHY ET AL. In the study of Sexual Orientation and Families of Origin. Community was defined as the recognition of others across time as well as recognition by others across time. However. a person who entered the chat room and never interacted would not be considered a community member. the data gathered included issues of acceptance and rejection by others. and others. up could be searched by a computer operator or a reasonably sophisticated computer hacker. The investment of time cannot be underestimated. This methodology is common in primatology and zoology. The approach requires intensive and extensive time commitments.
lesbian. In addition. Multi-method research facilitates the use of both qualitative and quantitative methods. and brevity of download time. or undeclared) communities. and (5) use of accurate information and dissemination of information to site visitors. agency based. Individual chat room communities are relatively small and sampling of membership can be exhausted within two months. the methodology is a multi-panel design which enables researchers to return to the same site and gather new samples over an extended period of time. known as the Home or index page. This adds an important and extremely valuable dimension to this work. and the usual caveats regarding participation and known risks. transgender. gay male. The second page serves the purpose of providing information about the study. the Internet also may become an important instrument of social research with difficult-to-reach and empirically underrepresented populations. the qualitative methods provide idiographic. In addition to these criteria. and was generally well-known among Internet users.5% of respondents who participated in the cyberethnography also participated in the cybersurvey. (3) use of multimedia and graduated increased use of graphic sources to enhance interest and attractiveness of participation. emic information. the questionnaire itself. This methodology overcomes the apparent impossibility of random sampling in this population. confidential survey. Hence. Put somewhat differently. The quantitative methods provide nomothetic. or predicated by an educational mission or milieu. this methodology is not limited to school-based surveys of youth. the visitor must activate a link to the next page. (2) ease and quick speed for downloading files and graphics. Limitations All research methodologies have limitations. the visitor reads a statement indicating . the qualitative methods give us information about meanings and the quantitative methods give us information about structure. Nonetheless. The Home page includes another file which is automatically activated when the page file is executed that plays a copy of a well-known classical music piece. Participant observation of several chat rooms (blocks) simultaneously enables researchers to alternate among virtual communities. However. The first. This page also provides the visitor with sources to contact in the event there are emotionally upsetting reactions to survey questions. support focused. From an anthropological perspective. In addition. The site itself consists of three Web pages or files. in order to view the questionnaire. one thing is certain: Some random sampling is possible in a lesbian and bisexual population that is not clinical. features the title of the site and includes a graphic symbol (a rainbow emblem) that is commonly recognized among BGLTU (bisexual. It provides an invitation to participate in the study of sexual orientation and family of origin issues by asking visitors to complete questions on an anonymous. an element of color. it is not yet known to what extent the population from which chat room community members are selected is similar to or different from those in the general population. Just as the telephone has become a dominant tool in social surveys. these methods are complementary. (4) use of design elements to enhance curiosity in pursuing exploration of the site. MATERIALS AND METHODS Technical design and development The design and development of the Web site containing the research questionnaire was based on accepted and recognized criteria for Web sites in general. We believe this methodology is significantly superior to snowball sampling or geographically homogeneous samples. more than 97. Specifically. etic information.14–18 These criteria emphasize the following: (1) ease of location of the site via search engines. provided reliable service.METHODOLOGICAL RIGOR WITH INTERNET SAMPLES 257 component of our research. The page acts as the first element to enhance curiosity about the site by a visitor by involving an animated graphic. it was important to locate the site with an Internet Host that offered free storage space. However. That is.
000 people. however. This free service features a CGI (Common Gateway Interface) script that removes the participant’s email address. The remaining 40% had some college education.g. The participant submits the questionnaire responses by clicking a button at the end of the questionnaire.4%) identified as Native American. and success of this methodology.S. two (2. Ethnicity. and other identifying information. This methodology allows other researchers to employ similar techniques. At the time of the cyberethnography. and another link that activates the file containing the survey questions. This page also includes a music file that is activated automatically.000) and 32 participants (40%) reported they had grown up in an area with a population of fewer than 50. Netscape and Internet Explorer). who consequently may fail to return to the research site to complete the questionnaire.8%).. poverty level. these disadvantages are outweighed by the overwhelming reliability. Thus. Succinctly stated. This can cause a loss of data as well as frustration for the participant.08. The ability to do so while maintaining both anonymity and confidentiality of respondents is a significant achievement. The sample median age was 21. For example.4%) indicated they were Black. It is accessible 24 h daily. The final page of the site consists of the research questionnaire. Rural residency. Approximately one-third (35.9%) of the participants reported they have personal incomes below the U.5 to 45.4%) identified themselves as Latina. featuring an easy-listening piece and a different animated graphic. three (3. This button activates a hidden command that is embedded in the file that forwards the participants’ responses to a different Web site.5%) indicated they lived in rural areas (populations of less than 50. a link back to the Home page for the site.2%) as had a high school degree or some secondary or primary education only (28. as the elements of design as well as use of technology are commonly available throughout cyberspace. the confidentiality as well as anonymity of participants is ensured. RESULTS Sample demographics Age. This page includes the e-mail addresses of the Co-Investigators of the Research Project on Sexual Orientation and Families of Origin. and two (2. Education. Possible disadvantages are primarily of a technical nature. two (2.7%) identified as Asian American. Participants ranged in age from 15. 18 participants (22. The participant’s connection to the Internet can be interrupted due to problems with their provider or that of the provider hosting the web site. The responses are forwarded to one of the Co-Investigators of this study for inclusion in a database. or higher education campuses and communities is a significant methodological breakthrough. SD = 7. consistency. This page features a different file that automatically plays a different easy-listening piece as well as graphics of researchers and participants at computers that are used to break each section of the questionnaire and stimulate interest.17 years.3). The Research Project on Sexual Orientation and Family of Origin obtained in its first cybersurvey a sample of 82 lesbian and bisexual females. Generally.38 years (M = 24. Seventy-three of the 82 participants (89%) reported they were White. they may decline or rescind participation at any point in time without adverse consequences. name. Roughly equal proportions of the sample were college graduates (31. Sample comparisons Our primary goal was to obtain a representative sample of lesbian and bisexual participants . e-mail containing a participant’s responses may not be processed correctly and fail to reach the researcher. social service agencies.258 MATHY ET AL. the ability to conduct random sampling of underrepresented and oppressed groups without relying upon clinical settings. The site can be reached by using any of the available Web browsers (e. Income. support groups (usually facilitated). possibly reflecting the relatively young median age of the sample.
67%) agreed to participate in the study.S.S. Hence. we wanted to know whether the demographics of our sample differed from the general population. The chat room (block) which had characteristics most central to all potential chat rooms was chosen for the cyberethnography. ethnically diverse.S. Bureau of the Census with even larger samples would be more representative of the U. Bureau of the Census. we would expect that demographic data obtained by the U. Bureau of the Census demographics for the U. we used unpublished data from a CNN/USA Today poll conducted by The Gallup Organization in 1998 and data published by the U. data regarding the U. Two forms of representativeness were considered. For this purpose. this methodology has since been improved. dollars. population than either our sample or the sample obtained by The Gallup Organization.000 or more) conducted by The Gallup Organization would be more representative than our data. All three of these samples were compared to U. population. For this purpose.19 Second. population. Therefore. based on sample size we would expect that demographic data obtained from larger surveys (n = 1. Bureau of the Census sample which assessed demographics of Internet use. We had frankly expected our sample would significantly underrepresent elderly. Smaller samples generally have less methodological power to estimate the population because of the dynamics of the central limit theorem. Age group distributions in our sample were compared with the samples of The Gallup Organization in 1998 and Mediamark Research Inc. The samples included our sample as well as both a Gallup Poll sample and a U.S.S. in 1998 .62%) completed and returned questionnaires. as the size of a sample increases its sample statistics more closely approximate the population parameters. general population were examined. the apparent difficulties of obtaining formal demographic data just two or three years ago might seem somewhat exaggerated by the data readily available today. The lead author obtained the crosstabulations of the CNN/USA Today poll from The Gallup Organization for 25 U. in 1998 (Table 917)19 as well as the U.S. Of the 86 community members of this chat room. Overall. As noted earlier. It is “difficult to substantiate any formal demographics to support the sampling procedures” and those with such knowledge often charge large sums of money for it.. less educated. U. 82 of 86 community members (95. we conducted statistical comparisons between several samples and the U.S.S. population. U. smaller samples (including random samples) may be significantly more skewed and hence less representative of the population than estimates obtained with larger samples. Of the 84 members who agreed to participate in the cybersurvey. we assume it is no different than among youth and adults in the general population. obviating the need to transmit the questionnaire (and identifying information) via email. after which a questionnaire was electronically submitted to the recipient. Bureau of the Census. we would suggest social science is effectively developed by overcoming such methodological challenges.S. Bureau of the Census in 199819 in the Statistical Abstract of the United States. Age.20 However.35%) completed questionnaires. 84 (97.13 We used cyberethnography and subsequently cybersurvey to obtain a sample of 82 lesbian and bisexual females. Further. To determine the representativeness of our small sample (n = 82) relative to the general population. and rural residents—even as we expected to draw participants from a population without many of the previously noted methodological problems.S. 82 (97. First. Bureau of the Census data were free and easily accessible via the Census Bureau’s web site and links to the on-line Statistical Abstract of the United States. Although the population of lesbian and bisexual females is unknown. The chat room community was selected through cluster sampling3 lesbian and bisexual chat room communities. Consent to participate was indicated by receipt of an email request for the questionnaire. Similarly.S.S. we wanted to know whether the demographics of our sample differed from reliable estimates regarding the demographics of Internet users.METHODOLOGICAL RIGOR WITH INTERNET SAMPLES 259 living in rural areas. and the U. The ultimate measure of any sampling methodology is its ability to reliably represent the population from which its data were drawn.S. poor.4 According to this theorem.
e.4%).S. 2.S.6% identified as members of other ethnic groups.764). political.S. Nor did the Mediamark sample underrepresent black respondents (p = 0. population (p = 0. However.4% of the population was between the ages of 15 and 34 in 1997. Bureau of the Census.S. and develop in a virtual community independent of geographic constraints and economic.S. and “other” (3. the Gallup poll sample significantly underrepresented white respondents relative to their representation in the U. Nor did the proportion of “other” in the Mediamark sample differ from the U. only 17.S. whereas 13.3% of their weighted base of 210 respondents aged 18–29 had used a computer on-line services at home.00001) from the representation of the U. adults were below the age of 35. population (p = 0. Mediamark Research Inc. In our sample.6% of respondents older than age 55 had used the Internet in the previous 30 days. 49.” Approximately 71. The U. In the United States. 89% of participants identified as white. The proportion of “non-white” in the Gallup poll sample differed markedly (i. By contrast. Internet usage is dominated by younger age groups (middle age or younger).7% of the U.1% of respondents aged 30–49 and 18. 17.7% were black. Mediamark estimated 33.59).e. The Gallup Organization reported that 46. Native American. (Table 16). About 32. This has enormous significance for both research and clinical work with this population.S. However. Bureau of the Census in 1998 (Table 13)19 reported that as of July 1. also trichotomized its sample.008).2% of respondents aged 18–34 had used the Internet in the previous 30 days.S.S. A two-tailed z-approximation test of Gallup poll data did not reveal a difference between white and non-white respondents . Asian American. the Gallup poll did not underrepresent black respondents relative to their representation in the U.6% as other groups (i.4% of respondents aged 65 or up had used a computer on-line service at home.” and “black.S. and social hegemony. The Gallup Organization reported that 35.6% of our samples was below the age of 25. more than 7.9% of “non-white” respondents had accessed on-line services from home. However.1% of black respondents reported that they had done so. population classified as “other” by the U. 1997.7% of our sample was younger than age 35. using as categories “white” (84. Only 4.77). Although 25.7% of our sample was below the age of 30..7% of U. and 8.085). Only 8.S. population (p = 0.6% of their weighted base of 210 respondents aged 18–29 had used computer on-line services at work or school. population was between the ages of 15 and 29 in 1997.005) relative to their proportion in the U.5% of respondents were white.S.5% of white respondents and 22.6% of the U. resident population was white. “black” (11. 81. and 4. Mediamark U.S. population (p = 0. 12. Reliable demographic data regarding chat room community memberships have yet to be found. 28.” “non-white.8% of those aged 50–64 had used a computer on-line service at home.S. Neither the Mediamark sample nor our sample differed significantly from the U. However.” and 10. respectively).4% as black. population. Chat room community members are a logical subset of Internet users. p < 0.7% was black.6% of the Gallup poll sample’s respondents were between the ages of 18 and 29.68 and p = 0. Nonetheless.8% were “non-white.260 MATHY ET AL. population (p < 0. However.9%). the proportion of other ethnic groups in our sample did not differ significantly from the U. The Gallup Organization trichotomized the race of its sample into “white. population was in this age group in 1997. Fewer than 2% of respondents aged 65 or up had used the Internet at work or school.13. our data suggest that lesbian and bisexual females have found a new way to socialize.7% of respondents aged 50 to 64 had used it in those locations.6%). Our sample did significantly underrepresent black respondents (p = 0. Approximately 31. interact. and 20. Ethnicity. and Latina).2% of those 35–54 years old had done so.7% of the U. 19 finding 64.. Bureau of the Census in 1998 (Table 917)19 reported that 42. there is some evidence that findings regarding Internet access are somewhat inconsistent. 21. whereas 92.69 standard deviations.2% of respondents aged 30–49 and 16. 82.
S. In The Gallup Organization data.. Interestingly. The highest proportion of Internet access at work or school occurred among “non-white” respondents (0. Similarly. whereas the Gallup Poll did not. in 1998 (Table 916)19 also reported that respondents classified as “other” had the highest proportion (.8%) had not attended college.055.S. there was a statistically significant difference (p = 0.3% of the total U.21 This is in marked contrast to 13. two-tailed p = 0. Nonetheless.S. perhaps.. However.199 proportion of U. U.205 proportion of respondents in poverty in our sample was different than the 0.199. Further.19 Precisely because 64.205 did not differ significantly from the population proportion of 0. Income.S.S. and 40% had attended only some college.S.19 Education. The differences in on-line access from work or school were also insignificant between black and white respondents (p = 0. This is particularly notable given our small sample size. The observed proportion of 0. it must be noted that we significantly underrepresented black respondents relative to their representation in the U.S. We also conducted a two-tailed z-approximation test to ascertain whether the 0. our data were more representative of ethnic diversity than data obtained with random sampling by The Gallup Organization. Nonetheless.S. 19.888. Bureau of the Census in 1998 19 reported that in 1997 about half (51.83) and between non-white and black respondents (p = 0. followed by white (0. population in 1997.S.7%) of the U. We conducted a two-tailed z-approximation test to ascertain whether the 0. Mediamark Research Inc. population. The observed proportion of . youth are accessing the Internet in numbers disproportionate to their numerical representation in the U.248) respondents.133 in the U.9% of youth aged 18–24 years were living in poverty (Table A). In sum.044) who had accessed on-line services from work or school. Bureau of the Census in 1998.000 U. Bureau of Census seem inconsistent with two stark realities.5% of The Gallup Organization sample had incomes below 20. two-tailed p = 0. First. respondents living in poverty than the data provided by either The Gallup Organization in 1998 or the U. In 1995.5% of our respondents reported they have incomes below the poverty level. the significantly greater proportion of “other” (non-white and non-black) respondents in our sample reflects the ethnic diversity in Internet use noted in both The Gallup Organization and Mediamark Research surveys. and 5% had obtained a graduate degree.S.47). the lowest household income reported by Mediamark Research Inc.000.3% had graduated from college.” The lower limits reported by The Gallup Organization and the U.205 proportion of respondents in poverty in our sample was different than the proportion of 0. followed by white (0.02) between white and black respondents.19 was “less than $50. Bureau of the Census.288).48) who had accessed on-line services from home. About 14. However.S.6% of our sample is in this age group. according to the U. One of the most startling demographic statistics currently available is the sheer poverty of youth.232) and black (0. there was not a significant difference between non-white and white respondents (p = 0. Bureau of the Census in 1998. population had either not (yet) completed high school or had graduated .S. Nonetheless.133.25) was found between “non-white” and black respondents in regard to on-line access from home. youth aged 18–24 living in poverty.S. population.128) respondents. it may not be surprising that 20.258) and black (0. youth are disproportionately poor. Bureau of the Census.S. The U.S. no significant difference (p = 0. Second. the median income of youth aged 15–24 was 6.960 U. population. this amount is so much greater than the poverty level that any comparisons with our sample would be meaningless. in 1998. U. There were fewer differences in Internet access from work by ethnicity in The Gallup Organization data. dollars.205 did not differ significantly from the population proportion of 0. One-fourth of our sample (28. it is clear that (whether or not we control for age) our sample was more representative of U.METHODOLOGICAL RIGOR WITH INTERNET SAMPLES 261 (p = 0.326) of Internet access in the previous 30 days. In 1997. Another 21. dollars.
In essence. We conducted a two-tailed z-approximation test to ascertain whether the proportion of their sample that had completed high school or less (0.8% had earned an advanced degree.288) differed from the population proportion of 0. The Gallup Organization simultaneously underrepresented the least educated group and overrepresented the most educated group. The observed proportion of 0.S. two tailed p = 0. our sample neither underrepresented nor overrepresented the proportion of the populationÇ with a college degree or additional postgraduate education. The observed proportion of 0. whereas our sample (also) underrepresented the least educated group and overrepresented the proportion of the population with some college education but not a degree.33. Of the remainder.nor underrepresented college graduates. and overrepresented the proportion of the population with some college education but not a college degree.238.378). it is notable that our sample neither over.238.012. from high school. 16% had earned a Bachelor’s degree and 7. We conducted a two-tailed z-approximation test to determine whether the proportion of our sample that had some college education but not a Bachelor’s degree (0. The observed proportion of 0.3%) had graduated college or received some postgraduate education. The Gallup Organization also significantly underrepresented the proportion of the population with less formal education than high school completion.333) differed from the population proportion of 0.517.517.288) differed from the proportion of The Gallup Organization sample with a high school education or less (0. .245. Therefore.1% of their respondents had attended some college but not earned a Bachelor’s degree.238.40 was significantly different from the population proportion of 0. We conducted a two-tailed z-approximation test to determine whether the proportion of their sample with a college degree or postgraduate education (0.238. we compared our data directly with The Gallup Organization data to see whether our small sample differed significantly from the larger random sample.04. The observed proportion of 0. The remainder (37. We conducted a two-tailed z-approximation test to determine whether the proportion of our sample with a high school education or less (0. Both our sample and The Gallup Organization sample were equivocal in their ability to approximate the educational distribution of the U.245. twotailed p < 0. The observed proportion of 0.001. In sum. two-tailed p = 0.5% had attended some college or had obtained either an Associate’s or vocational degree but had not (yet) earned a Bachelor’s degree. Another 24.0001. About 29. We conducted a two-tailed z-approximation test to determine whether the proportion of their sample with some college education but not a Bachelor’s degree (0. However. two-tailed p = 0. The Gallup Organization obtained a sample in which one-third of respondents (33. We conducted a two-tailed z-approximation test to ascertain whether the proportion of our sample that had completed high school or less formal education (0. two-tailed p = 0.S. the proportion of college graduates was significantly overrepresented. population.291) differed from the population proportion of 0. population in this group.8%) had a high school diploma or less.378) differed from the population proportion of 0.378 differed significantly from the population proportion. Although the proportion of The Gallup Organization with more than a high school education and less than a college degree did not differ from the proportion in the U.288 was significantly different from the population proportion of 0.291 did not differ significantly from the population proportion of 0.333 differed significantly from the population proportion of 0. our sample had underrepresented the proportion of the population with a high school education or less. our sample underrepresented respondents who had not gone to college and overrepresented those with some college education but not a degree.263) differed from the population proportion of 0. two-tailed p = 0.245. The observed proportion of 0. we conducted a two-tailed z-approximation test to assess whether the proportion of our sample with a Bachelor’s degree or some postgraduate education (0. In essence. Finally.517.40) was different from the population proportion of 0.263 did not differ significantly from the population proportion of 0.60.245.262 MATHY ET AL. However. In contrast.
U. The observed proportion of 0.5% were attending college.25). twotailed p = 0.263) did not differ significantly from the Gallup poll proportion of college graduates (0.243) did not differ significantly from the proportion of rural residents in the U.378. in 1998. two-tailed p = 0. 48..19 also found a skewed educational distribution related to Internet access. Of those who had accessed the Internet in the previous 30 days.291. The Gallup Organization obtained a sample of 24.333).6% had not (yet) done so.53.3% rural residents. Mediamark Research Inc. the remainder (24.243) also did not differ significantly from the proportion of rural residents in our sample (0. In all definitions. less than one-tenth the size). The U.95%) reported that they lived in rural areas at the time of the cyberethnography. We conducted a z-approximation test to determine whether the proportion of respondents residing in rural areas in our sample (0. perhaps.5% of respondents with more than a high school education and less than a college degree had done so. as it were) at approximating the educational distribution in the United States. We conducted a z-approximation test to determine whether the proportion of rural respondents in their sample deferred from the proportion in the U. and another 3.S.22) differed from the population proportion of 0. we conducted a two-tailed z-approximation test to determine whether there was a difference in the proportion of our two samples that had completed college or obtained additional postgraduate education. That is. population or our sample. Put somewhat differently..22).e. Bureau of the Census in 1998 (Table 46)19 reported that 75.3% of respondents with a high school education or less had used their home computer to connect to the Internet. twotailed p = 0. By definition.6% of The Gallup Organization sample with at least a college degree had used a computer at work or school to connect to the Internet.22 did not differ significantly from the population proportion of 0. Precisely because there were no statistically significant differences between our sample and the sample obtained by The Gallup Organization. Our sample proportion of 0.000) on any level of educational attainment. The observed proportion of rural residents in their sample (0. The same data revealed that 49.4% of their sample with at least a college degree had used their home computer to connect to the Internet. 26. 28. We conducted a two-tailed z-approximation test to determine whether there was a difference in the proportion of our two samples with regard to those with a high school education and less than a college degree. our much smaller sample (n = 82) did not differ significantly from a significantly larger sample (n > 1.2% of the population lived in urban areas in 1990.000 persons. Hence. In contrast. the population not classified as urban constitutes the rural population.288 did not differ significantly from the Gallup poll proportion of 0.7% were college graduates. The Gallup Organization data revealed that 47. Bureau of the Census defines an urban area as “one or more places and the adjacent densely settled surrounding territory that together have a minimum population of 50.S.S. Finally. a .25. two-tailed p = 0. two-tailed p = 0. we can assert that our sampling methodology is much more robust. Again in contrast. our methodology was equally effective at approximating the rural and urban residential distribution in the U.20.88.S.4% of respondents with a high school education or less had used a computer at work or school to connect to the Internet. Eighteen respondents in our study (21. Interestingly.8% had graduated from high school. two-tailed p = 0. our small sample with a more robust sampling methodology was equally effective (or ineffective.25. Bureau of the Census. Rural and urban residence. with a much smaller sample (i. a somewhat larger percentage (39%) reported that they had been raised in rural areas. Because the Central Limit Theorem predicts that a larger sample will more closely approximately the population.40 did not differ significantly from the Gallup poll proportion of 0.S.07.8%) lived in rural areas. (0.63.”19 The U.METHODOLOGICAL RIGOR WITH INTERNET SAMPLES 263 Our sample proportion of 0. Only 12. Only 12.S. 29% of respondents with more than a high school education and less than a college degree had done so. 10. Our sample proportion of college graduates (0.11. The observed proportion of rural residents in their sample (0.
It may have significant clinical as well as research implications.1% of rural residents have used their home computer to connect to the Internet or an online service. their interests are primarily and conspicuously sexual. For example. we cannot make reliable or even reasonable inferences about possible differences between the psychology of chat room community members and others without careful assessment. However. While designing these methodologies. however. However. Through triangulation. Yet this conspicuous demonstration of cultural obsession with female anatomy is frequently given as a self-descriptor by males who pose as lesbians interested in cybersex (i. The inherent asset of studying a virtual community is the ability to observe interactions among community members as well as between researcher and members. yeast infections. as we did. CONCLUSION There are significant methodological problems endemic to finding a representative sample of difficult-to-reach and empirically underrepresented populations.. even as we ascribe sex based on criteria for discerning gender. Clinically. gay male. However. which generally violates community norms. In contrast. pregnancy. the predominance of young people among lesbian and bisexual female chat room community members suggests this population has found a new way out of the closet. makes a habit of giving false information to surveyors. two-tailed p = 0. We have yet to see a genetic female who advertises her identity to others by announcing her bra and cup size. Individuals do pose as others on the Internet. The methodologies presented in this paper proffer several inexpensive and robust ways to reach them.. individuals with a questionable presentation of self become apparent and. professionally or organizationally facilitated social support groups. Yet we rarely have access to the first-order knowledge of the other person’s sex. The Gallup poll data indicate that only 17. creating caricatures of themselves. When asked. Internet users. Their answers about routine female health concerns (e. and computer users.0004 (home) and p = 0. Unfortunately. this work is left to others.6% of rural residents have used a computer at work or school to connect to the Internet. Other researchers may find. and educational milieu. Only 17. The most significant asset of these methodologies is their independence from heavily biased sources.264 MATHY ET AL. at times. Telephone interviewers may assume that the respondent is as they have presented themselves. These differences between rural and urban residents is statistically significant. a form of qualitative interrater reliability. This observation warrants further consideration. smaller sample that does so may prove particularly beneficial to researchers with limited financial resources. these caricatures are just as much a part of themselves. and ovulatory cycles) are charitably de- . a male home-owner. The most significant liability of these methodologies is the absence of substantive and reliable information regarding psychological variances between chat room community members. social service agencies. Our sample reflects the younger median age of Internet users relative to the general population. that there are significant differences between samples drawn from acephalous virtual communities and samples selected from mental health clinics.g. About 29. 27. lesbian. Yet many people pose as someone else to telephone surveyors. One of the first author’s acquaintances. A final caveat is worthy of mention. We assume individuals are male or female based upon their appearance and presentation of self.1% of urban residents have done so. it is not uncommon for males to pose as females who are interested in fulfilling a desire to play “lesbian” with a lesbian or bisexual female.4% of urban residents have done so. humerous caricatures because of their ineptitude. sexual gratification via role play in cyberspace). and bisexual clients who feel isolated and rejected may find others with similar interests and issues in a virtual community. we were struck with the similarities in taken-for-granted assumptions we make about each other based on the flimsiest of detail.0001 (work and school). Ironically.e. they are unable to describe a pap smear or a mammogram.
Babbie.K. The practice of social research. American Journal of Community Psychology 26:307–334. the study by Hershberger and D’Augelli10 depended upon responses to questionnaires mailed to youth groups throughout North America. pp. Berkeley. D. Muehrer. Sexual preference: its development in men and women. 4th ed. 10. A. French. those who might question the reliability of our participants’ sex is asked to reflect upon the unquestioned assumptions made about the demographic characteristics of telephone survey and mailed questionnaire respondents. Berkeley. (1998). Developmental Psychology 31:65–74.R. ed. Bruce Center. D. In: Proceedings of 18. Information architecture for the World Wide Web.. Bell... Statistical abstract of the United States. & Weinberg. 6. T. Vaughn. All research is imperfect. & Hammersmith. Waldo. CA: Peachpit Press. M. (1995). (1994). C. CA: Osborne/McGraw-Hill. M. 4.. A. Dual attraction: understanding bisexuality. Bureau of the Census. Parker. The relationship between suicide risk and sexual orientation: results of a population-based study. & Stiles. Cambridge. REFERENCES 1. Elements of web design. 11. and sense making: a proposed theory for information design. J. 8.M. R. Story. Rosenfeld. The reliability of our participants’ sex is at least equal to these methodologies. Hershberger. and sexual orientation. A. 15. it is probably no more likely that one would do so than those who “pass” as members of the “other” sex in everyday life. & Finlay. P. New York: Simon and Schuster. 19. and bisexual young people: a structural model comparing rural university and urban samples. P. (1996). Suicidal behavior and gay-related stress among gay and bisexual male adolescents. M.METHODOLOGICAL RIGOR WITH INTERNET SAMPLES 265 scribed as naive. Giudice. 7. (1998). & Rosario. (1998). Carole Bland. Jacobson.S.. Cambridge. 13. MA: O’Reilly. M. Boston: Wadsworth Publishing Company.. 35–58. Washington. (1995). gay. B. gay. Bell.. gender. Upper Saddle River. The impact of victimization on the mental health and suicidality of lesbian. . Journal of Sex Research 22:21–34. 12. (1986). DC: American Association of Suicidology. the 29th Annual Conference of the American Association of Suicidology. & Morville. pp.. 9.. (1998). NJ: Prentice Hall. Hunter. M.S. L. 5. (1990). 2.P. & D’Augelli. order. IN: Indiana University Press. Washington. G. Dervin. J. R. Weinberg. Web content and design. Harry. M. Deb Finstad and other members of the Research Program in Family Practice and Community Health provided invaluable feedback on a presentation of the methodological developments associated with the data reported here. For example. Journal of Interpersonal Violence 5: 295–300. Journal of Adolescent Research 9:498–508. Chaos. Williams. Sampling gay men. of Public Documents. New York: Oxford University Press. (1994). & D’Augelli. C. (1998). Remafedi. Although it is possible to deceive us. S. Bloomington. 3rd ed. & Pryor.S. (1978). B. E. DiNucci. Gary Remafedi.W.M. Mathy. This paper has introduced several new rigorous methodologies for sampling via the Internet. Ours is not an exception. (1998).. Parasuicide among lesbians: research neglect and familial abuse. Antecedents and consequences of victimization of lesbian. 17. S. U. 14.. Joseph Harry.. A. (1981). et al. Two of the authors of this paper have decades of experience conducting research on issues related to sex. 16. New York: MIS Press.J. Violence against lesbian and gay male youths. Agresti.R.P. and bisexual youths.. and Ritch Savin-Williams have provided invaluable feedback on the Research Project on Sexual Orientation and Families of Origin. K. the methodology has promise for increasing our knowledge about difficult-to-reach and empirically underrepresented populations. In: R. M. 95–97. Suicide and sexual orientation: a critical summary of recent research and directions for future research. (1998). ACKNOWLEDGMENTS This research was supported in part by an NIMH supplemental Research Grant to R. (1999). 3.W..S.M. Suicide and Life-Threatening Behavior 25:72–81. Multimedia: making it work. MA: MIT Press. the methodology tested here is significantly more robust than much larger random samples drawn from the general population. However..R. Homosexualities: a study of diversity among men and women. Hunter... (1997). Hesson-McInnis. M. DC: Supt. In essence. J. 118th ed.. A. (1997). Information design.L.J. on which the data for this article are based. Weinberg. Statistical methods for the social sciences.C. American Journal of Public Health 88:57–60. 8th ed. Rotheram-Borus. 2nd ed. Therefore. S.
J. (1998). Conducting on-line focus groups: a methodological discussion. Poverty in the United States. 21.edu . T. Social Science Computer Review 15:135–144. MATHY ET AL. Government Printing Office. Paul.J. Washington. Address reprint requests to: Dr. D. Dallaker. M. United States Bureau of Census. Current Population Reports.C. (1997). Series P60-207.266 20. MN 55108 E-mail: math5577@. Robin M.umn.: U. Mathy School of Social Work/105 Peters Hall 1404 Gortner Avenue St. & Naifer.S. Gaiser.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.