You are on page 1of 6



Study Design in Medical Research
Part 2 of a Series on the Evaluation of Scientific Publications

Bernd Röhrig, Jean-Baptist du Prel, Maria Blettner

Background: The scientific value and informativeness of M edical research studies can be split into five
phases—planning, performance, documenta-
tion, analysis, and publication (1, 2). Aside from finan-
a medical study are determined to a major extent by the
study design. Errors in study design cannot be corrected cial, organizational, logistical and personnel questions,
afterwards. Various aspects of study design are discussed scientific study design is the most important aspect of
in this article. study planning. The significance of study design for
Methods: Six essential considerations in the planning and
subsequent quality, the relability of the conclusions,
evaluation of medical research studies are presented and and the ability to publish a study are often underestimated
discussed in the light of selected scientific articles from (1). Long before the volunteers are recruited, the study
the international literature as well as the authors' own design has set the points for fulfilling the study objec-
scientific expertise with regard to study design. tives. In contrast to errors in the statistical evaluation,
errors in design cannot be corrected after the study has
Results: The six main considerations for study design are
been completed. This is why the study design must be
the question to be answered, the study population, the unit
laid down carefully before starting and specified in the
of analysis, the type of study, the measuring technique, and
study protocol.
the calculation of sample size.
The term "study design" is not used consistently in
Conclusions: This article is intended to give the reader the scientific literature. The term is often restricted to
guidance in evaluating the design of studies in medical the use of a suitable type of study. However, the term
research. This should enable the reader to categorize can also mean the overall plan for all procedures in-
medical studies better and to assess their scientific quality volved in the study. If a study is properly planned, the
more accurately. factors which distort or bias the result of a test procedure
Dtsch Arztebl Int 2009; 106(11): 184–9 can be minimized (3, 4). We will use the term in a
DOI: 10.3238/arztebl.2009.0184 comprehensive sense in the present article. This will
Key words: study design, quality, study, study type, deal with the following six aspects of study design:
measuring technique the question to be answered, the study population, the
type of study, the unit of analysis, the measuring tech-
nique, and the calculation of sample size—, on the
basis of selected articles from the international litera-
ture and our own expertise. This is intended to help
the reader to classify and evaluate the results in publi-
cations. Those who plan to perform their own studies
must occupy themselves intensively with the issue of
study design.

Question to be answered
The question to be answered by the research is of
decisive importance for study planning. The research
worker must be clear about the objectives. He must
think very carefully about the question(s) to be
answered by the study. This question must be opera-
tionalized, meaning that it must be converted into a
measurable and evaluable form. This demands an
Institut für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI), adequate design and suitable measurement parameters.
Johannes Gutenberg-Universität Mainz: Dr. rer. nat. Röhrig, Prof. Dr. rer. nat.
Maria Blettner A distinction must be made between the main questions
Zentrum Präventive Pädiatrie, Zentrum für Kinder- und Jugendmedizin, to be answered and secondary questions. The result of
Johannes Gutenberg-Universität Mainz: Dr. med. du Prel, M.P.H the study should be that open questions are answered

184 Deutsches Ärzteblatt International Dtsch Arztebl Int 2009; 106(11): 184–9

and tumor stage. age). the research worker many time points. tumor stage 3). as they are from different treatment centers. or before and after the intervention. exploratory or confirmatory manner. the aim in an explor. it may be interesting to compare the quality of life of all patients in a clinical department in the course of breast cancer patients with women of the same age one year. made about a population corresponding to these selec- swered. less is often more.g. mam- mary carcinoma). The possibility of generalizing the results descriptive. and physiological investigations. recording the primary study evaluation must be specified in the study protocol before data. and whether the study is to be evaluated in a tion criteria. the area (e. The experience is then that not all data data and research on secondary data. required number of patients can be reached within a The question to be answered also determines the type shorter period and that the results can more readily be and extent of the data to be recorded. and meta-analyses. by random selection through the residents' such as animal experiments. This can be ences the selection of the type of study. which data are to be recorded at which point in time. confirmatory analyses institution or site. With a selective sample. or from studies. to evaluate these and to formulate new enced by the decision to perform the study at a single hypotheses.g. a distinction must be used to generalize the results from the sample is made between narrative reviews. are evaluated. the condition (e. For example. Research on primary data means performing the The question to be answered and the strategy for actual scientific studies. between two groups. In this case. be greatly influenced by whether Whereas in a descriptive study the units of analysis the patients come from a specialist practice. the research worker is not primarily from population statistics. This raises the external validity.g. statistical test procedures conclusions. On the other hand. research are distinguished. are planned to provide statistical proofs by testing The advantages of a multicenter study are that the specified study hypotheses. age. may. In secondary data research. but must specify the group of patients (e.g. but in objective may be to win a comprehensive overview of whether the results can be transferred to the target the current state of research and to come to appropriate population. and poor compliance from a distinction in principle between research on primary the volunteers. analysis of recorded data. The FIGURE 1 Connection between following questions are important: Why? Who? overall population What? How? When? Where? How many? The question and study population/data to be answered also implies the target group and should therefore be very precisely formulated. clinical and epidemiological exclusion criteria. the question should not be "What is the quality of life?". Another interested in the observed study population. or consecutively. 106(11): 184–9 185 . for example. practices.g. a special- are to be described by the recorded variables (e. the EORTC QLQ- C30 questionnaire) at what point in time. MEDICINE and possibly that new hypotheses are generated. for example. or at several (multicenter study). quality of life) is to be de- termined with which method (e. This may include (renewed) there is a target group for whom this is to be clarified. The sample can be highly representative of the The underlying question to be answered also influ- study population if it is properly selected. This is intended to answer scientific questions the study is started. this can lead to low participation must be clear about the category of research.g. There is rates. high dropout rates. Study participants may be selected randomly. for Experimental research includes applied studies. Accordingly. In contrast. Nevertheless. for the whole population (figure 1).g. Scientific questions are often not only purely descriptive. If too many variables are recorded at too Before the study type is specified. but also include comparisons. perhaps from a register. such as sex. after surgery). and studies on Deutsches Ärzteblatt International Dtsch Arztebl Int 2009. perhaps also the intervention (e. biochemical registration office. research on secondary results involves Study population the analysis of studies which have already been per- The question to be answered by the study implies that formed and published. Data irrelevant to the question(s) to be answered should not be collected for Type of study the moment. without cancer. This specifies generalized. For ex- ample. and what endpoint (in this case. systematic reviews. for example. the disease (e. ized hospital department or from several different blood parameters or diagnosis). a statement can only be The research worker specifies the question to be an. experimental. Germany). and to gain new knowledge. atory analysis is to recognize connections between The possibility of generalization may also be influ- variables. In primary attained with defined and selective inclusion and research. cell studies. example.

hereditary information. Precision describes the extent to which a measuring technique consistently provides the same results if the Unit of analysis measurement is repeated (9). 8). Unfortunately.MEDICINE Portrayal of the FIGURE 2 of a region or of a country. or specified subgroup or the population sures exactly what it is supposed to measure. always the case that a measuring instrument of high A subsequent article will discuss the different study quality should be of high precision and validity. On model. clinical study. characteristics developed before extraction. or and of their causes. a cellular structure. German Drugs Act ment. Examples of observational fully standardized and also depends on the special area epidemiological studies include cohort studies. Measuring technique strate the clinical or pharmacological activities of The term "measuring technique" includes the use of drugs" and "to provide convincing evidence of the measuring instruments and the method of measure- safety or efficacy of drugs" (AMG. In systematic reviews. the unit of analysis may also be a technical imprecise and a larger sample size is needed (9). it must be clarified whether these are independent with respect to the question to be answered (i. This is particularly the case when the teeth are subject to additional preparation. guished from observational studies (7.e. On the other hand. ing. the other hand. It is therefore important for statistical reasons to know whether the units of analysis are dependent or independent of each other with respect to the outcome parameter. these Clinical research includes interventional and non. as well as the development of ana. a cell. case (for example. from different test persons) or dependent (i. act on all teeth in the mouth in the same way. chemical analysis. Thus the 186 Deutsches Ärzteblatt International Dtsch Arztebl Int 2009. types in detail. as there are no longer any shared factors which influence them. characteristics must be regarded as dependent. The reliability (or preci- The unit of analysis (investigational unit) must be sion) provides information on the precision or the specified before starting a medical study. and ecological with questionnaires. cutting or grind- material properties. if the teeth of test persons are the unit of analysis. In clinical studies. Teeth in the mouth of a single test person are generally dependent. such as nutrition and teeth cleaning habits. §4) (5). measurements are However. extracted teeth are generally independent study objects. patients are randomly assigned to treatment groups. a single test individual trueness) of a measuring instrument is high if it mea- (animal or man). The selection of the unit of analysis influences the interpretation of the study results. the terms reliability unit of analysis is a single study. 7). variables. interventional studies. For example. The sample then (precision) and includes the total of all units of analysis. cross-sectional studies. the patient is the unit of analysis. It is studies. In contrast. This distinction is not always easy. For example. as specific factors. depression. 106(11): 184–9 . or diagnostic studies). On the other hand. if the observations are on tooth lytical and biometric procedures.e. using a target characteristics) are recorded for the statistical units. addition of food quality and practicability are evaluated using statis- additives. noninter. psychological studies control studies. The interesting validity (trueness) information or data (observations. for example. Interventional During the validation of a measuring instrument. the correlation coefficients are low. quality of life. If the precision is low. satisfaction). from the same test person). in Measuring instruments include instruments which which patients are given an individually specified specifically record measuring data (such as blood treatment (6. its studies (such as vaccination. if the heart is being investigated in a pa- tient (the unit of analysis). the heart rate may be mea- sured as a characteristic of performance. The objective of interventional clinical studies (clinical trials) is "to study or demon. as well as data Epidemiological research studies the distribution collection with standardized or self-designed question- and changes with time of the frequency of diseases naires (for example. Use of measuring instruments ventional clinical studies are observational studies. pressure or laboratory parameters). the nomenclature is not experimental character. fluoride addition to drinking water) are of tical parameters. an organ system. Experimental studies are distin. the validity (accuracy of the mean or an organ. In a typical occurrence of random errors.

The size of medical clinically relevant difference) and its scatter (for studies is often too small. 13). 2). Whatever the study design. a real difference—for in preliminary studies or from published information. As a consequence. analytical laboratory or recording time. MEDICINE validity provides information on the occurrence of TABLE 1 systematic errors (10). External validity the study groups. performance. Summary of important terms to validate a measurement method ments. or whether a multicenter in the same room. This cannot be corrected once the g/dL). If there are differ. For worker has to consider whether alternative procedures example. the validity reflects the difference between the measured and true parameter (10). with the same study should be performed in collaboration with col- instrument. and ment (10. it Discussion must be established that the measurements are in agree.g. These may be determined small (23). publication are the component parts of medical studies The type of scale used for the recorded parameter is (1. If the sample is small. This is not in accordance with question to be answered. large if the scatter of the outcome parameter is large in nal validity can be distinguished (13). To obtain which make it difficult to justify a study which is eit- comparable and objective measurements. low. The problem is not only that the measurements may be invalid or false. structural. leagues. a calculation must be per. but also that the measurements may lead to erroneous conclusions. The accuracy is only high when Accuracy of the mean both the precision and the validity are high. documentation. For example. measuring instrument. bringing the risk that real dif- criteria. and reduction of external interfering factors. Study design is of decisive importance in plan- also of decisive importance. Planning. so that the power is also too example. ly planned before starting the study and this must be laid down in the study protocol. ences. 22). scale type is in principle possible. The type of scale is so important. There are both ethical problems—stress to patients.g. Test persons (or animals) are subjected to necessary number of units of analysis (for example. Reliability and validity surement method. the detailed planning. although the con. but not excessively large. the her too large or not large enough (16–19). Transformation from a higher to a lower rapid evaluation of the study design (table 2). Figure 2 portrays Term Concept the terms. The in. External and inter. standard deviation). in the same position. as performance (20). For this reason. about two thirds of 56 typical verse is impossible. possibly random Measurement plan allocation of therapy—and economic problems— The measurement plan describes the number and time financial. for example in the investigator. the design must be precise- conversely. The sample must also be Low power is the result if the study is too small. the hemoglobin errors in studies are connected to errors in design and content may be determined with a metric scale (e. exploiting ethics (protection of the individual) and collective knowledge of the expected effect (for example. Medical studies must consider both individual This requires calculation of sample size. scales are superior to ordinal scales. This not only lays down the statistical analysis. but not vincing. example. Putting it simply. discover a small difference. Whereas the precision describes the difference (variance) between repeated measure. ferences will not be identified (16. The research measurement conditions must be standardized. as and the significance and implementation of the study both descriptive statistics and statistical test procedures results (2). which are superior but also ultimately the reliability of the conclusions to nominal scales. (21. Table 1 Accuracy Accuracy summarizes the important terms to validate a mea. the Deutsches Ärzteblatt International Dtsch Arztebl Int 2009. The sample size is often restricted by the available ternal validity is the validity of a result for the actual time and/or by the budget. A six point checklist can be used for the depend on it. metric ning. the personnel or the funding. 12). 106(11): 184–9 187 . defined inclusion and exclusion power will also be low. using a target as a model. This requires a great Calculation of sample size deal of time. pressure must always be performed at the same time. unnecessary stress and research capacity is wasted patients) to answer the main study question (14–16). and with regard to personnel— points of the measurements to be performed. This makes the study less con- (e. and by the same person. Reliability Precision Reliability and validity are subsumed in the term Validity Trueness accuracy (11. This can be optimized by good scientific practice. clinical study measurements such as blood might be possible. the ethics (benefit for society) (22). It can then be transformed to an ordinal scale data have been collected. analysis. such as increasing the time available. 17). normal and high hemoglobin status). between the activity of two therapies—is It is generally true that a large sample is required to either unidentified or only described imprecisely (24). In the final analysis. According to Sackett. studies with poor design are formed before the start of the study to estimate the unethical. Sample size planning helps to ensure means the possibility of generalizing the study results that the study is large enough. for the study population to the target population.

A. The excuse that there is not enough time or money is misplaced. Hüttner M. too large. a prosthesis. London. GIT Labor-Fachzeitschrift 2002. – Level of significance 8. 227: 309–13. Altman DG: Practical Statistics for Medical research. 10. 2. M. – Clinical studies – Epidemiological studies REFERENCES Unit of observation ❃ Technical model (for example. Study population ❃ Information on – recruitment (type. Planning Checklist to evaluate study design errors and inadequacies can no longer be corrected once Item Content/information the study has been completed. sex. 174: studies. Aufl. Schäfer H. 2nd edition Bristol: BMJ Books 2000. Fortschr Röntgenstr 2002. New York: Springer 2007. material in 1. DocCheck. heim: Psychologie Verlags Union 1998.. Wagner EH. Fletcher SW. 188 Deutsches Ärzteblatt International Dtsch Arztebl Int 2009. Schwarting U: Grundzüge der Marktforschung. from a region) 4. age. Schulz KF. 17. Much more new knowledge is 358–62. Kauczor H-U: Fallzahlplanung in referenzkontrol- study of adequate size than from several inadequate lierten Diagnosestudien. Metho- – Validity dische Grundlagen der Planung. won from a single accurately performed. Weinheim. Bern: Verlag Hans Huber 1997. Machin D. Berlin. during the planning phase by calculating the sample 15. 30: 141–54. JAMA 2002. cardiovascular system) logie 1999. hospital group. 7. Moher D. Raatz U: Testaufbau und Testanalyse. 1–600. Fleiss JL: The design and analysis of clinical experiments. 134 : 657–62. 173. Biometrie und Epidemiologie in Medizin und Bio- ❃ Organ system (for example.. Donnevert G: Qualitatssicherung in der Ana- demands that the quality of studies should be increased lytischen Chemie: Anwendungen in der Umwelt-. 2. Aufl. Biebler K-E et al. Ph. On the other hand. measurement (25). 106(11): 184–9 . the scatter of the measurements is too great. 16. a blood sample) for contributers to medical journals. ❃ Population (for example.D. Aufl.what were the conditions? 7. Beaglehole R. 11. unnecessarily many test persons (or ani. Funk W.Flexikon: Thema: Studiendesign. 288: in multicenter studies. Sterne 12. Informatik. Gardner MJ. It is 14. Aufl. Schumacher M. Bryant TN. Dammann V. Pinol APY: Sample Size Tables size. It is therefore advisable to consult an experienced biometrician during the Question ❃ Is the question clearly defined? to be answered planning phase of the study (1. Hearting J: Klinische Epide- – Power – Clinically relevant difference miologie. heart or lung) dien. 2: 153–6. Brüggemann L: Bewertung von Richtigkeit und Präzision bei Ana- difference between the study groups is too small. or by cooperation conduct of underpowered clinical trails. London: therefore necessary to evaluate the feasibility of a study Chapman and Hall 1991. BMJ 1983. Kjellström T: Einführung in die Epidemio- – Type of test logie. Karlawish JHT.: Empfehlungen für die Er- ❃ Cell system stellung von Studienprotokollen (Studienplänen) für klinische Stu- ❃ Organ (for example. Berger J. Schulgen G: Methodik klinischer Studien. 17. Bern: Verlag Hans Huber – Scatter/variance 2007. Krummenauer F. Lebensmittel- by increasing their size and increasing the precision of und Werkstoffanalytik. or lysenverfahren. Biotechnologie und Medizintechnik. illness) the International Committee of Medical Journal Editors. 1987: 296–9. Altman DG. 53–84. ❃ Hereditary information ❃ Cell 2... for the CONSORT Group: The – Standardization of measurement conditions CONSORT Statement: Revised Recommendations for Improving – Type of scale the Quality of Reports of Parallel-Group Randomized Trials. 1–24 und 349–78. It may be necessary to take suitable measures to for Clinical Studies. Oxford. ❃ Single test subject (animal or man) ❃ Selected patient group (for example. ensure that the power is adequate. – Reliability 5. Calculation of ❃ Was the sample size calculated? sample size ❃ If yes. Type of study ❃ Research on secondary data ❃ Research on primary data (actual trials) – Experimental studies Translated from the original German by Rodney A. The power 16. 220–71.doccheck. New York: Wiley-VCH 2005. 3. Fayers PM. Eng J: Sample size estimation: how many individuals should be studied? Radiology 2003. Durchführung und Auswertung. 18). Heidelberg. 1438–44. Altman D. revised version accepted on 8 February 2008. Fletcher RH. if the study is Aufl. – inclusion and exclusion criteria – period of follow-up observation Manuscript received on 30 November 2007. well designed 18. area. 286: 1489– 1–28 – Time points – Number of investigators 6. time) Conflict of interest statement – sociodemographic information on test persons The authors declare that no conflict of interest exists according to the guidelines of (for example. Altman DG.MEDICINE TABLE 2 Only adequately planned studies give results which can be published in high quality journals. Pocock SJ: Statistical guidelines dentistry. Bonita R. ❃ Measurement plan 2. Ann Intern Med 2001. Measuring technique ❃ Use of measuring instruments (=validation) http://flexikon. Lienert GA. Grundlagen und Anwendung. Halpern SD. Campbell MJ. Machin D. may be increased by reducing the heterogeneity. Berlin JA: The continuing unethical improving measurement precision. 9. Yeates. 1–9. Berlin: Blackwell Science Ltd. 1–100.. Gore SM. München: Oldenburg Verlag 2002. Wein- mals) are exposed to stress and resources (such as per. Gardner MJ: Statistics with con- risk group) fidence. 13. New York: John Wiley & Sons 1986: 1–32.. sonnel or financial resources) are wasted. 2.

JAMA 1994. BMJ 1980. 22. Altman DG: Statistics and ethics in medical research. sample size. H: Signifikanz. Reha- bilitation 2004. 21. Effektstärke und Konfidenzintervall. Palmer CR: Ethics and statistical methodology in clinical trials. 1: 23–9. Wells GA: Statistical power. 272: 122–4. Sterne JAC. Dulberg CS. Faller. 20. and their reporting in randomized controlled trials. 19: 219–22. 23. Referat Rehabilitation/Biometrie Albiger Straße 19 d 55232 Deutsches Ärzteblatt International Dtsch Arztebl Int 2009. 25. Corresponding author Dr. Moher D. 32: 51–63. May WW: The composition and function of ethical committees. J Chronic Dis 1979. 43: 174–8. misuse of statistics is unethical. MEDICINE 19. rer. nat. JME 1993. Smith GD: Sifting the evidence—what's wrong with significance tests? BMJ 2001. 24. Bernd Röhrig MDK Rheinland-Pfalz. 281: 1182–4. Sackett DL: Bias in analytic research. 106(11): 184–9 189 . Germany Bernd. 322: 226–31.Roehrig@mdk-rlp. J Med Ethics 1975.