STATISTICS AND RESEARCH DESIGN: DOES SIMPLE RANDOMIZATION AFFECT GROUP BALANCE

STATISTICS AND RESEARCH DESIGN
Simple randomization may lead to

unequal group sizes. Is that a problem?
Wendel Minoru Shibasakia and Renato Parsekian Martinsb
Araraquara, S~ao Paulo, Brazil
I
t is virtually impossible to control all the factors that In these 2 examples, one might think that the biases
may confound results in research. Factors unknown mentioned could be controlled if the adhesive systems
to the researcher can create differences between (in the in-vitro test) or composite types (in the in-vivo
groups that are not related to the primary factor under test) were evaluated or applied in an alternate manner.
analysis. Randomization is perhaps one of the best Even though this idea may seem interesting, it could
ways to try to overcome such unknown factors.1 It ho- also lead to selection bias: eg, if the operator knows
mogenizes groups so that each specimen, participant, which adhesive is being evaluated, a systematic bias
or intervention has the same chance of being allocated might be introduced. Besides eliminating biases and giv-
to each experimental group. Therefore, it is important ing all specimens the same chance of allocation to any of
to include randomization in experimental designs of or- the groups, randomization is a fundamental premise
thodontic studies when possible. that justifies most of the statistical procedures used in
Imagine a research project where 2 adhesive systems data analysis.2
from 2 manufacturers are being compared for shear Thus, any attempt at randomization that has a logical
bond strength. The researcher may collect data of the pattern of allocation and that deviates from pure chance
adhesive system from manufacturer A in the morning (eg, chart identification numbers, day of the week, date
and from manufacturer B in the afternoon. It is possible of birth) can possibly, even if unconsciously, introduce
that the researcher might be tired toward the end of the unknown factors into the groups being studied. This
day and therefore could, unconsciously, be less precise could confound the intervention investigated and
when collecting data. Thus, if differences were found be- compromise the research results.
tween the groups, would those be attributed only to the In orthodontics, these problems are usually circum-
adhesive system or to the researcher's fatigue as well? vented by generating random numbers that do not follow
Similarly, imagine a clinical study aiming to compare any pattern by a method called simple randomization.
the bracket debonding rates using 2 composites. To Specific softwares or functions, such as RANDBETWEEN
compare them in the same patient in a split-mouth (Excel; Microsoft, Redmond, Wash), are normally used to
design, the researcher could bond the right-side brackets produce a list of random numbers that leave all expected
with composite A and the left-side ones with composite or unforeseen biases to chance alone.
B. Because the orthodontist might have a different Nevertheless, simple randomization has a significant
perspective between the patient's right and left sides, problem in studies with small sample sizes, because
the bonding quality could be slightly different between there is a high probability that the groups will be in a
the sides, and this may result in differences that are state of imbalance3; in most of the current orthodontic
not really attributed to the composites. This could un- literature, sample sizes are usually small. Orthodontic
dermine the results because it is expected that differ- randomized controlled trials published from 1992
ences between 2 interventions are explained only by through 2012 were shown to have a median sample
the factors being evaluated. size of 46 to obtain adequate power, and a median sam-
a
ple size of 60 was necessary to produce results.4 Howev-
Graduate Program of Orthodontics, Faculdade de Odontologia de Araraquara,
Universidade Estadual Paulista, Araraquara, S~ao Paulo, Brazil.
er, the percentages of these studies with groups in a
b
Private Practice and Graduate Program of Orthodontics, Faculdade de Odonto- state of imbalance, or whether they used any methods
logia de Araraquara, Universidade Estadual Paulista, Araraquara, S~ao Paulo, to restrict randomization, decreasing the chance of pro-
Brazil.
All authors have completed and submitted the ICMJE Form for Disclosure of Po-
ducing groups in a state of imbalance, are unknown.
tential Conflicts of Interest, and none were reported. This imbalance can produce differences in distribution
Am J Orthod Dentofacial Orthop 2018;154:600-5 and variances, decreasing the statistical power.5,6
0889-5406/$36.00
Ó 2018 by the American Association of Orthodontists. All rights reserved.
One way to overcome this problem is to use block
https://doi.org/10.1016/j.ajodo.2018.07.005 randomization, which produces equal numbers of
600
Statistics and research design 601
specimens in each group. In such a method, the spec- had larger imbalances (from 2.3:1 to 5.7:1), with grad-
imens are distributed into blocks of multiple numbers ually decreasing test power. This decrease was marked
related to the number of groups under study, contain- especially when a large effect was used in the simula-
ing all possible combinations of allocation but main- tion, causing a 17% decrease (Table I). The decrease
taining a 1:1 balance. Thus, in the aforementioned in test power with imbalanced samples occurred in all
clinical research example, when determining which effect sizes, with greater reductions for larger effect
composite, A or B, will be tested on the right or left sizes. In block randomization, all groups have a 1:1
side of 24 patients in a split-mouth design, 4 alloca- equilibrium rate; thus, the test power will be 40%
tions are arranged into 6 blocks (AABB, BBAA, BABA, with large effect sizes, 19% with medium effect sizes,
ABAB, ABBA, and BAAB). These blocks will then un- and 7% with small effect sizes.
dergo simple randomization to determine the sequence Only 8% of the simple randomization simulations
in which they will be applied until all patients are with 30 specimens showed balanced groups (1:1). In
bonded. The investigator must be careful, though, the remaining simulations, imbalance ranged from
when using blocks of the same size, which are easier 1.1:1 to 2.3:1, with maximum decreases of 7% in the
to manage, because that could lead to a prediction test power with a large effect size, 3% with a medium ef-
of which treatment will be allocated next. Different fect size, and less than 1% with a small effect size. In all
block sizes can be used to overcome that issue. effect sizes, the test power dropped, but the drop was
Therefore, the aim of this article was to determine more pronounced when a large effect size was used.
when block or simple randomization is necessary, based As in the previous simulation, block randomization
on the probability of imbalance between groups and on groups had a 1:1 equilibrium, but the test power was
the influence of the statistical power. higher for these comparisons. Test powers were 56%,
26%, and 8% for large, medium, and small effect sizes,
MATERIAL AND METHODS respectively (Table II).
Four hypothetical research designs were analyzed, With 60 specimens, only 10% of the simulations
varying the numbers of subjects (20, 30, 60, and 90) had balanced groups, which resulted in an 86% power
allocated into 2 groups, using an independent 2-tailed in a test with a large effect size. In the remaining sim-
t test, with a 5 0.05, by simple randomization using ulations of large effect size, test power was above 80%
the RANDBETWEEN function of the Excel 2011 soft- even when the imbalance was 2.2:1. When a medium
ware. effect size was used, the maximum decrease of the
A total of 100 allocation simulations were made for test power was 5% (from 1:1 to 2.2:1); in small effect
each research design to describe the differences between sizes, the test power decreased by only 1%. In the block
specimens in the groups, their frequencies, and the bal- randomization, the groups had a 1:1 equilibrium rate,
ance ratios. Statistical power was also calculated for each and the test power values were 86%, 48%, and 12%
balance ratio using the G*Power software (http://www. with large, medium, and small effect sizes, respectively
gpower.hhu.de/en.html).7 The effect sizes were varied (Table III).
from small and medium to large using Cohen's When the sample size was 90, 4% of the simulations
d (d 5 0.2, d 5 0.5, and d 5 0.8, respectively)8 and produced balanced groups, whereas 15% of the groups
the aforementioned parameters. were similar (1:1) “on average.” In the other 77% of
The values obtained with block randomization simu- the simulations, where imbalances ranged from 1.1:1
lations, with a fixed equilibrium ratio of 1:1, were to 1.6:1, the test power (96%) did not change for the
compared with those using simple randomization. large effect size, but it fell by 2% and 1% for the medium
and small effect sizes, respectively. The largest imbal-
ance (1.8:1) had a frequency of 3% and was responsible
RESULTS for decreasing the test power by 1%, 4%, and 1% in
When simulating with 20 specimens, 17% of the large, medium, and small effect sizes, respectively. In
simulations showed a ratio of 1:1 between the samples. block randomization, the groups had a 1:1 equilibrium,
On the other hand, 70% of the simulations showed im- and the test power values were 96%, 65%, and 16% for
balances from 1.2:1 to 1.9:1, which caused a maximum the large, medium, and small effect sizes, respectively
reduction of 10% of the test power to evaluate large (Table IV).
effects, but less than 1% when medium or small effects The decreases in test power became less as the sample
were evaluated. The remaining 13% of the simulations size increased, and the effect sizes were larger.
American Journal of Orthodontics and Dentofacial Orthopedics October 2018 Vol 154 Issue 4
602 Statistics and research design
Table I. Results from 100 simulations of the allocation of 20 patients into 2 groups by simple randomization
Power (%)
G1 G2 Imbalance Imbalance ratio Frequency d 5 0.8 d 5 0.5 d 5 0.2

10 10 1:1 17% 40 19 7
11 9 0.8 1.2:1 25% 39 18 7
12 8 0.7 1.5:1 25% 38 18 7
13 7 0.5 1.9:1 20% 37 17 7
14 6 0.4 2.3:1 6% 34 16 7
15 5 0.3 3:1 4% 31 15 7
16 4 0.3 4:1 2% 27 14 6
17 3 0.2 5.7:1 1% 23 12 6
G, Group.
Table II. Results from 100 simulations of the allocation of 30 patients into 2 groups by simple randomization
Power (%)

15 15 1:1 8% 56 26 8
16 14 0.9 1.1:1 30% 56 26 8
17 13 0.8 1.3:1 18% 55 26 8
18 12 0.7 1.5:1 19% 54 25 8
19 11 0.6 1.7:1 12% 53 25 8
20 10 0.5 2:1 9% 51 24 8
21 9 0.4 2.3:1 4% 49 23 8
G, Group.
Table III. Results from 100 simulations of the allocation of 60 patients into 2 groups by simple randomization
Power (%)

30 30 1:1 10% 86 48 12
31 29 0.9 1.1:1 14% 86 48 12
33 27 0.8 1.2:1 18% 86 47 12
34 26 0.8 1.3:1 23% 86 47 12
35 25 0.7 1.4:1 7% 85 47 12
36 24 0.7 1.5:1 7% 85 46 12
37 23 0.6 1.6:1 3% 84 46 11
39 21 0.5 1.9:1 4% 83 44 11
41 19 0.5 2.2:1 1% 81 43 11
G, Group.
DISCUSSION showed.10 As the imbalance increases, regardless of sam-

The probability of groups being balanced after simple ple size, the test power decreases. This may lead to the
randomization is small, regardless of the sample size. need of having groups balanced to maintain a high po-
This does not happen with block randomization, where wer in the statistical tests. Thus, researchers may use a
the groups are balanced, as has already been shown in block type of randomization or erroneously try to bal-
the literature.9 The problem with group imbalance is ance the groups by improperly interfering with the sim-
that the power of the test decreases, increasing the likeli- ple randomization process. This improper balancing may
hood of a type II error, which is what our results occur due to a simple randomization imbalance or even
October 2018 Vol 154 Issue 4 American Journal of Orthodontics and Dentofacial Orthopedics
Table IV. Results from 100 simulations of the allocation of 90 patients into 2 groups by simple randomization
Power (%)

45 45 1:1 4% 96 65 16
46 44 1.0 1:1 15% 96 65 16
47 43 0.9 1.1:1 16% 96 65 16
48 42 0.9 1.1:1 13% 96 65 15
49 41 0.8 1.2:1 11% 96 65 15
50 40 0.8 1.3:1 11% 96 64 15
51 39 0.8 1.3:1 11% 96 64 15
52 38 0.7 1.4:1 3% 96 64 15
53 37 0.7 1.4:1 7% 96 64 15
54 36 0.7 1.5:1 2% 96 63 15
55 35 0.6 1.6:1 3% 96 63 15
56 34 0.6 1.6:1 1% 95 62 15
58 32 0.6 1.8:1 3% 95 61 15
G, Group.
by losing some specimens of 1 group. Let us analyze the was 2.2:1, reducing the test power by 5%, regardless
following example: in a sample size calculation for an of the effect size. If 80% test power is considered to
independent 2-tailed t test, with a large effect size be adequate for an orthodontic research, simple
(0.8), with the type I error probability set to 5% and randomization can be used safely in samples larger
type II error set to 80%, the result will be a sample of than 60 if the effect size is large, because the test po-
26 participants in each group. If 3 patients drop out of wer will not be lower than 80%. In small samples
1 group and 3 are removed from the other group (in (n 5 30) or experiments that intend to demonstrate
an attempt to recover a 1:1 balance), 2 groups of 23 pa- small differences between groups, block randomization
tients will be left. This would decrease the power of the is suggested as a safe way to maintain balance between
test by 4.4%, to 75.6%. On the other hand, if the patient groups and to preserve the test power. If a simple
drop is not compensated for and the imbalance is main- randomization is used instead, not only would it pro-
tained, resulting in 23 patients in 1 group and 26 in the duce a small test power (based on the experimental
other, the test power would only drop by 2%, to 78%, design itself), but it could also decrease the test power
with a smaller decrease than when trying to balance even further due to an almost certain imbalance be-
the groups. tween groups.
The innocent attempt to remove specimens to bal- Even though a sample of 46 has been shown to be
ance groups will reduce the test power because it is adequate to obtain enough power to produce results in
more sensitive to the overall sample size than to the bal- the orthodontic literature,4 it is always wise to restrict
ance between the groups (Fig). Although this procedure randomization, decreasing the chance of producing
is not considered common, it happens more often than imbalanced groups, because this sample size is considered
expected. A search conducted in the PubMed database too small to produce balanced results. In addition to block
on July 17, 2017, using the terms "orthodontic [All randomization, there are other methods to overcome
Fields] AND (Controlled Clinical Trial[ptyp] AND some of the problems of simple random allocation: eg,
("2016/07/17"[PDAT]: "2017/07/17"[PDAT]) AND "hu- stratification and minimization. They can be used when
mans"[MeSH Terms] AND English [lang])" returned 30 known factors could confound the results and need to
studies. A more in-depth analysis of their methodologies be controlled, especially in small sample sizes. A good
showed that 15 articles applied the simple randomiza- example may be the influence of sex on compliance. If
tion method, and surprisingly, 10 of them obtained women comply more in using their retainers, a study
identical groups. comparing relapse between 2 appliances may need to
As the sample size increases, larger imbalance ratios have a balance between the sexes within each group;
are less frequent, and as mentioned, these large ratios otherwise, the outcome can be confounded.11
will significantly affect the test power.10 In sample sizes The first method, stratification, involves dividing the
larger than 60, the highest observed imbalance ratio groups studied into subgroups, called strata, where
604 Statistics and research design
Fig. Graph depicting the decrease of test power in relation to the group imbalance in different sample
sizes.
known factors that could confound the results can be included in the model (analysis of covariance), should
controlled. Then the features of block randomization be used.14 This will account for possible bias because
are used in each stratum to balance them; therefore, of the baseline values of covariates between groups.15
it is a combination of both methods. It is good for However, the interpretation of this postadjustment
small sample sizes when imbalances of prognostic pre- approach is often difficult because the imbalance of co-
dictors are more likely to occur.1 However, when there variates frequently leads to unanticipated interaction ef-
are too many prognostic factors that could confound fects, such as unequal slopes among subgroups of
the results, the division of the sample into many strata covariates.16
will result in a large number of subgroups with fewer
specimens, which could imbalance the treatment allo- CONCLUSIONS
cation. The alternative method for the problem of hav-
ing several prognostic factors in small samples is For independent parallel group studies, the following
minimization.12 conclusions were made.
Minimization works by randomizing the sample into 1. Simple randomization almost always produces im-
strata dynamically during the experiment. In contrast to balances between groups, which is a problem in
stratified randomization, where the specimens are allo- studies with fewer than 60 specimens. Therefore,
cated before the experiment, in minimization, the treat- simple randomization can be used for studies with
ment allocated to the next participant enrolled in the samples larger than 60.
trial depends on the characteristics of participants 2. Block randomization or another method that re-
already enrolled.12 Besides being slightly more compli- stricts randomization and manages imbalance,
cated to use, it is best performed with the help of soft- such as stratified randomization or minimization,
ware; the main issue with minimization is that it may should be used in studies with less than 60 samples.
be possible to predict the next allocation by looking at 3. The sample size influences the test power more than
the past assignment. the imbalance rate.
Another issue with small sample sizes that should be
discussed is the situation where the covariates may
REFERENCES
remain imbalanced even when the groups are numeri-
cally balanced. Normally, this will not occur in large 1. Pandis N, Polychronopoulou A, Eliades T. Randomization in clin-
samples where randomization was performed correctly, ical trials in orthodontics: its significance in research design and
methods to achieve it. Eur J Orthod 2011;33:684-90.
but it can affect the results of small samples, particularly 2. Hahn GJ, Meeker WQ. Assumptions for statistical inference. Am
with fewer than 50 specimens.13 When baseline covari- Stat 1993;47:1-11.
ates are imbalanced, a multiple linear regression, where 3. Lachin JM. Properties of simple randomization in clinical trials.
the baseline variable of the dependent variable is also Control Clin Trials 1988;9:312-26.
October 2018 Vol 154 Issue 4 American Journal of Orthodontics and Dentofacial Orthopedics
4. Koletsi D, Pandis N, Fleming PS. Sample size in orthodontic ran- 10. Lachin JM. Statistical properties of randomization in clinical trials.
domized controlled trials: are numbers justified? Eur J Orthod Control Clin Trials 1988;9:289-311.
2014;36:67-73. 11. Pratt MC, Kluemper GT, Lindstrom AF. Patient compliance with
5. Chen Z, Zhao Y, Cui Y, Kowalski J. Methodology and application of orthodontic retainers in the postretention phase. Am J Orthod
adaptive and sequential approaches in contemporary clinical trials. Dentofacial Orthop 2011;140:196-201.
J Probab Stat 2012;2012:1-20. 12. Altman DG, Bland JM. Treatment allocation by minimisation. BMJ
6. Suresh K. An overview of randomization techniques: an unbiased 2005;330:843.
assessment of outcome in clinical research. J Hum Reprod Sci 13. Grizzle JE. A note on stratifying versus complete random assign-
2011;4:8-11. ment in clinical trials. Control Clin Trials 1982;3:365-8.
7. Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: a flexible sta- 14. Kirkwood BR, Sterne JA, Malden MA. Essential medical statistics.
tistical power analysis program for the social, behavioral, and 2nd ed. Oxford, United Kigdom: Blackwell; 2003.
biomedical sciences. Behav Res Methods 2007;39:175-91. 15. Pandis N. Analysis of covariance. Am J Orthod Dentofacial Orthop
8. Cohen J. A power primer. Psychol Bull 1992;112:155-9. 2016;150:200-1.
9. Lachin JM, Matts JP, Wei LJ. Randomization in clinical trials: con- 16. Kang M, Ragan BG, Park JH. Issues in outcomes research: an over-
clusions and recommendations. Control Clin Trials 1988;9: view of randomization techniques for clinical trials. J Athl Train
365-74. 2008;43:215-21.

STATISTICS AND RESEARCH DESIGN: DOES SIMPLE RANDOMIZATION AFFECT GROUP BALANCE

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STATISTICS AND RESEARCH DESIGN: DOES SIMPLE RANDOMIZATION AFFECT GROUP BALANCE

Uploaded by

Copyright:

Available Formats

STATISTICS AND RESEARCH DESIGN

Simple randomization may lead to

G1 G2 Imbalance Imbalance ratio Frequency d 5 0.8 d 5 0.5 d 5 0.2

G1 G2 Imbalance Imbalance ratio Frequency d 5 0.8 d 5 0.5 d 5 0.2

G1 G2 Imbalance Imbalance ratio Frequency d 5 0.8 d 5 0.5 d 5 0.2

DISCUSSION showed.10 As the imbalance increases, regardless of sam-

G1 G2 Imbalance Imbalance ratio Frequency d 5 0.8 d 5 0.5 d 5 0.2

You might also like