Professional Documents
Culture Documents
Textbook Analysis of Correlated Data With Sas and R Mohamed M Shoukri Ebook All Chapter PDF
Textbook Analysis of Correlated Data With Sas and R Mohamed M Shoukri Ebook All Chapter PDF
https://textbookfull.com/product/clinical-trial-data-analysis-
with-r-and-sas-second-edition-chen/
https://textbookfull.com/product/survival-analysis-with-interval-
censored-data-a-practical-approach-with-examples-in-r-sas-and-
bugs-1st-edition-kris-bogaerts/
https://textbookfull.com/product/graphical-data-analysis-with-r-
antony-unwin/
https://textbookfull.com/product/data-analysis-with-r-2nd-
edition-tony-fischetti/
Functional Data Analysis with R 1st Edition Ciprian M.
Crainiceanu & Jeff Goldsmith & Andrew Leroux And Erjia
Cui
https://textbookfull.com/product/functional-data-analysis-
with-r-1st-edition-ciprian-m-crainiceanu-jeff-goldsmith-andrew-
leroux-and-erjia-cui/
https://textbookfull.com/product/uncertainty-analysis-of-
experimental-data-with-r-1st-edition-benjamin-david-shaw/
https://textbookfull.com/product/statistical-analysis-of-
financial-data-with-examples-in-r-1st-edition-james-gentle/
https://textbookfull.com/product/statistical-data-analysis-using-
sas-intermediate-statistical-methods-mervyn-g-marasinghe/
https://textbookfull.com/product/r-in-action-data-analysis-and-
graphics-with-r-bonus-ch-23-only-2nd-edition-robert-kabacoff/
Analysis of Correlated
Data with SAS and R
Fourth Edition
http://taylorandfrancis.com
Analysis of Correlated
Data with SAS and R
Fourth Edition
Mohamed M. Shoukri
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have
been made to publish reliable data and information, but the author and publisher cannot assume responsi-
bility for the validity of all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has not
been acknowledged, please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans-
mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive,
Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and regis-
tration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a
separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Preface.....................................................................................................................xv
vii
viii Contents
References .........................................................................................................465
Index...................................................................................................................481
Preface
A decade has passed since the publication of the third edition. There has been
an increasing demand by biological and clinical research journals to use state-
of-the-art statistical methods in the data analysis components of submitted
research papers. It is not uncommon to read papers published in high-impact
journals such as The Lancet, New England Journal of Medicine, and British
Medical Journal that deal with sophisticated topics such as meta-analysis,
multiple imputations of missing data, and competing risks in survival anal-
ysis. The fourth edition results in part from continuous collaboration between
the author and biomedical and clinical researchers inside and outside the
King Faisal Specialist Hospital. I want to emphasize that there are funda-
mental differences, both qualitative and quantitative, between this and the
previous editions.
The fourth edition has five additional chapters that cover a wide range of
applied statistical methods. Chapter 1 covers the basics of study design and
the concept of effect size. Chapter 2 deals with comparing group means when
the assumptions of normality and independence of measured outcomes are
not satisfied. The last three chapters titled “Introduction to Propensity Score
Analysis” (Chapter 10), “Introductory Meta-Analysis” (Chapter 11), and
“Missing Data” (Chapter 12) are also new. The other chapters from the third
edition have been extensively revised to keep up with new developments in
the respective areas. For example, topics such as competing risk, time-
dependent covariates, joint modeling of longitudinal and survival data, and
forecasting with exponential smoothing are introduced. Equally important is
the use of the extensive library of packages in R (4.3.3), throughout the book
we use packages produced by the R Development Core Team (2006, 2010,
2014), and the new developments in the SAS (version 9.4) output capabilities.
Every topic is illustrated by examples. Data, with both R and SAS codes are
also given.
Special thanks go to Mr. Abdelmonein El-Dali, Mrs. Tusneem Al-Hassan,
and Mr. Parvez Siddiqui for their valued help. I am also grateful to Mrs. Kris
Hervera and Mrs. Cielo Mendiola who both typed the entire volume.
Last but not the least, special thanks go to Mr. David Grubbs, Ms. Shelly
Thomas, Mr. Jay Margolis, Ms. Adel Rosario, and the entire production team
at CRC Press. Without the help of these people, the timely completion of this
work would have been impossible.
Additional material is available from the CRC Web site: https://www.crcpress
.com/9781138197459.
xv
http://taylorandfrancis.com
1
Study Designs and Measures of Effect Size
1
2 Analysis of Correlated Data with SAS and R
• Advantages
• Can test cause-and-effect relationships.
• Provides better control of the experiment and minimizes threats
to validity.
• Disadvantages
• For many studies, all extraneous variables cannot be identified
or controlled.
• Not all variables can be experimentally manipulated.
• Some experimental procedures may not be practical or ethical
in a clinical setting.
• Advantages
• Good for studying rare conditions or diseases.
• Less time needed to conduct the study because the condition or
disease has already occurred.
• These studies allow us to simultaneously look at multiple risk
factors.
• Useful as initial studies to establish an association.
• Can answer questions that could not be answered through
other study designs.
Study Designs and Measures of Effect Size 5
• Disadvantages
• These studies are subject to what is known as “recall bias.” They
also have more problems with data quality because they rely on
memory.
• Not good for evaluating diagnostic tests because it is already
clear that the cases have the condition and the controls do not.
• In many situations, finding a suitable control group may be
difficult.
• Advantages
• Efficient, not all members of parent cohort require diagnostic
testing.
• Flexible, allows testing of hypotheses not anticipated when the
cohort was drawn.
• Reduces selection bias since cases and controls sampled from
same population.
• Disadvantages
• Reduction in the statistical power of validating the null hypothesis
because of the reduction in the sample size.
6 Analysis of Correlated Data with SAS and R
• Advantages
• Efficient self-matching
• Efficient (reduction in variation) because we select only cases
• Can use multiple controls for one case window
• Disadvantages
• Information bias, inaccurate recall of exposure during control
window (can be overcome by choosing control window to
occur after case window)
• Requires careful selection of time period during which the
control window occurs
• Requires careful selection of the length and timing of the
windows (e.g., in an investigation of the risk of cell phone usage
Study Designs and Measures of Effect Size 7
1.1.7 Confounding
The confounding phenomenon is a potential problem in nonexperimental
studies that try to establish cause and effect. Confounding occurs when part
or all of a significant association between two variables arises through both
being causally associated with a third variable. For example, in a population
study you could easily show a negative association between smoking and
heart disease. But older people are less active, and older people are more
diseased, so you are bound to find an association between activity and disease
without one necessarily causing the other. To get over this problem you have
to control for potential confounding factors. For example, make sure all your
subjects are the same age or create several age strata to try to remove its effect
on the relationship between the other two variables.
Study Designs and Measures of Effect Size 9
1.1.8 Sampling
In most epidemiological investigations we have to work with a sample of
subjects rather than the full population. Since the public is interested in the
population under study we should be able to generalize our results from the
sample to the population. To generalize from the sample to the population,
the sample has to be representative of the population. The safest way to
ensure that it is representative is to use a random selection procedure. Other
strategies for sampling to control confounding and reduce heterogeneity may
be followed. For example, we may use a stratified random sampling proce-
dure to make sure that we have proportional representation of population
subgroups (e.g., sexes, races, regions).
When the sample is not representative of the population, selection bias is a
possibility. A typical source of bias in population studies is age, gender, level
of education, or socioeconomic status. It is noted that the very young and the
very old tend not to take part in the studies. Thus a high compliance (the
proportion of people contacted who end up as subjects) is important in
avoiding bias.
Failure to randomize subjects to control and treatment groups in experi-
ments can also produce bias. If we select patients who are willing to partic-
ipate in a clinical trial with a promising new drug or if we select the groups in
any way that makes one group different from another, then any result we get
might reflect the group difference rather than an effect of the intervention. For
this reason, it is important to randomly assign subjects in a way that ensures
the groups are balanced in terms of important variables that could modify the
effect of the treatment (e.g., age, gender, physical performance). Human
subjects may not be happy about being randomized, so we need to state
clearly that it is a condition of taking part.
Often the most important variables to balance are the baseline or
preintervention value of the dependent variable itself.
a hat from a hospital of 200 nurses. In this case, the population is all 200
nurses, and the sample is random because each nurse has an equal chance of
being selected.
Stratified random sampling is a method of sampling that involves the division
of a population into smaller subgroups known as strata. In stratified random
sampling, the strata are formed based on members’ shared characteristics. A
random sample from each stratum is taken in a number proportional to the
stratum’s size when compared to the population. These subsets of the strata
are then pooled to form a random sample. Similar to a weighted average, this
method of sampling produces characteristics in the sample that are propor-
tional to the overall population. Stratified sampling works well for popula-
tions with a variety of characteristics.
Cluster sampling (also known as one-stage cluster sampling) is a technique
in which aggregates or clusters of participants that represent the population
are identified and included in the sample. This is popular in conducting
household or family studies. The main aim of cluster sampling can be spec-
ified as cost reduction and increasing the levels of efficiency of sampling. This
specific technique can also be applied in integration with multistage
sampling.
Multistage sampling (also known as multistage cluster sampling) is a more
complex form of cluster sampling that contains two or more stages in sample
selection. In multistage sampling, large clusters of the population are divided
into smaller clusters in several stages in order to make primary data collection
more manageable. It has to be acknowledged that multistage sampling is not
as efficient as true random sampling; however, it addresses certain disad-
vantages associated with true random sampling such as being overly
expensive and time consuming.
A major difference between cluster and stratified sampling relates to the fact
that in cluster sampling a cluster is perceived as a sampling unit, whereas in
stratified only specific elements of strata are accepted as a sampling unit.
Accordingly, in cluster sampling a complete list of clusters represent the
sampling frame. Then, a few clusters are chosen randomly as the source of
primary data.
An advantage of multistage sampling is that it is effective in primary data
collection from geographically dispersed populations when face-to-face
contact in required (e.g., semistructured in-depth interviews). This type of
sampling can also be cost effective and time effective. A disadvantage of
multistage sampling is to effectively account for the heterogeneity, group-
level information should be available.
1.1.10 Summary
The selection of a research design is based on the research question or
hypothesis and the phenomena being studied. A true-experimental design is
Study Designs and Measures of Effect Size 11