You are on page 1of 38

A.

Abu Awwad

Healthcare Statistics/152526030

Lecture 1: Introduction to Biostatistics

Instructors: AbdulFattah.AbuAwwad@aaup.edu

1 / 38
A. Abu Awwad

Outline

1 Learning Objectives

2 What is Statistics/Biostatistics?

3 Data

4 Data Sources

5 Collecting Sample Data

6 Descriptive Statistics

7 Sampling and Statistical Inference

8 The Scientific Method and the Design of Experiments

2 / 38
A. Abu Awwad

General Guidelines

 All data sets from Daniel and Cross (2018) can be accessed at
Daniel and Cross (2018)

 All data sets from Triola et. al. (2017) can be accessed at Triola et.
al. (2017)

3 / 38
A. Abu Awwad Learning Objectives

Learning Objectives

After studying this chapter, the student will:

Define statistics generally.

Distinguish between statistics and biostatistics.

Differentiate between the two branches of statistics.

State the meaning of Biostatistics and its application to life sciences.

Differentiate between population and sample data.

Identify types of data and levels of measurement.

Identify the four basic sampling techniques.

4 / 38
What is Statistics/Biostatistics?

 The term statistics is a field of study concerned with


1 the collection, organization, summarization, and analysis of data; and
2 the drawing of inferences about a body of data when only a part of
the data is observed.

 Biostatistics
 The tools of statistics are employed in many fields—business,
education, psychology, agriculture, and economics, . . .
 When the data analyzed are derived from the biological sciences and
medicine, we use the term biostatistics to distinguish this particular
application of statistical tools and concepts.
A. Abu Awwad Data

Data

Data: Are the facts and figures collected, analyzed, and summarized for
presentation and interpretation.

 Some Basic Terms:


 Element/Member: A specific subject/object (for example, a person,
company, or country) about which the information is collected.
 Variable: A characteristic of interest for the elements. It assumes
different values for different elements.
 Examples: diastolic blood pressure, heart rate, heights of adult males,
ages of patients seen in a dental clinic, . . .
 Random Variable: When the values obtained arise as a result of
chance factors, so that they cannot be exactly predicted in advance,
the variable is called a random variable.
 Example: When a child is born, we cannot predict exactly his or her
height at maturity. Attained adult height is the result of numerous
genetic and environmental factors.

 Observation/Measurement: The value of a variable for an element.


 Data set: All the data collected in a particular study.
6 / 38
A. Abu Awwad Data

Basic Terms
Example: Patient satisfaction. A hospital administrator wished to study
the relation between patient satisfaction and patient’s age (in years), and
severity of illness (an index). The administrator randomly selected 46
patients and collected the data presented below, where larger values of
patient satisfaction, and severity of illness are, respectively, associated
with more satisfaction, and increased severity of illness.

The dimension of the data is 46 × 3.


7 / 38
A. Abu Awwad Data

Scales of Measurement

 The scale determines the amount of information contained in the


data.

 Scales of measurement include:


 Nominal
 Ordinal
 Interval
 Ratio

 The scale indicates the data summarization and statistical analyses


that are most appropriate.

8 / 38
A. Abu Awwad Data

Nominal scale

 The lowest measurement scale.

 It consists of “naming” observations or classifying them into various


mutually exclusive and collectively exhaustive categories.

 A nonnumeric label or numeric code may be used.

 Examples: Gender, smoking status, marital status . . .


A nonnumeric label could be used for the marital status variable
such as single, married, divorced, and so on. Alternatively, a numeric
code could be used

1,


Single,
2, Married
Marital status =
3,
 Divorced

..., ...

9 / 38
A. Abu Awwad Data

Ordinal scale
 The data have the properties of nominal data and the order or rank
of the data is meaningful.
 A nonnumeric label or numeric code may be used.
 Examples:
 Satisfaction: A nonnumeric label can be used such as strongly
satisfied, satisfied, dissatisfied, . . .). Alternatively, a numeric code
could be used

 1, Strongly satisfied,

2, Satisfied

Satisfaction =

 3, Dissatisfied

..., ...
 Convalescing patients: may be characterized as unimproved,
improved, and much improved.
 The degree of improvement between unimproved and improved is
probably not the same as that between improved and much improved.

 Grading: A+,A, A-, B+, . . .

10 / 38
A. Abu Awwad Data

Interval scale

 With this scale not only is it possible to order measurements, but


also the distance between any two measurements is known.

 The ability to do this implies the use of a unit distance and a zero
point.

 The selected zero point is not necessarily a true zero in that it does
not have to indicate a total absence of the quantity being measured.

 Interval data are always numeric/quantitative.

 Examples:
 Temperature (degrees Fahrenheit or Celsius)
 The unit of measurement is the degree, and the point of comparison
is the arbitrarily chosen “zero degrees,” which does not indicate a
lack of heat.
 College admission SAT scores

11 / 38
A. Abu Awwad Data

Ratio scale

 The highest level of measurement. Data have all the properties of


interval data and the ratio of two values is meaningful.

 Ratio data are always numeric/quantitative.

 Zero value is included to indicate that nothing exists.

 Examples: Age, distance, height, weight, . . .

12 / 38
A. Abu Awwad Data

Categorical and Quantitative Data/Variables


 Data can be further classified as being categorical (also called
qualitative) or quantitative.
 Categorical data:
 Data that can be grouped by specific categories.
 Use either the nominal or ordinal scale of measurement.
 A categorical variable is a variable with categorical data.

 Quantitative data:
 Data that use numeric values to indicate how much or how many.
 Obtained using either the interval or ratio scale of measurement.
 A quantitative variable is a variable with quantitative data.
 Quantitative variables may be classified as either
 discrete variables (e.g., the number of daily admissions to a general
hospital)
 continuous variables (e.g., height, weight, and skull circumference)
13 / 38
A. Abu Awwad Data

Categorical and Quantitative Data/Variables

 The statistical analysis that is appropriate depends on whether the


data for the variable are categorical or quantitative.

 In general, there are more alternatives for statistical analysis when


the data are quantitative.

Figure: Data/variables types.

14 / 38
A. Abu Awwad Data Sources

Data Sources
 Routinely Kept Records.
 Hospital medical records contain immense amounts of information on
patients.
 Surveys or Observational (Non-Experimental) Studies.
 If the data needed to answer a question are not available from
routinely kept records.
 Administrator of a clinic wishes to obtain information regarding the
mode of transportation used by patients to visit the clinic.
 Experiments.
 Frequently the data needed to answer a question are available only
as the result of an experiment.
 A pharmaceutical company would like to learn about how a new drug
it has developed affects blood pressure. Researchers selected a
sample of individuals. Different groups of individuals are given
different dosage levels of the new drug, and before and after data on
blood pressure are collected for each group. Statistical analysis of the
data can help determine how the new drug affects blood pressure.
 External sources.
 Published reports, commercially available data banks, or the research
literature. 15 / 38
A. Abu Awwad Collecting Sample Data

Collecting Sample Data

 Collecting Sample Data:


 Experiments: In an experiment, we apply some treatment and then
proceed to observe its effects on the individuals. (The individuals in
experiments are called experimental units, and they are often called
subjects when they are people.)

 Observational Studies: In an observational study, we observe and


measure specific characteristics, but we don’t attempt to modify the
individuals being studied.

16 / 38
A. Abu Awwad Collecting Sample Data

Example1: The Salk Vaccine Experiment

In 1954, an experiment was designed to test the effectiveness of the Salk


vaccine in preventing polio, which had killed or paralyzed thousands of
children. By random selection, 401,974 children were randomly assigned
to two groups:
1 200,745 children were given a treatment consisting of Salk vaccine
injections;
2 201,229 children were injected with a placebo that contained no
drug.
Children were assigned to the treatment or placebo group through a
process of random selection, equivalent to flipping a coin. Among the
children given the Salk vaccine, 33 later developed paralytic polio, and
among the children given a placebo, 115 later developed paralytic polio.

17 / 38
A. Abu Awwad Collecting Sample Data

Example 2: Ice Cream and Drownings

 Observational Study: Observe past data to conclude that ice


cream causes drownings (based on data showing that increases in ice
cream sales are associated with increases in drownings). The
mistake is to miss the lurking variable of temperature and the failure
to see that as the temperature increases, ice cream sales increase
and drownings increase because more people swim.

A lurking variable is one that affects the variables included in the


study, but it is not included in the study.

 Experiment: Conduct an experiment with one group treated with


ice cream while another group gets no ice cream. We would see that
the rate of drowning victims is about the same in both groups, so ice
cream consumption has no effect on drownings.

Here, the experiment is clearly better than the observational study.

18 / 38
A. Abu Awwad Collecting Sample Data

Experiments Versus Observational Studies

 Ethical, cost, time, and other considerations sometimes prohibit the


use of an experiment.
Example: We would never want to conduct a driving/texting
experiment in which we ask subjects to text while driving - some of
them could die. It would be far better to observe past crash results
to understand the effects of driving while texting.

 Experiments are often better than observational studies (Example 2)


because well-planned experiments typically reduce the chance of
having the results affected by some variable that is not part of a
study (lurking variable).

19 / 38
A. Abu Awwad Collecting Sample Data

Design of Experiments
 Good design of experiments includes
 Replication is the repetition of an experiment on more than one
individual. Good use of replication requires sample sizes that are
large enough so that we can see effects of treatments. In Example 1,
the experiment used sufficiently large sample sizes, so the researchers
could see that the Salk vaccine was effective.
 Blinding is used when the subject doesn’t know whether he or she is
receiving a treatment or a placebo. Blinding is a way to get around
the placebo effect, which occurs when an untreated subject reports
an improvement in symptoms. (The reported improvement in the
placebo group may be real or imagined.) The Salk experiment in
Example 1 was double-blind:
1 The children being injected didn’t know whether they were getting
the Salk vaccine or a placebo, and
2 the doctors who gave the injections and evaluated the results did not
know either.

 Randomization is used when individuals are assigned to different


groups through a process of random selection, as in Example 1. The
logic behind randomization is to use chance as a way to create two
groups that are similar.
20 / 38
A. Abu Awwad Collecting Sample Data

Designs of Experiments

 In a study, confounding occurs when we can see some effect, but


we can’t identify the specific factor that caused it, as in the ice
cream and drowning observational study in Example 2.

 Figure (a) shows a bad experimental design, where confounding can


occur when the treatment group of women shows strong positive
results. Because the treatment group consists of women and the
placebo group consists of men, confounding has occurred because we
cannot determine whether the treatment or the gender of the
subjects caused the positive results.

 The Salk vaccine experiment in Example 1 illustrates one


method for controlling the effect of the treatment variable: Use a
completely randomized experimental design, whereby
randomness is used to assign subjects to the treatment group and
the placebo group.
 A completely randomized experimental design is one of the following
methods that are used to control effects of variables.

21 / 38
A. Abu Awwad Collecting Sample Data

Designs of Experiments

22 / 38
A. Abu Awwad Collecting Sample Data

Designs of Experiments

 Completely Randomized Experimental Design: Assign subjects


to different treatment groups through a process of random selection,
as illustrated in Figure (b).

 Randomized Block Design: A block is a group of subjects that


are similar, but blocks differ in ways that might affect the outcome
of the experiment. Use the following procedure, as illustrated in
Figure (c):
1 Form blocks (or groups) of subjects with similar characteristics.
2 Randomly assign treatments to the subjects within each block.

For example, in designing an experiment to test the effectiveness of


aspirin treatments on heart disease, we might form a block of men
and a block of women, because it is known that the hearts of men
and women can behave differently. By controlling for gender, this
randomized block design eliminates gender as a possible source of
confounding.

23 / 38
A. Abu Awwad Collecting Sample Data

Designs of Experiments

 Matched Pairs Design or Before/After Design: Matched pairs


might consist of measurements from subjects before and after some
treatment, as illustrated in Figure (d). Each subject yields a
“before” measurement and an “after” measurement, and each
before/after pair of measurements is a matched pair.

24 / 38
A. Abu Awwad Collecting Sample Data

Observational Studies

 Types of observational studies:


 In a cross-sectional study, data are observed, measured, and
collected at one point in time, not over a period of time.

Example: If we want to measure current obesity levels in a


population, we could draw a sample of 1,000 people randomly from
that population, measure their weight and height, and calculate what
percentage of that sample is categorized as obese.

25 / 38
A. Abu Awwad Collecting Sample Data

Observational Studies

 In a retrospective (or case-control) study, data are collected from


a past time period by going back in time (through examination of
records, interviews, and so on).

 In a retrospective study, the subjects have already experienced the


outcome of interest, or developed the disease, before the start of the
study.

Example 1: A researcher was trying to determine the effects that


eating organic foods has on overall health. The researcher finds 200
individuals, where 100 of them have eaten organically for the past 3
years (cases), and the other 100 haven’t eaten organically in the
past 3 years (controls). He then gives each subject an overall health
assessment. Lastly, he analyzes the data and uses it to draw
conclusions on how eating organically can affect one’s overall health.

26 / 38
A. Abu Awwad Collecting Sample Data

Observational Studies

Example 2:

27 / 38
A. Abu Awwad Collecting Sample Data

Observational Studies
 In a prospective (or longitudinal or cohort) study, data are
collected in the future from groups that share common factors (such
groups are called cohorts).
 In a prospective study, the investigators will design the study, recruit
subjects, and collect baseline data on all subjects, before any of them
have developed the outcomes of interest.
 The subjects are followed and observed over a period of time to
gather information and record the development of outcomes.

28 / 38
A. Abu Awwad Collecting Sample Data

Observational Studies

29 / 38
A. Abu Awwad Descriptive Statistics

Descriptive Statistics

 Most of the statistical information in newspapers, magazines,


company reports, and other publications consists of data that are
summarized and presented in a form that is easy to understand.

 Such summaries of data (tabular, graphical, or numerical) are


referred to as descriptive statistics.
 Summarizing Categorical Data- Univariate Descriptive Statistics
 Tabular displays: Frequency, Relative Frequency, and Percent
Frequency Distribution Tables
 Graphical Displays: Bar and Pie Charts

 Summarizing Quantitative Data- Univariate Descriptive Statistics


 Tabular displays: Frequency, Relative Frequency, Percent Frequency,
Cumulative Frequency, Cumulative Relative Frequency, and
Cumulative Percent Frequency Distribution Tables
 Graphical Displays: Histogram, Boxplot, Polygon, ...

30 / 38
A. Abu Awwad Descriptive Statistics

Bivariate Descriptive Statistics

 Often we are interested in tabular and graphical methods that will


help understand the relationship between two variables.

 Two Categorical Variables:


 Tabular: Crosstabulation (Contingency Tables)
 Graphical: Side-by-Side Bar Chart, Stacked Bar Chart, . . .

 Two Numerical Variables:


 Tabular: Crosstabulation (Contingency Tables) (Hint: Transform the
Numerical Variables to Categorical!)
 Graphical: Scatter Diagram, . . .

 Categorical and Numerical:


 Tabular: Crosstabulation (Contingency Tables) (Hint: Transform the
Numerical Variable to Categorical!)
 Graphical: Side-by-Side Boxplot, . . .

31 / 38
A. Abu Awwad Descriptive Statistics

Numerical Descriptive Statistics

 Measures of central tendency


 Mean (or average)
 Median
 Mode

 Measures of dispersion/variability
 Range
 Variance and Standard Deviation
 Interquartile Range

 Measures of location/relative standing


 Percentiles
 Quartiles

 Others: Measures of Distribution Shape, Measures of Association


Between Two Variables, . . .

32 / 38
A. Abu Awwad Sampling and Statistical Inference

Sampling and Statistical Inference

 Population: The set of all elements of interest in a particular study.


 For example, we are interested in the weights of all the children
enrolled in a certain elementary school.

 Sample: A subset of the population.


 For example, suppose our population consists of the weights of all
the elementary school children enrolled in a certain county school
system. If we collect for analysis the weights of only a fraction (i.e.,
10%) of these children.

 Census: Collecting data for the entire population.

 Sample survey: Collecting data for a sample.

Statistical inference: The process of using data obtained from a sample


to make estimates and test hypotheses about the characteristics of a
population.

33 / 38
A. Abu Awwad Sampling and Statistical Inference

Sampling and Statistical Inference

 Sampling Techniques
 There are many kinds of samples that may be drawn from a
population.
 In general, in order to make a valid inference about a population, we
need a scientific/probabilistic sample from the population.
 There are also many kinds of scientific/probabilistic samples that
may be drawn from a population. For example:
 Simple Random Sample
 Systematic Sample
 Stratified Random Sampling
 Cluster Sampling

34 / 38
A. Abu Awwad Sampling and Statistical Inference

Common Sampling Methods

35 / 38
A. Abu Awwad The Scientific Method and the Design of Experiments

The Scientific Method and the Design of Experiments

The scientific method is a process by which scientific information is


collected, analyzed, and reported in order to produce unbiased and
replicable results in an effort to provide an accurate representation of
observable phenomena.

 Key elements associated with the scientific method


 Making an Observation
 Example: it is readily observable that regular exercise reduces body
weight in many people.

 Formulating a Hypothesis
 A statistical hypothesis: “The average (mean) loss of body weight of
people who exercise is greater than the average (mean) loss of body
weight of people who do not exercise.”

36 / 38
A. Abu Awwad The Scientific Method and the Design of Experiments

Key Elements Associated with the Scientific Method

 Designing an Experiment
 A sample of 100 participants could be randomly assigned to two
conditions. A sample of 50 would be assigned to a specific exercise
program and the remaining 50 would be monitored, but asked not to
exercise for a specific period of time. At the end of this experiment
the mean weight losses of the two groups could be compared. The
reason that experimental designs are desirable is that if all other
potential factors are controlled, a cause–effect relationship may be
tested.

 Conclusion
 It is often the case that hypotheses need to be modified and retested
with new data and a different design.
 Results are rarely considered to be conclusive. That is, results need
to be replicated, often a large number of times, before scientific
credence is granted them.

37 / 38
A. Abu Awwad The Scientific Method and the Design of Experiments

The End

38 / 38

You might also like