You are on page 1of 18

Biostatistics

Lecture 1
2022-2 Fall Semester

Instructor: Min Jin Ha


Department of Health Informatics and Biostatistics
Graduate School of Public Health
Yonsei University
Outline

Biostatistics GHPF-Batch3 GCID-Batch1 Lectures (16 weeks)


Module1 2:00~3:00 Statistics Theory
Required
• Textbook: Principles of
Module2 3:00~4:00 R programing Required Biostatistics by Marcello Pagano
Module3 4:00~5:00 R workshop and Kimberlee Gauvreau
Week Mon Day Topics Professor
• Software: R
1 9 7 Introduction Min Jin Ha • Slides and R workshop files will be
2 14 Data Presentation/Descriptive Statistics Min Jin Ha
3 21 Probability Min Jin Ha
provided
4 28 Theoretical Probability Distributions Min Jin Ha • GHPF Evaluation: Attendance (20%),
5 10 5 Sampling Distribution of the Mean Min Jin Ha Mid-term (40%), Final Exam (40%)
6 12 Estimation/Confidence Intervals Min Jin Ha
7 19 Mid-term Examination Min Jin Ha
• GCID Evaluation: Attendance
8 26 Hypothesis Testing Min Jin Ha (20%), Mid-term (30%), Final
9 11 2 Comparison of Two Means Min Jin Ha Exam (30%), Assignments(20%)
10 9 Analysis of Variance Min Jin Ha
11 16 Analysis of Variance Min Jin Ha
12 23 Nonparametric Methods Min Jin Ha
13 30 Correlation Min Jin Ha
14 12 7 Simple Linear Regression Min Jin Ha
15 14 Multiple Regression Min Jin Ha
16 21 Final Exam Min Jin Ha
Readings
• Pagano and Gauvreau, Chapters 1 and 2.1
What is Biostatistics?
• Statistics is the science of obtaining, analyzing and interpreting data

• When the focus is on the biological and health sciences, we use the
term Biostatistics

• Biostatisticians forge advances in science that benefit human health


through innovations in biostatistical methodology and theory as well
as the thoughtful implementation of biostatistical methods in practice
Welcome to Biostatistics
Biostatistics lectures
• are introductory course in probability and statistical inference
• provides a tour of basic statistical methods commonly encountered in
public health and biomedical research
• Places emphasis on understanding of basic statistical methods, use of
the methods to evaluate evidence from studies, and communication
of statistical results to statisticians/non-statisticians
Learning Objectives
By December, students successfully completing the course will
• have a basic working knowledge of important statistical topics including
descriptive statistics and probability, inference on means, regression
methods, and nonparametrics
• understand how to evaluate which methods are appropriate in answering a
research question for a given study design
• be able to evaluate straightforward statistical usage in public health and
biomedicine
• Have the tools to interact knowledgeably with biostatisticians in planning,
conducting, analyzing, and reporting public health and medical research
Keep up with the readings for Success
(they were assigned for a reason!)
The Big Picture

• Statistics is the process by which we convert data into useful


information. As part of this process, we
• Collect data
• Summarize data
• Interpret the results

Graphic from the CMU Open Learning Initiative


Population
• The process starts when we identify what group we want to study or learn
something about. We call the group the population.
• We might be interested in all babies born in Seoul, all breast cancer
patients diagnosed in South Korea, or all adults (>18years) in South Korea
• Population, then, is the entire group that is the target of our interest

Graphic from the CMU Open Learning Initiative


Sampling from the Population
• In most cases, the population is so large, there is absolutely no way we can
study all of it (e.g., all adults in South Korea)
• Usually we have to compromise by taking a sample of objects from the
population
• This involves choosing a sample that are representative of the population
and collecting data from it

Graphic from the CMU Open Learning Initiative


Explain the data
• Once the data have been collected, we need to summarize data in a
meaningful way, called exploratory data analysis

Graphic from the CMU Open Learning Initiative


Not finished yet
• Remember that our goal is to study the population!
• We want to be able to draw conclusions about the population based on the sample
results
• We use probability to examine the difference between the population and the
sample
• In essence, probability is the `machinery’ that allows us to draw conclusions about
the population based on the data collected about the sample

Graphic from the CMU Open Learning Initiative


Statistical Inference
We can use what we’ve discovered about our sample to draw conclusions
about our target population, which is the final step in the process
inference

Graphic from the CMU Open Learning Initiative


What do we really mean by data?
• Data are pieces of information about individuals organized into
variables
• By an individual, we mean a particular person or object.
• By a variable, we mean a particular characteristic of the individual
• A dataset is a set of data identified with particular circumstances, that
are typically displayed in a table/matrix.

Graphic from the CMU Open Learning Initiative


Types of Data: Discrete
• Nominal Data (categorical data, Qualitative)
• Classification into named categories without numeric meaning
• e.g., gender, race, blood type, whether or not you have a disease
• Ordinal Data (Quantitative)
• Categories are ordered, but differences between levels not easily measured;
only relative comparisons are made about differences between levels
• e.g., clinical/pathological stages of cancer, I, IIA, IIB,IIC, IIIA,…. And Likert scale,
1=strongly disagree, 2=disagree, 3=neutral, 4=agree, 5= strongly agree
• Count data (Quantitative)
• Counted observations, e.g., New confirmed Covid-19 cases in Texas
Types of Data: Continuous data
• Data representing measurable
quantities
• The difference between any two
possible data values can be arbitrarily
small
• e.g. birth weight, BMI, Serum
Cholesterol level

Graphic from Centers of Disease Control and Prevention


Quiz
The table shows the part of the dataset for a random sample from the 2000 U.S.
Census
• Who are the individuals described
by this data
• What type of variable is zipcode?
• What type of variable is
Family_Size?
• What type of variable is
Annual_income?
Reading for Next Time
• Pagano and Gauvreau, reminder of Chapter 2

You might also like