Spring Semester, 2020-2021

Spring semester, 2020-2021
General information

 Instructors:
A.Prof.Dr. Nguyen Tan Khoi: ntkhoi@hcmiu.edu.vn (8 weeks)
Ms. Do Ngoc Phuc Chau: dnpchau@hcmiu.edu.vn (7 weeks)
 Course assessment: (decided by A.Prof. Khoi)
 Labwork assessment: 30% quizzes
70% lab exam
(this is given by Ms. Chau)
Course outline

- Chapter 1: Introduction
- Chapter 2: Descriptive statistics
- Chapter 3: Probability & Distribution of probability
- Chapter 4: Continuous distribution of probability
- Chapter 5: Hypothesis testing
- Chapter 6: ANOVA
- Chapter 7: Regression and correlation analysis
- Chapter 8: Normality test
- Chapter 9: Non-parametric tests


INTRODUCTION
WHAT IS BIOSTATISTICS?
statistics used in biological fields
So, What is statistics?

the process of converting data into information.
consists of various steps like generation of hypothesis,

collection of data, and application of analysis methods.
Then, Biostatistics teaches us how to summarize, analyze, and

draw meaningful inferences from data that then lead to
confirmations of hypotheses that relates to biological problem.
CATEGORIES of STATISTICS
How many pairs of shoes does each
student in our class own?
Descriptive Statistics
 Collect
 Organize
 Summarize
 Display
 Analyze
Inferential Statistics
 Predict and forecast
values of population
parameters
 Test hypotheses about
values of population
parameters
TYPES of DATA
Qualitative data Quantitative data
(Categorical or Nominal) (Measurable or Countable)
Examples are- • Discrete variable
 Color • Continuous variable
 Gender Examples are-
 Level of agreement  Temperatures
 Salaries
 Number students in a group
SCALES of MEASUREMENT
Nominal Scale – groups or classes
Ordinal Scale – order matters
Interval Scale – difference or distance matters – has arbitrary zero value
Ratio Scale – ratio matters – has a natural zero value
DISPLAYING DATA
Discrete variables
Pie chart
Bar chart
Line graph
Continuous variables
Histogram
Frequency polygon
Ogive (or Cumulative frequency graph)
Stem-and-Leaf diagram
Scatter plot
SAMPLE and POPULATION
A population – consists of the set

of all measurements for which the
investigator is interested
A sample – is a subset of the

measurements selected from the
population
A census – is a complete ?? Population vs. Census ??

enumeration of every item in a
population
POPULATION or SAMPLE
What is average height of IU students?
Population Sample
- all about 7,000 IU students - 300 students
 Impossible  Possible
 Impractical  Easy to archive
 Too costly  Cheaper
 Take long time  Faster
Sampling and Simple random sample
• Sampling from the population is often done randomly,
such that every possible sample of equal size (n) will have
an equal chance of being selected
• A sample selected in this way is called a simple random
sample or just a random sample
• A random sample allows chance to determine its elements
How can we make random sample of 300 students from IU students?
EXPERIMENT, SET and EVENT
Set and Complement of set
Intersecting of sets
Union of sets
Mutually exclusive or disjoint sets
Partitions
DESCRIPTIVE STATISTICS
SUMMARY MEASURES
Measures of Central Tendency

Median
Mode
Mean
Measures of Viability
Range
Interquartile range
Variance
Standard deviation
Other measures
Skewness
Kurtosis
Does the girl own more shoes than the boy?
MEASURES of CENTRAL TENDENCY
Median – middle value when sorted in order of magnitude
Mode – most frequently-occurring value
Mean – average
Arithmetic Mean or Average
Population Mean Sample Mean

MEASURES of VARIABILITY or DISPERSION
Range – difference between maximum and minimum values
Interquartile range (IQR) – difference between third and

first quartile
Variance – average of the squared deviations from the means
Population Variance Sample Variance
Standard deviation (SD) – square root of the variance

Population SD Sample SD
* Find the 80th percentile:
To find the 80th percentile, determine the data
point in position (n + 1) x P / 100 of the data set
Position (20 + 1) x 80 / 100 = 16.8
The 80th percentile is located at the 16.8th position
of the data set. The 16th observation in the ordered
set is 32 and the 17th observation is also 33.
The 80th percentile will lie at 0.8 between the 16th
and 17th values  the value is 16th + (17th – 16th) *
0.8 = 32 + (33 – 32) * 0.8 = 32.8
Quartile – the percentage points that break down the ordered data set
into quarters
• The first quartile, Q1, or lower quartile is the 25th percentile – the
point below which lie ¼ of the data
• The second quartile, Q2, or middle quartile is the 50th percentile –
the point below which lie ½ of the data. This is also called the median
• The third quartile, Q3, or upper quartile is the 75th percentile – the
point below which lie ¾ of the data
Example: finding percentile
Find the 50th, 80th and 90th

percentiles of this data set.
*** Sorting data (by Excel):
???
???
*** Determine the data point:
To find the 50th percentile (Median),

determine the data point in position (n + 1)
x P / 100 of the data set
Position (20 + 1) x 50 / 100 = 10.5
In Excel, input =(20+1)*50/100
*** Determine the percentile value:

of the data set.
The 10th observation in the ordered set is 22 and
the 11th observation is also 22.
The 50th percentile will lie halfway between the
10th and 11th values (which are both 22 in this
case)  the value is 22
Position (20 + 1) x 80 / 100 = 16.8
and 17th values  the value is 16th + (17th – 16th) *
0.8 = 32 + (33 – 32) * 0.8 = 32.8
Position (20 + 1) x 90 / 100 = 18.9
and 19th values
 the value is 49+(52–49)*0.9=51.7
Example: finding Quartile
The interquartile range ????

OTHER MEASURES
Skewness – measure of the degree of

asymmetry of a frequency distribution
•Skewed to left
•Symmetric or unskewed
•Skewed to right
Kurtosis – measure of flatness or

peakedness of a frequency distribution
•Platykurtic (relatively flat)
•Mesokurtic (normal)
•Leptokurtic (relatively peaked)
RELATIONS between the MEAN and S.D.
Chebyshev’s Theorem
•Applies to any distribution, regardless of shape
•Places lower limits on the percentages of observations within a
given number of standard deviations from the mean
Empirical Rule
•Applies only to roughly mound-shaped and symmetric
distributions
•Specifies approximate percentages of observations within a given
number of standard deviations from the mean
Chebyshev’s Theorem
Empirical Rule
Homework
Bulimic Adolescents Healthy Adolescents

15.9 17.0 18.9 30.6 40.8
16.0 17.6 19.6 25.7 37.4
16.5 28.7 21.5 25.3 37.1
18.9 28.0 24.1 24.5 30.6
18.4 25.6 23.6 20.7 33.2
18.1 25.2 22.9 22.4 33.7
30.9 25.1 21.6 23.1 36.6
29.2 24.5 23.8

Spring Semester, 2020-2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spring Semester, 2020-2021

Uploaded by

Copyright:

Available Formats

Spring semester, 2020-2021

So, What is statistics?

consists of various steps like generation of hypothesis,

Then, Biostatistics teaches us how to summarize, analyze, and

A population – consists of the set

A sample – is a subset of the

A census – is a complete ?? Population vs. Census ??

Measures of Central Tendency

Median – middle value when sorted in order of magnitude

Mode – most frequently-occurring value

Population Mean Sample Mean

Range – difference between maximum and minimum values

Interquartile range (IQR) – difference between third and

Variance – average of the squared deviations from the means

Population Variance Sample Variance

Standard deviation (SD) – square root of the variance

Find the 50th, 80th and 90th

To find the 50th percentile (Median),

* Find the 50th percentile:

The interquartile range ????

Skewness – measure of the degree of

Kurtosis – measure of flatness or

Bulimic Adolescents Healthy Adolescents

You might also like