You are on page 1of 40

Spring semester, 2020-2021

General information

 Instructors:
A.Prof.Dr. Nguyen Tan Khoi: ntkhoi@hcmiu.edu.vn (8 weeks)
Ms. Do Ngoc Phuc Chau: dnpchau@hcmiu.edu.vn (7 weeks)
 Course assessment: (decided by A.Prof. Khoi)
 Labwork assessment: 30% quizzes
70% lab exam
(this is given by Ms. Chau)
Course outline

- Chapter 1: Introduction
- Chapter 2: Descriptive statistics
- Chapter 3: Probability & Distribution of probability
- Chapter 4: Continuous distribution of probability
- Chapter 5: Hypothesis testing
- Chapter 6: ANOVA
- Chapter 7: Regression and correlation analysis
- Chapter 8: Normality test
- Chapter 9: Non-parametric tests


INTRODUCTION
WHAT IS BIOSTATISTICS?
statistics used in biological fields

So, What is statistics?


the process of converting data into information.

consists of various steps like generation of hypothesis,


collection of data, and application of analysis methods.

Then, Biostatistics teaches us how to summarize, analyze, and


draw meaningful inferences from data that then lead to
confirmations of hypotheses that relates to biological problem.
CATEGORIES of STATISTICS
How many pairs of shoes does each
student in our class own?
Descriptive Statistics
 Collect
 Organize
 Summarize
 Display
 Analyze

Inferential Statistics
 Predict and forecast
values of population
parameters
 Test hypotheses about
values of population
parameters
TYPES of DATA
Qualitative data Quantitative data
(Categorical or Nominal) (Measurable or Countable)
Examples are- • Discrete variable
 Color • Continuous variable
 Gender Examples are-
 Level of agreement  Temperatures
 Salaries
 Number students in a group

SCALES of MEASUREMENT
Nominal Scale – groups or classes
Ordinal Scale – order matters
Interval Scale – difference or distance matters – has arbitrary zero value
Ratio Scale – ratio matters – has a natural zero value
DISPLAYING DATA

Discrete variables
Pie chart
Bar chart
Line graph

Continuous variables
Histogram
Frequency polygon
Ogive (or Cumulative frequency graph)
Stem-and-Leaf diagram
Scatter plot
SAMPLE and POPULATION

A population – consists of the set


of all measurements for which the
investigator is interested

A sample – is a subset of the


measurements selected from the
population

A census – is a complete ?? Population vs. Census ??


enumeration of every item in a
population
POPULATION or SAMPLE
What is average height of IU students?
Population Sample
- all about 7,000 IU students - 300 students
 Impossible  Possible
 Impractical  Easy to archive
 Too costly  Cheaper
 Take long time  Faster
Sampling and Simple random sample
• Sampling from the population is often done randomly,
such that every possible sample of equal size (n) will have
an equal chance of being selected
• A sample selected in this way is called a simple random
sample or just a random sample
• A random sample allows chance to determine its elements
How can we make random sample of 300 students from IU students?
EXPERIMENT, SET and EVENT
Set and Complement of set
Intersecting of sets
Union of sets
Mutually exclusive or disjoint sets
Partitions
DESCRIPTIVE STATISTICS
SUMMARY MEASURES

Measures of Central Tendency


Median
Mode
Mean

Measures of Viability
Range
Interquartile range
Variance
Standard deviation

Other measures
Skewness
Kurtosis
Does the girl own more shoes than the boy?
MEASURES of CENTRAL TENDENCY

Median – middle value when sorted in order of magnitude

Mode – most frequently-occurring value

Mean – average
Arithmetic Mean or Average

Population Mean Sample Mean


MEASURES of VARIABILITY or DISPERSION

Range – difference between maximum and minimum values

Interquartile range (IQR) – difference between third and


first quartile

Variance – average of the squared deviations from the means

Population Variance Sample Variance

Standard deviation (SD) – square root of the variance


Population SD Sample SD
* Find the 80th percentile:
To find the 80th percentile, determine the data
point in position (n + 1) x P / 100 of the data set
Position (20 + 1) x 80 / 100 = 16.8
The 80th percentile is located at the 16.8th position
of the data set. The 16th observation in the ordered
set is 32 and the 17th observation is also 33.
The 80th percentile will lie at 0.8 between the 16th
and 17th values  the value is 16th + (17th – 16th) *
0.8 = 32 + (33 – 32) * 0.8 = 32.8
Quartile – the percentage points that break down the ordered data set
into quarters
• The first quartile, Q1, or lower quartile is the 25th percentile – the
point below which lie ¼ of the data
• The second quartile, Q2, or middle quartile is the 50th percentile –
the point below which lie ½ of the data. This is also called the median
• The third quartile, Q3, or upper quartile is the 75th percentile – the
point below which lie ¾ of the data
Example: finding percentile

Find the 50th, 80th and 90th


percentiles of this data set.
*** Sorting data (by Excel):

???

???
*** Determine the data point:

To find the 50th percentile (Median),


determine the data point in position (n + 1)
x P / 100 of the data set
Position (20 + 1) x 50 / 100 = 10.5
In Excel, input =(20+1)*50/100
*** Determine the percentile value:

* Find the 50th percentile:


The 50th percentile is located at the 10.5th position
of the data set.
The 10th observation in the ordered set is 22 and
the 11th observation is also 22.
The 50th percentile will lie halfway between the
10th and 11th values (which are both 22 in this
case)  the value is 22
* Find the 80th percentile:
To find the 80th percentile, determine the data
point in position (n + 1) x P / 100 of the data set
Position (20 + 1) x 80 / 100 = 16.8
The 80th percentile is located at the 16.8th position
of the data set. The 16th observation in the ordered
set is 32 and the 17th observation is also 33.
The 80th percentile will lie at 0.8 between the 16th
and 17th values  the value is 16th + (17th – 16th) *
0.8 = 32 + (33 – 32) * 0.8 = 32.8
* Find the 90th percentile:
To find the 90th percentile, determine the data
point in position (n + 1) x P / 100 of the data set
Position (20 + 1) x 90 / 100 = 18.9
The 90th percentile is located at the 18.9th position
of the data set. The 18th observation in the ordered
set is 49 and the 19th observation is also 52.
The 90th percentile will lie at 0.9 between the 18th
and 19th values
 the value is 49+(52–49)*0.9=51.7
Example: finding Quartile

The interquartile range ????


OTHER MEASURES

Skewness – measure of the degree of


asymmetry of a frequency distribution
•Skewed to left
•Symmetric or unskewed
•Skewed to right

Kurtosis – measure of flatness or


peakedness of a frequency distribution
•Platykurtic (relatively flat)
•Mesokurtic (normal)
•Leptokurtic (relatively peaked)
RELATIONS between the MEAN and S.D.

Chebyshev’s Theorem
•Applies to any distribution, regardless of shape
•Places lower limits on the percentages of observations within a
given number of standard deviations from the mean

Empirical Rule
•Applies only to roughly mound-shaped and symmetric
distributions
•Specifies approximate percentages of observations within a given
number of standard deviations from the mean
Chebyshev’s Theorem
Empirical Rule
Homework

Bulimic Adolescents Healthy Adolescents


15.9 17.0 18.9 30.6 40.8
16.0 17.6 19.6 25.7 37.4
16.5 28.7 21.5 25.3 37.1
18.9 28.0 24.1 24.5 30.6
18.4 25.6 23.6 20.7 33.2
18.1 25.2 22.9 22.4 33.7
30.9 25.1 21.6 23.1 36.6
29.2 24.5 23.8

You might also like