biostatistics

Lecture plan

Basics

Variable types

Descriptive statistics:

1.

2.

3.

Categorical data

Numerical data

Inferential statistics

4.

Confidence intervals

Hipotheses testing

DEFINITIONS

STATISTICS can mean 2 things:

- the numbers we get when we measure and

count things (data)

- a collection of procedures for describing and

anlysing data.

BIOSTATISTICS application of statistics

in nature sciences, when biomedical and

problems are analysed.

2

????

Descriptive

Inferential

Terminology

Population

Sample

Variables

Variable types

Categorical (qualitative)

Numerical (quantitative)

Combined

Categorical data

Nominal

2 categories

>2 categories

Ordinal

Numerical data

Continuous

Discrete

Description of categorical

data

Arranging data

Frequencies, tables

Visualization (graphical

presentation)

Frequencies and

contingency tables

From those

who were

unsatisfied 4

were males,

6 were

females.

Total

Males Females

40

80%

14

77,8

%

26

81,3%

Unsatisfied 10

20 %

4

22,2

%

6

18,7%

Total

18

32

100% 100%

Satisfied

50

100%

10

Graphical presentation

11

Graphical presentation

12

Graphical presentation

13

Graphical presentation

14

Graphical presentation

Other:

- Maps

- Chernoff faces

- Star plots, etc.

15

Description of numerical

data

Arranging data

Frequencies (relative and cumulative),

graphical presentation

Measures of central tendency and

variance

Assessing normality

16

Grouping

Sorting data

Groups (5-17 gr.) according

researchers criteria.

17

and calculation

197

students

were

asked

about

the

amount

of money

(litas)

they had

in cash

at the

18

Gaphical presentation of

frequencies

19

Normal distributions

Most

Less above and lower central

values, approximately the

same proportions

Most often Gaussian

distribution

20

21

Asymmetrical distribution

22

describe/present your

respondents if the data are

numeric?

2 groups of measures:

1. Central tendency (central

value, average)

2. Variance

23

MEASURES OF CENTRAL

TENDENCY

Means/averages (arithmetic,

geometric, harmonic, etc.)

Mode

Median

Quartiles

24

MEASURES OF CENTRAL

TENDENCY

25

1

2

MEASURES OF CENTRAL

TENDENCY

procentile (the value of the observation,

that divides the sorted data in almost

equal parts).

It is found this way

When

When n even: median is the average of values

of two middle observations

26

MEASURES OF CENTRAL

TENDENCY

values

27

MEASURES OF CENTRAL

TENDENCY

size is divided into 4 equal parts

getting 25% of observations in each

of them.

28

Is it enough measure of

central tendency to

describe respondents?

29

MEASURES OF VARIANCE

Min and max

Range

Standard deviation sqrt of

variance (SD)

Variance - V= (xi - x)2/n-1

Interquartile range (Q3-Q1 or

75%-25%) IQRT

30

sample description?

If distribution is NORMAL

Mean

Variance (or standard deviation)

Median

IQRT or min/max

31

X, Mo, Me

Mean~Median~Mode,

SD ir empyric rule

32

EMPYRICAL RULE

2.5 SD from mean if distribution is

normal

33

Example

X=8

SD=2,5

-2SD

+2SD

34

Normality assessment

Summary

Graphical

Comparison of measures of central

tendency; empyrical rule (mean and

standard deviation)

Skewness and kurtosis (if Gaussian

=0)

Kolmogorov-Smirnov test

35

Boxplot

75th Procentile

75th Procentile

Mean( *)

Median

25th Procentile

25th Procentile

Outliers

Boxplot example

26,00

24,67

23,33

22,00

20,67

19,33

18,00

16,67

15,33

14,00

440

Inferential statistics

Confidence intervals

Hipotheses testing

39

Confidence intervals

Interval where the true value

most likely could occur.

40

and their measures

X2, SD2; p2

X1, SD1; p1

X3, SD3; p3

X4; SD4; p4

, , p0

41

confidence intervals

, p0

42

Confidence interval

Statistical definition:

If the study was carried out 100 times, 100

results ir 100 CI were got, 95 times of 100 the

true value will be in that interval. But it will

not appear in that interval 5 times of 100.

43

Confidence intervals

calculation)

95% CI : X 1.96 SE

Xmin; Xmax

95% CI : p 1.96 SE

pmin ; pmax

44

SD

p

(

1

)

NN

Numeric data

(X )

Categorical data

(p)

45

depends on:

a) Sample size;

b) Confidence level (guaranty - usually 95%,

but available any %);

c) dispersion.

46

Hipotheses testing

H0: 1=2; p1=p2; (RR=1, OR=1,

difference=0)

HA: 12; p1p2 (two sided, one

sided)

47

Hipotheses testing

Significance level (agreed 0.05).

Test for P value (t-test, 2 , etc.).

P value is the probability to get the

difference (association), if the null

hypothesis is true.

OR P value is the probability to get the difference

(association) due to chance alone, when the null

hypothesis is true.

48

Statistical agreements

be explained by chance alone,

therefore we reject H0 and accept HA.

difference can be due to chance

alone, therefore we dont reject H 0.

49

Tests

Test depends on

Study design,

Variable type

distribution,

Number of groups, etc.

z test

t test (one sample, two independent, paired)

2 (+ trend)

F test

Fisher exact test

Mann-Whitney

Wilcoxon and others.

50

Inferential statistics

Summary

significant difference (association).

be.

51

Inferential statistics

Summary

explanations of the result (bias and

confounding).

about the biological, clinical or public

health meaning of the results.

52

