© All Rights Reserved

5 views

© All Rights Reserved

- Data Summary Metrics
- statistics
- Business Mathematics Assignment
- MIL-STD-414
- Criticism against the factors of the movie
- M3
- Small-scale evaluation in health. Cap.07 Understanding numerical data
- Module 1 - Statistics
- 09 Chapter 3
- Self Study Maths Self Study Guide
- Basic Math fINAL
- The Statistics
- Intro Statistics
- SD-SE
- Buchheit - Pmet IJSM 2015
- Student 1 PPT MAT Manuscript Science Achievement in Trece Martinez Final Edited
- MDT11CCD
- Chap 003
- The Mean
- 2017 Awards Ceremony Survey Results

You are on page 1of 101

derived from status meaning

information useful to the state, e.g.,

the sizes of the populations and

armed forces.

BIOSTATISTICS

numerical data relating to an

aggregate of facts.

procedures and techniques

used to collect, process and

analyze data to make

inferences and to reach

IMPORTANT CHARACTERISTICS OF

BIOSTATISTICS:

population groups and events.

random variations like height of

children etc.

procedures have to be correct to

obtain meaningful statistics.

methods concerned with summarization and

description of the important aspects of the

numerical data.

2. Inferential: Deals with procedures for

making inferences about the characteristics

of the large groups of populations by using a

part of the data called the sample population.

Definitions

Population is a set of measurement of

interest to the sample collector.

Sample is any subset of

measurements selected from the

population.

Element/Unit an entity on which

measurements are obtained.

obtained for each element

Data facts and figures collected,

summarised and analysed.

Data set a set of different

variables in a particular study.

variability; otherwise

there is nothing to study

variables

Variation is important!!!!

different values for different people, times,

places, species etc is called a

VARIABLE

Eg., height, weight, uric acid level, Xrays

findings, parity, social class etc.

eg., the ratio of the circumference of a circle to

its diameter is a constant, 3.141592654 for all

sized circles

Types of variables

A QUALITATIVE variable is one which does

not take a numerical value. It may be

concerned with the characteristics eg., gender,

survival or death, place of birth, colour of eyes

etc.

A QUANTITATIVE variable takes

a numerical value. eg., height, blood pressure,

lung capacity, exact age, parity, number of

cases in a study, completed family size, age last

birthday etc.

TYPES OF VARIABLES

Variable

Qualitative

or categorical

Nominal

(not ordered)

e.g. ethnic

group

Ordinal

(ordered)

e.g. response

to treatment

Quantitative

measurement

Discrete

(count data)

e.g. number

of admissions

Continuous

(real-valued)

e.g. height

CATEGORICAL VARIABLES

must cover all possibilities

CATEGORICAL NOMINAL

VARIABLES

Named categories

No implied order among categories

Examples:

Gender: Male/Female

Blood Groups: 0, A, B, AB

Ethnic Group: Chinese, Malay, Indian,

Jordanian

Eye color: brown/black/blue/green/mixed

categories

Differences between categories

may not be considered equal

Examples:

unsatisfactory

Pain severity: no pain, slight pain,

moderate pain, severe pain

QUANTITATIVE VARIABLES

Can be measured numerically

Examples:

weight

# of admissions to the hospital

concentration of chlorine

Integers that correspond to a count

Can assume only whole numbers

Examples:

#

#

#

#

of

of

of

of

missing teeth

accidents in a time period

illnesses in a time period

CONTINUOUS DATA

range

Limitations imposed by the measuring

stick

time

Categorical and quantitative variables are statistically

summarized and presented in different ways

Variable Type

Data Presentation

Quantitative

Graphs, Tables

Categorical

Charts, Tables

TYPES of DATA

Qualitative data Categorical data

Quantitative data Numerical data

Qualitative/Categorical Data

There are two types of categorical

data:

nominal

NOMINAL DATA

named categories. These categories however,

cannot be ordered one above another (as they

are not greater or less than each other).

Example:

NOMINAL DATA CATEGORIES

Sex/ Gender:

male, female

Marital status: single, married, widowed,

divorced

separated,

ORDINAL DATA

divided into a number of categories, but they

can be ordered one above another, from

lowest to highest or vice versa.

Example:

ORDINAL DATACATEGORIES

Level of knowledge: good, average, poor

Opinion on a statement: fully agree, agree,

disagree, totally disagree

Numerical Data

We speak of NUMERICAL DATA if the

VARIABLES are expressed in numbers. They

can be examined through:

Frequency Distribution

Percentages, Proportions, Ratios and Rates

Figures ETC.

Numerical Data

May be:

Discrete or Continuous

Discrete numerical data considers counts

which can be expressed only as whole

numbers e.g., number of people, parity,

number of males/females in a family etc.

Continuous numerical data considers

measures which can take any value

between two whole numbers e.g., weight,

height, uric acid levels etc.

SCALES OF MEASUREMENT

There are four scales (or levels) at which we measure:

__________________________________________________________

Lowest

Level

Scale

Characteristic

_________________________________________________________

Nominal naming

Ordinal ordering

Interval equal interval without absolute zero

Ratio

equal interval with absolute zero

__________________________________________________________

Highest

__________________________________________________________

DATA SUMMARIZATION

Measures of Dispersion and

Measures of Shapes

Central Location

Number of people

Spread

Age

represents (is a good summary of) an

entire distribution of data

Measure of central position

Common measures

Arithmetic mean

Median

Mode

Age

27

30

28

31

28

36

29

37

29

34

30

30

27

30

Ages of students in a class (years)

Ob

s

Age

27

27

28

28

28

29

29

29

29

10

30

11

30

12

30

13

30

14

30

15

31

16

31

17

32

18

34

19

36

20

37

lowest value to the highest value

Add observation numbers

MODE

Definition: Mode is the value that occurs

most frequently

Method for identification

1. Arrange data into frequency

distribution or histogram,

showing the values of the

variable and the frequency with

which each value occurs

2. Identify the value that occurs

most often

Mode

Ob

s

Age

27

27

28

28

28

Age

Frequenc

y

29

29

27

29

28

29

29

10

30

11

30

30

12

30

31

13

30

32

14

30

33

15

31

16

31

34

17

32

35

18

34

36

19

36

20

37

37

Mode

Obs

Age

27

27

28

28

28

29

Mode

The most frequent value of the variable

Mode

= 30

29

29

29

10

30

11

30

12

30

13

30

14

30

15

31

16

31

17

32

2

7

3

2

3

3

3

4

3

5

3

6

3

7

18

34

19

36

20

37

Frequency

28

29 30 31

Age (years)

STAY DATA

0, 2, 3, 4, 5, 5, 6, 7,

8, 9,

9, 9, 10, 10, 10, 10, 10, 11,

12, 12,

12, 13, 14, 16, 18, 18, 19, 22,

27, 49

Mode = 10

20

Unimodal Distribution

18

Population

16

14

12

10

8

6

4

2

0

18

16

Population

14

12

10

8

6

4

2

0

Bimodal Distribution

explain, identify

Always equals an original value

Insensitive to extreme values

(outliers)

Good descriptive measure, but poor

statistical properties

May be more than one mode

May be no mode

Does not use all the data

MEDIAN

Definition: Median is the middle

value; also, the value that splits the

distribution into two equal parts

50% of observations are above the median

1.

2.

3.

Find middle position as (n + 1) / 2

Identify the value at the middle

Obs

Age

27

27

28

28

28

29

29

29

29

10

30

11

30

12

30

13

30

14

30

15

31

16

31

17

32

18

34

19

36

Median:

Odd Number of Values

n = 19

Median

Observation

=

=

n+1

2

19+1

2

20

2

10

Obs

Age

27

27

28

28

28

29

29

29

29

10

30

11

30

12

30

13

30

14

30

15

31

16

31

17

32

18

34

19

36

Median:

Even Number of Values

n = 20

Median

Observation

=

=

=

n+1

2

20+1

2

21

2

10.5

11th observation

30+30

2

30 years

Median at 50% = 10

IS MEDIAN SENSITIVE TO OUTLIERS?

0, 2, 3, 4, 5, 5, 6, 7, 8, 9,

9, 9, 10, 10, 10, 10, 10, 11, 12, 12,

12, 13, 14, 16, 18, 18, 19, 22, 27, 49

0, 2, 3, 4, 5, 5, 6, 7, 8, 9,

9, 9, 10, 10, 10, 10, 10, 11, 12, 12,

12, 13, 14, 16, 18, 18, 19, 22, 27, 149

available

Insensitive to extreme values

(outliers)

Good descriptive measure but

poor statistical properties

Measure of choice for skewed

data

Equals an original value of n is

odd

Quartiles

Definition: Quartile is the value that splits

the distribution into four equal parts

25%

25%

25%

25%

of observations are between Q1 and Q2 (median)

of observations are between Q2 (median) and Q3

of observations are above Q3

Q1

Q2

Q3

Obs

Age

27

27

28

28

28

29

29

29

29

10

30

11

30

12

30

13

30

14

30

15

31

16

31

17

32

18

34

19

36

20

37

Quartiles

Q1 age = 28

Q2 age = 30

Q3 age = 31

n+1

Q1 observation = round

4

20+1

21

=

=

4

4

= 5.25 ~ 5th obs

Q2 observation =

10.5 (median)

3(n+1)

Q3 observation = round

4

3(20+1)

3(21)

=

=

4

4

= 15.75 ~ 16th obs

Percentiles

Value of the variable that splits the

distribution in 100 equal parts

35 % of observations are below the 35th percentile

65 % of observations are above 35th percentile

Obs

Age

27

27

28

28

28

29

29

Percentiles

Value

s

(Age)

Fre

q

Percent

(Freq/To

tal)

Cumulati

ve

Percent

27

10%

10%

29

28

15%

25%

29

29

20%

45%

10

30

30

25%

70%

11

30

12

30

31

10%

80%

13

30

32

5%

85%

14

30

34

5%

90%

15

31

36

5%

95%

16

31

37

5%

100%

17

32

18

34

Total

20

100%

19

36

20

37

25th Percentile

90th Percentile

ARITHMETIC MEAN

Arithmetic mean = average value

1.

2.

Divide the sum by the

number of observations

(n)

Obs

Age

27

27

28

28

28

29

29

29

29

10

30

11

30

12

30

13

30

14

30

15

31

16

31

17

32

18

34

19

36

20

37

Arithmetic Mean

i

x

x

n

n = 20

xi = 605

x 605

20

30.25

0, 2, 3, 4, 5, 5, 6, 7, 8, 9,

9, 9, 10, 10, 10, 10, 10, 11, 12, 12,

12, 13, 14, 16, 18, 18, 19, 22, 27, 49

Sum = 360

n = 30

Mean = 360 / 30 = 12

CENTERING PROPERTY OF

MEAN

0

2

3

4

5

5

6

7

8

9

12

12

12

12

12

12

12

12

12

12

-71

= -12 9

= -10 9

= -9 10

= -8 10

= -7 10

= -7 10

= -6 10

= -5 11

= -4 12

= -3 12

-17 88

12

12

12

12

12

12

12

12

12

12

=

=

=

=

=

=

=

=

=

=

-3 12

-3 13

-2 14

-2 16

-2 18

-2 18

-2 19

-1 22

0 27

0 49

12

12

12

12

12

12

12

12

12

12

=

=

=

=

=

=

=

=

=

=

0

1

2

4

6

6

7

10

15

37

SO SENSITIVE TO OUTLIERS

6

5

4

3

2

1

0

Mean = 12.0

10

15

20

25

30

Nights of stay

Mean = 15.3

35

40

45

50

Centered distribution

Approximately

symmetrical

Few extreme values

(outliers)

OK!

USES

central location

Uses all of the data

Affected by extreme values (outliers)

Best for normally distributed data

Not usually equal to one of the

original values

Good statistical properties

Var A

0 0

0 4

1 4

1 4

1 5

5 5

9 5

9 6

9 6

10

10

Var B

0

1

2

3

4

5

6

7

8

6 9

10 10

Var C

For each variable,

find the:

Sum

Mean

Median

Mode

Minimum value

Maximum value

Var A

Var B

Sum: 55 55 55

Mean:

Median:

Mode:

Min:

Max:

Var C

For each variable,

find the:

Sum

Mean

Median

Mode

Minimum value

Maximum value

Var A

Var B

Var C

Sum: 55 55 55

Mean: 5 5 5

Median: 5 5 5

Mode: 1,9 4,5,6 none

Min: 0 0 0

Max: 10 10 10

find the:

Sum

Mean

Median

Mode

Minimum value

Maximum value

Symmetrical:

Skewed right:

Mode < Median < Mean

Skewed left:

Mean < Median < Mode

Measure of Central Location single measure

that represents an entire distribution

Mode most common value

Median central value

Arithmetic mean average value

Mean uses all data, so sensitive to outliers

Mean has best statistical properties

Mean preferred for normally distributed data

Median preferred for skewed data

Same center

but

different dispersions

MEASURES OF SPREAD

Definition: Measures that quantify

the variation or dispersion of a set

of data from its central location

Also known as:

Measure of dispersion

Measure of variation

Common measures

Range

Standard error

Interquartile range

95% confidence

interval

Variance / standard deviation

RANGE

Definition: difference between largest and

smallest values

Properties / Uses

Greatly affected by outliers

Usually used with median

STAY DATA

0, 2, 3, 4, 5, 5, 6, 7, 8,

9,

9, 9, 10, 10, 10, 10, 10, 11, 12,

12,

12, 13, 14, 16, 18, 18, 19, 22, 27, 49

6

5

4

3

2

1

0

Range = 49 - 0 = 49

10

15

20

25

30

Nights of stay

35

40

45

50

INTERQUARTILE RANGE

Definition: the central 50% of a distribution

Properties / Uses

Used with median

Five-number summary for boxand-whiskers diagram:

Third quartile (75%)

Median (50%)

First quartile (25%)

Minimum (0%, smallest value)

INTERQUARTILE RANGE

LENGTH OF STAY DATA

Q1

0, 2, 3, 4, 5, 5, 6, 7, 8,

9,

9, 9, 10, 10, 10, M 10, 10, 11, 12,

12,

Q3

12,th 13, 14, 16, 18, 18, 19, 22, 27,

Q1 = 25

percentile

@

(30+1)

/

4

=

7

6

49

Median = 50th percentile @ 15.5

10

Q3 = 75th percentile @ 3 (30+1) / 4 = 23

14

BOX-AND-WHISKERS DIAGRAM

LENGTH OF STAY DATA

BOX-AND-WHISKERS DIAGRAMS

VARIABLES A, B, C

DEVIATION

Definition: measures of variation that

quantifies how closely clustered the

observed values are to the mean

Variance

= average of squared deviations

from mean

= Sum (x mean)2 / n-1

Standard deviation

= square root of variance

STANDARD DEVIATION

x : mean

xi : value

n : number

sd: variance

sd : standard deviation

i - x

SD =

n-1

SD =

x i - x

n-1

STANDARD DEVIATION

x : mean

xi : value

n : number

sd: variance

sd : standard deviation

x i - x

SD

n-1

=

x

x - x

i

2. Subtract the mean from each observation.

x i - x

4. Sum the squared differences

x i - x

3. Square the difference.

6. Take the square root of the variance

SD

= s2

CENTERING PROPERTY OF

MEAN

0

2

3

4

5

5

6

7

8

9

12

12

12

12

12

12

12

12

12

12

-71

= -12 9

= -10 9

= -9 10

= -8 10

= -7 10

= -7 10

= -6 10

= -5 11

= -4 12

= -3 12

-17 88

12

12

12

12

12

12

12

12

12

12

=

=

=

=

=

=

=

=

=

=

-3 12

-3 13

-2 14

-2 16

-2 18

-2 18

-2 19

-1 22

0 27

0 49

12

12

12

12

12

12

12

12

12

12

=

=

=

=

=

=

=

=

=

=

0

1

2

4

6

6

7

10

15

37

(0 12)2

0

(2 12)2

1

(3 12)2

4

(4 12)2

16

(5 12)2

36

(5 12)2

36

(6 12)2

49

(7 12)2

= 144

= 100

USES

calculated only when data are more

or less normally distributed (bell

shaped curve)

For normally distributed data,

68% of the data fall within 1 SD

95% of the data fall within 2 SD

99% of the data fall within 3 SD

NORMAL DISTRIBUTION

2.5%

95%

68%

Standard

deviation

Mean

2.5%

Mode

Standard deviation

Median

Arithmetic mean

Range

Interquartile range

Mode

Standard deviation

Median

Arithmetic mean

Range

Interquartile range

MEASURES OF CENTRAL LOCATION AND SPREAD

Distribution

symmetrical deviation

Skewed or Median Range or

Data with outliers

Interquartile range

* Median and mode will be similar

Properties of

Measures of Central Location & Spread

For quantitative / continuous variables

Mode simple, descriptive, not always useful

Median best for skewed data

Arithmetic mean best for normally distributed

data

Range use with median

Standard deviation use with mean

Standard error used to construct confidence

intervals

Median

Mode

14

12

Population

10

8

6

4

2

0

Age

1st quartile

Minimum

3rd quartile

Interquartile interval

Range

Maximum

Measures of Shapes

distribution. This is a bell shaped curve

with most of the values clustered near the

mean and a few values out near the tails.

MEASURES OF VARIATION

Range is defined as the difference in value

between the highest (maximum) and the lowest

(minimum) observation

Variance is defined as the sum of the squares of

the deviation about the sample mean divided by

one less than the total number of items.

Standard deviation it is the square root of the

variance

.2

F r a c tio n

.1 5

.1

.0 5

0

0

V ar

10

15

symmetrical around the

mean. The mean, the median

and the mode of a normal

distribution have the same

value.

An important characteristic of

a normally distributed

variable is that 95% of the

measurements have value

which are approximately

within 2 standard deviations

(SD) of the mean.

ESTIMATIONS

are applied in practice arise when trying

to deduce something about a population

from the evidence provided by a sample

of observations taken from that

population.

The population

parameters do not change

and remain constant

whereas the sample

estimates can change and

take any random value.

Population

parameters

Sample

estimates

Mean

Standard

deviation

SD

Proportion

Population

correlation

coefficient

EXTENT TO WHICH THE

SAMPLE REPRESENTS THE

POPULATION AS A WHOLE.

particular sample value

deviates from the population

value, a range or an interval

around the sample value can

be worked out which will most

probably contain the

population value.

the CONFIDENCE INTERVAL.

interval takes into account the

STANDARD ERROR. The standard

error gives an estimate of the

degree to which the sample mean

varies from the population mean. It

is computed on the basis of the

standard deviation.

calculated by dividing the standard

deviation by the square root of the

sample size:

standard deviation/ Sample

size

n

or SD /

you usually present the calculated

x ).

sample mean x 1.96 times the SE(

CONFIDENCE INTERVAL. It means

that there is 95% probability that the

population mean lies within this

interval.

size, the smaller the standard

error and the narrower the

confidence interval will be. Thus

the advantage of having a large

sample size is that the sample

mean will be a better estimate of

the population mean.

differences can be significant but

a large difference may not

achieve statistical significance

due to small sample size. This

leads us to calculating the

Confidence Intervals.

- Data Summary MetricsUploaded byMandar Gadre
- statisticsUploaded byAhmed Talaat
- Business Mathematics AssignmentUploaded byanish1012
- MIL-STD-414Uploaded byjamie04
- M3Uploaded byFatmata Haja Kamara
- Criticism against the factors of the movieUploaded bySana Shahid
- Small-scale evaluation in health. Cap.07 Understanding numerical dataUploaded bymes_reis
- Module 1 - StatisticsUploaded byRogelio Maneclang Canuel Jr.
- 09 Chapter 3Uploaded bymastermind_asia9389
- Self Study Maths Self Study GuideUploaded byHassebKhan
- Basic Math fINALUploaded byVinoodini SI
- The StatisticsUploaded byHridoy Bhuiyan
- Intro StatisticsUploaded byMarcelaMoreno
- SD-SEUploaded byNaeem Iqbal
- Buchheit - Pmet IJSM 2015Uploaded byLuciano Arrien
- Student 1 PPT MAT Manuscript Science Achievement in Trece Martinez Final EditedUploaded byDemetrio Barrozo Dela Rosa Jr.
- MDT11CCDUploaded byKarla Hoffman
- Chap 003Uploaded byShahmir Hamza Ahmed
- The MeanUploaded byEprinthousesp
- 2017 Awards Ceremony Survey ResultsUploaded byJohn Acardo
- Statistics Measures of Central TendencyUploaded byAnjel Dane Go
- Revision Sheet II for Unit IIUploaded byFor, rkl
- puzzle testsUploaded byapi-360194114
- Final ReportUploaded byRahat Ul Amin
- DeformaçãoTreliçasUploaded byFabricio Carneiro
- Statistical Thinking and Applications 2Uploaded bycoczmark
- Bab5Uploaded byIrdayani Hamid
- A3Uploaded byHio
- Assessment of Inward Leakage for Air Fed SuitsUploaded bys3dbw
- Jan 2007Uploaded byAnmol Chopra

- Supervisor Support and Organizational Climate as Predictors of Work Family ConflictUploaded byarcherselevators
- manual_phyf110.pdfUploaded byBITS PILANI
- Quiz 4 (Take Home)Uploaded byYellow Carter
- Auditory Development in Early Amplified ChildrenUploaded byadriricalde
- Aspects of Mothers’ Parenting Independent and Speciﬁc INGLESUploaded byresiamorais
- Genetic parameters of body condition score and milk production traits in Canadian Holsteins .pdfUploaded byfranky
- 3.1Paper1-An Overview of Electricity Demand Forecasting Techniques.pdfUploaded byingeperdomo
- Survival Part 2Uploaded bysarath.annapareddy
- Working for Cents on the Dollar: Race and Ethnic Wage Gaps in the Noncollege Labor MarketUploaded byTlecoz Huitzil
- tif_ch08Uploaded byJerkz Lim Pei Yong
- ANOVA AssumptionsUploaded byAbuzar Tabassum
- 97em MasterUploaded byJoey Mclaughlin
- Forecasting Stock Price Index Using Artificial Neural Networks in the Indonesian Stock ExchangeUploaded byTimothy Diaz
- BBS Semester III.docxUploaded byDevkota Sunil
- Meriwell Case ForecastingUploaded byHarishankarSethuraman
- RM ANOVA - SPSS InterpretationUploaded bylieselbad
- 47161_2107_Electronic_Instruments_and_Measurements.pdfUploaded byKendall Strip Jennah
- Are Fallen Angels SpecialUploaded byAnamaria Cociorva
- Tell a 1Uploaded byAme Catechu
- A Comparison Among Experienced and Beginning TeacherUploaded byVoon Cheng Cheng
- 1st Year Lab InstrumentationUploaded byenonsoezeoke
- Stat7055 l05 Samp-distUploaded byMuneeb Jasim
- Calibration by Linear Regression - TutorialUploaded bypatrickjanssen
- Tutorials2016s1 Week7 Answers-3Uploaded byyizzy
- Effects of Music and Color on MemoryUploaded byTerence Titus
- Johnson NoiseUploaded bykapanak
- ES714timeseries.pdfUploaded byManuel Novo
- owenUploaded byParveshi Pusun
- Template for parameter estimation with Matlab Optimization Toolbox.pdfUploaded byacheges
- STA301 - Final Term Solved Subjective With Reference by MoaazUploaded byAdnan Khawaja