14 views

Uploaded by 19512

save

- G 32 - 03 _RZMY
- 2 Introduction Description
- statistics-for-analytical-chemistry
- Business Math for CAIIB III.ppt
- 15.pdf
- Quantitative Methods
- Case Allied.doc
- AE 101 L3 PS1 - Frequency Analysis
- Measures Ariability v0.1
- mennis_cagis06
- STAT Share Price Comparison of Pakistani Banks
- g57(Resistividad)
- 7619
- 9709_w12_qp_72
- Win Bugs
- S3 June 2011 Question Paper
- Research Report.docx
- Essentials of Investment Analysis and Portfolio Management
- 6 Minutes Tests
- 9ABS304 Probability and Statistics
- OM5SQC
- Portfolio Management
- Waiting Lines 2
- Translate Inggris
- ABELES, Marc - Estado
- IRJET-Analysis of BER Performance for DCO-OFDM in VLC SYSTEM
- Informe de Refri
- Reglamento F-11 Escolar 10-11(2)
- año
- DINAMICA Socio Drama
- JUEGOS CORPORALES 1_
- analisis_data_geofisika.pdf
- 1. SEP PROGRAMASDEESTUDIO2011.GUIAPARAELMAESTRO.EDUCACIONBASICA.SECUNDARIA.ESPAOL.pdf
- Trabajar con las familias de las personas con discapacidades (2002).pdf
- Prisionero m24x50 Mm
- Ciclo-de-Evaluacion-Formativa-procesos-clave-para-evaluar-dia-a-dia.pdf
- bib act
- Dealer Operating Standard Premises Guidelines Appendix 4 Eur
- Hjk
- Pregunta 3 - ejemplo de capitalización
- 4Basico - Guia Trabajo Historia - Semana 22.pdf
- Preliv Od Pečuraka
- PROYECTO FENCYT
- 6_Implantação (1)
- 09172018mfm.docx
- ragatanta.pdf
- Friedman 03
- 1852 Specification for Rolling and Cutting Tolerances for Hot-rolled Steel Product
- Henri Lefebvre La producción del espacio.pdf
- pr
- Elasticity Practice Questions
- Comodato Contrato de.doc
- Guía2 Economía Colombiana

You are on page 1of 20

**Review: some key Definitions
**

Definition 1: Frequency table of quantitative data

Tabulations and Frequency Distributions

One of the simplest ways to summarize data is by tabulation.

Table 1 Distribution of Married Women of

Reproductive Age According to Present

Number of children in rural area China

Number of

Children

Number of women

0 13751

1 25171

2 30426

3 28560

4 21719

5 13695

6 7255

7 3268

8 1151

9 373

10＆above 156

total 145,525

Definition 2: Frequency distribution

One of the most common ways of describing a sample pictorially is to

plot on one axis values of the variable and on another axis the frequency

of occurrence of a value or a measure related to it.

Graph 1 Distribution of Married Women

0

5000

10000

15000

20000

25000

30000

35000

0 1 2 3 4 5 6 7 8 9

1

0

＆

a

b

o

v

e

Number of Children

N

u

m

b

e

r

o

f

W

o

m

e

n

系列1

The type of curve

Definition 3: Normal distribution: The peak of the curve is in the middle,

bilateral symmetric with respect to mean

Definition 4 Skewed distribution: The peak of the curve is not in the

middle.

Graph 1 Distribution of Married Women

0

5000

10000

15000

20000

25000

30000

35000

0 1 2 3 4 5 6 7 8 9

1

0

＆

a

b

o

v

e

Number of Children

N

u

m

b

e

r

o

f

W

o

m

e

n

系列1

Measures of central tendency

An entire distribution can be characterized by one typical measure that

represents all the observations-measures of central tendency.

It is the average including mean, geometric mean and median.

(1) Mean is calculated by following formula:

X

X X X

n

X

n

X

n

n

i

i

n

=

+ + +

= =

=

¯

¯

1 2 1

...

(for raw data)

For example, The weight (kg) of ten 7-year-old boys are 17.3, 18.0, 19.4,

20.6, 21.2, 21.8, 22.5, 23.2, 24.0, 25.5. Find their average weight.

) ( 35 . 21

10

5 . 213

10

5 . 25 ... 0 . 18 3 . 17

kg X = =

+ + +

=

n

fX

n

X f

f f f

X f X f X f

X

n

i

i i

n

n n

¯

¯

= =

+ + +

+ + +

=

=1

1 2 1

2 2 1 1

...

...

(for frequency table)

Table 1 Mean and SD of Height of 14 years old female

children

Height f

i

x

i

f

i

x

i

f

i

x

2

i

124~ 2 126 252 31752

128~ 3 130 390 50700

132~ 11 134 1474 197516

136~ 22 138 3036 418968

140~ 39 142 5538 786396

144~ 27 146 3942 575532

148~ 16 150 2400 360000

152~ 5 154 770 118580

156~ 3 158 474 74892

160~164 2 162 324 52488

Total 130 186000 2666824

n

fX

n

X f

f f f

X f X f X f

X

n

i

i i

n

n n

¯

¯

= =

+ + +

+ + +

=

=1

1 2 1

2 2 1 1

...

...

) ( 08 . 143

130

18600

2 ... 3 2

162 2 ... 130 3 126 2

cm X = =

+ + +

× + + × + ×

=

(2) Geometric mean is calculated by the same formula as for mean and

the only difference is to transform the value into logarithm when the

calculation.

)

lg

( lg )

lg ... lg lg

( lg

1

2 1

1

n

X

n

X X X

G

n

¯ ÷ ÷

=

+ + +

=

(for raw data)

)

lg

( lg )

lg ... lg lg

( lg

1 2 2 1 1 1

¯

¯

¯

÷ ÷

=

+ + +

=

f

fX

f

X f X f X f

G

n n

(for

frequency table data)

Example The serum titre of five person : 1:2, 1:4,1:8,1:16,1:32. Find the

average titire.

8 ) 9031 . 0 ( lg )

5

5154 . 4

( lg )

5

32 lg 16 lg 8 lg 4 lg 2 lg

( lg

1 1 1

= =

+ + + +

=

÷ ÷ ÷

G

(3)Median is the value of observation located in the middle of value

sequence of observations (sorted first). It divides the frequency

distribution in half when all the values are listed in order.

For example:

120,123,125,127,128,130,132 (7 values)

M=127.

118,120,123,125,127,128,130,132 (8 values)

M=(125+127)/2=126.

For the data from a frequency table, we do not know the exactly value of

median so that we calculate the median by following formula:

Median or Percentile

) % (

¯

÷ + =

L

x

x

f nx

f

i

L P

x means percentile; L means the low limit of group where

percentile located in; i means the interval; f means frequency in the group.

Table 2 Latent period of an infective disease

Latent days

Group

F

Frequency

Ef

Cumulative

frequency

Cumulative

percent/%

4~ 26 26 24.07

8~ 48 74 68.52

12~ 25 99 91.67

16~ 6 105 97.22

20~ 3 108 100.00

Which group is median located in？

) ( 33 . 10 ) 26 % 50 108 (

48

4

8

50

days P = ÷ × + =

Percentile

The xth percentile of a data set is a value such that at least x percent of

the items take on this value or less and at least (100-x) percent of the

items take on this value or more.

Median is the fiftieth percentile. 50% of items is less than Median, and

50% of items is large than it.

(4) Mode is the value that occurs with the greatest frequency.

2，2，3，3，3，3，4，5，6

Let’s summarize the application of average:

1.Mean is suitable to the data distributed in normal distribution or at least

symmetric distribution.

2.Geometric mean is suitable to the data distributed in positive skewed

distribution or logarithm normal distribution.

3.Median is suitable to all kinds of data but it is of poor attribute for

further analysis comparing to mean. skewed distribution

Measures of dispersion

(1) Range

The range is the difference between the largest and smallest values in the

data set.

Disadvantage: It only reflect the tow extremely values, the biggest and

the smallest.

(2) Quartile interval

Quartile interval can be regarded as the range of half observed values in

the middle part, marked by Q.

The interquartile range is the difference between the third quartile (75

th

percentile) and the first quartile (25

th

percentile).

Table 2 Latent period of an infective disease

Latent days f Ef %

4~ 26 26 24.07

8~ 48 74 68.52

12~ 25 99 91.67

16~ 6 105 97.22

20~ 3 108 100.00

To calculate the interquartile range.

① Find

75

P

and

25

P

) % (

¯

÷ + =

L

x

x

f nx

f

i

L P

75

P

： L=12, i=4,

75

f

=25,

¯

=

L

f

74

( ) 12 . 13 74 % 75 108

25

4

12

75

= ÷ × + = P

25

P

: L=8, i=4,

25

f

=48,

¯

=

L

f

26

( ) 08 . 8 26 % 25 108

48

4

8

25

= ÷ × + = P

② calculated the interquartile range

Q=

75

P

—

25

P

=13.12－8.08=5.04

(3) Variance

A key step in computing the variance involves the computation of the

difference between each data values and the mean for the data set.

( )

¯

= ÷ 0 x x

(the positive and negative deviations cancel each orther, causing the sum

of deviations about the mean is 0)

( )

¯

= ÷ 0

2

x x

The value of the squared deviations dependent on the number of the

values, except the variability.

The average squared deviation is called Variance. Marked

2

o

for

population,

2

s

for sample.

( )

n

x

¯

÷

=

2

2

u

o

( )

n

x x

s

¯

÷

=

2

2

(4) Standard Deviation， SD

The Standard Deviation of a data set is defined to be the positive square

root of the variance.

( )

n

x

2

¯

÷

=

u

o

( )

n

x x

s

2

¯

÷

=

Application of standard deviation:

Standard deviation Show the dispersion degree of variable distribution.

The big standard deviation shows the large variation degree of variable

value. The variable values are more dispersion (father away from mean ),

the representation of mean is poor. Conversely smaller standard deviation

shows that the variable values more centralized around mean, so the

representation of mean for each variable value is better.

(6) Coefficient of Variation

Coefficient of variation, also called coefficient of dispersion, marked by

CV, is the ratio between standard deviation S and mean X expressed by

percentage. Formula is

% 100 × =

X

S

CV

All ranges Quartile intervals and standard deviations have measurement

units, which is the same as the unit of observed value. While coefficient

of variation is relative number and has no measurement unit. Thus it is

more suitable for data analysis and comparison.

The coefficient of variation is often used in:

1) Comparing variation degree of several data whose means are of great

disparity. One is bigger than another in twice times or more.

2) Comparing variation degree of several data of different measurement

units.

For example:

100 20-year-old men in a place, the mean of height is 166.o6 cm,

standard deviation is 4.95 cm; the mean of weight is 53.72 kg, standard

deviation is 4.96 kg. To compare which variation degree is larger. We

should compare the coefficient of variation instead of comparing the

standard deviation because of the different measurement unit (kg and cm).

Now

% 98 . 2 % 100

06 . 166

95 . 4

= × =

height

CV

% 23 . 9 % 100

72 . 53

96 . 4

= × =

wieght

CV

The variation degree of weight is larger than that of height.

Statistical inference of measurement data

There are two purposes of statistical analysis:

(1)The statistical description, to describe and to summarize the important

features of data by a few of statistics.

(2)The statistical inference, to make a generalization from the sample to

the population including the estimation of population parameter and the

hypothesis testing.

The sampling error

From a population, the samples are selected and the means of these

samples will be different from each other and from the population mean.

This difference is the sampling error.

Why ? The sampling error is related to the variation of observations in the

population.

2. Standard error, SE

The measure of sampling error is standard error.

Formula:

n

x

o

o =

(for population mean)

n

s

s

x

=

(for sample mean)

The sampling error is also related to the sample size.

If the sample size equals the population size, there is no sampling error. If

the sample size equals 1, the sampling error equals SD.

3. Hypothesis testing of population mean

Supposed there is a samples and sample mean is

1

X

. For example,

mean of hemoglobin of 280 healthy male adults is 136.0 g/L, and SD is

6.0 g/L. The population mean is 140.0 g/L. The sample mean is different

from the population mean. There are two possibilities:

a. The difference is because of the sampling error. It means μ

1

= μ

0

=140.

b. The difference is substantial, because the sample come from different

population. It means μ

1

≠ μ

0

=140.

Now we need to make a judgment : which possibility is true.

(1) The steps of hypothesis testing

①Setting Hypothesis

H

0

: null Hypothesis, H

1

:alternative Hypothesis

α=0.05 (to determine if reject H

0

or not)

②Calculating the value of statistic (t)

③ Determine value of P and making a judgment:

Judgment :

When P≤α，the conclusion is to refuse H

0

but accept H

1

, there are

significance different.

When P>α，the conclusion is not to refuse H

0

, there are no significance

different.

4. There are three kinds of design patterns:

●Comparing the sample with the population;

One-sample T Test

●Comparing in a matched pair way;

Paired-sample T Test

●Comparing two independent samples.

Two Independent-sample T Test

Home Work

1. To calculate the mean and SD, Median and Q

Table Mean and SD of Height of 14 years old female children

Height X

Min-point

f

i

Cumulative

frequency

Cumulative

percent /%

124~ 126 2 2 1.5

128~ 130 3 5 3.8

132~ 134 11 16 12.3

136~ 168 22 38 29.2

140~ 142 39 77 59.2

144~ 146 27 104 80.0

148~ 150 16 120 92.3

152~ 154 5 125 96.2

156~ 158 3 128 98.5

160~164 162 2 130 100.0

Total 130

2. 50 measles-susceptible children have been vaccinated for a month. The

antibody titire is shown in table below. Find the average titire.

caculation of average tiire

Antibody

titire

Number of

Children

Reciproced

of titire, x

lgx f.lgx

(1) (2) (3) (4) (5)=(2)(4)

1:4 1 4

1:8 5 8

1:16 6 16

1:32 7 32

1:64 8 64

1:128 10 128

1:256 8 256

1:512 5 512

total 50

3. The average sleep time is supposed to be 8 hours a day (m). We think

college students sleep a different amount, maybe more - maybe less. We

survey ten students to see how much they sleep. The data are as follows

(each value represents a student): 5, 4, 6, 4, 8, 6, 5, 4, 3, 7, 5, 5, 5, 6, 6

(hours). One-sample t test

①Setting Hypothesis

H0: H1: α=0.05

②Calculating the value of statistic (t value for t-test)

③Making a judgment:

4. In a small clinical to assess the value of a new tranquillizre on

psychoneurotic patients, each patient was given a week’s treatment with

the drug and a week’s treatment with a placebo, the order in which the

two sets of treatments were given being determined at random. At the end

of each week the patient had to complete a questionnaire, on the basic of

which he was given an ‘anxiety score’ (with bossible values from 0 to 30),

high score corresponding to states of anxiety. The results are shown in

Table.

Table 1 Anxiety scores recorded for 10 patients receiving a new drug and

placebo in random order

Anxiety score difference:

patient Drug Placebo di

(1) (2) (3)=(1)-(2)

1 19 22 -3

2 11 18 -7

3 14 17 -3

4 17 19 -2

5 23 22 1

6 11 12 -1

7 15 14 1

8 19 11 8

9 11 19 -8

10 8 7 1

paired-sample t test

①Setting Hypothesis

H0: ，H1: ， α=0.05

②Calculating the value of statistic (t value for t-test)

n s

d

t

d

/

0 ÷

=

③Making a judgment:

5. Cardiovascular disease, Hypertension. Suppose a sample of 20 35-39-

year-old nonpregnant, premenopausal OC users who work in a company

are identified who have mean systolic blood pressure of 132.86 mmHg

and sample standard deviation of 15.34 mmHg. A sample of 21 35-39-

year-old non-pregnant, premenopausal non-OC users are similarly

identified who have mean systolic blood pressure of 127.44 mmHg and

sample standard deviation of 18.23 mmHg. What can be said about the

underlying mean difference in blood pressure between the two

populations?

Group n mean SD

OC users 20 132.86 15.34

non-OC users 21 127.44 18.23

two independent-sample t test

①Setting Hypothesis

2 1 0

: u u = H

,

2 1 0

: u u = H

α=0.05

②Calculating the value of statistic (t value for t-test)

)

1 1

(

) 1 ( ) 1 (

2 1 2 1

2

2 2

2

1 1

2 1 2 1

2 1

n n n n

s n s n

x x

s

x x

t

x x

+

+

÷ + ÷

÷

=

÷

=

÷

υ=n

1

+n

2

-2

③Making a judgment:

- G 32 - 03 _RZMYUploaded byاحمد عقل
- 2 Introduction DescriptionUploaded byKhaidhir Harun
- statistics-for-analytical-chemistryUploaded byCarzen Joe Albolario
- Business Math for CAIIB III.pptUploaded byteju16sy
- 15.pdfUploaded byHilmy Muhammad
- Quantitative MethodsUploaded byNikhil Sawant
- Case Allied.docUploaded byChi
- AE 101 L3 PS1 - Frequency AnalysisUploaded bygregorio roa
- Measures Ariability v0.1Uploaded byMohamed Farag Mostafa
- mennis_cagis06Uploaded byliveris
- STAT Share Price Comparison of Pakistani BanksUploaded byUsman Sheikh
- g57(Resistividad)Uploaded byVictor Hernandez
- 7619Uploaded byCharina De la Cruz
- 9709_w12_qp_72Uploaded byAj Agen
- Win BugsUploaded bymuralidharan
- S3 June 2011 Question PaperUploaded bygerikaalhu
- Research Report.docxUploaded bySyazaa Orange
- Essentials of Investment Analysis and Portfolio ManagementUploaded byprince185
- 6 Minutes TestsUploaded byestuarti
- 9ABS304 Probability and StatisticsUploaded bysivabharathamurthy
- OM5SQCUploaded byfrancis
- Portfolio ManagementUploaded byPrateek Patel
- Waiting Lines 2Uploaded bylamartinezm