74 views

Uploaded by Vivek Gangadharan

- Immature granulocytes
- tmp847B
- Buchheit - Pmet IJSM 2015
- Infiltration Rate Assessment of Some Major Soils
- BDM Lecture 8
- Uji BABE Berapost
- 4 Acceptance Testing and Criteria for Ready Mixed Concrete in Hong Kong by Prof Albert Kwan
- LoA ABPS
- GEOG 3890 R23
- FAMILY ADJUSTMENT IN RELATION TO PROFESSIONAL LIFE STRESS
- High-throughput quantification of phosphatidylcholine and sphingomyelin by electrospray ionization tandem mass spectrometry coupled with isotope correction algorithm
- statistics_one_workbook.pdf
- 080009 - Standar Test Method for Drop Impact Resistance of Blow-Molded Thermoplastic Containers
- Stats
- 01 Nishat Mustafa
- 10. Probability Analysis for Estimation of Annual One Day
- Semi Month Effect in Indian IT Sector With Reference To BSE IT Index
- Net2015.docx
- Assessment 1
- soil sample average.pdf

You are on page 1of 58

A.Ramesh

Department of Mechanical Engineering NIT Calicut

Email: ramesh@nitc.ac.in, ramesh.anbanandam@gmail.com

Phone:0495-228- 6540

So far….

• Importance of • Collecting data

statistics in business • Difference between

• Classification of sample and population

statistics • Finding a meaningful

• Classification of data pattern in the data

– Nominal – Frequency distribution

– Histogram

– Ordinal

– Frequency polygons

– Interval

– Ogives

– Ratio – Pareto diagram

– Scatter plot

ramesh.anbanandam@gmail.com 2

Measures of Central Tendency:

• Measures of central tendency yield

information about “particular places or

locations in a group of numbers.”

• A single number to describe the

characteristic of a set of data

ramesh.anbanandam@gmail.com 3

Summary statistics

• Central tendency or • Dispersion

measures of location – Skewness

– Arithmetic mean – Kurtosis

– Weighted mean – Range

– Geometric mean – Interfractile range

– Median – Variance

– Percentile – Standard score

– Coefficient of variation

ramesh.anbanandam@gmail.com 4

Arithmetic Mean

• Commonly called ‘the mean’

• It is the average of a group of numbers

• Applicable for interval and ratio data

• Not applicable for nominal or ordinal data

• Affected by each value in the data set,

including extreme values

• Computed by summing all values in the

data set and dividing the sum by the number

of values in the data set

ramesh.anbanandam@gmail.com 5

Population Mean

µ=

∑ X X +X +X

= 1 2 3

+ ... + X N

N N

24 + 13 + 19 + 26 + 11

=

5

93

=

5

= 18. 6

ramesh.anbanandam@gmail.com 6

Sample Mean

X=

∑ X X +X +X

= 1 2 3

+ ... + X n

n n

57 + 86 + 42 + 38 + 90 + 66

=

6

379

=

6

= 63.167

ramesh.anbanandam@gmail.com 7

Mean of Grouped Data

• Weighted average of class midpoints

• Class frequencies are the weights

µ= ∑fM

∑f

=

∑fM

N

f1M1+ f 2M2 + f 3M 3+⋅⋅⋅+ fM

i i

=

f1+ f 2 + f 3+⋅⋅⋅+ fi

ramesh.anbanandam@gmail.com 8

Calculation of Grouped Mean

Class Interval Frequency(f) Class Midpoint(M) fM

20-under 30 6 25 150

30-under 40 18 35 630

40-under 50 11 45 495

50-under 60 11 55 605

60-under 70 3 65 195

70-under 80 1 75 75

50 2150

µ=

∑f

M2

=

1

5

0

=4

3.0

∑f 50

ramesh.anbanandam@gmail.com 9

Median

• Middle value in an ordered array of

numbers.

• Applicable for ordinal, interval, and ratio

data

• Not applicable for nominal data

• Unaffected by extremely large and

extremely small values.

ramesh.anbanandam@gmail.com 10

Median: Computational Procedure

• First Procedure

– Arrange the observations in an ordered array.

– If there is an odd number of terms, the median

is the middle term of the ordered array.

– If there is an even number of terms, the median

is the average of the middle two terms.

• Second Procedure

– The median’s position in an ordered array is

given by (n+1)/2.

ramesh.anbanandam@gmail.com 11

Median: Example

with an Odd Number of Terms

Ordered Array

3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22

• Position of median = (n+1)/2 = (17+1)/2 = 9

• The median is the 9th term, 15.

• If the 22 is replaced by 100, the median is

15.

• If the 3 is replaced by -103, the median is

15.

ramesh.anbanandam@gmail.com 12

Median: Example

with an Even Number of Terms

Ordered Array

3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21

• Position of median = (n+1)/2 = (16+1)/2 = 8.5

• The median is between the 8th and 9th terms,

14.5.

• If the 21 is replaced by 100, the median is 14.5.

• If the 3 is replaced by -88, the median is 14.5.

ramesh.anbanandam@gmail.com 13

Median of Grouped Data

N

− cfp

Median = L + 2 ( W)

fmed

Where:

L = the lower limit of the median class

cfp = cumulative frequency of class preceding the median class

fmed = frequency of the median class

W = width of the median class

N = total of frequencies

ramesh.anbanandam@gmail.com 14

Median of Grouped Data -- Example

Cumulative N

− cfp

Class Interval Frequency Frequency Md = L + 2 ( W)

20-under 30 6 6 fmed

30-under 40 18 24 50

40-under 50 11 35 − 24

50-under 60 11 46 = 40 + 2 ( 10)

11

60-under 70 3 49

= 40.909

70-under 80 1 50

N = 50

ramesh.anbanandam@gmail.com 15

Mode

• The most frequently occurring value in a

data set

• Applicable to all levels of data

measurement (nominal, ordinal, interval,

and ratio)

• Bimodal -- Data sets that have two modes

• Multimodal -- Data sets that contain more

than two modes

ramesh.anbanandam@gmail.com 16

Mode -- Example

• The mode is 44.

• There are more 44s 35 41 44 45

37 43 44 46

39 43 44 46

40 43 44 46

40 43 45 48

ramesh.anbanandam@gmail.com 17

Mode of Grouped Data

• Modal class has the greatest frequency

Class Interval Frequency d1

Mode = LMo + w

20-under 30 6 d1 + d 2

30-under 40 18

40-under 50 11 12

50-under 60 11 30 + 10 =36.31

60-under 70 3 12 + 7

70-under 80 1

ramesh.anbanandam@gmail.com 18

Percentiles

• Measures of central tendency that divide a group

of data into 100 parts

• At least n% of the data lie below the nth

percentile, and at most (100 - n)% of the data lie

above the nth percentile

• Example: 90th percentile indicates that at least

90% of the data lie below it, and at most 10% of

the data lie above it

• The median and the 50th percentile have the same

value.

• Applicable for ordinal, interval, and ratio data

• Not applicable for nominal data

ramesh.anbanandam@gmail.com 19

Percentiles: Computational Procedure

• Organize the data into an ascending ordered array.

• Calculate the

percentile location:

P

i = (n)

100

• Determine the percentile’s location and its value.

• If i is a whole number, the percentile is the

average of the values at the i and (i+1) positions.

• If i is not a whole number, the percentile is at the

(i+1) position in the ordered array.

ramesh.anbanandam@gmail.com 20

Percentiles: Example

• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17

• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28

• Location of

30th percentile: 30

i= (8) =24

.

100

• The location index, i, is not a whole number; i+1 =

2.4+1=3.4; the whole number portion is 3; the

30th percentile is at the 3rd location of the array;

the 30th percentile is 13.

ramesh.anbanandam@gmail.com 21

Dispersion

• Reliability of measure of central tendency

• To compare dispersion of various samples

ramesh.anbanandam@gmail.com 22

Variability

No Variability in Cash Flow Mean

Mean

Mean

ramesh.anbanandam@gmail.com 23

Measures of Variability or dispersion

– Range

– Interfractile range

– Deviation of Variance

– Standard Deviation

ramesh.anbanandam@gmail.com 24

Measures of Variability:

Ungrouped Data

• Measures of variability describe the spread

or the dispersion of a set of data.

• Common Measures of Variability

– Range

– Inter quartile Range

– Mean Absolute Deviation

– Variance

– Standard Deviation

– Z scores

– Coefficient of Variation

ramesh.anbanandam@gmail.com 25

Range

• The difference between the largest and the

smallest values in a set of data

• Simple to compute 35 41 44 45

• Ignores all data points except 37 41 the44 46

two extremes

• Example: 37 43 44 46

Range 39 43 = 44 46

Largest - Smallest =

48 - 35 = 13 40 43 44 46

40 43 45 48

ramesh.anbanandam@gmail.com 26

Quartiles

• Measures of central tendency that divide a group of

data into four subgroups

• Q1: 25% of the data set is below the first quartile

• Q2: 50% of the data set is below the second quartile

• Q3: 75% of the data set is below the third quartile

• Q1 is equal to the 25th percentile

• Q2 is located at 50th percentile and equals the median

• Q3 is equal to the 75th percentile

• Quartile values are not necessarily members of the

data set

ramesh.anbanandam@gmail.com 27

Quartiles

Q1 Q2 Q3

ramesh.anbanandam@gmail.com 28

Quartiles: Example

• Ordered array: 106, 109, 114, 116, 121, 122,

125, 129

• Q1 25 109 +114

i = (8) =2 Q1 = =111.5

100 2

50 16 +1

1 21

• Q 2: i= (8)=4 Q2= =1

18.5

1

00 2

7

5 2+

1

2 1

2

5

• Q 3: i= ()

8=6 3=

Q =1

2

35.

1

0

0 2

ramesh.anbanandam@gmail.com 29

Interquartile Range

quartiles

• Range of the “middle half”

• Less influenced by extremes

I

n

t

e

r

q

ua

r

t

i

l

e e =−

R

a

n

g Q3Q

1

ramesh.anbanandam@gmail.com 30

Deviation from the Mean

• Data set: 5, 9, 16, 17, 18

• Mean:

µ=

∑X 65

= =13

N 5

• Deviations from the mean: -8, -4, 3, 4, 5

-4 +5

-8 +4

0 5 10 15 20 +3

µ

ramesh.anbanandam@gmail.com 31

Mean Absolute Deviation

• Average of the absolute deviations from the

mean

X

X−µX−µ ∑ X−µ

M . A. D. =

5 -8 +8 N

9 -4 +4 24

16 +3 +3 =

17 +4 +4

5

18 +5 +5 = 4. 8

0 24

ramesh.anbanandam@gmail.com 32

Population Variance

• Average of the squared deviations from the

arithmetic mean

X

X−µ( X −µ)

2

∑ ( X −µ)

2

σ

2

5 -8 64 =

9 -4 16 N

16 +3 9 130

=

17 +4 16 5

18 +5 25 = 26 .0

0 130

ramesh.anbanandam@gmail.com 33

Population Standard Deviation

• Square root of the

variance

∑( X−µ)

2

X

X−µ( X −µ) σ

2 2

=

N

5 -8 64 130

9 -4 16 =

5

16 +3 9

= 26 .0

17 +4 16

18 +5 25

σ

2

σ =

0 130

= 26 .0

ramesh.anbanandam@gmail.com

= 51

. 34

Sample Variance

• Average of the squared deviations from the

arithmetic mean

− ( X −X ) ∑( X − X)

2

X

XX 2

2

2,398 625 390,625 S =

n−1

1,844 71 5,041

1,539 -234 54,756 663 ,866

=

1,311 -462 213,444 3

7,092 0 663,866 = 221 ,288 .67

ramesh.anbanandam@gmail.com 35

Sample Standard Deviation

• Square root of the

∑ ( X − X)

2

sample variance 2

( X −X )

S =

n− 1

−

2

X

X X

663 ,866

2,398 625 390,625 =

1,844 71 5,041 3

1,539 -234 54,756 = 221 ,288 .67

1,311 -462 213,444 2

7,092 0 663,866 S= S

= 221 ,288 .67

= 470 .41

ramesh.anbanandam@gmail.com 36

Uses of Standard Deviation

• Indicator of financial risk

• Quality Control

– construction of quality control charts

– process capability studies

• Comparing populations

– household incomes in two cities

– employee absenteeism at two plants

ramesh.anbanandam@gmail.com 37

Standard Deviation as an

Indicator of Financial Risk

Annualized Rate of Return

Financial µ σ

Security

A 15% 3%

B 15% 7%

ramesh.anbanandam@gmail.com 38

Empirical Rule

• Data are normally distributed (or

approximately normal)

the Mean Falling Within Distance

µ ±1σ 68

µ ±2σ 95

µ ± 3σ 99.7

ramesh.anbanandam@gmail.com 39

Coefficient of Variation

• Ratio of the standard deviation to the mean,

expressed as a percentage

• Measurement of relative dispersion

σ

C.V .= (100)

µ

ramesh.anbanandam@gmail.com 40

Coefficient of Variation

µ1 = 29 µ 2 = 84

σ 1

= 4.6 σ 2

= 10

σ σ

. . = µ ( 100)

CV 1

1

CV

. .= µ 2

2

( 100)

1 2

4.6 10

= ( 100) = ( 100)

29 84

= 1586

. = 1190

.

ramesh.anbanandam@gmail.com 41

Variance and Standard Deviation

of Grouped Data

Population Sample

∑f ( M −µ) S ∑f ( M−X)

2 2

σ= =

2

n −1

N

2

σ= σ

2 S= S

ramesh.anbanandam@gmail.com 42

Population Variance and Standard

Deviation of Grouped Data(mu=43)

M M −µ ( M −µ) ( M −µ)

2 2

Class Interval f M f f

30-under 40 18 35 630 -8 64 1152

40-under 50 11 45 495 2 4 44

50-under 60 11 55 605 12 144 1584

60-under 70 3 65 195 22 484 1452

70-under 80 1 75 75 32 1024 1024

50 2150 7200

f (M −

µ) 7200 σ= σ

2

∑ =14 =1

2

4 2

σ=

2

N

=

50

=

144

ramesh.anbanandam@gmail.com 43

Measures of Shape

• Skewness

– Absence of symmetry

– Extreme values in one side of a distribution

• Kurtosis

– Peakedness of a distribution

– Leptokurtic: high and thin

– Mesokurtic: normal shape

– Platykurtic: flat and spread out

• Box and Whisker Plots

– Graphic display of a distribution

– Reveals skewness

ramesh.anbanandam@gmail.com 44

Skewness

Skewed (Not Skewed) Skewed

ramesh.anbanandam@gmail.com 45

Skewness..

The skewness of a distribution is measured by comparing the

relative positions of the mean, median and mode.

• Distribution is symmetrical

• Mean = Median = Mode

• Distribution skewed right

• Median lies between mode and mean,

and mode is less than mean

• Distribution skewed left

• Median lies between mode and mean,

and mode is greater than mean

ramesh.anbanandam@gmail.com 46

Skewness

Mode

Median

Median Mode Median

Skewed (Not Skewed) Skewed

ramesh.anbanandam@gmail.com 47

Coefficient of Skewness

• Summary measure for skewness

3( µ− Md )

S=

σ

• If S < 0, the distribution is negatively skewed

(skewed to the left).

• If S = 0, the distribution is symmetric (not

skewed).

• If S > 0, the distribution is positively skewed

(skewed to the right).

ramesh.anbanandam@gmail.com 48

Coefficient of Skewness

µ 1

= 23 µ 2

= 26 µ 3

= 29

Md1 = 26 M d2 = 26 Md3 = 26

σ 1

= 12.3 σ 2

= 12.3 σ 3

= 12.3

(µ − M )

3 d1 (µ − M )

3 d2 (µ

3 − M d3 )

S S S

1 2 3

= = =

1

σ 1

2

σ 2

3

σ 3

= = =

12.3 12.3 12.3

= −0.73 =0 = + 0.73

ramesh.anbanandam@gmail.com 49

Kurtosis

• Peakedness of a distribution

– Leptokurtic: high and thin

– Mesokurtic: normal in shape

– Platykurtic: flat and spread out

Leptokurtic

Mesokurtic

Platykurtic

ramesh.anbanandam@gmail.com 50

Box and Whisker Plot

• Five specific values are used:

– Median, Q2

– First quartile, Q1

– Third quartile, Q3

– Minimum value in the data set

– Maximum value in the data set

ramesh.anbanandam@gmail.com 51

Box and Whisker Plot

Minimum Q1 Q2 Q3 Maximum

ramesh.anbanandam@gmail.com 52

Skewness: Box and Whisker Plots,

and Coefficient of Skewness

S<0 S=0 S>0

Skewed (Not Skewed) Skewed

ramesh.anbanandam@gmail.com 53

Pearson Correlation Coefficient

SS

r = XY

( SS X )( SS Y )

=

∑( X −X )(Y −Y )

∑( X −X ) ∑(Y −Y )

2 2

(∑ X )(∑Y)

∑XY − n

=

∑ 2 ( ∑X ) 2

∑ 2 − ∑Y ( )

2

X −

n Y n

− 1≤ r ≤ 1

ramesh.anbanandam@gmail.com 54

Three Degrees of Correlation

r<0 r>0

r=0

ramesh.anbanandam@gmail.com 55

Computation of r for

an Example

Futures

Interest Index

Day X Y X2 Y2 XY

1 7.43 221 55.205 48,841 1,642.03

2 7.48 222 55.950 49,284 1,660.56

3 8.00 226 64.000 51,076 1,808.00

4 7.75 225 60.063 50,625 1,743.75

5 7.60 224 57.760 50,176 1,702.40

6 7.63 223 58.217 49,729 1,701.49

7 7.68 223 58.982 49,729 1,712.64

8 7.67 226 58.829 51,076 1,733.42

9 7.59 226 57.608 51,076 1,715.34

10 8.07 235 65.125 55,225 1,896.45

11 8.03 233 64.481 54,289 1,870.99

12 8.00 241 64.000 58,081 1,928.00

Summations 92.93 2,725 720.220 619,207 21,115.07

ramesh.anbanandam@gmail.com 56

Computation of r

for the Example

( ∑ X )( ∑ Y )

∑ XY −

n

r=

∑ X 2 −

∑X ( ) 2

∑Y − 2 ( )

∑Y

2

n n

( 92.93) ( 2725 )

( 21,115.07 ) −

= 12

( 720.22 ) −

(

92 .93

2

)

(

( 619 ,207 ) − 2725

) 2

12 12

=.815

ramesh.anbanandam@gmail.com 57

Scatter Plot and Correlation Matrix

for the Example

245

240

Futures Index

235

230

225

220

7.40 7.60 7.80 8.00 8.20

Interest

ramesh.anbanandam@gmail.com 58

- Immature granulocytesUploaded bypieterinpretoria391
- tmp847BUploaded byFrontiers
- Buchheit - Pmet IJSM 2015Uploaded byLuciano Arrien
- Infiltration Rate Assessment of Some Major SoilsUploaded byحيدر الزريجاوي
- BDM Lecture 8Uploaded bycebucpatricia
- Uji BABE BerapostUploaded byFara Azzahra
- 4 Acceptance Testing and Criteria for Ready Mixed Concrete in Hong Kong by Prof Albert KwanUploaded byTharindu Edirisinghe
- LoA ABPSUploaded byamitsh20072458
- GEOG 3890 R23Uploaded byalice_chan_28
- FAMILY ADJUSTMENT IN RELATION TO PROFESSIONAL LIFE STRESSUploaded byAnonymous CwJeBCAXp
- High-throughput quantification of phosphatidylcholine and sphingomyelin by electrospray ionization tandem mass spectrometry coupled with isotope correction algorithmUploaded byBernd
- statistics_one_workbook.pdfUploaded byMuhammad Muddasir
- 080009 - Standar Test Method for Drop Impact Resistance of Blow-Molded Thermoplastic ContainersUploaded byFilipe Santana
- StatsUploaded byChristina Hill
- 01 Nishat MustafaUploaded byIqra Jawed
- 10. Probability Analysis for Estimation of Annual One DayUploaded byMayankDubey
- Semi Month Effect in Indian IT Sector With Reference To BSE IT IndexUploaded byRamasamy Velmurugan
- Net2015.docxUploaded byrakesh
- Assessment 1Uploaded byLailatul Fitri
- soil sample average.pdfUploaded byshachen2014
- 03_S_Ellison_Val.pdfUploaded byEmad Emad
- Gateway RaleysUploaded byErnesto1959
- Descheemaeket Al., 2006Uploaded byHarvey Marin
- ptr worksheet xls(2).xlsxUploaded byLhot Bilds Siapno
- Am. J. Epidemiol.-2004-Dyer-1122-31Uploaded byluchitita
- ch14_ICASVDnotes_2011Uploaded bySrinivas Sai
- Power Analysis1Uploaded byAnonymous NL4u0dtVbc
- Sol_PQ220-6234F.Ch-06.pdfUploaded byucho
- Hunter Et Al (1990, JAP) - Individual Differences in Output Variability as a Function of Job ComplexityUploaded byAndrei Băcanu
- ISO 5136 2003Uploaded byArmond Kurti

- MCAT Math PortionMCATUploaded bywbowen92888
- General PhysicsUploaded byAravindh Akash
- FLOWCHART.pptxUploaded byRaja Rosenani
- UsefulStataCommandsUploaded bygergoszetlik7300
- Howto Octave c++Uploaded bySimu66
- Additional Mathematics Project Work 2013 : SabahUploaded byHRiffs Guns
- sdsdsUploaded byTefadeSimon
- Double helix WaterUploaded byjosse@bluepossum.com
- Equations of StateUploaded bylionforcenike95
- Central Limit TheoremUploaded bykernelexploit
- CEE291_SyllabusUploaded byEdison Lau
- Signals and Systems Kuestion (EE).pdfUploaded bysanthosh
- A review of overall models for maintenance_Sherwin_2000.pdfUploaded byLuz Dary Carvajal Mendoza
- Ashenafi AgizawUploaded byTebsu Tebsuko
- Unit1Uploaded byRR886
- C3 Chapter 3Uploaded byLis Rhea
- UPENN Phil 015 Handout 7Uploaded bymatthewkimura
- Brief Revision Strategies Units 1 & 2.pdfUploaded byCharlie Gribben
- function in C++Uploaded byIbrahim Ali
- Resolucion de SumatoriasUploaded byCristhian González
- G4040_Sol_SetUploaded byAzwar Sutiono
- Bionomial Theorem Paper 02Uploaded byAnju Mohta
- 37668Uploaded byLeonardo Sátiro
- Easy Tri EveUploaded bySrinivas Reddy
- CURSO FISICA AVANZADA 1 Y 2. BASADO ALGEBRA..pdfUploaded byGuillen Zentella Mora
- DiophantusUploaded byfrostyfoley
- Consumer Behavior shubham srivastavaUploaded byShubham Srivastava
- Points Planes LinesUploaded byJohn Alexiou
- On Menelaus' Theorem (Hang Kim Hoo and Koh KM)Uploaded byGary Blake
- Approximating___pi_.pdfUploaded byToby Reichelt