charts

Attribution Non-Commercial (BY-NC)

201 views

charts

Attribution Non-Commercial (BY-NC)

- X-Test norm
- BA:Statistics for Management
- Chapter 2 171
- 9740-H2Maths-2010-JC-Prelims-With-Ans
- STAT 3360 Homework Chapter 9
- Tutorial Dips
- Dnvgl Rp c205
- Measures of Central Tendency
- Growth
- Earnings and Education in Latin America- George Psacharopoulus
- Eurolab Handbook Iso Iec 17025 2017
- Analyzing Quantitative Data_510
- CHAPTER 7 Not Mine
- Introduction to Probability
- Us Factor Reference Book
- An Analysis of Dominant Working Styles in Different Professions in Macedonia (Pavlovska M.)
- 9709_w06_qp_6
- Example Distribution
- BS.doc
- Statistik

You are on page 1of 45

FOR ENGINEERING AND

SCIENCE

PROBABILITY

Informally, probable is one of several words applied to

uncertain events or knowledge, being closely related in

meaning to likely, risky, hazardous, and doubtful.

Chance, odds, and bet are other words expressing similar

notions.

probable.

and if the probability is 0 then that event will never occur.

STATISTICS

Statistics is a mathematical science pertaining to collection,

analysis, interpretation and presentation of data. It is

applicable to a wide variety of academic disciplines from the

physical and social sciences to the humanities, as well as to

business, government, medicine and industry.

summarize or describe the data; this use is called

descriptive statistics.

genomics, computational biology, survival analysis, statistical

genetics, portfolio optimization and management, financial

risk management, credit rating/scoring,

Statistical Packages

R

http://cran.r-project.org/bin/windows/base/

http://stat.ethz.ch

/R-manual/R-patched/doc/html/

http://www.omegahat.org/REventLoop/man.pdf

http://www.r-project.org/other-docs.html

SAS

http://v8doc.sas.com/sashtml/

Descriptive Statistics

graphically, to describe the sample.

and standard deviation.

graphs.

Pareto Charts

A Pareto chart is a bar graph for qualitative data, with the

bars arranged in order according to frequencies.

How To Construct A Pareto Chart

the data into groups (also called segments, bins or categories).

associated with processing credit card applications, you could

group the data into the following categories:

No signature

Residential address not valid

Non-legible handwriting

Already a customer

Other

(the number of counts for each category), the right-side vertical

axis of the Pareto chart is the cumulative percentage, and the

horizontal axis of the Pareto chart is labeled with the group names

of your response variables

What Questions The Pareto Chart

Answers

80/20 Rule)?

improvements?

Pareto Chart

Accidental Deaths

45000

40000

35000

30000

25000

20000

15000

10000

5000

Poison

Vehicle Ingestion of

s Food/Object

Stem-and-Leaf Plots

distribution of the data, yet not lose the actual data points.

example.

temperature for a city over a day span

78 76 82 75 85 82 78 74 83 90

70 76 85 92 87 67 65 68 73 74

83 88 86 85 92 90 82 75 69 80

85 77 86 85 90 85 80 70 65 60

Stem-and-Leaf Plots

We can see that the data ranges from about 60 to about 95.

60 65 65 67 68 69 70 70 73 74

74 75 75 76 76 77 78 78 80 80

82 82 82 83 83 85 85 85 85 85

85 86 86 87 88 90 90 90 92 92

Stem Leaves

6 055789

7 003445566788

8 00222335555556678

9 00022

Stem-and-Leaf Plots

Stem Leaves

6 055789

7 003445566788

8 00222335555556678

9 00022

If you look at the page sideways you can see the distribution

of the data. The same rule that says you should 5-20 classes

of data in a histogram applies to a stem and leaf diagram. We

could clearly expand the stem and leaf diagram to include

more rows and could also be condensed to include fewer rows.

HISTOGRAM

scale for values of the data being represented and a vertical

scale for frequencies, and bars representing the frequency of

each class of values.

horizontal scale as the histogram - but the vertical scale will

be marked with relative frequencies.

boundary at the left and upper class boundary at the right.

HISTOGRAM

information about the shape of the data distributions and are

not limited by the size of the data set.

from a process and then display it graphically to view how the

distribution of the data, centers itself around the mean, or

main specification. From the data, the histogram will

graphically show:

The spread of the data.

Any data skewness .

The presence of outliers (product outside the specification

range).

The presence of multiple modes (or peaks) within the data

HISTROGRAM

BOXPLOT

In 1977, John Tukey published an efficient method for

displaying a five-number data summary. The graph is called a

boxplot and summarizes the following statistical measures:

-median

-upper and lower quartile

-minimum and maximum value

of the shape of a distribution (symmetry, skewness), but they

are not good tools for making comparisons among datasets.

Boxplots are ideal for making comparisons.

John Wilder Tukey

John Wilder Tukey (June 16,

1915 - July 26, 2000) was a

statistician born in

New Bedford, Massachusetts.

Tukey obtained a A.B. in 1936

and Sc.M. in 1937, both in

Chemistry, from

Brown University, before

moving to Princeton University

where he received his Ph.D. in

mathematics. During

World War II, Tukey worked at

the Fire Control Research Office

and collaborated with Samuel

Wilks and William Cochran.

After the war, he returned to

Princeton, dividing his time

between the university and

AT&T Bell Laboratories.

Lottery payoffs for winning numbers for three time periods

(May 1975-March 1976, November 1976-September 1977,

and December 1980-September 1981).

Boxplots

center line, and the first and third quartiles are the

edges of the red area, which is known as the inter-

quartile range (IQR).

The extreme values (within 1.5 times the inter-quartile

range from the upper or lower quartile) are the ends of

the lines extending from the IQR. Points at a greater

distance from the median than 1.5 times the IQR are

plotted individually as asterisks. These points represent

potential outliers.

median values. The IQR is decreasing from one time

period to the next, indicating reduced variability of

payoffs in the second and third periods. In addition, the

extreme values are closer to the median in the later

time periods.

Dot Plot

In a dot plot, each data entry is plotted, using a point, above

a horizontal axis.

statistics class.

Ages of Students

18 20 21 27 29 20

19 30 32 19 34 19

24 29 18 37 38 22

30 39 32 44 33 46

54 49 18 51 21 21

Dot Plot

Ages of

Students

1 1 2 2 2 3 3 3 3 4 4 4 5 5 5

5 8 1 4 7 0 3 6 9 2 5 8 1 4 7

the values lie between 18 and 32.

Pie Chart

A pie chart is a circle that is divided into sectors that represent

categories. The area of each sector is proportional to the

frequency of each category.

Type Frequency

Motor Vehicle 43,500

Falls 12,200

Poison 6,400

Drowning 4,600

Fire 4,200

Ingestion of Food/Object 2,900

Firearms 1,400

(Source: US Dept. of

Transportation)

Pie Chart

To create a pie chart for the data, find the relative frequency

(percent) of each category

Relative

Type Frequency

Frequency

Motor Vehicle 43,500 0.578

Falls 12,200 0.162

Poison 6,400 0.085

Drowning 4,600 0.061

Fire 4,200 0.056

Ingestion of Food/Object 2,900 0.039

Firearms 1,400 0.019

n = 75,200

Pie Chart

Next, find the central angle. To find the central angle, multiply the

relative frequency by 360°.

Relative

Type Frequency Angle

Frequency

Motor Vehicle 43,500 0.578 208.2°

Falls 12,200 0.162 58.4°

Poison 6,400 0.085 30.6°

Drowning 4,600 0.061 22.0°

Fire 4,200 0.056 20.1°

Ingestion of Food/Object 2,900 0.039 13.9°

Firearms 1,400 0.019 6.7°

Pie Chart

Ingestion Firearms

3.9% 1.9%

Fire

5.6%

Drowning

6.1%

Poison

8.5% Motor

vehicles

Falls 57.8%

16.2%

Times Series Chart

A data set that is composed of quantitative data entries taken at

regular intervals over a period of time is a time series. A time

series chart is used to graph a time series.

Example:

Month Minute

The following table lists the

January s

236

number of minutes Robert used

on his cell phone for the last six

February 242

months.

March 188

Construct a time series chart

April 175

for the number of minutes May 199

used.

June 135

Times Series Chart

Robert’s Cell Phone

Usage

250

200

Minutes

150

100

50

0

Jan Feb Mar Apr May June

Month

Quartiles and Percentiles

percent of the

total frequency scored at or below that measure.

equal

parts: 25%, 50%, 75%, 100%.

array, the lower quartile is the middle value of the half of

the data below the median, and the upper quartile is the

middle value of the half of the data above the median.

Quartiles and Percentiles

i x[i]

1 102

2 104

3 105 ---- the first quartile, Q1 = 105

4 107

5 108

6 109 ---- the second quartile, Q2 or median = 109

7 110

8 112

9 115 ---- the third quartile, Q3 = 115

10 115

11 118

Quartiles and Percentiles

smallest non-outlier observation = 5 (left "whisker")

lower (first) quartile (Q1, x.25) = 7

median (second quartile) (Med, x.5) = 8.5

upper (third) quartile (Q3, x.75) = 9

largest non-outlier observation = 10

interquartile range, IQR = Q3 − Q1 = 2

the value 3.5 is a "mild" outlier, between 1.5*(IQR) and

3*(IQR) below Q1

the value 0.5 is an "extreme" outlier, more than 3*(IQR) below

Q1

the data is skewed to the left (negatively skewed)

Quartiles and Percentiles

+------+-+

o * |---------| + | | -- |

+-----+-+

+---+---+---+---+---+---+---+---+---+---+ number line

0 1 2 3 4 5 6 7 8 9 10

Measures of the center

methods of measuring characteristics of data.

This is a value at the center or middle of a data set.

mean, median, mode and midrange. Here is some data

10 11 12 12 15 17 21 22 23 27

Measures of the center

The mean (or arithmetic mean) is the average of these data

points. To calculate the mean you simply add the data points

and divide by the number of data points. The mean is denoted

by x . In our example above:

Sum of data points: 10+11+12+12+15+17+21+22+23+27 =

170

Number of data points = 10

Average = 170/10 = 17

The median is the middle value when the scores are arranged

in order of increasing (or decreasing) magnitude To calculate

the median follow this rule:

If the number of scores is odd, the median is the number that

is located in the exact middle of the list If the number of

scores is even, the median is found by computing the mean of

the two middle numbers

NOTE: TO APPLY THE RULES ABOVE THE LISTS MUST BE

SORTED!

Measures of the center

the median is (15+17)/2 = 16.

The mode of the data set is the score that occurs most

frequently. When two scores occur with the same greatest

frequency, each one is a mode and the data is bimodal. If

more than two scores occur with the same greatest frequency,

each is a mode and the data is multimodal. When all scores

occur just once there is no mode. The mode is denoted by M

The value 12 in the above dataset occurs most frequently and

is therefore the mode.

example above this is (10+ 27)/2 = 37/2 = 18.5

SOME MATHEMATICAL

NOTATION

Mathematicians like to have symbols to represent

complicated calculations. Here are some we will use

throughout the course:

∑ denotes the summation of a group of values (this

means add them all up)

x denotes the variable, usually used to represent the

individual data values

n represents the number of values in a sample

N represents the number of values in a population

_

x=

∑x

n is the mean of a sample

µ=∑

x

N is the mean of a population

Measures of Variation

Measures of central tendency give us measures of where the

middle of a set of data occurs, but this is not enough to

characterize a set of data.

50 60 70 80 90 And 69 69 70 71 71

Both these data sets have a mean of 70. Yet the first data set

is more widely dispersed than the second data set. So a

measure of variation is clearly needed.

a 20 oz steak at a restaurant. We will use this throughout this

section

17 20 21 18 20 20 20 18 19 19

20 19 22 20 18 20 18 19 20 19

Measures of Variation

The range is the difference between the highest value and

the lowest value in a dataset.

highest value. In the example above the range is (22-17)=5

consideration every value. Consider each of the following data

sets:

1 10 10 10 10

And

1 2 5 8 10

Both have a range of 9, yet the first data set is clearly not as

dispersed as the second.

Measures of Variation

A more accurate measure of variation can be given by

the standard deviation of the data.

measure of variation of scores about the mean. It is

calculated by

n _

∑ i

( x −x ) 2

s= i =1

n −1

Measures of Variation

follows:

Subtract the mean from each individual score

Square each of the values in step 2

Add up all the squares obtained in step 3

Divide the total in step 4 by n-1

Find the square root of step 5.

Measures of Variation

The sample variance is the standard deviation

squared. To calculate all you do all the steps for the

standard deviation except taking the final square

root. Here is the formula:

n _

∑ i

( x −x ) 2

s2 = i =1

n −1

Interpretation of standard

deviation

A small standard deviation means the data is close together,

a large deviation means the data is wide spread

The range rule of thumb states that for typical data sets,

the range of the data is about 4 standard deviations wide so

the standard deviation is about the range divided by 4. This

is a very rough estimate

The 68-95-99 rule states that about 68% of all scores fall

within one standard deviation of the mean, 95% of all scores

fall within about 2 standard deviations of the mean and

99.7% of all scores fall within 3 standard deviations from the

mean.

The above rule tells us that data more than 2 standard

deviations from the mean is unusual. While data within 2

standard deviations is normal

fall within 2 standard deviations from the mean and at least

89% fall within at least 3 standard deviations from the mean.

This works for ANY distribution (not just bell shaped)

Z-Scores

How do we compare two different sets of data.

kinds of automobiles - say light trucks and compact cars.

Assume the mean miles per gallon for the light trucks is 23.6

miles per gallon with a standard deviation of 3.6 miles per

gallon and if the mean miles per gallon for compact cars is

28.7 miles per gallon with a standard deviation of 5.7 miles

per gallon.

gallon rating of 27.5 and a compact car with a miles per gallon

rating 31.2.

some way to standardize these scores - this way we would not

have to know what scale was being used. The way to get a

standard score is the z score.

Z-Scores

The standard score or z-score, is the number of standard

deviations that a given value x is above or below the

mean. You calculate the z score using:

_

x−x

z=

s

z=(28.7-23.6)/3.6=1.42 standard deviations above the mean.

z=(31.2-27.5)/5.7=0.65 standard deviations above the mean

Z-Scores

hours per week that college freshman spend studying has a

mean of 7.06 hours with a standard deviation of 2.32 hours.

Suppose Sally Simplestudent spends 2 hours per week

studying. Does Sally spend an unusually small amount of time

studying?

more than 2 standard deviations away from the mean, so her

low amount of study time is unusual.

Z-Scores

Intuition: a measure of how far an individual score is

from the mean compared to the average distance of

scores in the entire distribution from the mean.

Intuition: you can think of z-Scores as simply indicating

the number of standard deviations a certain data point is

away from the mean.

symmetric distribution of data, the interval (x-s, x+s)

contains approximately 68% of the data points, the

interval (x-2s, x+2s) contains approximately 95% of the

data points, and the interval (x-3s, x+3s) usually

contains all the data points.

- X-Test normUploaded byMarco Ripà
- BA:Statistics for ManagementUploaded byAnkur Mittal
- Chapter 2 171Uploaded byWee Han Chiang
- 9740-H2Maths-2010-JC-Prelims-With-AnsUploaded byAmos Yap
- STAT 3360 Homework Chapter 9Uploaded byxxambertaimexx
- Tutorial DipsUploaded byMarusia Maddu Cruz
- Dnvgl Rp c205Uploaded byAshish Gupta
- Measures of Central TendencyUploaded byEzekiel D. Rodriguez
- Earnings and Education in Latin America- George PsacharopoulusUploaded byNicoleLeines
- GrowthUploaded bykriss Wong
- Eurolab Handbook Iso Iec 17025 2017Uploaded byMohammad Rehan
- Analyzing Quantitative Data_510Uploaded byNarutoLLN
- CHAPTER 7 Not MineUploaded byMark Cliffton Badlon
- Introduction to ProbabilityUploaded bythrphys1940
- Us Factor Reference BookUploaded bywsl1133
- An Analysis of Dominant Working Styles in Different Professions in Macedonia (Pavlovska M.)Uploaded bynomiczek
- 9709_w06_qp_6Uploaded bySarah Phillips
- Example DistributionUploaded byArtrinda Anggita
- BS.docUploaded bymbapriti
- StatistikUploaded byAdhie Pra Ryuuku
- 1-s2.0-S222541101630181X-mainUploaded byNuna Siska
- 316 Example Final 2 SolutionUploaded byDarran Cairns
- p Ptt Ttttt TttttUploaded byJasmine Singh
- Qam Tutorial Session 1.pptxUploaded byBharat Chandra Das
- Psyc 60 Central Tendency and Variability_2.pptUploaded byYosef Imanuel Yulius Opi
- 3. Introduction Biostat.pptUploaded bylailykurnia
- GridDataReport-dompuUploaded byPrawiroYudhio Putro Indonesia Negoro
- 59A0682 Final ReportUploaded byanon_879621966
- ObermeyerUploaded bySuyash Bajpai
- GE 105 Lecture 1 (LEAST SQUARES ADJUSTMENT) by: Broddett Bello AbatayoUploaded byBroddett Bello Abatayo

- Chapter 2--Sections 2.5 - 2Uploaded byapi-3729261
- How to Install R SoftwareUploaded byapi-3729261
- R Software and CommandsUploaded byapi-3729261
- Project2 SolutionsUploaded byapi-3729261
- Final Test on Chapters 3, 4, 5, 6, 7, And 8Uploaded byapi-3729261
- Project 1Uploaded byapi-3729261
- Project 2Uploaded byapi-3729261
- Test 1_ Winter 2007Uploaded byapi-3729261
- Chapter 8Uploaded byapi-3729261
- Solutions for Chapters 4, 5, And 6Uploaded byapi-3729261
- Solutions for Chapter 4 Part 1Uploaded byapi-3729261
- Chapter 7Uploaded byapi-3729261
- Chapter 6Uploaded byapi-3729261
- Solutions for Chapters 3 and 4 Part 2Uploaded byapi-3729261
- Solutions for Chapters 3 and 4 Part1Uploaded byapi-3729261
- CHAPTER 4 Normal Distribution Z-ScoresUploaded byapi-3729261
- Chapter4- Normal Distribution Part 1Uploaded byapi-3729261
- Chapter 4[1]Uploaded byapi-3729261
- Chapter 3Uploaded byapi-3729261
- CHAPTER 2--Sections 2.1 and 2Uploaded byapi-3729261
- Chapter 2 Solutions Part 1Uploaded byapi-3729261
- Chapter 2 - Sections 2.3 and 2Uploaded byapi-3729261
- Chapter 5Uploaded byapi-3729261

- Installation of Over Head CraneUploaded bymansih457
- SUCRALOSE2Uploaded byKristine Dwi Puspitasari
- Tb-1033_1987-03 Fp Pipe Reducer Correction Factor for CvUploaded byCarlos Gutierrez
- Tall Buildings Example (Go)Uploaded byAlban
- ModelingInflowPerformanceRelationshipsIPR-Uploaded byhasan
- CHED LIST OF MARITIME HIGHER EDUCATION INSTITUTIONS (MHEIs) offering BSMT and BSMarE programs as of March 2013Uploaded byjamjamreeves
- (Measure Theory) D. H Fremlin-Measure Theory-Torres Fremlin (2001)Uploaded bymnemonia
- PEG 2013 Abstract VolumeUploaded bymanudem
- Example for ICT Integration PlanUploaded bydrdfontejon
- EdeUploaded bykxalxo7637
- Manual - Partes - CabezoteUploaded byTesla Ec
- Chapter 2Uploaded byzetseat
- linear motion 11-1Uploaded byapi-225960274
- Intellect Image GuideUploaded byTridentGumYum
- t235281b_pt_hobUploaded byfizznit
- Accounting Department Dneveloping Visual Aids.docUploaded byBenjamin Jackson
- FoundationsUploaded byTrayo Ayangbayi
- Creativity is in the Eye of the BeholderUploaded bymark brown
- msp430 ulpmuc datasheetUploaded byVenkataSampath
- academic success group lessonsUploaded byapi-341978441
- Artificial IntelligenceUploaded byzan_race_football
- IJPEM-1261129-1141Dec2011EnergyHarvesting.PDFUploaded byChimzy Iwumune
- HVAC Design ProblemUploaded byAgnel Stanley
- Morningstar Prostar Ps15m Ps30m Documentacion Bench Test EnUploaded byMarcos Cochachi Poma
- 1Uploaded byNiren Patel
- BCG PDT&DRS Team_RV Campus Visit Details_2018Uploaded byShashank S Kudlur
- Werewolf Card SetUploaded byKharisma Dwi Veteriananta
- Dragon Age 2 Dual Wield RogueUploaded byserpent83
- Ansys Meshing TutorialUploaded byUppala Krishna Chaitanya
- Self Assembled Monolayers -A ReviewUploaded byMeghna Sheoran