1

INSE 6220 -- Week 2
Advanced Statistical Approaches to Quality

• Overview of Course Contents
• Statistical Methods using MATLAB
• Statistical Process Control using MATLAB

Dr. A. Ben Hamza Concordia University

2

Contents

Probability

Distributions Example 1:
Descriptive Statistics Probability of success in test

Estimation theory
Example 2:
Hypothesis testing Probability of success in test 2
Linear Model given that test 1<5.5?

Design of experiments

3

Contents

Probability
Distributions 0.35

mu=5.72 sigma=1.55
0.3
Descriptive Statistics
0.25
Estimation theory
0.2

Dens ity
Hypothesis testing
0.15

Linear Model
0.1

Design of experiments 0.05

0
0 1 2 3 4 5 6 7 8 9 10
Score

4

Contents

Probability
25

Distributions
Test 1 Test 2
20 5. 6 6. 1
Descriptive Statistics 5. 1 7. 5
6. 8 6. 6
3. 4 3. 1
Estimation theory 15 6. 8 8. 4

Frequency
4. 6 6. 4
5. 6 4. 9
Hypothesis testing 6. 3 10. 0
10 5. 0 4. 0
Linear Model 7. 6
5. 6
8. 2
5. 8

5
Design of experiments

0
0 1 2 3 4 5 6 7 8 9 10
Score

5 Contents Descriptive Statistics Probability Distributions Example: What is  and ? Estimation theory Hypothesis testing •Bias •Robustness Linear Model •Confidence Interval Design of experiments .

6 Contents Descriptive Statistics Probability Example 1: Distributions When you have less than 4. you will not pass Estimation theory Hypothesis testing Example 2: Linear Model Average Test1=Average Test 2 Design of experiments . 5 on test 1.

7 Contents Descriptive Statistics 10 Probability 9 8 Distributions 7 Estimation theory Score Test 2 6 5 Hypothesis testing 4 Linear Model 3 2 Design of experiments 1 0 1 2 3 4 5 6 7 8 9 10 Score Test 1 .

. To improve prediction of model Design of experiments .. 8 Contents Descriptive Statistics Probability Distributions Estimation theory … To improve estimate Hypothesis testing Linear Model .

or responses. and 68% were not wearing a seatbelt. • “People who eat three daily servings of whole grains have been shown to reduce their risk of stroke by 37%. measurements.S. using information collected from subsets of the individuals or items.  Make reliable forecasts about a computer software company  Predict the number of software defects and Improve software processes What is Data? Data: Consist of information coming from observations. counts.” • “70% of the 1500 U.” . spinal cord injuries to minors result from vehicle accidents. 9 Why Study Statistics? Decision Makers Use Statistics To:  Present and describe data and information properly  Draw conclusions about large groups of individuals or items.

especially Information: Knowledge numerical facts. information. 10 What is Statistics? “Statistics is a way to get information from data” Statistics Data Information Data: Facts. collected communicated concerning together for reference or some particular fact. . Statistics is a tool for creating new understanding from a set of numbers.

Class average. etc. The professor provides last term’s final exam marks to the student. 65 Proportion of class receiving A’s 78 Most frequent mark.g. : . New information about the statistics class. What can be discerned from this list of numbers? Statistics Data Information List of last term’s marks. 11 Example:: Stats Anxiety… A Computer Science student is anxious about her/his statistics course. 57 Marks distribution. since s/he heard the course is difficult. 95 89 70 E.

Which data are qualitative data and which are quantitative data? . 12 Example: Classifying Data by Type The base prices of several vehicles are shown in the table.

13 Solution: Classifying Data by Type Qualitative Data (Names of Quantitative Data (Base prices vehicle models are non. of vehicles models are numerical entries) numerical entries) .

Sample A subset of the population. or counts that are of interest. . measurements. 14 Data Sets Population The collection of all outcomes. responses.

Tables. e.g. 15 Branches of Statistics Descriptive Statistics Inferential Statistics Involves organizing. and displaying to draw conclusions about a data. population. averages . Involves using sample data summarizing. charts.

g. Sample mean = X i n . 16 Descriptive Statistics • Collect data  e..g. Survey • Present data  e. Tables and graphs • Characterize data  e.g...

g.. Estimate the population mean weight using the sample mean weight • Hypothesis testing  e. Test the claim that the population mean weight is 120 pounds Drawing conclusions about a large group of individuals based on a subset of the large group. 17 Inferential Statistics • Estimation  e.. .g.

DATA Data are the different values associated with a variable. SAMPLE A sample is the portion of a population selected for analysis. 18 Basic Vocabulary of Statistics VARIABLE A variable is a characteristic of an item or individual. POPULATION A population consists of all the items or individuals about which you want to draw a conclusion. PARAMETER A parameter is a numerical measure that describes a characteristic of a population. STATISTIC A statistic is a numerical measure that describes a characteristic of a sample. .

y order 0 (y.05:10*pi.*sin(x).x) simply rotates the plot 90 degrees! Manually inserted text.5 NOTE #2: line(x. >> y=exp(-0.y) is similar to plot(x. 19 Basic 2D Plotting in MATLAB • The simplest kind of plot is a cartesian plot of (x. -0.y) but does not have additional options -1 0 5 10 15 20 25 30 35 X axis description . >> plot(x..y) pairs defined by symbols or connected with lines >> x=0:0.y) >> xlabel('X axis description') >> ylabel('Y axis description') Title for plot goes here 1 >> title('Title for plot goes here') Legend for graph >> legend('Legend for graph') >> grid on 0.5 Y axis description NOTE #1: Reversing the x..1*x).

(Note also the command stairs(x)) bar(x.9:0.^2.9.y) . y = x. 20 Some basic plot commands you may need: Kinds of plots: bar(x) creates a bar graph of the vector x.y) creates a bar-graph of the elements of the vector y. bar(x.2:2. locating the bars according to the vector elements of 'x' >>x = -2.

length) % CYLINDER computes volume of circular cylinder % given radius and length % Use: Help comments % vol=cylinder(radius. length) % volume=pi.*length.*radius^2. 21 m-function Structure Function definition Arguments Returned variable function volume=cylinder(radius. Statements (no end required) NOTE: function names are NOT case sensitive in Windows .

22 Statistics with MATLAB Online help for Statistics Toolbox is available from the MATLAB prompt (>> a double arrow). DISTTOOL creates interactive plots of probability distributions. This is a demo that displays a plot of the cumulative distribution function (cdf) or probability distribution function (pdf) of the distributions in the Statistics Toolbox. both generally (listing of all available commands): >> help stats [a long list of help topics follows] and for specific commands: >> help distool [a help message on the disttool function follows]. . >> help disttool DISTTOOL Demonstration of many probability distributions.

23 Plotting Probability Distributions >> disttool .

Weibull density.Binomial density. • mvnpdf .Lognormal density.T density. • lognpdf . • wblpdf . • geopdf . • chi2pdf .Poisson density.Chi square density.Hypergeometric density. • pdf . 24 Probability density functions (pdf) • binopdf .F density. .Density function for a specified distribution. • poisspdf .Gamma density. • hygepdf . • unifpdf .Uniform density. • normpdf .Exponential density.Geometric density. • exppdf .Normal (Gaussian) density. • tpdf . • gampdf . • fpdf .Multivariate normal density.

the pdf assigns a probability to each outcome. % Visualize the probability distribution .2.  . 1. 25 Example: Binomal density function For discrete distributions. 2. % Probability of success for each trial n = 10. In this context. For example. % Probability mass vector bar(x. x  0. the pdf is often called a probability mass function (pmf). % Outcomes fx = pdf(‘bino’. n  x assigns probability to the event of k successes in n trials of a Bernoulli process (such as coin flipping) with probability p of success at each trial.x.p). the discrete binomial pdf n f ( x)  P( X  x)    p x (1  p) n  x . % Number of trials x = 0:n.fx) .n. p = 0.

Covariance. • std . • median .5455 >> std(X) ans = 17.5455 .Linear correlation coefficient with confidence intervals.Standard deviation (in MATLAB toolbox). • cov .Range. • var . • range .Variance (in MATLAB toolbox).Sample average (in MATLAB toolbox).50th percentile of a sample. 26 Descriptive Statistics • corrcoef . • mean . Example: >> X = [ 1 2 3 5 6 7 23 45 33 46 22] X= 1 2 3 5 6 7 23 45 33 46 22 >> mean(X) ans = 17.

0 (row-wise mean) Median: median(A) = 5 median(B) = 3.0 (row-wise median) .2) = 2.2) = 2.0 7.0 4.0 (column-wise mean) mean(B.5 (column-wise median) median(B.5 4.0 4.5 6. 27 Mean and Median Examples: A = [ 0 2 5 7 20] B = [1 2 3 336 468 4 7 7].0 6.0 6.0 3.8 mean(B) = 3.5 6. Mean: mean(A) = 6.0 6.

28 Standard Deviation and Variance • Standard deviation is calculated using the std() function • std(X) : Calcuate the standard deviation of vector x • If x is a matrix. var() will return the standard deviation of each column . std() will return the standard deviation of each column • Variance (defined as the square of the standard deviation) is calculated using the var() function • var(X) : Calcuate the variance of vector x • If x is a matrix.

9). 29 Descriptive Statistics Example: The function “displaytable. % plots the covariance matrix of X >> displaytable(corrcoef(X)). %generates 9x9 random matrix >> displaytable(cov(X)).m” is posted on the course website >> X = rand(9. % plots the correlation matrix of X .

5 1 1.7051 0.5 2 Variable 2 .var2]) Variable 1 0 r = 1.5 -1 -0.5 0 0. 30 Data Correlations 3 2 % Compute sample correlation 1 r = corrcoef([var1.0000 0.7051 1.0000 -1 -2 -3 -2 -1.

Histogram (in MATLAB toolbox).Parallel coordinates plot for multivariate data. • cdfplot . • probplot .Interactive contour plot of a function.Probability plot.Andrews plot for multivariate data.Plot stars or Chernoff faces for multivariate data. • wblplot .Biplot of variable/factor coefficients and scores.Matrix of scatter plots grouped by a common variable. • fsurfht .Normal probability plot. • glyphplot . • normplot .Weibull probability plot.Interactive contour plot of a data grid. • gplotmatrix . • gscatter . .Scatter plot of two variables grouped by a third. • parallelcoords . 31 Statistical Plotting • andrewsplot . • hist . • boxplot .Plot of empirical cumulative distribution function (cdf). • biplot .Boxplots of a data matrix (one per column). • hist3 .Three-dimensional histogram of bivariate data. • surfht .

defects).'cracks'. median. Boxplot(X) produces a box and whisker plot for each column of the matrix X. and upper quartile values. >> pareto(quantity. The whiskers are lines extending from each end of the box to show the extent of the rest of the data. . 32 Statistical Plotting using MATLAB Create a Pareto chart from data measuring the number of manufactured parts rejected for various types of defects. Outliers are data with values beyond the ends of the whiskers >> load parts >> boxplot(runout).'dents'}. >> quantity = [5 3 19 25]. The box has lines at the lower quartile. >> defects = {'pits'.'holes'.

'Waleed'.'Arash'. >> coders = {'Travis'. 33 Statistical Plotting using MATLAB Pareto charts display the values in the vector Y as bars drawn in descending order. Only the first 95% of the cumulative distribution is displayed. >> pareto(codelines.'Emad'. Values in Y must be nonnegative and not include NaNs.'Farshad'. coders) >> title('Lines of Code by Student') . Examine the cumulative productivity of a group of programmers to see how normal its distribution is: >> codelines = [200 120 555 608 1024 101 57 687].‘Maggie'}.'Khaled'.'Mohamed'.

1).'.2). 34 Multivariate Statistical Plotting using MATLAB Scatter plots in 2D and 3D >> load carsmall >> X = [Acceleration Displacement Horsepower MPG Weight].3). >> scatter3(X(:.X(:.2)]). >> scatter(X(:.1).X(:. 3D histogram >> hist3([X(:.'.3).X(:.').2).X(:. .').

1.02]. and red for 8.Weight. There is also a handful of 5 cylinder cars. 'Weight'. 'Horsepowe r'}.12. there may be important patterns in higher dimensions.5). varNames. and those are not easy to recognize in this plot. The points in each scatterplot are color-coded by the number of cylinders: blue for 4 cylinders. 'Rotat ion'.25 .1.Horsepower]. However.8.[]. This array of plots makes it easy to pick out patterns in the relationships between pairs of variables.[].[].5).['c' 'b' 'm' 'g' 'r']. .false). repmat(-. 35 Multivariate Statistical Plotting using MATLAB >> load carbig >> X = [MPG.Acceleration. 'FontSize'. [.66 .90).08 . >> varNames = {'MPG'.83]. 'Displacement'. >> gplotmatrix(X.1.Displacement. text([.Cylinders.86 . varNames.41 .8). 'Acceleration'. text(repmat(-. 'FontSize'.24 . and rotary- engined cars are listed as having 3 cylinders.43 . green for 6.62 .

0.02 0.05 0.01 -1.98 0. indicating that you can model the sample by a normal distribution .5 1 1.5 Data The plot is linear. • Generate a normal sample and a normal probability plot of the data.1).95 0. 36 Statistical Plotting normplot: Normal probability plot for graphical normality test.75 Probability 0.5 0 0.1. >> x = normrnd(0.50 0.10 0.5 -1 -0.50.25 0. Normal Probability Plot >> h = normplot(x).90 0.99 0.

If the two dimensions are independent then they tend to cluster as a circular cloud of points. . In two dimensions each variable is itself a normal distribution. This can be extended to any number multiple dimensions. 37 Multivariate Gaussian distribtuions A multivariate Gaussian (or normal) distribution is a n-dimensional extension of a univariate Gaussian In a single dimension a normal distribution is the familiar bell-shaped curve. if they are correlated then the form an ellipse.

SPC is used in programs that define.Capability plot. • ewmaplot .Plot normal density between specification limits. measure. . Combined with methods from the Design of Experiments. and control development and production processes.Histogram with superimposed normal density. • capaplot . • normspec .Exponentially weighted moving average plot.Xbar chart for monitoring the mean. • xbarplot . 38 Statistical Process Control (SPC) Statistical process control (SPC) refers to a number of different methods for monitoring and assessing the quality of manufactured goods. analyze.Capability indices. • schart . • capable . • histfit . improve. These programs are often implemented using "Design for Six Sigma" methodologies.S chart for monitoring variability.

What percentage of boxes will have less than 10 ounces?.11.1. Variability in the process of filling each box with flakes causes a 1. The average box of cereal has 11.sigma) plots the normal density between a lower and upper limit defined by the two elements of the vector specs.25 ounce standard deviation in the true weight of the cereal in each box. 39 Plot normal density between specification limits normspec(specs.mu.3 0.05 0 6 8 10 12 14 16 18 20 Critical Value .2 Density 0.1 0.25) 0. where mu and sigma are the parameters of the plotted normal distribution.25 0.5.5 ounces of flakes. Probability Between Limits is 0. • Example: Suppose a cereal manufacturer produces 10 ounce boxes of corn flakes.35 >> normspec([10 20].88493 0.15 0.

 Xbar or mean  Standard deviation  Range  Exponentially weighted moving average  Individual observation  Moving range of individual observations  Moving average of individual observations  Proportion defective  Number of defectives  Defects per unit  Count of defects . The process can then be compared with its specifications—to see if it is in control or out of control. The control chart is used to discover the variation. The measurements are plotted together with user-defined specification limits and process-defined control limits. 40 Control Charts • A control chart displays measurements of process samples over time. systematic change in the process. so that the process can be adjusted to reduce it. • The chart is just a monitoring tool. Control activity might occur if the chart indicates an undesirable.