Chap 5

Data Collection and Analysis ()
By C. L. Hsieh Department of Industrial Management Aletheia University
Introduction ()
You can observe a lot just by watching ( ) Data gathering results a conceptual model of how the system operated ( ) Data gathering should avoid ending up with lots of data but with very little useful information ( )
Data Collection and Analysis 2
Questions for Data Gathering

()
What is the best procedure to follow? (?) What types of data should be gathered? (?) What sources should be used ? (?) What types of analyses should be performed on the data? (?) How do you select the right probability distribution representing the data? (?) How should data be documented? (?)
Guidelines for Data Gathering

()
Identify triggering events: ()

identify the causes or conditions that trigger the activitiese.g. the causes of downtime: failure, idle, unavailability of stock ( ) Look for common grouping () the solution is to reduce the data to common behaviors and patterns () Identify general categories ()

()
Focus on key impact factors () Avoid little impact information (e.g. off-hour performance, extremely rare downtime, negligible move time..)
( .) Separate input variables from response variables ( )
Input variables define how the system works (

)
Response variables do not drive model behavior


()
Focus on essence rather than substance Capture cause-effect relationships and ignore meaningless details () Focus on the activity of using resources or the delay of entity flow (system abstraction)
()
Isolate actual activity times. () Exclude any extra time waiting

()
Steps to Gathering Data

()
Determine data requirements ( ) Identify data resources () Collect the data () Make assumptions () Analyze the data () Document and approve the data ( )
Determining Data Requirements

()
Structural data ()
All the objects in the system to be modeled () Describe the layout of the system () Identify the items to be processed (e.g. entities, resources, locations.) ( )

()
Operational Data ()
Explain how the system operates () When, where and how events & activities take place () Consist of the logic information about the system, e.g. routing, schedules, downtime behavior and resource allocation. ( )

()
Numerical Data () Provide quantitative information of the system () Some are easy to get but some are not () e.g. capacities, arrival rates, activity time ( )
Data Collection and Analysis
10

()
Use of a Questionnaire (sample see p.103) (103) Questionnaire help gathering right information () If sample data are not available, it is useful to get at least estimate of the minimum, most likely, and maximum value until more precise data obtained. ( )
Identifying Data Sources

()
Good sources of data () Historical Records () System Documentation (..) Personal Observation (,..) Personal Interviews (,,..) Comparison with similar systems () Vendor claim (..) Design estimation (, ..) Research literature (..)
Collecting the Data

()
Defining Entity Flow ()

Entity flow establishes a skeletal framework for additional data be attached () Follow the entity movement () Use Entity flow diagram (EFD) () Difference between Entity flow diagram & Process Flowchart ()
Collecting the Data

()
Difference between Entity flow diagram & Process Flowchart ()

Process Flow chart logical sequence of activities () () Define what it happens () Entity flow diagram
Show Show
physical movement of entities () Define where it happens ()
14
Developing A Description of Operation ()
Description of Operation ()
Explain how entities are processed & provides the details of the EFD
(EDF) Requirements ()
Time & resource requirements of the activity or operation Where, when & in what quantities entities get routed next
()
()
Time & resource requirements for moving to the next location
()
15
Entity Flow Diagram for Patient Processing ()
16
Process Description for Patient Processing ()
17
Defining Incidental Details ()
Incidental data (downtimes, setups & work priority) are not essential but necessary in order to have a complete & accurate model ( ) Once a basic model constructed, any numerical values (e.g. activity time, arrival rates ..) should be firmed up ( )
Making Assumptions ()
Simulation cant run with incomplete data, so assumptions are required for any unknown future conditions ( ) Assumption must make sense in the overall operation of the model. Seeing absurd behavior may tell us that certain assumptions dont make sense (
)
Simulation cant run with incomplete data, so assumptions are required for any unknown future conditions
()
Assumption must make sense in the overall operation of the model. Seeing absurd behavior may tell us that certain assumptions dont make sense
()
20
Sensitivity analysis assess the influence of an assumption on the validity of a model.

() Best or most optimistic case () Worst or most pessimistic case () Most likely or best guess case ()
21
Statistical Analysis of Numerical Data ()
Data should be analyzed to ascertain their suitability for use. () Data characteristics: ()
Independence (randomness) () Homogeneity (data from the same distribution) () Stationary (distribution of data no change over time) ( )
Statistical Analysis of Numerical Data ()
Stat::Fit in Promodel can automatically analyze & test data in a simulation

(Stat:Fit)
Parameters ()

Mean () the average of the data Median () the value of middle observation Mode () the value with greatest frequency
Descriptive Statistics
()
Parameters ()
Standard Deviation () measure of average deviation Variance () the square of standard deviation Coefficient of variation () standard deviation divided by mean Skewness () measure of symmetry Kurtosis () measure of flatness or peakedness
Suitability of Data for Use

()
Test for Independency (): Data

are independent if the value of one observation is not influenced by the value of another observation
()
Test for Homogeneity ()data

from the same distribution ()
Test for Stationary Data ()

distribution of data does not change over time
()
Test for Independency

()
Scatter Plot ()
A plot of adjacent points in the sequence of observed values plotted against each other A pair of consecutive observations (Xi, Xi+1), i=1,..,n-1 () Xis Positively correlated () positively sloped trend line () Xis Negatively correlated () Negatively sloped trend line ()

()
Autocorrelation Plot ()
If observations in a sample are independent, they are uncorrelated. () Assume that data are taken from stationary process The measure of autocorrelation is called rho () (see, p. 104) () Autocorrelation is between [-1,1]. (-1<= <=1) If is near either extreme 1 or -1, the data is autocorrelated. (1 or -1) If is near 0, the data is little or unrelated (0 )

()
Runs Test ()
A run in a series of observations is the occurrence of an uninterrupted sequence of numbers showing the same trend e.g run up or down ; ( )
28

()
Types of runs tests: if there are too many or too few, the randomness of the series is rejected. ()
Median Test (): measure the number of runs (sequences of numbers) above and below the median Turning Point Test(): measure the number of times the series changes directions
Test for Homogeneity

()
Test for Identically Distributed Data): Test if data set come from the same distribution. () Examples of non-homogenous data set ()
Activity times that take longer or shorter depending on the type of entity being processed () Inter-arrival times vary in length depending on the time of the day or week ()
30

()
Visually inspect the distribution to see if it has more than one mode () (p.118 Fig. 5.9)
() Analysis of variance (ANOVA) for normally distributed data ( ) Two-Sample test, Chi-square multi-sample test, Kruskal-Wallis non-parametric test. ()

()
One type of nonhomogenous data occurs when the distribution changes over time Example of time-changing distribution ( )
Learning Curve () Non-stationary or time variant ( Arrival rate of customers to a service facility ( )
Approaches for Stationary Data ()
Non-stationary data can be detected by plotting subgroups of data that occur within successive time intervals (Fig 5.10)
()
Run Stat::Fit and see what distribution best fits each data set. If the same distribution fits both, the same population is assumed (Stat::Fit
)
33
Distribution Fitting
()
Three ways of Data Representation () Original data record () The data set is usually not large enough Empirical distribution (characterize data) () Continuous frequency distribution (): the percentage of values that fall within given intervals ()
()
Empirical distribution (characterize data) Discrete frequency distribution: the percentage of times a particular value occurs. () Drawbacks ()
Insufficient sample size may create artificial bias () Fail to capture rare extreme values that may exist in the population from which they were sampled
()
()
Theoretical distribution () Fitting theoretical distribution to the data () Random variates (generated from the probability distribution provide the simulated random values. ()
36
()
Theoretical distribution ()
Fitting a theoretical distribution to sample data smoothes artificial irregularities

()
Ensure extreme values are includes

()
Most simulation software provide utilities for fitting distributions to numerical data ( )
Theoretical Distribution
()
Uniform Distribution
() (see p. 124) X~U(a,b) with EX=(a+b)/2, VarX=(b-a)^2/12 Used as a first model that is felt to be randomly varying between a & b which little else is known ()
38
()
Triangular Distribution
() (see p. 124) X~Triang(a,m,b) with EX=(a+m+b)/3, VarX=(a^2+m^2+b^2-am-ab-bm)/18
Used
as a rough model and good approximation to use in the absence of data
()
39
()
Normal Distribution
() (see p. 125) X~N(,2) with EX=, VarX= 2 Symmetry (Bell-shaped curve) () Physical measurements height, length () Certain activity time ()
()
Poisson Distribution () (p. 126)

X~Po() with EX= , VarX= Used as numbers of events that occur in an interval of time when the events are occurring at a constant () e.g. # of items in a batch of random size () e.g. # of items demanded from an inventory ()
()
Exponential Distribution () (p. 126)

X~Exp() with EX= , VarX= 2 Used frequently in initerarrival times of customers to a system that occur at a constant rate or time to failure of a piece of equipment () If an occurrence happens at a rate of Po(), the time between occurrences is Exp (1/ ) (Po()Exp (1/ )) Exp() is memory-less (help for events occurred independently of one another) ()
()
Gamma Distribution ()
X~Gamma(,) with EX = , with VarX=2 Used as time to complete some tasks, e.g. customer service or machine repair. ( ,) Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT ( PERT)
()
Beta Distribution ()
X~Beta(1,2) Used as a rough model in the absence of data () Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT ( PERT)
44
()
Weibull Distribution ()
X~Weibull(,) Exp()=Weibull(1,) Used as time to complete some task or time to failure of a piece of equipements ( ) Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT ( PERT)
Fitting Theoretical Distribution

()
Stat::Fit does a reasonable job of data fitting which ranks distribution. (Stat::Fit (p.127)) Trial and Error Process () Goodness of fit test evaluates each fitted distribution to ascertain the relative goodness of fit.
(
46
Fitting Theoretical Distribution

()
Two common goodness of fit tests: 2 and Kolmogorov-Smirnov tests
( 2 KolmogorovSmirnov )
If little data are available, goodness of fit test is unlikely to reject any candidate distribution Good idea to look at graphical display in a histogram () before making decisions
()
()
Data Absence ()
Most likely or Mean Value ()
Minimum and Maximum Values ()
About 10 customers arrivals per hour Approximately 20 mins to assemble parts Around five machine failure per day 1.5 to 3 mins to inspect items 5 to 10 customer arrivals per hour 4 to 6 minutes to set up a machine
Minimum, Most likely, Maximum Values can be easily set up as a triangular distribution ( )
Summary ()
Data should be collected systematically () Three types of data: structural, operational and numerical () Questionnaire is a good way to request information ()
Summary ()
Numerical data for random variables should be analyzed to test for independency and homogeneity
()
A theoretical distribution should be fit to the data whenever possible

()
Data should be documented, reviewed and approved

()

Chap 5

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 5

Uploaded by

Copyright:

Available Formats

Data Collection and Analysis ()

By C. L. Hsieh Department of Industrial Management Aletheia University

Questions for Data Gathering

Guidelines for Data Gathering

Identify triggering events: ()

Data Collection and Analysis 4

Guidelines for Data Gathering

Input variables define how the system works (

Response variables do not drive model behavior

Guidelines for Data Gathering

Isolate actual activity times. () Exclude any extra time waiting

Steps to Gathering Data

Determining Data Requirements

Data Collection and Analysis 8

Determining Data Requirements

Data Collection and Analysis 9

Determining Data Requirements

Data Collection and Analysis

Determining Data Requirements

Identifying Data Sources

Collecting the Data

Defining Entity Flow ()

Data Collection and Analysis 13

Collecting the Data

Difference between Entity flow diagram & Process Flowchart ()

physical movement of entities () Define where it happens ()

Data Collection and Analysis

Developing A Description of Operation ()

Time & resource requirements for moving to the next location

Data Collection and Analysis

Entity Flow Diagram for Patient Processing ()

Data Collection and Analysis

Process Description for Patient Processing ()

Data Collection and Analysis

Defining Incidental Details ()

Data Collection and Analysis

Sensitivity analysis assess the influence of an assumption on the validity of a model.

Data Collection and Analysis

Statistical Analysis of Numerical Data ()

Data Collection and Analysis 22

Statistical Analysis of Numerical Data ()

Stat::Fit in Promodel can automatically analyze & test data in a simulation

Suitability of Data for Use

Test for Independency (): Data

Test for Homogeneity ()data

Test for Stationary Data ()

Test for Independency

Test for Independency

Test for Independency

Data Collection and Analysis

Test for Independency

Data Collection and Analysis 29

Test for Homogeneity

Data Collection and Analysis

Test for Homogeneity

Test for Homogeneity

Data Collection and Analysis 32

Approaches for Stationary Data ()

Data Collection and Analysis

Data Collection and Analysis 34

Data Collection and Analysis

Fitting a theoretical distribution to sample data smoothes artificial irregularities

Ensure extreme values are includes