You are on page 1of 50

Data Collection and Analysis ()

By C. L. Hsieh Department of Industrial Management Aletheia University

Introduction ()

You can observe a lot just by watching ( ) Data gathering results a conceptual model of how the system operated ( ) Data gathering should avoid ending up with lots of data but with very little useful information ( )
Data Collection and Analysis 2

Questions for Data Gathering


()

What is the best procedure to follow? (?) What types of data should be gathered? (?) What sources should be used ? (?) What types of analyses should be performed on the data? (?) How do you select the right probability distribution representing the data? (?) How should data be documented? (?)
Data Collection and Analysis 3

Guidelines for Data Gathering


()

Identify triggering events: ()


identify the causes or conditions that trigger the activitiese.g. the causes of downtime: failure, idle, unavailability of stock ( ) Look for common grouping () the solution is to reduce the data to common behaviors and patterns () Identify general categories ()

Data Collection and Analysis 4

Guidelines for Data Gathering


()

Focus on key impact factors () Avoid little impact information (e.g. off-hour performance, extremely rare downtime, negligible move time..)
( .) Separate input variables from response variables ( )

Input variables define how the system works (


)

Response variables do not drive model behavior


Data Collection and Analysis 5

Guidelines for Data Gathering


()

Focus on essence rather than substance Capture cause-effect relationships and ignore meaningless details () Focus on the activity of using resources or the delay of entity flow (system abstraction)
()

Isolate actual activity times. () Exclude any extra time waiting


()
Data Collection and Analysis 6

Steps to Gathering Data


()

Determine data requirements ( ) Identify data resources () Collect the data () Make assumptions () Analyze the data () Document and approve the data ( )
Data Collection and Analysis 7

Determining Data Requirements


()

Structural data ()
All the objects in the system to be modeled () Describe the layout of the system () Identify the items to be processed (e.g. entities, resources, locations.) ( )

Data Collection and Analysis 8

Determining Data Requirements


()

Operational Data ()
Explain how the system operates () When, where and how events & activities take place () Consist of the logic information about the system, e.g. routing, schedules, downtime behavior and resource allocation. ( )

Data Collection and Analysis 9

Determining Data Requirements


()

Numerical Data () Provide quantitative information of the system () Some are easy to get but some are not () e.g. capacities, arrival rates, activity time ( )

Data Collection and Analysis

10

Determining Data Requirements


()

Use of a Questionnaire (sample see p.103) (103) Questionnaire help gathering right information () If sample data are not available, it is useful to get at least estimate of the minimum, most likely, and maximum value until more precise data obtained. ( )
Data Collection and Analysis 11

Identifying Data Sources


()

Good sources of data () Historical Records () System Documentation (..) Personal Observation (,..) Personal Interviews (,,..) Comparison with similar systems () Vendor claim (..) Design estimation (, ..) Research literature (..)
Data Collection and Analysis 12

Collecting the Data


()

Defining Entity Flow ()


Entity flow establishes a skeletal framework for additional data be attached () Follow the entity movement () Use Entity flow diagram (EFD) () Difference between Entity flow diagram & Process Flowchart ()

Data Collection and Analysis 13

Collecting the Data


()

Difference between Entity flow diagram & Process Flowchart ()


Process Flow chart logical sequence of activities () () Define what it happens () Entity flow diagram
Show Show

physical movement of entities () Define where it happens ()

Data Collection and Analysis

14

Developing A Description of Operation ()

Description of Operation ()

Explain how entities are processed & provides the details of the EFD

(EDF) Requirements ()

Time & resource requirements of the activity or operation Where, when & in what quantities entities get routed next

()

()

Time & resource requirements for moving to the next location

()

Data Collection and Analysis

15

Entity Flow Diagram for Patient Processing ()

Data Collection and Analysis

16

Process Description for Patient Processing ()

Data Collection and Analysis

17

Defining Incidental Details ()

Incidental data (downtimes, setups & work priority) are not essential but necessary in order to have a complete & accurate model ( ) Once a basic model constructed, any numerical values (e.g. activity time, arrival rates ..) should be firmed up ( )
Data Collection and Analysis 18

Making Assumptions ()

Simulation cant run with incomplete data, so assumptions are required for any unknown future conditions ( ) Assumption must make sense in the overall operation of the model. Seeing absurd behavior may tell us that certain assumptions dont make sense (
)
Data Collection and Analysis 19

Making Assumptions ()

Simulation cant run with incomplete data, so assumptions are required for any unknown future conditions
()

Assumption must make sense in the overall operation of the model. Seeing absurd behavior may tell us that certain assumptions dont make sense
()

Data Collection and Analysis

20

Making Assumptions ()

Sensitivity analysis assess the influence of an assumption on the validity of a model.


() Best or most optimistic case () Worst or most pessimistic case () Most likely or best guess case ()

Data Collection and Analysis

21

Statistical Analysis of Numerical Data ()

Data should be analyzed to ascertain their suitability for use. () Data characteristics: ()
Independence (randomness) () Homogeneity (data from the same distribution) () Stationary (distribution of data no change over time) ( )

Data Collection and Analysis 22

Statistical Analysis of Numerical Data ()

Stat::Fit in Promodel can automatically analyze & test data in a simulation


(Stat:Fit)

Parameters ()

Mean () the average of the data Median () the value of middle observation Mode () the value with greatest frequency
Data Collection and Analysis 23

Descriptive Statistics
()

Parameters ()

Standard Deviation () measure of average deviation Variance () the square of standard deviation Coefficient of variation () standard deviation divided by mean Skewness () measure of symmetry Kurtosis () measure of flatness or peakedness
Data Collection and Analysis 24

Suitability of Data for Use


()

Test for Independency (): Data


are independent if the value of one observation is not influenced by the value of another observation
()

Test for Homogeneity ()data


from the same distribution ()

Test for Stationary Data ()


distribution of data does not change over time
()
Data Collection and Analysis 25

Test for Independency


()

Scatter Plot ()

A plot of adjacent points in the sequence of observed values plotted against each other A pair of consecutive observations (Xi, Xi+1), i=1,..,n-1 () Xis Positively correlated () positively sloped trend line () Xis Negatively correlated () Negatively sloped trend line ()
Data Collection and Analysis 26

Test for Independency


()

Autocorrelation Plot ()

If observations in a sample are independent, they are uncorrelated. () Assume that data are taken from stationary process The measure of autocorrelation is called rho () (see, p. 104) () Autocorrelation is between [-1,1]. (-1<= <=1) If is near either extreme 1 or -1, the data is autocorrelated. (1 or -1) If is near 0, the data is little or unrelated (0 )
Data Collection and Analysis 27

Test for Independency


()

Runs Test ()

A run in a series of observations is the occurrence of an uninterrupted sequence of numbers showing the same trend e.g run up or down ; ( )

Data Collection and Analysis

28

Test for Independency


()

Types of runs tests: if there are too many or too few, the randomness of the series is rejected. ()

Median Test (): measure the number of runs (sequences of numbers) above and below the median Turning Point Test(): measure the number of times the series changes directions

Data Collection and Analysis 29

Test for Homogeneity


()
Test for Identically Distributed Data): Test if data set come from the same distribution. () Examples of non-homogenous data set ()

Activity times that take longer or shorter depending on the type of entity being processed () Inter-arrival times vary in length depending on the time of the day or week ()

Data Collection and Analysis

30

Test for Homogeneity


()

Visually inspect the distribution to see if it has more than one mode () (p.118 Fig. 5.9)

() Analysis of variance (ANOVA) for normally distributed data ( ) Two-Sample test, Chi-square multi-sample test, Kruskal-Wallis non-parametric test. ()
Data Collection and Analysis 31

Test for Homogeneity


()

One type of nonhomogenous data occurs when the distribution changes over time Example of time-changing distribution ( )
Learning Curve () Non-stationary or time variant ( Arrival rate of customers to a service facility ( )

Data Collection and Analysis 32

Approaches for Stationary Data ()

Non-stationary data can be detected by plotting subgroups of data that occur within successive time intervals (Fig 5.10)
()

Run Stat::Fit and see what distribution best fits each data set. If the same distribution fits both, the same population is assumed (Stat::Fit
)

Data Collection and Analysis

33

Distribution Fitting
()

Three ways of Data Representation () Original data record () The data set is usually not large enough Empirical distribution (characterize data) () Continuous frequency distribution (): the percentage of values that fall within given intervals ()

Data Collection and Analysis 34

Distribution Fitting
()

Empirical distribution (characterize data) Discrete frequency distribution: the percentage of times a particular value occurs. () Drawbacks ()

Insufficient sample size may create artificial bias () Fail to capture rare extreme values that may exist in the population from which they were sampled

()
Data Collection and Analysis 35

Distribution Fitting
()

Theoretical distribution () Fitting theoretical distribution to the data () Random variates (generated from the probability distribution provide the simulated random values. ()

Data Collection and Analysis

36

Distribution Fitting
()

Theoretical distribution ()

Fitting a theoretical distribution to sample data smoothes artificial irregularities


()

Ensure extreme values are includes


()

Most simulation software provide utilities for fitting distributions to numerical data ( )
Data Collection and Analysis 37

Theoretical Distribution
()

Uniform Distribution
() (see p. 124) X~U(a,b) with EX=(a+b)/2, VarX=(b-a)^2/12 Used as a first model that is felt to be randomly varying between a & b which little else is known ()

Data Collection and Analysis

38

Theoretical Distribution
()

Triangular Distribution
() (see p. 124) X~Triang(a,m,b) with EX=(a+m+b)/3, VarX=(a^2+m^2+b^2-am-ab-bm)/18
Used

as a rough model and good approximation to use in the absence of data

()

Data Collection and Analysis

39

Theoretical Distribution
()

Normal Distribution
() (see p. 125) X~N(,2) with EX=, VarX= 2 Symmetry (Bell-shaped curve) () Physical measurements height, length () Certain activity time ()
Data Collection and Analysis 40

Theoretical Distribution
()

Poisson Distribution () (p. 126)


X~Po() with EX= , VarX= Used as numbers of events that occur in an interval of time when the events are occurring at a constant () e.g. # of items in a batch of random size () e.g. # of items demanded from an inventory ()

Data Collection and Analysis 41

Theoretical Distribution
()

Exponential Distribution () (p. 126)


X~Exp() with EX= , VarX= 2 Used frequently in initerarrival times of customers to a system that occur at a constant rate or time to failure of a piece of equipment () If an occurrence happens at a rate of Po(), the time between occurrences is Exp (1/ ) (Po()Exp (1/ )) Exp() is memory-less (help for events occurred independently of one another) ()

Data Collection and Analysis 42

Theoretical Distribution
()

Gamma Distribution ()
X~Gamma(,) with EX = , with VarX=2 Used as time to complete some tasks, e.g. customer service or machine repair. ( ,) Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT ( PERT)

Data Collection and Analysis 43

Theoretical Distribution
()

Beta Distribution ()
X~Beta(1,2) Used as a rough model in the absence of data () Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT ( PERT)

Data Collection and Analysis

44

Theoretical Distribution
()

Weibull Distribution ()
X~Weibull(,) Exp()=Weibull(1,) Used as time to complete some task or time to failure of a piece of equipements ( ) Distribution of a random proportion, e.g. the proportion of defective items in a shipment; time to complete a task in a PERT ( PERT)

Data Collection and Analysis 45

Fitting Theoretical Distribution


()

Stat::Fit does a reasonable job of data fitting which ranks distribution. (Stat::Fit (p.127)) Trial and Error Process () Goodness of fit test evaluates each fitted distribution to ascertain the relative goodness of fit.
(

Data Collection and Analysis

46

Fitting Theoretical Distribution


()

Two common goodness of fit tests: 2 and Kolmogorov-Smirnov tests

( 2 KolmogorovSmirnov )

If little data are available, goodness of fit test is unlikely to reject any candidate distribution Good idea to look at graphical display in a histogram () before making decisions

()

()
Data Collection and Analysis 47

Data Absence ()

Most likely or Mean Value ()

Minimum and Maximum Values ()

About 10 customers arrivals per hour Approximately 20 mins to assemble parts Around five machine failure per day 1.5 to 3 mins to inspect items 5 to 10 customer arrivals per hour 4 to 6 minutes to set up a machine

Minimum, Most likely, Maximum Values can be easily set up as a triangular distribution ( )
Data Collection and Analysis 48

Summary ()

Data should be collected systematically () Three types of data: structural, operational and numerical () Questionnaire is a good way to request information ()
Data Collection and Analysis 49

Summary ()

Numerical data for random variables should be analyzed to test for independency and homogeneity
()

A theoretical distribution should be fit to the data whenever possible


()

Data should be documented, reviewed and approved


()
Data Collection and Analysis 50

You might also like