# Chapter 3 - Developing the Plan (Part 2) Chapter 4 - Data Step

We looked at the following sub-stages in past lectures: Deﬁning the study units and population Determining what variates will be measured Specifying the sampling protocol We will now address the ﬁnal two sub-stages of the plan: Deﬁning and assessing the measurement systems Planning for data collection

Introduction
Developing the Plan
Recall that in the plan stage, we would like to determine procedures for carrying out the study and collecting the data.

Deﬁning and assessing the measurement systems
The measuring process
A measurement system or process is the combination of instruments (or gauges), methods, material, environment and people/operators that are used to produce the measured value of a variate on a sampled unit.

Deﬁnition: Measurement error is the difference between the measured value and the true value of a variate.

We can use a separate PPDAC cycle to investigate the measurement system by itself: A unit is the act of taking a single measurement. The response variate is a measured value. Possible attributes of interest include the average measurement error or standard deviation (or diversity) of the measurement error.

Deﬁning and assessing the measurement systems
Example 1.2 of Chapter 1 Revisited
Recall the example where a micrometer is used to measure the diameter of a ground shaft. The following is a ﬁshbone diagram for the measurement system.

Planning for data collection
Preparing for the data stage
Address various questions regarding the collection of the data. For example:
Who will collect the sample and make the measurements?
What might go wrong? What can we do to avoid this?
When will the sample be collected and the measurements made?
How will we record the data in an effective and efﬁcient manner?

The Data Stage
The data stage
This stage may be costly and time-consuming. As such, it is important to continuously monitor the data collection process to ensure that everything goes according to the plan. At all points in this stage, we must record any departures from the plan.

Steps:
Execute the plan.
Monitor the data as it is collected.
Store the data.
Examine the data.

The Data Stage
Execute the plan
If the plan is experimental, set the explanatory variate values.
Select the sample of units according to the sampling protocol and record any departures.
Measure variates on each unit.
Follow measurement systems protocols.
Record the gauge.
Record the measurer.
Record time/order of each measurement.
Distinguish complete, incomplete, and missing measurements.

The Data Stage
Monitor the data collection
Within each sampled unit, check for unexpected relationships between measurements and the measurer, measurements and the gauge used, measurements and time of measurement.

Across the units, are there any missing values? (You can represent missing data values using "NA" or "NaN". It is not recommended that you represent missing values using numerical values, e.g. 0, 99, etc.)
are there any illegal data types? (e.g. letter values instead of numerical values)
is the range of data values logically consistent? (e.g. negative values for a variate which is always positive)

The Data Stage
Examine the data
Conduct visual and graphical checks to look for patterns of internal consistency. Is the data logically consistent with any prior information? Look for extreme observations (sometimes called outliers) which seem to be away from the bulk of the data.

We can perform, for example, a marginal examination (look a one variate at a time), a joint examination (look at multiple variates at once) or a conditional examination (look at one or multiple variates which are organized according to the value of another variate).

Some useful graphical tools include scatter plots, histograms, stem and leaf diagrams, box plots, time series plots and conditional displays.

The Data Stage
Store the Data
Store the data in an efﬁcient manner so that it can be accessed and used easily in the analysis stage. Store the data in a format that is compatible with the statistical software used in the analysis stage.