# EXPLORING, DISPLAYING, AND EXAMINING DATA

Types of Data Analysis

Exploratory data analysis

• the data guide the choice of analysis--or a
revision of the planned analysis

Confirmatory data analysis

• closer to classical statistical inference in its •

use of significance and confidence may use information from a closely related data set or by validating findings through the gathering and analyzing of new data

Techniques to Display and Examine Distributions
 Frequency

Table  Visual Displays

• Histograms

• Stem-and-leaf display • Box-plot
 Crosstabulation

of Variables

Techniques to Display and Examine Distributions
 Histograms

• Display all intervals in a distribution, even •

without observed values Examine the shape of the distribution for skewness, kurtosis, and the modal pattern

Techniques to Display and Examine Distributions
 Box-plot

(cont.)

• Rectangular plot encompasses 50% of the
data values

(box and whisker-plot)

• Center line through the width of the box

• Edges of the box (hinges)

marks the median • Whiskers extend from the right and left hinges to the largest and smallest values
Techniques to Display and Examine Distributions
 Transformation

(cont.)

• To improve interpretation and compatibility
with other data sets • To enhance symmetry and stabilize spread • To improve linear relationships between and among variables

Improvement & Control Analysis
 Statistical

• Uses statistical tools to analyze, monitor, and • •
improve process performance Total Quality Management Control chart

process control

• Displays sequential measurements of a process
together with a center line and control limits

• Upper control limit • Lower control limit
Types of Control Charts
 Variables

data (ratio or interval measurements)

• X-bar • R-charts • s-charts • Pareto Diagrams

• Bar chart whose percentages sum to 100 percent

Geographic Information Systems
 Systems

of hardware, software, and procedures that capture, store, manipulate, integrate, and display spatially-referenced data

Geographic Information Systems
 Minimum

• Integrating information from various sources • Capturing data • Projection and restructuring • Modeling

four components

Crosstabulation
A

technique for comparing two classification variables

–Cells –Marginals –Contingency tables

Percentaging Errors
 Averaging

percentages without weighting  Using too-large percentages (>100%)  Using percentage with very small sample  Citing percentage decrease exceeding 100 percent

Other Table-based Analysis
 Automatic

• Sequential partitioning procedure that uses a • • •
dependent variable and set of predictors Searches among up to 300 variables for the best single division of data into subsets according to each predictor variable, Chooses one division approach Splits the sample using chi-square tests to create multi-way splits.

Interaction Detection (AID)

