Professional Documents
Culture Documents
Distributions of
Data
Understanding the importance of
location, spread, and shape
Reading materials
Do examples of the FREQ procedure given in SAS documentation under the procedure
1
3/19/2020
2
3/19/2020
Frequency histogram is useful for variables with and ordered scale – ordinal,
Interval, or ration – that contains a large number of values
You can easily draw the conclusion that extremely low and high values are less likely
This may not have been easy to deduce if there were a large number of values
Outlier?
3
3/19/2020
Shapes of distributions
Skewness
4
3/19/2020
For example these distributions may be associated with outliers, or may represent
Variables that are non-normally distributed, sometimes requiring transformation.
The measures of Skewness and Kurtosis may be quickly obtained (for example
through PROC MEANS) to identify variables needing more detailed distributional
analysis.
5
3/19/2020
Histogram
FREQ PROCEDURE
proc freq Data=new;
tables a / missprint;
title '1-WAY FREQUENCY TABLE WITH MISSPRINT OPTION';
run;
6
3/19/2020
Analysis of distributions of
variables
PROC UNIVARIATE
Data Gains
7
3/19/2020
MORE
OUTPUT
ON NEXT
PAGE
MORE
OUTPUT
ON NEXT
PAGE
8
3/19/2020
9
3/19/2020
10
3/19/2020
10% 108
5% 100
1% 96
0% Min 96
11
3/19/2020
Data BPressure;
Set BPressure;
Run;
12
3/19/2020
Winsorized Means
Percent Number Winsorized Std Error 95% DF t for H0: Pr > |t|
Winsorized Winsorized Mean Winsorized Confidence Mu0=0.00
in Tail in Tail Mean Limits
Trimmed Means
Percent Number Trimmed Std Error 95% Confidence Limits DF t for H0: Pr > |t|
Trimmed Trimmed Mean Trimmed Mu0=0.00
in Tail in Tail Mean
13