New Univariate and Npar1way

CAWANGAN KELANTAN KAMPUS KOTA BHARU
FACULTY OF COMPUTER AND MATHEMATICAL SCIENCES

MARA OF UNIVERSITY TECHNOLOGY
BACHELOR OF SCIENCE STATISTICS (CS241)
GROUP PROJECT: SAS PROGRAMMING (STA610) TOPIC:

UNIVARIATE & NPAR1WAY
PREPARED FOR: SIR MOHD NOOR AZAM BIN NAFI
PREPARED BY:
D2CS2415A
AMIR HAMZAH BIN ABDUL HALIM 2018801466
NUR FATIHAH ANIS BINTI MAT JUNOH 2018410666
MIMI NOR SHAZLEEN BINTI HANAFIAH 2018695498

NUR SYAI’RAH NAJWA BINTI SHAMSURI 2018695596
NUR AZMINA HUSNA BINTI NORAMINASRUN 2018435767
1.0 INTRODUCTION
UNIVARIATE
The UNIVARIATE procedure provides a variety of descriptive measures, high-resolution

graphical displays, and statistical methods, which you can use to summarize, visualize,
analyze, and model the statistical distributions of numeric variables.
The data used are listed below:
-Breast Cancer Wisconsin (Diagnostic) Data Set consists of 569 dataset and 7 variables
-Weight loss using diet consists of 78 dataset and 7 variables
-Iris Species consist 150 dataset and 6 variable
The data retrieve from:
1.https://www.kaggle.com/uciml/breast-cancer-wisconsin-data/
2. https://www.kaggle.com/tombenny/foodhabbits
3. https://www.kaggle.com/uciml/iris
Some common options used in the PROC UNIVARIATE statement are DATA=, NORMAL,
FREQ, PLOT. These options are explained as follows:
a) The DATA= option specifies which SAS Dataset to use in the PROC
b) The NORMAL option indicates a request for several tests of normality of variable(s)
c) The FREQ option produces a frequency table of the variable(s)
d) The PLOT option produces stem-and-leaf, box and qq plots of the variable(s)
Procedure Manual:
PROC UNIVARIATE < options >;
CLASS variable(s) < / KEYLEVEL= value >;
VAR variable(s);
BY variable(s);
HISTOGRAM < variables > < / options >;
FREQ variable;
ID variable(s);
INSET keyword-list < / options >;
OUTPUT < OUT=SAS-data-set > . . . < percentile-options >;
PROBPLOT < variable(s) > < / options >;
QQPLOT < variable(s) > < / options >;
WEIGHT variable;
RUN;
NPAR1WAY
Nonparametric or distribution-free tests such as those available in PROC NPAR1WAY

usually focus on the sign or rank of the data rather than the exact numerical value. PROC
NPAR1WAY defaults numerous nonparametric location and scale tests such as the
following: Analysis of Variance, Wilcoxon, Kruskal-Wallis, Median, Van der Waerden,
Savage, Kolmogorov-Smirnov and the Cramer-von Mises.
The following statements are available in the NPAR1WAY procedure:

PROC NPAR1WAY < options >;
BY variables;
CLASS variable;
EXACT statistic-options < / computation-options >;
FREQ variable;
OUTPUT < OUT=SAS-data-set > < output-options >;
VAR variables;
2.0 USES
2.1 USES OF UNIVARIATE PROCEDURE
Used in statistics to describe a type of data which consists of observations on
only a single characteristic or attribute.The main purposed is to describe the data
and find patterns that exist within it.
2.2 USES OF NPAR1WAY PROCEDURE

To performs nonparametric tests for location and scale differences across one
way classification.it also provides a standard analysis of variance on raw
data,empirical distribution function and multiple comparison analysis.
3.0 ADVANTAGES AND DISADVANTAGES

3.1 UNIVARIATE PROCEDURE
ADVANTAGES
i) It can be explore by the distributions of the variables in a dataset.
ii) Easy for modelling the data distribution and validating the distribution
assumptions
iii) It can determine the goodness of fit by hypotheses tests and graphical
displays, such as probability plots and quantile plots.
iv) Can be used to suit the parametric distributions such as beta, exponential,
gamma, etc., and to measure the probabilities and percentiles of such
models
DISADVANTAGES
When no other statements specified, a variety of statistics will be generated that

summarize the distribution of data for each analysis variable
3.2 NPAR1WAY PROCEDURE
ADVANTAGES
i) It can perform tests based on basic linear rank statistics when the data is
divided into two samples. (one-way variance analysis (ANOVA) statistics)
ii) Provides stratified analysis for two-sample data such as median, savage, and
etc.
iii) It performs empirical distribution function (EDF) statistics, measuring if the
distribution of the variable is the same among different groups.
DISADVANTAGES
i) If no analysis options are defined in the PROC NPAR1WAY statement, the
options are invoked by default
ii) If ODS Graphics is allowed but does not specify the option, PROC
NPAR1WAY will produce all the plots associated with the analyses
requested.
4.0 STASTISTICAL METHODS

Comparison of two groups t-test (paired t-test)
Comparison of more then two groups Analysis of variance (ANOVA)
Relationship of 2 + quantitative variables Correlation and regression
Dependence of two categorical variables Contingency tables

T-TEST
- Two independent samples

- Two dependent samples (paired t-test)
MORE THAN TWO GROUP
- Analysis of variance-anova
i) Single classification – one way ANOVA
ii) More classification criteria – two way / multi way ANOVA
VARIOUS ANOVA MODEL
- You can account for various hierarchical structures of your data

- You can reflect also repeated observations
RELATIONSHIP OF TWO (OR MORE) QUANTITATIVE VARIABLES
- The two compared variables are (considered) to be “equal” – calculate correlation

coefficient (value between -1 and 1)
- One of the variables is (considered) predictor (=independent,explanatory),the other is
(considered) response (=dependent,explained) – Regression
- PROC NPAR1WAY defaults numerous nonparametric location and scale tests

such as the following: Analysis of Variance, Wilcoxon, Kruskal-Wallis, Median,
Van der Waerden, Savage, Kolmogorov-Smirnov and the Cramer-von Mises.
- When the data are classified into two sample,tests are based on simple linear rank
statistics.
- When the data are classified into more than two sample,test are based on one-
way ANOVA statistics.
-
5.0 HYPOTHESIS TESTING
H 0: µ = 0
H1 µ ≠ 0

H 0: distribution1 = distribution2 = … = distributionp
H 1: distribution1 ≠ distribution2 ≠ … ≠ distribution
6.0 EXAMPLES OF UNIVARIATE AND NPAR1WAY
6.1 PERFORMING ONE SAMPLE NON PARAMETRIC TEST
Firstly, we must know the meaning of SAS data set. SAS data set is a specially structured
data file that SAS creates and only SAS can read. A SAS data set is a table that contains
observations and variables. The data to use SAS data set is Diagnosis of Breast Cancer from
computed data in digitized image of a fine needle aspirate (FNA).
The first step to run the data using SAS, Use PROC CONTENTS to display the descriptor
portion of a SAS data set
proc contents data=work.Cancer;
run;
Figure 1.1 : The contents procedure in data

Next, Use the PROC PRINT to display the data portion of a dataset.
proc print data=work.Cancer;
run;
Figure 1.2: The list of variables and attributes
The one sample of non – parametric are:
WILCOXON TEST
The Wilcoxon matched-pairs (or Signed-ranks) test is a nonparametric test of location for two
related samples (for example a before-and-after study). The null hypothesis is that the
samples arise from exactly the same distribution, and this is tested against the alternative that
the underlying distributions differ in their locations.The data for the one-sample Wilcoxon
test is a variate holding differences between the two samples.
RUNS TEST
Use this to perform a test of the randomness of a sequence of observations. The data for the
test is assumed to be an ordered sequence of observations of two types, and a run is defined
to be a succession of observations of the same type. A clue to lack of randomness is provided
by the total number of runs in the sequence. A low number of runs might indicate positive
serial correlation while a high number might arise from negative serial correlation.The data
for the test are in a Genstat variate structure whose name must be entered into
the Variate field.Observations larger than the value specified by the Boundary value field are
considered to be of the first type, while observations smaller than this are taken to be of the
second type. Missing values and observations that equal the boundary value are ignored.
SIGN TEST
Use this to perform a nonparametric two-sided test of location of a single sample of

observations. The data for the test are in a Genstat variate structure whose name must be
entered into the Variate field.The default null hypothesis is that the sample has a median
value of zero, but you can test against other values by changing the Median value field.
Missing values and observations that equal the specified Median value are ignored and the
effective sample-size reduced accordingly.
T TEST
The one-sample t-test is a statistical hypothesis test used to determine whether an unknown
population mean is different from a specific value. Use the test for continuous data. The data
should be a random sample from a normal population.The sections below discuss what we
need for the test, checking our data, performing the test, understanding test results and
statistical details.It just need for the one-sample t-test, we need one variable.
It can be choosen to perform a one-sided or two-sided test from the Type of test options.
i) Central Tendency
A measured of central tendency is a single value thet attempts to describe a set of data by
identifying the central position within set of data. The mean(often called the average) is
most likely the measure of central tendency.
Its important to find the mean(or average) for Diagnosis of Breast Cancer to measure of
central tendency. Hence,the average of total radius_mean Diagnosis of Breast Cancer can
be measured to know the central tendency of this data.
Command:
proc means data=work.Cancer;
title 'The average of total radius_mean Diagnosis of Breast Cancer';
var radius_mean;
run;
Output:
Figure 1.3 The average of total radius_mean Diagnosis of Breast Cancer
For the t-test calculations we need the average, standard deviation and sample size. These are
shown in the summary statistics section of Figure 1.3 above.
ii) UNIVARIATE TEST
Using the UNIVARIATE TEST, it is very important to used in statistics to

describe a type of data which consists of observations on only a single
characteristic or attribute.The main purposed is to describe the data and find
patterns that exist within it.
Moreover ,we used PROC UNIVARIATE for display the radius mean for each
type of diagnosis which are Malignant and Benign.
a) Diagnosis Malignant
Command:
title 'The radius mean for diagnosis Malignant';

proc univariate data=work.cancer;
by diagnosis;
var radius_mean;
where diagnosis='M';
run;
Output:
Figure 1.4 output of diagnosis Malignant
From the above output, the test statistics for each test is provided here. The p-value for each
test is provided. The all p-value < 0.0001 would indicate that we should reject the assumption
for the test for location. It is sufficient evidence of a The radius mean for diagnosis Malignant.
b) Diagnosis Benign
Command:
title 'The radius mean for diagnosis Benign';

by diagnosis;
var radius_mean;
where diagnosis='B';
run;
Output:
Figure 1.5 Output for diagnosis for Benign
From the figure 1.5 output, the test statistics for each test is provided here. The p-value for
each test is provided. The all p-value < 0.0001 would indicate that we should reject the
assumption for the test for location. It is sufficient evidence of a The radius mean for diagnosis
Benign.
Sign Ranked Test
The definition for sign ranked test is a statistical test used to analyse the direction of
differences of scores between the same or matched pairs of subjects under two experimental
conditions.
Firstly,we need to analysis of prediction Breast Cancer Changes between radius_mean and
texture mean which is radius_mean is mean of distances from center to points on the
perimeter meanwhile texture_mean is standard deviation of gray scale values.
Coding:
data work.subset1;
set work.cancer;
diagnosispredict= radius_mean - texture_mean;
drop id parameter_mean area_mean smoothness_mean;
run;
title 'Analysis of Prediction Breast Cancer Changes';

ods select Frequencies;
proc univariate data=work.subset1 freq;
var diagnosispredict;
run;
The output shown at appendix 1.1 which is shown the negative values and positive values for
the changes of prediction of Breast Cancer.
PROC UNIVARIATE provides three tests for location: Student’s t test, the sign test, and the
Wilcoxon signed rank test. All three tests produce a test statistic for the null hypothesis that
the mean or median is equal to a given value against the two-sided alternative that the mean
or median is not equal to . By default, PROC UNIVARIATE sets the value of to zero. We
can use the MU0= option in the PROC UNIVARIATE statement to specify the value of .
Student’s t test is appropriate when the data are from an approximately normal population;
otherwise, use nonparametric tests such as the sign test or the signed rank test. For large
sample situations, the t test is asymptotically equivalent to a z test.
We need to do to the first thing which are The ODS SELECT statement restricts the output to
the "TestsForLocation" and "LocationCounts" tables;. The MU0= option specifies the null
hypothesis value of for the tests for location; by default, . The LOCCOUNT option produces
the table of the number of observations greater than, not equal to, and less than 569 data.
Coding:
title 'Sign Rank Test';

title2 'Prediction Diagnosis for Breast Cancer';
ods select TestsForLocation LocationCounts;
proc univariate data=work.subset1 mu0=569 loccount;
run;
Output:
Figure 1.6
The output in figure 1.6 contains the results of the tests for location. All three tests are highly
significant, causing the researchers to reject the hypothesis that the 569 data.
Therefore,In performing a sign test for the prediction diagnosis for breast cancer. The
following statements request basic statistical measures and tests for location:
Coding:

title2 'Prediction Diagnosis for Breast
Cancer';
ods select BasicMeasures
TestsForLocation;
proc univariate data=work.subset1;
run;
Output:
The ODS SELECT statement restricts the output above to the "BasicMeasures" and
"TestsForLocation" tables. The instructor is not willing to assume that the diagnosispredict
variable is normal or even symmetric, so he decides to examine the sign test. The all p-value
(<0.0001) of the sign test provides is sufficient evidence of a Analysis of Prediction
Diagnosis Breast Cancer .
Finding the p-value
Look for p value column identifies the p-value as Pr > |t| of < .0001, which is less than the
significance level of 0.05 so we reject the null hypothesis. It concludes the mean value of
variable is significantly different from zero.
Using t-test
Coding:
ODS GRAPHICS ON;

proc ttest data=work.subset1 H0=569 PLOTS(SHOWH0) alpha=0.05;
title 'Analysis of Prediction Diagnosis Breast Cancer';
run;
Output:
From Figure 1.7 that appears to be a fairly safe assumption.The points seem to fall about a
straight line. Notice that x-axis plots quantiles. Those are the quantiles from the Normal
distribution with mean 0 and standard deviation 1 but the data is not normally distributed.
Furthermore,we need to predict for Radius SE Breast Cancer and the Worst Breast Cancer
among the Radius_mean which is mean of distances from center to points on the perimeter.
a) Analysis of Radius SE Breast Cancer

The ODS SELECT statement restricts the output to the "TestsForLocation" and
"LocationCounts" tables;. The MU0= option specifies the null hypothesis value of for the
tests for location; by default, . The LOCCOUNT option produces the table of the number of
observations greater than, not equal to, and less than 13 data for Radius SE Breast Cancer.
Coding:
title 'Analysis of Radius SE Breast Cancer';

proc univariate data=work.cancer mu0=13 loccount;
var radius_mean;
run;
Output:
The output in figure 1.8 contains the results of the tests for location. The student’s t and
signed rank tests are highly significant but for sign test is not significant, causing the
researchers to reject the hypothesis that the 13 data.
Therefore,In performing a sign test for the prediction diagnosis for radius SE breast cancer.
The following statements request basic statistical measures and tests for location:
Coding:
title2 'Prediction Diagnosis for Radius SE Breast Cancer';
ods select BasicMeasures TestsForLocation;
var radius_mean;
run;
Output:
"TestsForLocation"tables. The instructor is not willing to assume that the radius_mean
(<0.0001) of the sign test provides insufficient evidence of a Radius SE Breast Cancer.
b) Analysis of Worst Breast Cancer
The Analysis for Worst Breast Cancer also do the ODS SELECT statement restricts the
output to the "TestsForLocation" and "LocationCounts" tables;. The MU0= option specifies
the null hypothesis value of for the tests for location; by default, . The LOCCOUNT option
produces the table of the number of observations greater than, not equal to, and less than field
23 for worst Breast Cancer.
Coding:
title 'Analysis of Worst Breast Cancer';

proc univariate data=work.cancer mu0=23 loccount;
var radius_mean;
run;
Output:
The output contains the results of the tests for location. All three tests are highly significant,
causing the researchers to reject the hypothesis that the 23 data.
Therefore,In performing a sign test for the prediction diagnosis for breast cancer. The
following statements request basic statistical measures and tests for location:
Coding:
title2 'Prediction Diagnosis for Worst Breast Cancer';
ods select BasicMeasures TestsForLocation;
var radius_mean;
run;
Output:
"TestsForLocation"tables. The instructor is not willing to assume that the radius_mean
(<0.0001) of the sign test provides insufficient evidence of a Worst Breast Cancer.
One Sample T-Test
A one sample test of means compares the mean of a sample to a pre-specified value and tests
for a deviation from that value. We choose this test because the data for Diagnosis Breast
Cancer because have a continuous data and random sample from a normal population.
Assumptions for the one-sample paired t-test are the observations are independent and
identically normally distributed. To perform a one sample paired t-test in SAS, either PROC
UNIVARIATE(as described above) or the following PROC TTEST code can be used.
a) To test whether the radius_mean are less than 569 data in Diagnosis of Breast
Cancer from computed data in digitized image of a fine needle aspirate (FNA).
/*Null:mean radius_mean =569*/
/*Alt: mean radius_mean <569*/
proc ttest data=work.Cancer H0=569 SIDES=L alpha=0.05;
title 'One Sample t-test with proc ttest';
title2 "Testing Radius Mean with Diagnosis of Breast Cancer less than 569";
var radius_mean;
run;
Figure 1.9:Output of one sample t-test with proc ttest
From Figure 1.9 ,Its shown the 95% confidence lower limits around the mean is 3.3305.As
in conclusions, we reject H0 because p-value=0.0001 and less than 0.05. Hence, the
radius_mean are less than 569 data in Diagnosis of Breast Cancer from computed data in
digitized image of a fine needle aspirate (FNA).
Figure 1.10: Histogram and Boxplot for Univariate Radius_Mean for Lower Limit.
The Figure 1.10 shown that the histogram have positive values for the skewness coefficient
indicate that the radius_name are right skewed. Positive values for the kurtosis coefficient
indicate that the distribution of the radius_name is steeper than a normal distribution.
Hence,the box plot also shown more skewed to the right its mean most of the wait times are
relatively short for radius_mean.
Next, the data have need to display the output in Q-Q plot because is a graphical tool to help
us assess is a set of data plausibly came from some theoretical distribution.
proc ttest data=work.Cancer H0=569 PLOTS(SHOWH0) alpha=0.05;
title2 "Testing Radius Mean with Diagnosis of Breast Cancer less than 569 ";
var radius_mean;
run;
Figure 1.11 :Q-Q PLOT for radius_mean
distribution with mean 0 and standard deviation 1 but the data is not normally distributed.
b) To test whether the radius_mean are greater than 569 data in Diagnosis of
Breast Cancer from computed data in digitized image of a fine needle aspirate
(FNA).
/*Null:mean radius_mean =569*/
/*Alt: mean radius_mean >569*/
proc ttest data=work.Cancer H0=569 SIDES=U alpha=0.05;
title2 "Testing Radius Mean with Diagnosis of Breast Cancer greater than 569";
var radius_mean;
run;
Figure 1.12 : Output of one sample t-test with proc ttest

From Figure 1.12 ,Its shown the 95% confidence upper limits around the mean is 3.3305.As
in conclusions, we failed to reject H0 because p-value=1.000 and greater than 0.05. Hence,
the average of radius_mean are less than 569 data in Diagnosis of Breast Cancer from
computed data in digitized image of a fine needle aspirate (FNA).
Figure 1.13: histogram and boxplot for univariate radius_mean for upper limit.
The Figure 1.13 shown that the histogram have negative values indicate that that data are left
skewed. Negative values for kurtosis indicate that the distribution of the data is flatter than
normal distribution. Hence,the box plot also shown more skewed to the left shows failure
time data because have more outliers.
Next, the data have need to display the output in Q-Q plot because is a graphical tool to help
us assess is a set of data plausibly came from some theoretical distribution.
ODS GRAPHICS ON;
proc ttest data=work.Cancer H0=569 PLOTS(SHOWH0) alpha=0.05;
title2 "Testing Radius Mean with Diagnosis of Breast Cancer greater than 569";
var radius_mean;
run;
Figure 1.14 :Q-Q PLOT for radius_mean
distribution with mean 0 and standard deviation 1 but data the data is not normally
distributed.
iii) NPAR1WAY
Unfurnately ,Npar1way cannot be use for the result of Sign Ranked Test and One sample t
test because its only computes tests based on simple linear rank statistics when the data are
classified into two samples one-way ANOVA statistics and more than two samples.
As Conclusion,we have determined the data using One sample t-test performed each
treatment group is not normally distributed, and we have major influential outliers. The test
would be more appropriate to determine whether an unknown population mean is different
from a specific value but based on the result using the One sample T-test is clearly shown
that radius_mean are less than 569 data in Diagnosis of Breast Cancer from computed data
in digitized image of a fine needle aspirate (FNA).
6.2 PERFORMING A TWO INDEPENDENT SAMPLE NONPARAMETRIC
TEST
A Mann-Whitney U test is typically performed when an analyst would like to test for
differences between two independent treatments or conditions. However, the continuous
response variable of interest is not normally distributed. The Mann-Whitney U test is often
considered a nonparametric alternative to an independent sample t-test. The Mann-Whitney U
test is also known as the Mann-Whitney-Wilcoxon, Wilcoxon-Mann-Whitney, and the
Wilcoxon Rank Sum.
A Mann-Whitney U test is typically performed when each experimental unit, (study subject)
is only assigned one of the two available treatment conditions. Thus, the treatment groups do
not have overlapping membership and are considered independent. A Mann-Whitney U test
is considered a “between-subjects” analysis.
i) DESCRIPTIVE STATISTICS
Descriptive statistics are not only used to describe the data but also help determine if any
inconsistencies are present.
proc means data=work.cancer n nmiss mean std median min max qrange maxdec=2 nonobs;
var area_mean;
class diagnosis;
run;
From this output, we can know the number of observations, missing values, mean, standard
deviation, median, minimum, maximum, and quartile range for each treatment (diagnosis).
ii) UNIVARIATE PROCEDURE
TEST FOR NORMALITY
Prior to performing the Mann-Whitney U, it is important to evaluate our assumptions to

ensure that we are performing an appropriate and reliable comparison. If normality is present,
an independent samples t-test would be a more appropriate test. Testing normality should be
performed using a Shapiro-Wilk normality test (or equivalent), and a QQ plots for large
sample sizes. PROC UNIVARIATE is used to produce the Shapiro-Wilk normality test and
corresponding QQ plots.
proc univariate data=work.cancer normal cipctldf;

ods select TestsForNormality quantiles qqplot;
by diagnosis;
class diagnosis;
var area_mean;
histogram area_mean /normal;
qqplot /normal (mu=est sigma=est)run;
The Shapiro-Wilk normality test for breast cancer diagnosis B (Benign):

The Shapiro-Wilk normality test for breast cancer diagnosis M (Malignant):
From the above output, four different normality tests are presented. The test statistics for each
test is provided here. The p-value for each test is provided. A p-value < 0.05 would indicate
that we should reject the assumption of normality. The Shapiro-Wilk Test p-values for breast
cancer B is 0.0228 and breast cancer M is 0.0001. Both p-value are <0.05, therefore, not
normally distributed.
For QQ plot, the vast majority of points should follow the theoretical normal reference line.
QQ plot for diagnosis B:

QQ plot for diagnosis M:
Since the Shapiro-Wilk test p-values are < 0.05 and the QQ plot follow the theoretical normal
reference line for both treatment groups, we conclude that the data is not normally
distributed.
PROC UNIVARIATE also can create distribution free 95% confidence intervals on many
different percentiles. This can be helpful when describing data that does not follow a normal
distribution. A subset of tables for each diagonal is presented below to provide confidence
intervals on the median:
Median 95% confidence intervals for diagnosis B:
Median 95% confidence intervals for diagnosis M:

Level indicates the percentile for which the confidence interval is computed. Since 50% of
the data falls above and below this point, the quantile value in this table also corresponds to
the median for each group.
BOXPLOTS
Side-by-side boxplots are provided by the SGPLOT procedure. The boxplots below seem to
indicate four outlier in diagnosis B and three outlier in diagnosis M. We also can compare the
range and distribution of the area_mean for diagnosis B and diagnosis M. We observe that
there is a greater variability for diagnosis M area_mean as well as larger outliers. Also, since
the notches in the boxplots do not overlap, we can conclude that with 95% confidence, that the
true medians do differ.
proc sgplot data=work.cancer;

title 'Boxplot of area_mean and diagnosis of cancer';
vbox area_mean /category=diagnosis;
run;
iii) NPAR1WAY PROCEDURE
So far, we have determined that the data for each treatment group is not normally
distributed, and we have major influential outliers. As a result, a Mann-Whitney U test
would be more appropriate than an independent samples t-test to test for significant
differences between treatment groups. Our next step is to officially perform a Mann-
Whitney U test. The NPAR1WAY procedure performs this test in SAS.
proc npar1way data=work.cancer wilcoxon;

class diagnosis;
var area_mean;
run;
From the above output, it’s shows the number of observations, the sum of the assigned ranks,
the expected sum of the ranks, the standard deviation of the ranked data, and the mean rank
for each treatment level.
MANN-WHITNEY U TEST RESULT
The Mann-Whitney U test results in a two-sided test p-value is <.0001. This indicates that we
should reject the null hypothesis that distributions are equal and conclude that there is a
significant difference in diagnosis of breast cancer.
We have concluded that the area_mean in each treatment group is not normally distributed. In
addition, outliers exist in each group. As a result, a Mann-Whitney U test is more appropriate
than a traditional independent samples t-test.
6.3 PERFORMING A PAIRED DATA OF NON PARAMETRIC TEST
Non parametric test propose the cases where samples are paired: Wilcoxon signed rank test.
The Wilcoxon Signed Rank test assumes that each measurement pair is distinct from the
other measurement pairs. With ordinary and continuous variables, this test can be used.
For this test, we use another data that suitable.
We retrieved data from https://www.kaggle.com/tombenny/foodhabbits
i) UNIVARIATE PROCEDURE
To perform the paired-difference t-test or the Wilcoxon Signed Rank test on a difference
variable, use PROC UNIVARIATE. PROC UNIVARIATE creates the Tests for Location
table, which contains both the t-test and the Wilcoxon Signed Rank test.
 Performing The Wilcoxon Signed Rank Test
PROC UNIVARIATE DATA=data-set-name;

VAR difference-variable;
ods select TestsForLocation;

proc univariate data=work.fooddiet;
var weight6weeks;
title 'Testing The Differences for Diet';
run;
Figure 4.1a Comparing Paired Groups with PROC UNIVARIATE
Figure 4.1a shows the output. The heading in the procedure identifies the variable
being tested. The tests for location table identifies the hypothesis being tested. PROC
UNIVARIATE automatically tests the hypothesis that the mean difference is 0, shown by the
heading Mu0=0. The test column identifies the statistical test.
Finding the p-value
Look for the Student’s t row in the Test for Location table. The Statistic column identifies
the t-statistic of 67.96704 . The p Value column identifies the p-value as Pr > |t| of < .0001,
which is less than the significance level of 0.05 so we reject the null hypothesis. It concludes
the mean value of variable is significantly different from zero.
 Using PROC TTEST to Test Paired Differences
proc ttest data=work.fooddiet;

paired pre_weight*weight6weeks;
title 'Paired Differences with Diet Before and After';
run;
Figure 4.1b Comparing Paired Groups with PROC TTEST
Figure 4.1b shows the output how SAS creates the difference variable, gives
descriptive statistics for the difference variable, and performs the paired-difference t-test. The
subheading Difference: pre_weight – weight6weeks indicates how PROC TTEST calculates
the paired difference. SAS subtracts the second variable in the PAIRED statement from the
first variable in the statement.
Finding the p-value
In figure 1.1 The value of Pr > |t| is <0.0001 which is less than the significance level
of 0.05.So we reject the null hypothesis. This test has the same result as the analysis of the
difference variable. We conclude that the mean difference from the two methods is
significantly different from 0. We interpret the p-value for this test the exact same way as
described for PROC UNIVARIATE.
Normal Probability Plot

We can tell from The normal probability plot, in order to investigate whether the usual
variable is normally distributed. From the output, light tails are shown. Notice that the points
differ, suggesting that the distribution is not normal.
ii) NPAR1WAY PROCEDURE
Performing The Wilcoxon Signed Rank Test
proc npar1way data=work.fooddiet wilcoxon;

class gender;
var pre_weight;
run;
Figure 4.2a displays the output produced by the Wilcoxon analysis. The Wilcoxon
statistic equals 1932.0000. The expected value under the null hypothesis, PROC
NPAR1WAY displays the right-sided p-values. The one-sided p-value equals < 0.0001,
which is not significant at the 0.05 level. So, we reject the null hypothesis.
Figure 4.2b displays the box plot of Wilcoxon scores. This graph corresponds to the
Wilcoxon scores analysis shown in figure 4.2a.
Box plot are available for all PROC NPAR1WAY score types except median scores, which
are displayed in a stacked bar chart. If ODS Graphics is enable but do not specify the PLOTS
= option, PROC NPAR1WAY produces all plots that are associated with the analysis that we
request.
6.4 PERFORMING A KRUSKAL WALLIS ONE WAY ANOVA
The Kruskal – Wallis test is a nonparametric test, and is used when the assumptions of one
way ANOVA are not met. Both the Kruskal – Wallis test and one way ANOVA assess for
significant differences on continuous dependent variable by a categorical independent
variable (with two or more groups).
For this test, we use another data that suitable.
We retrieved data from https://www.kaggle.com/uciml/iris
We import the data then run this coding:
proc contents data=work.iris;

run;
proc print data=work.iris;
run;
*sort the data by species;

proc sort data=work.iris;
by species;
run;
*produce descriptive statistics;

proc means data=work.iris nmiss std stderr lclm uclm median min max
qrange maxdec=2;
class species;
var petallengthcm;
run;
From this we can know the number of observations, number of missing observation, mean
value for each treatment, standard deviation, standard error lower and upper 95% CL for
mean, minimum and maximum value for each treatment and quartile range that is 75 th
percentile – 25th percentile.
i) UNIVARIATE PROCEDURE
Test for Normality
It is important to evaluate model assumptions to ensure that we are performing an appropriate

and reliable comparison. If normality is present, a one-way ANOVA would be more powerful
alternative.
Testing normality should be performed using a Shapiro – Wilk normality test (or equivalent)
and QQ plots for large sample sizes. Histograms can also be helpful. PROC UNIVARIATE is
used to produce the Shapiro – Wilk normality test and corresponding QQ plots.
*test for normality;

proc univariate data=work.iris normal cipctldf;
by species;
var petallengthcm;
histogram petallengthcm /normal;
qqplot /normal (mu=est sigma=est);
run;
The Shapiro – Wilk

normality test for iris –
setosa by
petallengthcm

versicolor by
petallengthcm

virginica by
petallengthcm
Since the sample size is less than 2000, therefore we use Shapiro-Wilk test. From this output
we can see that the all p - value is more than 0.05 so we failed to reject the null hypothesis. It
implies the distribution is normal.
Next, for QQ plots, the vast majority of points should follow the theoretical normal reference
line. If data were normally distributed, most of the points would be on the line. Data points
for all Q – Q Plot below shows that they close to diagonal line so that the data is normal
distribution and this shows that Shapiro – Wilk normality test and QQ plots are
corresponding.
SPECIES = IRIS - SETOSA SPECIES = IRIS - VERSICOLOR
SPECIES = IRIS - VIRGINICA

Also, PROC UNIVARIATE can create description free 95% confident intervals on many
different percentiles. This can be helpful when describing data that does not follow a normal
distribution.
Median 95% confidence interval for iris –

Quantiles (Definition 5)
setosa by petallengthcm
Level Quantile 95% Confidence Limits
Distribution Free
Quantiles (Definition 5)
50% Median 1.5 1.4 1.5
Distribution Free
50% Median 4.35 3.7 4.1
Median 95% confidence interval for iris – versicolor by petallength
Quantiles (Definition 5) Median 95% confidence interval for iris –

virginica by petallength
Distribution Free
The level, indicates the percentile for
50% Median 5.55 5.3 5.7
which interval is computed. Designates
the quantile corresponding to each percentile. Since 50% of the data falls above and below
this point, the quantile value in this table also corresponds to the median for each group.
Boxplots
Side by side boxplots are provided by the SGPLOT procedure.
*produce boxplots;
proc sgplot data=work.iris;
title 'Comparison of three species in the iris';
vbox petallengthcm /category=species;
run;
There is an indication that petal length of the setosa may be different from the petal lengths of
versicolor and virginica. From this we can conclude that there is difference among petal
length for different iris species.
ii) NPAR1WAY PROCEDURE
So far, we have determined that data is normally distributed. Next step is to officially
perform a Kruskal – Wallis test to determine which iris species is more effective. The
NPAR1WAY procedure performs this test.
*perform the Kruskal-wallis test;

proc npar1way data=work.iris wilcoxon dscf;
class species;
var petallengthcm;
run;
The p-value corresponding to the two-sided test based on the chi-square distribution. The p-
value for our test is <0.0001. Our alpha is 0.05, so from here we would reject null hypothesis
and conclude that there is a statistical significant difference.
We have concluded that the petal lengths are normally distributed. Also outlier exists for
setosa and versicolor.
The difference between the median values of each species iris-setosa and iris-versicolor is
2.85 (p<.0001), iris-setosa and iris-virginica is 4.05 (p<.0001) and iris-versicolor and iris-
virginica is 1.2 (p<.0001).
7.0 CONCLUSION
Using this Breast Cancer Wisconsin (Diagnostic) Data Set the sample consists of 569 dataset
and 7 variables used for One Sample And Two Independent Sample Of Non Parametric
Test. Then ,the second data is weight loss using diet consists of 78 dataset and 7 variables
used for This data used for Paired Data Of Non Parametric Test . The third data is Iris
Species consist 150 dataset and 6 variable for Kruskal Wallis One Way Anova .Univariate
and Npar1way in SAS because PROC UNIVARIATE is a BASE SAS procedure which goes
beyond the functionality of PROC MEANS. This procedure is extremely useful for
examination of distributions of analysis variables and the production of high resolution
graphics for dataset. This tutorial has just scratched the surface of the power of PROC
UNIVARIATE and the author’s hope is that from these simple examples that the SAS user
will use it as a guide to extend their knowledge of PROC UNIVARIATE and experiment
with other uses for this very versatile procedure. The SAS procedure PROC NPAR1WAY
allowed us to investigate the relationship between the continuous variable which is radius
mean class by diagnosis and also diagnosis for the impact area. Using PROC NPAR1WAY,
p-values can be computed for one-sample and two-sample tests using Wilcoxon Sign-Ranked
Test and Wilcoxon-Mann-Whitney or for multi-sample tests using Kruskal-Wallis but cannot
be used for Sign Ranked Test and One Sample t test.
8.0 REFERENCE
i) The NPAR1WAY Procedure. Retrieved from
https://documentation.sas.com/?
cdcld=pgmsascdc&ccVersion=9.4_3.4&docsetId=statug&docsetTarget=statug_n
par1way_syntax01.htm&
ii) Understanding Q-Q Plots. Retrieved from
https://data.library.virginia.edu/understanding-q-q-plots/
iii) Chapter 7 ELSTAT : Comparing Paired Groups – SAS Institute. Retrieved from
https://www.sas.com/storefront/aux/en/spelstat/62097_excerpt.pdf
iv) Perusing, Choosing, and Not Mis-using: Non-parametric vs. Parametric Tests in
SAS ® Venita DePuy and Paul A. Pappas, Duke Clinical Research Institute,
Durham, NC. Retrieved from https://lexjansen.com/nesug/nesug04/an/an10.pdf
v) The NPAR1WAY Procedure - Academics | | WPI. Retrieved from
http://www.math.wpi.edu/saspdf/stat/chap47.pdf
vi) One Sample Ttest | Introduction to Statistic | JPM.Retrieved from
https://www.jmp.com/en_nl/statistics-knowledge-portal/t-test/one-sample-t-
test.html/
vii) Univariate Analysis. Retrieved from
https://www.google.com/url?
sa=t&source=web&rct=j&url=https://www.slideshare.net/mobile/drswaroopsoumya/u
nivariate-analys&ved=2ahUKEwjs2b-
UnPftAhUgwzgGHW_dAdMQFjAFegQIAhAE&usg=AOvVaw09oSLeq6Hq0Q2zDDJKyH8l
9.0. APPENDIX
Appendix 1.1
Analysis of Prediction Breast Cancer Changes
The UNIVARIATE Procedure

Variable: diagnosispredict
Frequency Counts
Value Count Percents
Cell Cum
- 1 0.2 0.2
22.590
- 1 0.2 0.4
21.820
- 1 0.2 0.5
18.457
- 1 0.2 0.7
18.445
- 1 0.2 0.9
18.170
- 1 0.2 1.1
18.030
- 1 0.2 1.2
17.761
- 1 0.2 1.4
17.749
- 1 0.2 1.6
17.700
- 1 0.2 1.8
17.660
- 1 0.2 1.9
17.340
- 1 0.2 2.1
17.320
- 1 0.2 2.3
16.780
- 1 0.2 2.5
Frequency Counts
Cell Cum
16.660
- 1 0.2 2.6
16.230
- 1 0.2 2.8
16.100
- 1 0.2 3.0
16.040
- 1 0.2 3.2
15.760
- 1 0.2 3.3
15.400
- 1 0.2 3.5
14.850
- 1 0.2 3.7
14.750
- 1 0.2 3.9
14.680
- 1 0.2 4.0
14.570
- 1 0.2 4.2
14.520
- 1 0.2 4.4
13.630
- 1 0.2 4.6
13.390
- 1 0.2 4.7
13.100
- 1 0.2 4.9
13.000
- 1 0.2 5.1
12.940
- 1 0.2 5.3
12.900
- 1 0.2 5.4
12.890
Frequency Counts
Cell Cum
- 1 0.2 5.6
12.750
- 1 0.2 5.8
12.607
- 1 0.2 6.0
12.590
- 1 0.2 6.2
12.580
- 1 0.2 6.3
12.481
- 1 0.2 6.5
12.420
- 1 0.2 6.7
12.382
- 1 0.2 6.9
12.295
- 1 0.2 7.0
12.283
- 1 0.2 7.2
12.130
- 1 0.2 7.4
12.040
- 1 0.2 7.6
11.580
- 1 0.2 7.7
11.545
- 1 0.2 7.9
11.480
- 1 0.2 8.1
11.370
- 1 0.2 8.3
11.350
- 1 0.2 8.4
11.310
- 1 0.2 8.6
11.250
Frequency Counts
Cell Cum
- 1 0.2 8.8
11.180
- 1 0.2 9.0
11.160
- 1 0.2 9.1
11.080
- 1 0.2 9.3
10.680
- 2 0.4 9.7
10.670
- 1 0.2 9.8
10.660
- 1 0.2 10.0
10.620
- 1 0.2 10.2
10.520
- 1 0.2 10.4
10.400
- 1 0.2 10.5
10.390
- 1 0.2 10.7
10.280
- 1 0.2 10.9
10.250
- 1 0.2 11.1
10.210
- 1 0.2 11.2
10.190
- 1 0.2 11.4
10.153
- 1 0.2 11.6
10.070
- 1 0.2 11.8
10.020
- 1 0.2 12.0
10.003
Frequency Counts
Cell Cum
-9.970 1 0.2 12.1
-9.940 1 0.2 12.3
-9.920 1 0.2 12.5
-9.880 1 0.2 12.7
-9.858 1 0.2 12.8
-9.840 1 0.2 13.0
-9.700 1 0.2 13.2
-9.680 1 0.2 13.4
-9.680 1 0.2 13.5
-9.670 1 0.2 13.7
-9.657 1 0.2 13.9
-9.650 1 0.2 14.1
-9.620 1 0.2 14.2
-9.610 1 0.2 14.4
-9.600 1 0.2 14.6
-9.524 1 0.2 14.8
-9.520 1 0.2 14.9
-9.500 1 0.2 15.1
-9.440 1 0.2 15.3
-9.430 2 0.4 15.6
-9.400 1 0.2 15.8
-9.380 1 0.2 16.0
-9.378 1 0.2 16.2
-9.280 1 0.2 16.3
-9.220 1 0.2 16.5
-9.210 1 0.2 16.7
-9.160 1 0.2 16.9
-8.960 1 0.2 17.0
-8.900 1 0.2 17.2
Frequency Counts
Cell Cum
-8.884 1 0.2 17.4
-8.880 1 0.2 17.6
-8.823 1 0.2 17.8
-8.820 1 0.2 17.9
-8.820 1 0.2 18.1
-8.810 1 0.2 18.3
-8.800 1 0.2 18.5
-8.730 1 0.2 18.6
-8.700 1 0.2 18.8
-8.690 1 0.2 19.0
-8.644 1 0.2 19.2
-8.640 1 0.2 19.3
-8.640 2 0.4 19.7
-8.570 1 0.2 19.9
-8.520 1 0.2 20.0
-8.520 1 0.2 20.2
-8.500 1 0.2 20.4
-8.490 1 0.2 20.6
-8.432 1 0.2 20.7
-8.430 1 0.2 20.9
-8.350 1 0.2 21.1
-8.350 2 0.4 21.4
-8.340 1 0.2 21.6
-8.301 1 0.2 21.8
-8.280 1 0.2 22.0
-8.270 1 0.2 22.1
-8.250 1 0.2 22.3
-8.156 1 0.2 22.5
-8.150 1 0.2 22.7
Frequency Counts
Cell Cum
-8.140 1 0.2 22.8
-8.140 1 0.2 23.0
-8.130 1 0.2 23.2
-8.120 1 0.2 23.4
-8.120 1 0.2 23.6
-8.106 1 0.2 23.7
-8.100 1 0.2 23.9
-8.090 1 0.2 24.1
-7.900 1 0.2 24.3
-7.820 1 0.2 24.4
-7.780 1 0.2 24.6
-7.750 1 0.2 24.8
-7.750 1 0.2 25.0
-7.730 1 0.2 25.1
-7.650 1 0.2 25.3
-7.630 1 0.2 25.5
-7.600 1 0.2 25.7
-7.540 1 0.2 25.8
-7.520 1 0.2 26.0
-7.480 1 0.2 26.2
-7.470 2 0.4 26.5
-7.440 1 0.2 26.7
-7.430 1 0.2 26.9
-7.410 1 0.2 27.1
-7.394 1 0.2 27.2
-7.390 1 0.2 27.4
-7.380 1 0.2 27.6
-7.350 1 0.2 27.8
-7.340 1 0.2 27.9
Frequency Counts
Cell Cum
-7.330 1 0.2 28.1
-7.320 1 0.2 28.3
-7.300 1 0.2 28.5
-7.280 1 0.2 28.6
-7.270 1 0.2 28.8
-7.234 1 0.2 29.0
-7.230 1 0.2 29.2
-7.220 1 0.2 29.3
-7.220 1 0.2 29.5
-7.213 1 0.2 29.7
-7.210 1 0.2 29.9
-7.200 1 0.2 30.1
-7.190 1 0.2 30.2
-7.130 2 0.4 30.6
-7.120 2 0.4 30.9
-7.120 1 0.2 31.1
-7.104 1 0.2 31.3
-7.100 1 0.2 31.5
-7.000 1 0.2 31.6
-6.980 1 0.2 31.8
-6.960 1 0.2 32.0
-6.950 1 0.2 32.2
-6.950 1 0.2 32.3
-6.940 1 0.2 32.5
-6.920 1 0.2 32.7
-6.890 1 0.2 32.9
-6.890 1 0.2 33.0
-6.880 1 0.2 33.2
-6.860 1 0.2 33.4
Frequency Counts
Cell Cum
-6.840 1 0.2 33.6
-6.830 1 0.2 33.7
-6.810 1 0.2 33.9
-6.760 1 0.2 34.1
-6.750 1 0.2 34.3
-6.720 1 0.2 34.4
-6.700 1 0.2 34.6
-6.690 1 0.2 34.8
-6.670 1 0.2 35.0
-6.660 1 0.2 35.1
-6.630 1 0.2 35.3
-6.612 1 0.2 35.5
-6.560 1 0.2 35.7
-6.550 1 0.2 35.9
-6.510 1 0.2 36.0
-6.480 1 0.2 36.2
-6.460 1 0.2 36.4
-6.449 1 0.2 36.6
-6.380 1 0.2 36.7
-6.370 1 0.2 36.9
-6.343 1 0.2 37.1
-6.330 1 0.2 37.3
-6.320 1 0.2 37.4
-6.300 1 0.2 37.6
-6.260 1 0.2 37.8
-6.230 1 0.2 38.0
-6.210 1 0.2 38.1
-6.200 1 0.2 38.3
-6.180 1 0.2 38.5
Frequency Counts
Cell Cum
-6.130 1 0.2 38.7
-6.080 1 0.2 38.8
-6.060 1 0.2 39.0
-6.030 1 0.2 39.2
-5.930 1 0.2 39.4
-5.928 1 0.2 39.5
-5.900 1 0.2 39.7
-5.880 1 0.2 39.9
-5.870 1 0.2 40.1
-5.833 1 0.2 40.2
-5.820 1 0.2 40.4
-5.810 2 0.4 40.8
-5.790 1 0.2 40.9
-5.779 1 0.2 41.1
-5.770 1 0.2 41.3
-5.752 1 0.2 41.5
-5.750 1 0.2 41.7
-5.740 1 0.2 41.8
-5.730 1 0.2 42.0
-5.710 1 0.2 42.2
-5.690 1 0.2 42.4
-5.660 1 0.2 42.5
-5.650 1 0.2 42.7
-5.640 1 0.2 42.9
-5.630 1 0.2 43.1
-5.620 1 0.2 43.2
-5.620 1 0.2 43.4
-5.610 1 0.2 43.6
-5.609 1 0.2 43.8
Frequency Counts
Cell Cum
-5.580 1 0.2 43.9
-5.580 1 0.2 44.1
-5.570 1 0.2 44.3
-5.540 1 0.2 44.5
-5.530 1 0.2 44.6
-5.520 1 0.2 44.8
-5.490 2 0.4 45.2
-5.480 2 0.4 45.5
-5.470 2 0.4 45.9
-5.460 1 0.2 46.0
-5.450 1 0.2 46.2
-5.410 2 0.4 46.6
-5.400 1 0.2 46.7
-5.390 1 0.2 46.9
-5.390 1 0.2 47.1
-5.380 1 0.2 47.3
-5.380 1 0.2 47.5
-5.350 1 0.2 47.6
-5.340 1 0.2 47.8
-5.340 2 0.4 48.2
-5.330 1 0.2 48.3
-5.330 1 0.2 48.5
-5.320 1 0.2 48.7
-5.290 1 0.2 48.9
-5.280 1 0.2 49.0
-5.140 1 0.2 49.2
-5.090 1 0.2 49.4
-5.050 1 0.2 49.6
-5.030 1 0.2 49.7
Frequency Counts
Cell Cum
-5.020 1 0.2 49.9
-4.970 1 0.2 50.1
-4.960 1 0.2 50.3
-4.950 1 0.2 50.4
-4.880 1 0.2 50.6
-4.850 2 0.4 51.0
-4.840 1 0.2 51.1
-4.840 1 0.2 51.3
-4.810 1 0.2 51.5
-4.790 1 0.2 51.7
-4.780 1 0.2 51.8
-4.770 1 0.2 52.0
-4.740 1 0.2 52.2
-4.730 1 0.2 52.4
-4.710 1 0.2 52.5
-4.687 1 0.2 52.7
-4.660 1 0.2 52.9
-4.630 1 0.2 53.1
-4.610 1 0.2 53.3
-4.605 1 0.2 53.4
-4.570 1 0.2 53.6
-4.560 1 0.2 53.8
-4.550 1 0.2 54.0
-4.530 1 0.2 54.1
-4.529 1 0.2 54.3
-4.500 2 0.4 54.7
-4.490 1 0.2 54.8
-4.490 1 0.2 55.0
-4.480 1 0.2 55.2
Frequency Counts
Cell Cum
-4.450 1 0.2 55.4
-4.440 1 0.2 55.5
-4.410 1 0.2 55.7
-4.360 1 0.2 55.9
-4.250 1 0.2 56.1
-4.230 2 0.4 56.4
-4.220 1 0.2 56.6
-4.210 1 0.2 56.8
-4.170 1 0.2 56.9
-4.170 1 0.2 57.1
-4.140 1 0.2 57.3
-4.100 1 0.2 57.5
-4.020 2 0.4 57.8
-3.960 2 0.4 58.2
-3.950 1 0.2 58.3
-3.940 2 0.4 58.7
-3.910 1 0.2 58.9
-3.900 1 0.2 59.1
-3.890 1 0.2 59.2
-3.820 1 0.2 59.4
-3.790 1 0.2 59.6
-3.780 1 0.2 59.8
-3.770 3 0.5 60.3
-3.760 1 0.2 60.5
-3.760 1 0.2 60.6
-3.750 1 0.2 60.8
-3.740 1 0.2 61.0
-3.710 1 0.2 61.2
-3.660 1 0.2 61.3
Frequency Counts
Cell Cum
-3.650 1 0.2 61.5
-3.650 1 0.2 61.7
-3.630 1 0.2 61.9
-3.602 1 0.2 62.0
-3.600 1 0.2 62.2
-3.580 1 0.2 62.4
-3.550 1 0.2 62.6
-3.530 1 0.2 62.7
-3.510 1 0.2 62.9
-3.510 1 0.2 63.1
-3.470 1 0.2 63.3
-3.464 1 0.2 63.4
-3.460 1 0.2 63.6
-3.420 1 0.2 63.8
-3.410 1 0.2 64.0
-3.380 1 0.2 64.1
-3.360 1 0.2 64.3
-3.340 1 0.2 64.5
-3.310 1 0.2 64.7
-3.260 1 0.2 64.9
-3.250 2 0.4 65.2
-3.230 1 0.2 65.4
-3.172 1 0.2 65.6
-3.150 1 0.2 65.7
-3.120 1 0.2 65.9
-3.110 1 0.2 66.1
-3.100 1 0.2 66.3
-3.090 1 0.2 66.4
-3.070 2 0.4 66.8
Frequency Counts
Cell Cum
-3.060 1 0.2 67.0
-3.020 1 0.2 67.1
-3.020 1 0.2 67.3
-3.010 2 0.4 67.7
-2.950 1 0.2 67.8
-2.936 1 0.2 68.0
-2.930 1 0.2 68.2
-2.900 2 0.4 68.5
-2.860 1 0.2 68.7
-2.830 1 0.2 68.9
-2.810 1 0.2 69.1
-2.770 1 0.2 69.2
-2.750 1 0.2 69.4
-2.720 1 0.2 69.6
-2.700 1 0.2 69.8
-2.670 1 0.2 69.9
-2.660 1 0.2 70.1
-2.650 1 0.2 70.3
-2.630 1 0.2 70.5
-2.620 3 0.5 71.0
-2.620 1 0.2 71.2
-2.620 1 0.2 71.4
-2.610 1 0.2 71.5
-2.580 1 0.2 71.7
-2.550 1 0.2 71.9
-2.530 2 0.4 72.2
-2.520 1 0.2 72.4
-2.450 1 0.2 72.6
-2.380 1 0.2 72.8
Frequency Counts
Cell Cum
-2.370 1 0.2 72.9
-2.340 1 0.2 73.1
-2.340 1 0.2 73.3
-2.280 2 0.4 73.6
-2.270 1 0.2 73.8
-2.232 1 0.2 74.0
-2.210 1 0.2 74.2
-2.190 1 0.2 74.3
-2.120 1 0.2 74.5
-2.110 2 0.4 74.9
-2.080 1 0.2 75.0
-2.060 1 0.2 75.2
-2.060 1 0.2 75.4
-2.030 1 0.2 75.6
-2.020 1 0.2 75.7
-2.010 1 0.2 75.9
-2.000 1 0.2 76.1
-1.990 1 0.2 76.3
-1.980 1 0.2 76.4
-1.960 1 0.2 76.6
-1.960 1 0.2 76.8
-1.950 1 0.2 77.0
-1.940 2 0.4 77.3
-1.930 1 0.2 77.5
-1.900 1 0.2 77.7
-1.880 2 0.4 78.0
-1.800 1 0.2 78.2
-1.790 1 0.2 78.4
-1.780 1 0.2 78.6
Frequency Counts
Cell Cum
-1.760 1 0.2 78.7
-1.750 2 0.4 79.1
-1.730 1 0.2 79.3
-1.700 1 0.2 79.4
-1.690 1 0.2 79.6
-1.640 2 0.4 80.0
-1.560 1 0.2 80.1
-1.560 1 0.2 80.3
-1.560 1 0.2 80.5
-1.490 2 0.4 80.8
-1.440 1 0.2 81.0
-1.370 1 0.2 81.2
-1.340 1 0.2 81.4
-1.330 1 0.2 81.5
-1.290 1 0.2 81.7
-1.240 2 0.4 82.1
-1.220 1 0.2 82.2
-1.170 2 0.4 82.6
-1.100 1 0.2 82.8
-1.080 1 0.2 83.0
-1.060 1 0.2 83.1
-0.980 1 0.2 83.3
-0.830 1 0.2 83.5
-0.820 1 0.2 83.7
-0.790 1 0.2 83.8
-0.760 1 0.2 84.0
-0.740 1 0.2 84.2
-0.680 1 0.2 84.4
-0.670 1 0.2 84.5
Frequency Counts
Cell Cum
-0.650 1 0.2 84.7
-0.650 1 0.2 84.9
-0.610 1 0.2 85.1
-0.600 1 0.2 85.2
-0.590 1 0.2 85.4
-0.510 1 0.2 85.6
-0.480 1 0.2 85.8
-0.460 1 0.2 85.9
-0.440 1 0.2 86.1
-0.370 1 0.2 86.3
-0.370 1 0.2 86.5
-0.340 1 0.2 86.6
-0.310 1 0.2 86.8
-0.270 1 0.2 87.0
-0.250 1 0.2 87.2
-0.230 1 0.2 87.3
-0.200 1 0.2 87.5
-0.090 1 0.2 87.7
-0.070 1 0.2 87.9
-0.060 1 0.2 88.0
-0.010 1 0.2 88.2
0.020 1 0.2 88.4
0.070 1 0.2 88.6
0.090 1 0.2 88.8
0.100 1 0.2 88.9
0.110 1 0.2 89.1
0.110 1 0.2 89.3
0.120 1 0.2 89.5
0.220 1 0.2 89.6
Frequency Counts
Cell Cum
0.310 1 0.2 89.8
0.490 1 0.2 90.0
0.500 2 0.4 90.3
0.520 1 0.2 90.5
0.530 1 0.2 90.7
0.550 1 0.2 90.9
0.580 1 0.2 91.0
0.590 1 0.2 91.2
0.600 1 0.2 91.4
0.620 1 0.2 91.6
0.630 1 0.2 91.7
0.640 2 0.4 92.1
0.710 1 0.2 92.3
0.740 1 0.2 92.4
0.760 1 0.2 92.6
0.770 1 0.2 92.8
0.790 1 0.2 93.0
0.820 1 0.2 93.1
0.870 1 0.2 93.3
0.920 1 0.2 93.5
0.970 1 0.2 93.7
1.020 1 0.2 93.8
1.050 1 0.2 94.0
1.080 1 0.2 94.2
1.090 1 0.2 94.4
1.150 1 0.2 94.6
1.180 1 0.2 94.7
1.220 1 0.2 94.9
1.230 1 0.2 95.1
Frequency Counts
Cell Cum
1.230 1 0.2 95.3
1.280 1 0.2 95.4
1.440 1 0.2 95.6
1.540 1 0.2 95.8
1.780 1 0.2 96.0
1.900 1 0.2 96.1
1.980 1 0.2 96.3
2.360 1 0.2 96.5
2.520 1 0.2 96.7
2.590 1 0.2 96.8
2.600 1 0.2 97.0
2.800 1 0.2 97.2
3.030 1 0.2 97.4
3.250 1 0.2 97.5
3.260 1 0.2 97.7
3.290 1 0.2 97.9
3.570 1 0.2 98.1
3.610 1 0.2 98.2
4.050 1 0.2 98.4
4.450 1 0.2 98.6
4.460 1 0.2 98.8
4.620 1 0.2 98.9
5.350 1 0.2 99.1
5.950 1 0.2 99.3
6.270 1 0.2 99.5
7.610 1 0.2 99.6
8.270 1 0.2 99.8
9.640 1 0.2 100.0

New Univariate and Npar1way

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

New Univariate and Npar1way

Uploaded by

Copyright:

Available Formats

CAWANGAN KELANTAN KAMPUS KOTA BHARU

FACULTY OF COMPUTER AND MATHEMATICAL SCIENCES

BACHELOR OF SCIENCE STATISTICS (CS241)

GROUP PROJECT: SAS PROGRAMMING (STA610) TOPIC:

PREPARED FOR: SIR MOHD NOOR AZAM BIN NAFI

AMIR HAMZAH BIN ABDUL HALIM 2018801466

NUR FATIHAH ANIS BINTI MAT JUNOH 2018410666

MIMI NOR SHAZLEEN BINTI HANAFIAH 2018695498

NUR AZMINA HUSNA BINTI NORAMINASRUN 2018435767

The UNIVARIATE procedure provides a variety of descriptive measures, high-resolution

The data used are listed below:

The data retrieve from:

Nonparametric or distribution-free tests such as those available in PROC NPAR1WAY

The following statements are available in the NPAR1WAY procedure:

2.2 USES OF NPAR1WAY PROCEDURE

3.0 ADVANTAGES AND DISADVANTAGES

When no other statements specified, a variety of statistics will be generated that

4.0 STASTISTICAL METHODS

Comparison of two groups t-test (paired t-test)

Comparison of more then two groups Analysis of variance (ANOVA)

Relationship of 2 + quantitative variables Correlation and regression

Dependence of two categorical variables Contingency tables

- Two independent samples

MORE THAN TWO GROUP

VARIOUS ANOVA MODEL

- You can account for various hierarchical structures of your data

RELATIONSHIP OF TWO (OR MORE) QUANTITATIVE VARIABLES

- The two compared variables are (considered) to be “equal” – calculate correlation

4.2 NPAR1WAY PROCEDURE

- PROC NPAR1WAY defaults numerous nonparametric location and scale tests

5.2 NPAR1WAY PROCEDURE

proc contents data=work.Cancer;

Figure 1.1 : The contents procedure in data

proc print data=work.Cancer;

Figure 1.2: The list of variables and attributes

The one sample of non – parametric are:

Use this to perform a nonparametric two-sided test of location of a single sample of

It can be choosen to perform a one-sided or two-sided test from the Type of test options.

proc means data=work.Cancer;

title 'The average of total radius_mean Diagnosis of Breast Cancer';

Figure 1.3 The average of total radius_mean Diagnosis of Breast Cancer

Using the UNIVARIATE TEST, it is very important to used in statistics to

title 'The radius mean for diagnosis Malignant';

title 'The radius mean for diagnosis Benign';

Figure 1.5 Output for diagnosis for Benign

title 'Analysis of Prediction Breast Cancer Changes';

title 'Sign Rank Test';

title 'Sign Rank Test';

Finding the p-value

ODS GRAPHICS ON;

a) Analysis of Radius SE Breast Cancer

title 'Analysis of Radius SE Breast Cancer';

b) Analysis of Worst Breast Cancer

title 'Analysis of Worst Breast Cancer';

/*Alt: mean radius_mean <569*/

proc ttest data=work.Cancer H0=569 SIDES=L alpha=0.05;

title 'One Sample t-test with proc ttest';

proc ttest data=work.Cancer H0=569 PLOTS(SHOWH0) alpha=0.05;

title 'One Sample t-test with proc ttest';

Figure 1.11 :Q-Q PLOT for radius_mean

/*Alt: mean radius_mean >569*/

proc ttest data=work.Cancer H0=569 SIDES=U alpha=0.05;

title 'One Sample t-test with proc ttest';

Figure 1.12 : Output of one sample t-test with proc ttest

/Alt: mean radius_mean <569/

/Alt: mean radius_mean >569/