2 views

Uploaded by Rutwik

- Statistics Memo
- Ph Stat Help
- 7763-26558-1-PB (the Effects of Brain-Based Learning on Academic Achievement and Retention of Knowledge in Science Course)
- Skripsi Bahasa Inggris Improving Reading
- UT Dallas Syllabus for sci5340.0i1.11f taught by Lynn Melton (melton)
- Robot Characterizations 1 2
- Volunteer Tourists Length of Stay in Ghana Influence of Socio Demographic and Trip Attributes
- Anova Regression -SPSS Output
- BUS 308 HELPS Learn by Doing/bus308helps.com
- How to Analyze Your Data
- 0192I - PPT9 - R2
- testul T
- spss20p2
- Ankit Stats
- SSRN-id1905423
- anggi 22
- hasil spss kat numerik.docx
- The Influence of Basketball Dribbling on Repeated High-Intensity Intermittent Run
- c5009en
- stathema

You are on page 1of 44

SAS

FOR

STATISTICAL ANALYSIS

OVERVIEW

SAS/STAT Software

Component of the SAS System

Provides comprehensive statistical tools for a wide range of statistical

analyses, including analysis of variance, regression, categorical data

analysis, multivariate analysis, survival analysis.

In addition to 54 procedures for statistical analysis, SAS/STAT

software also includes the Market Research Application (MRA), a

point-and-click interface to commonly used techniques in market

research..

BASIC STATISTICS

Mean

Median

Mode

Dispersion

Standard Deviation

Range

Percentiles

Quartiles

MEAN

An arithmetic average

Procedure for computing

Add up the numbers

Divide by the number of observations

Example

+ 90)

=535

6

=89.1

MEDIAN

Mid value

Procedure For Computing

Sorting the data

For Even number of observation

Median= average of(n/2)th obs and (n/2)+1th obs

For Odd number of observation

Median=(n/2)th obs

Example: 80 85 90 90 90 100

n=6

Median = (3rd obs+4thobs)/2

= (90+90)/2

= 90

MODE

Most frequently occurring observation

The value that is repeated most often in the data set.

Example:

data = 80 85 90 90 90 100

since there are 3 90's

Mode = 90

DISPERSION

Standard Deviation

Squared root of the average of the Squared distances of the

observation from the mean.

Range

Difference between highest and lowest observed value.

Percentiles

Divide the data into 100 equal parts.

Quartiles

Divide the data into 4 equal parts.

PROBABILITY

Event :

One or more of the possible outcome of doing something.

Example:

The event that we'll get over an inch of rain tomorrow, which

reflects the likelihood that we will get this much rain .

PROBABILITY

A probability is a number from 0 to 1.

will occur.

If an event having probability 1, this indicates that this event always

will occur.

This means that it is just as likely for the event to occur as for the

event to not occur.

HYPOTHESIS TESTING

Procedure for making rational decision about the reality of effects.

Setting up and testing hypotheses is an essential part of statistical

inference.

Example :

Claiming that a new drug is better than the current drug for treatment of the

same symptoms

In each problem considered ,the question of interest is simplified into two

competing claims/hypothesis.

Null Hypothesis(Ho)

Alternate Hypothesis(H1).

10

NULL HYPOTHESIS

The hypothesis that there were no effects is called the NULL

HYPOTHESIS.(Ho)

Note : unlike geometry, we cannot prove the effects are real, rather we may

decide the effects are real.

Example :

In a clinical trial of a new drug, the null hypothesis might be that the new

drug is no better, on average, than the current drug.

Ho: there is no difference between the two drugs on average.

11

ALTERNATIVE HYPOTHESIS

definition

Example :

In a clinical trial of a new drug, the alternative hypothesis might be that

the new drug has a different effect, on average, compared to that of the

current drug.

H1: the two drugs have different effects, on average.

OR

H1: the new drug is better than the current drug, on average.

12

P-VALUE

Probability of wrongly rejecting the null hypothesis if it is in fact true.

The p-value is compared with the significance level ,

if it is smaller, the result is significant.

i.e. If p-value <0.05

then it indicates the strength of evidence for say, rejecting the null

hypothesis H0,

rather than concluding 'reject H0' or 'do not reject H0'.

13

SIGNIFICANCE LEVEL

"Does a 5 percent significance level mean there is only a 5% chance that

my results are significant?"

The significance level is actually the alpha.(

because of random variation (luck).

14

T TEST

T TEST is performed on three types of samples.

One sample

Two samples

Paired observations

15

T TEST

One

sample t-test

given number.

The two-sample t test compares the mean of the first sample

minus the mean of the second sample to a given number.

Paired t-test

The paired observations t test compares the mean of the

differences in the observations to a given number.

16

REGRESSION ANALYSIS

Regression analysis is the analysis of the relationship

between one variable and another set of variables.

Where

yi is the response variable

xi is a regressor variable

0 and 1 are unknown parameters to be estimated

i is an error term.

17

ANALYSIS OF VARIANCE

Analysis of variance (ANOVA) is a technique for analyzing experimental

data in which one or more response (or dependent or simply Y)

variables are measured under various conditions identified by one or

more classification variables.

Example :

An experiment may measure weight change (the dependent

variable) for men and women who participated in three different

weight-loss programs. The six cells of the design are formed by

the six combinations of sex (men, women) and program (A, B, C).

18

SAS/STAT

There are 54 procedures for statistical analysis.

Analysis of variance

Generalized linear models

Categorical data analysis

Mixed models

Survival analysis

Multivariate techniques

Nonparametric analysis

Psychometric analysis

19

PROC T TEST

.

CLASS variable ;

PAIRED variables ;

BY variables ;

VAR variables ;

FREQ variable ;

WEIGHT variable ;

No statement can be used more than once. There is no restriction on the order of

the statements after the PROC statement.

20

COMPARSION BETWEEN

PROC GLM AND PROC ANOVA

GLM procedure can analyze for both

balanced and unbalanced data.

handle balanced data (that is, data

with equal numbers of observations

for every combination of the

classification factors).

.

PROC ANOVA takes into account the special structure of a balanced design,

it is faster and uses less storage than PROC GLM for balanced data

21

COMPARSION BETWEEN

PROC GLM AND PROC MIXED

In Random statement ,PROC GLM

effects are treated as fixed and

computes expected mean squares.

computes REML and ML estimates of

variance parameters

MIXED is used to specify covariance

structures for repeated measurements

on subjects.

GLM is used to specify various

transformations with which to

conduct the traditional univariate or

multivariate tests.

PROC MIXED is more flexible and more widely applicable than

either the univariate or multivariate approaches

22

PROC ANOVA

PROC ANOVA < options > ;

CLASS variables ;

MODEL dependents=effects < / options > ;

ABSORB variables ;

BY variables ;

FREQ variable ;

MANOVA < test-options >< / detail-options > ;

MEANS effects < / options > ;

REPEATED factor-specification < / options > ;

TEST < H=effects > E=effect ;

23

PROC MIXED

The primary assumptions underlying the analyses performed by PROC

MIXED are as follows:

The data are normally distributed .

The means (expected values) of the data are linear in terms of a certain

set of parameters.

The variances and covariances of the data are in terms of a different

set of parameters, and they exhibit a structure matching one of those

available in PROC MIXED

24

PROC MIXED

The following statements are available in PROC MIXED.

PROC MIXED < options > ;

BY variables ;

CLASS variables ;

ID variables ;

MODEL dependent = < fixed-effects > < / options > ;

RANDOM random-effects < / options > ;

REPEATED < repeated-effect > < / options > ;

PARMS (value-list) ... < / options > ;

25

PRIOR < distribution > < / options > ;

CONTRAST 'label' < fixed-effect values ... >

< | random-effect values ... > , ... < /

options > ;

ESTIMATE 'label' < fixed-effect values ... >

< | random-effect values ... >< /

options > ;

LSMEANS fixed-effects < / options > ;

MAKE 'table' OUT=SAS-data-set ;

WEIGHT variable ;

PROC GLM

GLM procedure can be used for many different analyses, including

Simple regression

Multiple regression

Analysis of variance (ANOVA), especially for unbalanced data

Analysis of covariance

Response-surface models

Weighted regression

Polynomial regression

Partial correlation

Multivariate analysis of variance (MANOVA)

Repeated measures analysis of variance

27

PROC GLM

The following statements are available in PROC GLM.

PROC GLM < options > ;

CLASS variables ;

MODEL dependents=independents < / options > ;

ABSORB variables ;

BY variables ;

FREQ variable ;

ID variables ;

WEIGHT variable ;

28

PROC GLM(cont.)

CONTRAST 'label' effect values < ... effect values > < / options > ;

ESTIMATE 'label' effect values < ... effect values > < / options > ;

LSMEANS effects < / options > ;

MANOVA < test-options >< / detail-options > ;

MEANS effects < / options > ;

OUTPUT < OUT=SAS-data-set >

keyword=names < ... keyword=names > < / option > ;

RANDOM effects < / options > ;

REPEATED factor-specification < / options > ;

TEST < H=effects > E=effect < / options > ;

29

EXAMPLE ON GLM

data exp;

input A $ B $ Y @@;

datalines;

A1 B1 12 A1 B1 14

A1 B2 11 A1 B2 9

A2 B1 20 A2 B1 18

A2 B2 17

;

proc glm;

class A B;

model Y=A B A*B;

run;

30

EXAMPLE ON GLM

31

PROC FREQ

FREQ procedure produces one-way to n-way frequency and

crosstabulation (contingency) tables.

The statistics for contingency tables include

Chi-square tests and measures

Measures of association

Risks (binomial proportions) and risk differences for 22 tables

Odds ratios and relative risks for 22 tables

Tests for trend

Tests and measures of agreement

32

PROC FREQ

PROC FREQ < options > ;

BY variables ;

EXACT statistic-options < / computation-options > ;

OUTPUT < OUT=SAS-data-set > options ;

TABLES requests < / options > ;

TEST options ;

WEIGHT variable

33

PROC TABULATE

Simple but powerful methods to create tabular reports .

Flexibility in classifying the values of variables and establishing

hierarchical relationships between the variables.

Mechanisms for labeling and formatting variables and proceduregenerated statistics.

34

PROC TABULATE

PROC TABULATE <option(s)BY <DESCENDING> variable-1

<...<DESCENDING> variable-n>

<NOTSORTED>;

CLASS variable(s) </ options>;

CLASSLEV variable(s) / style =<style-element-name | <PARENT>> <[styleattribute-specification(s)]>;

FREQ variable;

KEYLABEL keyword-1='description-1'

<...keyword-n='description-n'>;

KEYWORD keyword(s) / style =<style-element-name | <PARENT>> <[styleattribute-specification(s)]>;

TABLE <<page-expression,> row-expression,> column-expression </ tableoption(s)>;

VAR analysis-variable(s)</ options>;

WEIGHT variable;

>;

35

PROC UNIVARIATE

The UNIVARIATE procedure provides data summarization tools, highresolution graphics displays, and information on the distribution of

numeric variables.

calculates descriptive statistics based on moments

calculates the median, mode, range, and quantiles

calculates the robust estimates of location and scale

calculates confidence limits

generates frequency tables

performs goodness-of-fit tests for fitted parametric and nonparametric

distributions

creates quantile-quantile plots and probability plots for various

theoretical distributions

36

PROC UNIVARIATE

PROC UNIVARIATE <option(s)BY <DESCENDING> variable-1

<...<DESCENDING> variable-n>

<NOTSORTED>;

CLASS variable-1<(variable-option(s))> <variable-2<(variable-option(s))>>

</ KEYLEVEL='value1'|('value1' 'value2')>;

FREQ variable;

HISTOGRAM <variable(s)> </ option(s)>;

ID variable(s);

INSET <keyword(s) DATA=SAS-data-set> </ option(s)>;

OUTPUT <OUT=SAS-data-set> statistic-keyword-1=name(s)

<... statistic-keyword-n=name(s)> <percentiles-specification>;

PROBPLOT <variable(s)> </ option(s)>;

QQPLOT <variable(s)> </ option(s)>;

VAR variable(s);

>WEIGHT variable;

;

37

PROC NPAR1WAY

The NPAR1WAY procedure performs nonparametric tests for location

and scale differences across a one-way classification.

PROC NPAR1WAY also provides a standard analysis of variance on

the raw data and statistics based on the empirical distribution function.

PROC NPAR1WAY provides tests using the raw input data as scores.

When the data are classified into two samples, tests are based on

simple linear rank statistics.

When the data are classified into more than two samples, tests are

based on one-way ANOVA statistics.

Both asymptotic and exact p-values are available for these tests.

38

PROC NPAR1WAY

PROC NPAR1WAY < options > ;

BY variables ;

CLASS variable ;

EXACT statistic-options < / computation-options > ;

FREQ variable ;

OUTPUT < OUT=SAS-data-set > < options > ;

VAR variables

39

Display your output in Rich-Text-Format (RTF)

Create SAS data sets directly from output tables

Select or exclude individual output tables

Customize the layout, format, and headers of your output

ODS combines raw data with one or more table definitions to produce

one or more output objects. These objects can be sent to any or all

ODS destinations.

40

How ODS Works ?

In your ODS statement(s), you specify one or more

destinations for your output

This destination . . .

Produces . . .

Output

Listing

listing output

HTML

HTML output

41

42

ODS LISTING <action>;

ODS LISTING <DATAPANEL=number | DATA | PAGE>;

ODS HTML HTML-file-specification(s) <option(s)>;

ODS OUTPUT data-set-definition(s);

43

44

- Statistics MemoUploaded byKipkosgey Chemai
- Ph Stat HelpUploaded byTanner Wareham
- 7763-26558-1-PB (the Effects of Brain-Based Learning on Academic Achievement and Retention of Knowledge in Science Course)Uploaded byM. Furqon
- Skripsi Bahasa Inggris Improving ReadingUploaded byNanangSusanto
- UT Dallas Syllabus for sci5340.0i1.11f taught by Lynn Melton (melton)Uploaded byUT Dallas Provost's Technology Group
- Robot Characterizations 1 2Uploaded bylinout20007612
- Volunteer Tourists Length of Stay in Ghana Influence of Socio Demographic and Trip AttributesUploaded byfelix
- Anova Regression -SPSS OutputUploaded byfebycv
- BUS 308 HELPS Learn by Doing/bus308helps.comUploaded bymunna15
- How to Analyze Your DataUploaded byhzaneta
- 0192I - PPT9 - R2Uploaded byadeputra
- testul TUploaded byalinaanton1980_73345
- spss20p2Uploaded byAnand Nilewar
- Ankit StatsUploaded byVipul Gupta
- SSRN-id1905423Uploaded byvikram_lavanya
- anggi 22Uploaded byIsma R
- hasil spss kat numerik.docxUploaded byFrancisca Noveliani
- The Influence of Basketball Dribbling on Repeated High-Intensity Intermittent RunUploaded byDefprimal
- c5009enUploaded byjorge
- stathemaUploaded byRegita Ayu Lestari
- Graphpad Unpaired t Test CD11c CD206 MGL1 Rasio M1-M2Uploaded byRidwan Yasin
- 515-2018-1-PBUploaded byJohana Murcia
- Independen t TestUploaded byikrimatulismi
- final paperUploaded byapi-282885262
- SPSS for BeginnersUploaded byأبوسوار هندسة
- TOCCMJ-3-1.pdfUploaded byAnonymous fmc83U9J
- 41. Human Resources - Ijhrmr - Study on Talent Management - Vivekanandan.k, Sasiraja.s, Aswini.p.m.Uploaded byTJPRC Publications
- Homework.docxUploaded bymao0021
- Biodiesel Final ReportUploaded byranamerry16
- ASN4.pdfUploaded byAnonymous 6VOXSiI64G

- Suicides CDC ReportUploaded byZacharyEJWilliams
- AML StatisticsUploaded bySomobrata Ballabh
- DesertationUploaded byAvneetPalSingh
- Folio p.moralUploaded byNonie Rosdi
- intro to RUploaded byMorten Akhøj
- Case Study Service Failure AirlinesUploaded byscribdchakri
- 3 Probability and Sampling DistributionsUploaded bySom Piseth
- Experimental Design ProposalUploaded bySreejukto Mahfuz Chowdhury
- outlierUploaded byAjay Gole
- Habyarimana Faustin 2016Uploaded byTebsu Tebsuko
- 22Uploaded byjagadeeshraj
- 3. Quiz 2 2002Uploaded byhimanshubahmani
- Test Bank Questions Chapter 7Uploaded byAnonymous 8ooQmMoNs1
- abcd.pdfUploaded byVivek Khepar
- A Logistic Regression Model of Customer Satisfaction of AirlineUploaded byjosephatpeter
- supp. examUploaded byHay Jirenyaa
- Module3aUploaded bypalkybd
- MBA Syllabus 2018-19 (AICTE)(1)Uploaded byvikram singh
- Statistics_3.4_answersUploaded byrpittman598
- Usiness Research Nature Amp ScopeUploaded bysadathnoori
- MB0040 Statistics for ManagementUploaded byAnish Nair
- Trends Appl DMUploaded byAllison Collier
- list 20150331.pdfUploaded byHuesni
- Insect Pests of Rainfed Wetland RiceUploaded byJames Litsinger
- Mystry ShoppingUploaded byattique
- Application of multivariate principal component analysis on dimensional reduction of milk composition variablesUploaded byresearchinbiology
- Survival, Mixed, Panel DataUploaded byUjjawal Bhandari
- Resume of Dr. M S SridharUploaded byM S Sridhar
- apuntes de climatologiaUploaded byGiselsimp
- HA X on EFAUploaded byMuhammed Ammachandy