© All Rights Reserved

3 views

© All Rights Reserved

- Science(101)
- Anova
- pengelolaan limbah laboratorium
- ANOVA 2-Way Random
- Published Article on Performance Appraisal
- Post-ANOVA Comparison of Means.ppt
- When People Are the Instrument Sensory Evaluation Methods
- Twoway Stata
- detoxification.pdf
- 2 way anova
- Case Study
- 1-s2.0-S2212017314000851-main
- activated arbon.pdf
- Joint Drumming Tommasello
- Final Analysis of Data - Group 2
- 762.Test.2007
- Asseigment-6
- Efecto de inactivación de lipasa por aplicación de pulsos eléctricos en arroz
- Stat 5050 Syllabus (1)
- Effectiveness of Ascorbic Acid and Sodium Metabisulfite as Anti-browning Agent and Antioxidant on Green Coconut Water (Cocos Nucifera) Subjected to Elevated Thermal Processing

You are on page 1of 182

Methods in

Research

13 – 14 January 2011

Department of Mathematical and Actuarial

Sciences,

Universiti Tunku Abdul Rahman

Self Introduction

Name

Department and work nature

Experience in using SPSS or any

statistical tool

Expectation from this program

6/9/2016 2

Contact Information

Objective

Method

Machinery

Agenda

Norm

Development of SPSS

6/9/2016 3

Contact Information

O

M

M Contact

A

N

Information

D

• Mr. Chang Yun Fah

• E-mail: changyf@utar.edu.my

• Tel: 03-41079802

• Department of Mathematical and Actuarial

Sciences, Faculty of Engineering and

Science

6/9/2016 4

C

Objective Objective

M

M

A • Use the analytical functions of SPSS.

N • Process data and generate statistics for

D ANOVA and MANOVA tests.

• Process data and generate statistics for linear

regression analysis and logit analysis.

• Process data and generate statistics for

principal component analysis and factor

analysis.

• Process data and generate statistics for

testing for clustering analysis.

• Process data and generate statistics for

discriminant analysis.

6/9/2016 5

C

O

Method

M

Presentation

A

N

D

Method of

Case study Exercise

Learning

Discussion

6/9/2016 6

C

O

M

Machine Machine &

A Software

N • SPSS 14.0 for Windows

D • Student package

• Release 14.0.0 on

• needed a downloadable hotfix to be

installed in order to be compatible with

Windows Vista.

6/9/2016 7

C

O

M

Agenda

M Day 1 Day 2

Agenda

COMMAND Principal

N Component Analysis

D One-Way, Two-Way and Factor Analysis

and Three-Way

ANOVA Clustering Analysis

Analysis and Logit

Analysis

6/9/2016 8

C

O

M

Writing report

M

A

Navigate Interpretation

D

Statistical analysis

Tools selection

Data collection

Problem analysis

6/9/2016 9

C

O

M History

M

A Release history

N SPSS 15.0.1 - November 2006

Development SPSS 16.0.2 - April 2008

SPSS Statistics 17.0.1 - December

2008

PASW Statistics 17.0.3 - September

2009

PASW Statistics 18.0 - August 2009

PASW Statistics 18.0.1 - December

2009

PASW Statistics 18.0.2 - April 2010

6/9/2016 10

One-Way,

Two-Way

and Three-

Way ANOVA

6/9/2016 11

One-Way ANOVA

Analysis of variance (ANOVA) is an extension of

the independent-sample t-test (one independent

variable/factor with 2 levels/groups)

Dealt with an experiment involves one

dependent variable and one factor/independent

variable.

Comparing means of 2 or more levels/

treatments of the factor.

6/9/2016 12

Completely Randomized Design

In general, there will be a levels of the factor, or a

treatments, and n replicates of the experiment, run in

random order

Objective is to test hypotheses about the equality of the

a treatment means

N=axn total runs

yij = μ + τ i + εij ; i = 1, 2K , a; j = 1, 2,K , n

µ = an overall mean, τi = ith treatment effect, εij =

experimental error, NID(0,σ2).

H0: τ1 = τ2 = …= τa = 0

H1: τi ≠ 0 for at least one i.

6/9/2016 13

ANOVA table:

Source of

variation Sum of Squares DF MS F0

a

SST = n∑ ( yi. − y.. )

2

MST

i =1

a −1 MST =

SST F0 =

Between 1 a 2 y2 a −1 MSE

SST = ∑ yi −

treatments n i =1 N

Error (within SSE

SSE = SSTO − SST N −a M SE =

treatments) N −a

a n

SSTO = ∑∑ ( yij − y.. )

2

i =1 j =1

y2

N −1

a n

SSTO = ∑∑ yij2 −

i =1 j =1 N

Total

6/9/2016 14

Exercise:

relationship between the RF power setting and

the etch rate for this tool. The objective of an

experiment like this is to model the relationship

between etch rate and RF power, and to specify

the power setting that will give a desired target

etch rate. She is interested in a particular gas

(C2F6) and gap (0.80cm), and wants to test four

levels of RF power: 160W, 180W, 200W, and

220W. She decided to test five wafers at each

level of RF power

6/9/2016 15

Observations

Power (W) 1 2 3 4 5 Total Average

160 575 542 530 539 570 2756 551.2

180 565 593 590 579 610 2937 587.4

200 600 651 610 637 629 3127 625.4

220 725 700 715 685 710 3535 707.0

(independent variable)

Y=etch rate, X=RF power

4 levels: 160, 180, 200, 220

6/9/2016 16

Observations

Power (W) 1 2 3 4 5 Total Average

160 575 542 530 539 570 2756 551.2

180 565 593 590 579 610 2937 587.4

200 600 651 610 637 629 3127 625.4

220 725 700 715 685 710 3535 707.0

n=5

Power: 160 160 160 160 160 180 180 180 180 180 200 200 200 200 200 220 220 220 220 220

EtchR: 575 542 530 539 570 565 593 590 579 610 600 651 610 637 629 725 700 715 685 710

6/9/2016 17

Open SPSS file ‘Data 1’: Summaries for

personal savings and personal income based

on ethnic group. What is your ‘conclusion’?

Descriptives

95% Confidence Interval for

Mean

N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum

Malay 11 22.55 15.076 4.545 12.42 32.67 11 51

Chinese 15 27.73 8.514 2.198 23.02 32.45 21 56

Indian 7 37.14 8.783 3.320 29.02 45.27 32 56

Foreigner 6 40.83 14.386 5.873 25.74 55.93 12 49

Total 39 29.97 13.114 2.100 25.72 34.23 11 56

Descriptives

Are they statistically different?

Personal income

95% Confidence Interval for

Mean

N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum

Malay 11 $1,011.8182 $438.01411 $132.066 $717.5563 $1,306.0801 $650.00 $2100.00

Chinese 15 $1,127.3333 $340.95384 $88.03390 $938.5194 $1,316.1473 $600.00 $1600.00

Indian 7 $1,138.5714 $367.48826 $138.898 $798.7015 $1,478.4414 $600.00 $1670.00

Foreigner 6 $1,238.3333 $406.91113 $166.121 $811.3063 $1,665.3604 $590.00 $1700.00

Total 39 $1,113.8462 $376.92394 $60.35614 $991.6615 $1,236.0308 $590.00 $2100.00

6/9/2016 18

Ethnic Psavings

3 32 Rearrange the personal savings and

4 45 ethnic group from Questionnaire 1

2 34

2 56 (Data1) into the completely

3 32

2 26

randomized design format:

2 23

2 27 Observations

3 38

2 21 Treatment 1 2 3 4 5 6 7 8

4 48 1 11 43 18 19 11

4 49

4 12 2 34 56 26 23 27 21 27 24

2 27 3 32 32 38

1 11

4 45 48 49 12

1 43

1 18

1 19

1 11

2 24

6/9/2016 19

One-Way

Compare Means Analyze menu

ANOVA

move

move dependent select the

independent Click

variable to

variable to ‘Options’ Statistics and Continue

‘Dependent List’ Means plot OK

‘Factor’

1

9

4

2

5

3

6

8

7

6/9/2016 20

Open SPSS file ‘Data1’: Conduct an One-Way ANOVA by using

Personal savings as dependent variable and Ethnic as factor.

Use LCD test for multiple comparison.

the amounts of savings

among ethnic groups

H1: at least one ethnic group

has different amounts of

savings than other ethnic

groups.

Treatment Sum of

squares (SST)

squares (SSE) Personal savings (thousand)

Sum of

Squares df Mean Square F Sig.

Total Sum of Between Groups 1749.623 3 583.208 4.266 .011 p-value <0.05,

squares (SSTO) Within Groups 4785.351 35 136.724 reject H0

Total 6534.974 38

6/9/2016 21

Writing it in report 1!

savings totaling RM29970 but the average savings for each of the four

ethnic groups and countries of origin seemed to be different. The means

plot shows that foreign workers had the highest savings and the Malay

workers the lowest. Therefore, one of the appropriate hypothesis in this

study is that there is a significant difference in the amount of savings of

workers from different ethnic groups and countries of origin. To test this

hypothesis, the one-way ANOVA was used. The analysis yielded a

significant result with F-ratio of 4.266 which was significant at the 0.05 level

of significance (p=0.011). Therefore it can be concluded that the workers

from different ethnic groups and countries of origin had different amounts of

savings.

6/9/2016 22

Exercise

Get the One-Way ANOVA for Personal income as

dependent variable and Ethnic as factor

among ethnic groups

H1: at least one ethnic group has different amounts of

income than other ethnic groups.

6/9/2016 23

Multiple Comparison Test (Post Hoc Test)

treatments’ mean, a researcher may want to

know which means differ.

•E.g. the ANOVA test showed that the amounts

of savings differed among ethnic groups but the

analysis does not tell us which ethnic groups

differed in their amounts of savings.

•To detect the difference in the means of each

pair of ethnic groups, the post hoc test (multiple

comparison tests) can be used.

6/9/2016 24

Multiple Comparison Test (Post Hoc Test)

Comparing treatment means simultaneously without control

group:

H0: Contrast =0 vs H1: Contrast ≠ 0

1.Bonferroni t-test 2. Scheffe’s method

H0: μi-μi’ = 0 vs H1: μi-μi’ ≠ 0

1.Tukey’s test

2. LSD (Fisher Least Significant Difference) test

6/9/2016 25

Dunnett’s test: Comparing treatment means with a control

group:

H0: μi-μa= 0 vs H1: μi-μa ≠ 0

Example 1: the Malays is a benchmark for personal savings.

Example 2: the ASEAN countries’ GDP for the last 4

consecutive years were compared. The GDP obtained in the

2006 is used as the base year.

3 new teaching approaches. A class of 100 students were

randomly divided into 4 groups, A, B, C and D. Classes for

groups A, B and C were conducted using Method 1, Method

2 and Method 3 respectively and the control group D using

the existing method.

6/9/2016 26

One-Way

Compare Means Analyze menu

ANOVA

move Select the

move dependent

independent Click ‘Post multiple

variable to Continue

variable to Hoc’ comparison

‘Dependent List’ OK

‘Factor’ methods

9

4

6/9/2016

8 27

Multiple Comparisons

LSD

Mean

Malays

Difference 95% Confidence Interval

(I) Ethnic group

(J) Ethnic group (I-J) Std. Error Sig. Lower BoundUpper Bound

Malay Chinese -5.188 4.642 .271 -14.61 4.24

Indian -14.597* 5.653 .014 -26.07 -3.12 Chinese

Foreigner -18.288* 5.934 .004 -30.34 -6.24

Chinese Malay 5.188 4.642 .271 -4.24 14.61

Indian -9.410 5.352 .087 -20.28 1.46

Foreigner

Indians

-13.100* 5.648 .026 -24.57 -1.63

Indian Malay 14.597* 5.653 .014 3.12 26.07

Chinese 9.410 5.352 .087 -1.46 20.28

Foreigner -3.690 6.505 .574 -16.90 9.52

Foreigner Malay 18.288* 5.934 .004 6.24 30.34 Foreigner

Chinese 13.100* 5.648 .026 1.63 24.57

Indian 3.690 6.505 .574 -9.52 16.90

*. The mean difference is significant at the .05 level.

6/9/2016 28

Writing it in report 2!

Analysis of the data showed that the amount of savings differed among

the ethnic groups with the foreign workers having the highest savings at

RM40830 followed by the Indians, Chinese and lastly Malays. Further

analysis using one-way ANOVA technique revealed that the difference

was significant at least at 0.05 level. LCD test was conducted to detect

which ethnic group differed from the other ethnic groups. The result

showed the mean difference of savings between Malays and the Chinese

was RM5188 and this was not significant at 0.05 (p=0.271). But the mean

differences in savings between the Malays and the Indians and the

foreigners were RM14597 and RM18288 respectively and they were both

significant at 0.05 with the probabilities of error being p=0.014 and

p=0.004 respectively. At the 0.05 level, the Chinese workers had

significant savings differences from the foreigners but no significant

differences with the Malays and the Indians. The Indians has a significant

savings difference from the Malays but not the other ethnic groups. Lastly,

the foreigners had significant savings differences from the Malays and the

Chinese but not the Indians.

6/9/2016 29

Perform the multiple comparison by assuming Malay is a

control group.

Multiple Comparisons

a

Dunnett t (2-sided)

Mean

Difference 95% Confidence Interval

(I) Ethnic group (J) Ethnic group (I-J) Std. Error Sig. Lower Bound Upper Bound

Chinese Malay 5.188 4.642 .563 -6.27 16.64

Indian Malay 14.597* 5.653 .038 .65 28.55

Foreigner Malay 18.288* 5.934 .011 3.64 32.93

*. The mean difference is significant at the .05 level.

a. Dunnett t-tests treat one group as a control, and compare all other groups against it.

6/9/2016 30

Kruskal-Wallis H Test (nonparametric multiple comparison test)

K Independent

Nonparametric Tests Analyze menu

Samples

variable to ‘Test variable to in Test Type Range’ minimum=1,

Variable List’ ‘Grouping Variable’ maximum=3 (3

groups)

1 Continue

10

4 OK

5

7 open SPSS file ‘Data11’

6

9

2

8

3

6/9/2016 31

Test Statistics a,b

(RM)

Chi-Square 9.632

df 2

Asymp. Sig. .008

Writing it in report 3! a. Kruskal Wallis Test

b. Grouping Variable: Country

Malaysia to their families in their countries of origin. Initial analysis revealed

that the mean remittances for Indonesian, Bangladeshi and Myanmar workers

were RM940, RM735 and RM497.5 respectively. Examination of the means

suggested that there was a possibility that the amounts of remittances were

different among the three groups of foreign workers. However,, owing to the

small number of samples taken in this study, test of normality found that the

data were not normally distributed. This suggested that the use of one-was

ANOVA was not appropriate to test the difference of mean remittances among

the three groups. Instead the Kruskal-Wallis H test, a non-parametric test was

used. The test resulted in a fairly large Chi-square value of 9.846 which was

significant at 0.05 (p=0.007). Therefore, this study concludes that the amounts

of remittances to the countries of origin were different among the Indonesian,

Bangladeshi and Myanmar foreign workers.

6/9/2016 32

Two-Way ANOVA

Applied to Two-Factor Factorial Design

Detecting interaction in data involves at

least 3 variables.

One dependent variable in interval or

numerical scale.

Two or more independent variables/factors

measure in nominal or categorical scale.

i = 1, 2,K , a

yijk = μ + τi + β j + ( τβ )ij + εijk j = 1, 2,K , b

k = 1, 2,K , n

6/9/2016 33

Interaction Effects

In the study on job satisfaction by ownership of

company and workers’ country of origin, 3

possible hypotheses can be constructed:

1. Whether the perceptions of workers towards

their employers differed significantly for the 3

groups of companies (main effect), i.e.

foreign-owned, joint venture and local.

2. Whether the mean perceptions of workers

differed significantly for the 2 groups of

workers (main effect), i.e. locals and

foreigners.

6/9/2016 34

Interaction Effects

3. Whether the means of perception were

significantly influenced by both the types of

companies and citizenships of workers

(interaction effects)

variable mean i = 1, 2,K , a

yijk = μ + τi + β j + ( τβ )ij + εijk j = 1, 2,K , b

k = 1, 2,K , n

Mean

Mean

effect of Interaction

effect of

Factor 1 effects

Factor 2

6/9/2016 35

Open SPSS file ‘Data15’:

the data suggested that foreign workers responded favorably to the

question whether they received good treatment from their employers, but

the situation was reversed for companies owned by local investors.

towards their employers

for the 3 groups of

companies are not

different.

H0: there is no different in

perceptions of workers

between locals and

foreigners

H0: there is no interaction

between the types of

companies (owner) and

citizenships of workers.

6/9/2016 36

Interaction Plot

Univariate General Linear Model Analyze menu

move factors to click click

to ‘Dependent ‘Horizontal Axis’, one

‘Fixed Factors’ ‘Plot’ ‘Add’

Variable’ to ‘Separate Lines’

Continue

1

4

6

5

2 3

9

7

7

8

6/9/2016 37

move factors (can be >2

Multiple click ‘Post factors) to ‘Post Hoc Tests Continue

Comparison Hoc’ for’ and select the tests

10

13 click

‘Options’

click

‘Descriptiv

e statistics’

16

Continue

12 OK

11

11

14

6/9/2016 38

Descriptive Statistics

Between-Subjects Factors OWNER WORKER Mean Std. Deviation N

foreign local 5.30 .949 10

Value Label N Vietnamese 7.30 1.494 10

OWNER 1 foreign 20 Total 6.30 1.593 20

2 joint-ventur joint-venture local 6.50 1.269 10

20

e Vietnamese 7.10 1.370 10

3 local 20 Total 6.80 1.322 20

WORKER 1 local 30 local local 6.00 1.563 10

2 Vietnamese 4.00 1.491 10

Vietnamese 30

Total 5.00 1.806 20

Total local 5.93 1.337 30

Vietnamese 6.13 2.080 30

Total 6.03 1.737 60

Dependent Variable: SATIS difference in the level of

Type III Sum satisfaction towards

Source of Squares df Mean Square F Sig. employers between workers

Corrected Model 76.333a 5 15.267 8.114 .000 in different types of

Intercept 2184.067 1 2184.067 1160.823 .000 companies.

Ownership 34.533 2 17.267 9.177 .000 2. There is no significant

Worker .600 1 .600 .319 .575 difference in the level of

Ownership * Worker 41.200 2 20.600 10.949 .000 satisfaction towards

Error 101.600 54 1.881 employers between workers

Total 2362.000 60 of different citizenships.

Corrected Total 177.933 59

3. There is an interaction effect

a. R Squared = .429 (Adjusted R Squared = .376)

between company ownership

and citizenship.

6/9/2016 39

Multiple Comparisons

Tukey HSD

Mean

Difference 95% Confidence Interval

(I) OWNER (J) OWNER (I-J) Std. Error Sig. Lower Bound Upper Bound

foreign joint-venture -.50 .434 .486 -1.55 .55

local 1.30* .434 .011 .25 2.35

joint-venture foreign .50 .434 .486 -.55 1.55

local 1.80* .434 .000 .75 2.85

local foreign -1.30* .434 .011 -2.35 -.25

joint-venture -1.80* .434 .000 -2.85 -.75

Based on observed means.

*. The mean difference is significant at the .05 level.

that the citizenship and

ownership are ‘interact’ to each

other.

• Vietnamese workers achieved

higher level of satisfaction in

foreign and joint-venture

companies.

• Local workers achieved higher

level of satisfaction in local

companies.

6/9/2016 40

Writing it in report 4!

This study interviewed 60 local and foreign (Vietnamese) workers in local,

joint-venture and foreign-owned companies. Using 1-9 Likert scale ratings,

they were asked to indicate their level of satisfaction with the treatment they

received from their respective employers. In this study, the dependent

variable is the level of satisfaction while the ownership of the companies and

the citizenship of the workers acted as the explanatory variables (factors).

An initial look at the data suggested that the satisfaction levels of the

workers towards their employers were influenced by types of ownership of

the companies and their citizenships. Further analysis using the two-way

ANOVA method yielded three results. First, there was a significance

difference in the means of satisfaction towards employers among the

workers in the three groups of companies with F=9.177 (p=0.000). This

means that ownership of company exerted influence on workers’ level of

satisfaction.

6/9/2016 41

Second, there was no significant difference in the levels of satisfaction

between local workers and the Vietnamese workers with F=0.319 (p=0.575).

This means that citizenship did not have significant influence on satisfaction.

Third, there was an interaction effect in which both ownership of company

and workers’ citizenship together exerted significant influence on workers’

satisfaction with F=10.949 and significant at the 0.05 level (p=0.000).

A closer look at the interaction effects (figure) revealed that there was a big

gap between the level of satisfaction of local and Vietnamese workers in

foreign-owned companies, with the latter showing higher satisfaction than

the former. In joint-venture companies, Vietnamese workers continued to

show higher satisfaction than local workers but the gap was narrowed,

whereas in local companies, the situation was reversed where local workers

indicated a higher level of satisfaction than Vietnamese workers.

6/9/2016 42

Case of No Interaction Effects

Tests of Between-Subjects Effects

Type III Sum

Source of Squares df Mean Square F Sig.

Corrected Model 6.222a 5 1.244 .391 .853

Intercept 1725.853 1 1725.853 542.749 .000

Worker 2.391 1 2.391 .752 .390

Activity 5.307 2 2.654 .835 .440 p>0.05, Accept H0 and

Worker * Activity .120 2 .060 .019 .981

Error 171.711 54 3.180 conclude that there is

Total 2362.000 60

Corrected Total 177.933 59

no interaction between

a. R Squared = .035 (Adjusted R Squared = -.054)

worker and activity

Parallel lines

6/9/2016 43

Tests of Between-Subjects Effects

Type III Sum

Source of Squares df Mean Square F Sig.

Corrected Model 49.931a 8 6.241 2.487 .023

Intercept 1931.094 1 1931.094 769.406 .000

Activity 4.162 2 2.081 .829 .442

Ownership

Activity * Ownership

33.183

11.332

2

4

16.592

2.833

6.611

1.129

.003

.353

p>0.05, Accept H0 and

Error 128.002 51 2.510 conclude that there is

Total 2362.000 60

Corrected Total 177.933 59 no interaction between

a. R Squared = .281 (Adjusted R Squared = .168)

ownership and activity

It is hard to

determine the

interaction effects

based on a plot with

3 or more lines

6/9/2016 44

Three-Way ANOVA

It is a 3-factor factorial design.

One dependent variable in numeric scale

and 3 explanatory variables (factors) either

all in numeric scale or a combination of

numeric and categorical scales.

i = 1,2,..., a

j = 1,2,...b

yijkl = μ + τi + β j + γk + (τβ)ij + (τγ)ik + ( βγ) jk + (τβγ)ijk + εijkl

k = 1,2,..., c

l = 1,2,..., n

6/9/2016 45

Univariate General Linear Model Analyze menu

move variable

‘Fixed Factors’ or variables click ‘Horizontal Axis’, one to

to ‘Dependent

in numerical, interval, ‘Plot’ ‘Separate Lines’ and one to

Variable’

Likert scale to ‘Covariate’ ‘Separate Plots’

click

OK Continue

Open SPSS file ‘Data16’ ‘Add’

9

4 7

6 7

5

7

10

6/9/2016 46

There is a significant difference in There is no significant difference in

perception towards efficiency of the perception towards the efficiency of

police among the respondents of the police among respondents of

different ethnicities at 0.05 level different educational backgrounds at

(p=0.005) 0.05 (p=0.096)

Tests of Between-Subjects Effects

There is a

Dependent Variable: PERCEPTION significant

Type III Sum

Source of Squares df Mean Square F Sig. difference in

Corrected Model 172.717a 11 15.702 8.416 .000 perception towards

Intercept 544.027 1 544.027 291.588 .000

Ethnicity

the efficiency of the

27.432 2 13.716 7.352 .005

Education 5.767 1 5.767 3.091 .096 police among

City 10.607 1 10.607 5.685 .028 respondents of

Ethnicity * Education 65.617 2 32.809 17.585 .000 different locations at

Ethnicity * City 3.832 2 1.916 1.027 .378

Education * City 1.179 1 1.179 .632 .437

0.05 (p=0.028)

Ethnicity * Education * City 23.096 2 11.548 6.189 .009

Error 33.583 18 1.866 No interaction effect

Total 869.000 30

Corrected Total 206.300 29

between ethnic and

a. R Squared = .837 (Adjusted R Squared = .738) location factors

interaction effect between interaction effect No interaction

ethnic background and between ethnic, effect between

educational attainment of the education and location education and

respondents factors location factors

6/9/2016 47

6/9/2016 48

Writing it in report 5!

This study looks at the public perception from the different ethnic,

educational background and location towards the efficiency of the police,

with the hypothesis that their perceptions are different and the differences

can be explained by their personal backgrounds. The three-way ANOVA

was employed in this study.

The results of the analysis showed that there was a significant difference in

perception towards efficiency of the police among respondents of different

ethnicities at 0.05 (p=0.005) and different locations (p=0.028), but no

significant difference in perception among respondents of different

educational backgrounds (p=0.096). In a similar perception, there was a

significant interaction effect between ethnic background and educational

attainment of the respondents at 0.05 (p=0.000), and a significant

interaction effect between ethnic, education, and location backgrounds of

the respondents (p=0.009), but no significant interaction effect between

ethnic and location factors (p=0.378) and no significant interaction effect

between education and location factors (p=0.437).

6/9/2016 49

Controlling for the location factor, this study revealed that there was a

significant interaction effect in perception towards efficiency in the police in

terms of ethnicity among respondents living in small cities. In small cities,

lowly educated Malay respondents demonstrated a relatively high regard for

the police compared with the highly educated Malays, a similar pattern

observable among the Chinese respondents but with a wider gap between

the lowly and highly educated. However, this phenomenon was not

noticeable among the Indian respondents; in fact this ethnic group showed a

reverse situation in which the highly educated Indians tended to have a

relatively more favorable perception towards the police compared with the

lowly educated Indians. Moreover, the gap between the two groups of

educational attainment among the Indians was much wider compared with

the other two ethnic groups, namely the Malays and the Chinese.

6/9/2016 50

A similar interaction effect can be seen among respondents living in big

cities. Generally the Malays demonstrated the highest regard for the police,

followed by the Chinese and the Indians. The lowly educated Malays

demonstrated a relatively high regard for the police; a similar pattern which

was observable among the Chinese respondents but at relatively low levels

and a narrower gap between the two groups. However, the situation is

reversed among the Indians where perception towards efficiency of the

police among the highly educated of this ethnic group was relatively higher

than that of the lowly educated. In general, the Indians were found to have

very low regard for the police compared with the other two ethnic groups,

and there was not much difference between the lowly and highly educated

respondents among members of this ethnic group in terms of the level of

respect for the police.

6/9/2016 51

Exercise using ‘Data15’: construct the 2 way ANOVA

using three factors: Ownership, Worker, and Activity

Type III Sum

Source of Squares df Mean Square F Sig.

Corrected Model 100.690a 16 6.293 3.503 .001

Intercept 1287.435 1 1287.435 716.697 .000

Ownership 38.985 2 19.492 10.851 .000

Worker 1.310 1 1.310 .729 .398

Activity 2.170 2 1.085 .604 .551

Ownership * Worker 35.736 2 17.868 9.947 .000

Ownership * Activity 7.209 4 1.802 1.003 .416

Worker * Activity 1.284 2 .642 .357 .702

Ownership * Worker

14.089 3 4.696 2.614 .063

* Activity

Error 77.243 43 1.796

Total 2362.000 60

Corrected Total 177.933 59

a. R Squared = .566 (Adjusted R Squared = .404)

6/9/2016 52

Difficult to

interpret 3 lines

6/9/2016 53

6/9/2016 54

Linear Regression Analysis and

Logit Analysis

6/9/2016 55

Simple linear regression

Fitted line

Multiple linear regression and model

checking

Significance test

Model selection

Nonlinear regression and curve estimation

Dummy variable in regression

Logit analysis

6/9/2016 56

Simple Linear Regression

yi = β 0 + β1 xi + ε i , i = 1,2, K , n

where:

• The intercept β0 and the slope β1 are unknown

constants (parameters)

• The regressor (independent or predictor variable)

xi is a known constant (fix)

• εi is the random error.

• yi is the value of the response (dependent)

variable in the i-th trial/observation

6/9/2016 57

Assumptions:

1) The error terms εi are Normally (needed in inferences and

MLE) and Independently Distributed with mean E(εi)=0 and

constant variance Var(εi)=δ2 ;

Cov ( ε i , ε j ) = 0; ∀i ≠ j

2) The errors (thus, the yi also) are uncorrelated with each other in

successive observations

ε i ~ NID (0 , σ 2

)

3) The relationship between the variables y and x should be linear.

6/9/2016 58

move one variable to

‘Dependent’, one variable Linear Regression Analyze menu

to ‘Independent’

Select types of statistics:

Statistics Continue OK

Estimates, Model fit, R2

1

8

4

2 3

7

6

6/9/2016 59

Coefficientsa

Coefficients Coefficients

Model B Std. Error Beta t Sig.

means that the

1 (Constant) 296.813 53.519 5.546 .000 intercept and slope

CGPA (max 4.0) -48.650 20.532 -.324 -2.369 .022 are not significantly

a. Dependent Variable: Handphone bills

equal zeros

n n

∑y ∑x i i Standardized intercept

∑ y ( x − x)

n n

∧ ∑yx − i i

i =1

n

i =1

i i

S xy

always zero,

β1 = i =1

= i =1

= = −48.650 standardized slope (=-

∑ ( x − x)

2 n

n 2 S xx 0.324) is now

n

∑ xi

i =1

i

comparable to other

∑

i =1

xi −

2 i =1

n

slopes if any

∧ ∧

β 0 = y − β1 x = 296.813

∧ ∧ ∧

y = β 0 + β1 x Estimated Mobile phone bill = 296.813 – 48.65*CGPA

6/9/2016 60

ANOVAb

Sum of

Model Squares df Mean Square F Sig. p-value<0.05 means the

1 Regression 41792.297 1 41792.297 5.615 .022a regression model is

Residual 357292.9 48 7443.603

Total 399085.2 49

significant

a. Predictors: (Constant), CGPA (max 4.0)

b. Dependent Variable: Handphone bills

Model Summary

Change Statistics

Adjusted Std. Error of R Square

Model R R Square R Square the Estimate Change F Change df1 df2 Sig. F Change

1 .324a .105 .086 86.276 .105 5.615 1 48 .022

a. Predictors: (Constant), CGPA (max 4.0)

Coefficient of

correlation (R):

Std error of the

measure the

Coefficient of estimate: standard

strength of When Adjusted R2 – R2

determination errors of the

relationship between is large means the

(R2): the variation predicted values.

the dependent and linear model is not

independent of dependent appropriate or some

variables. variable ‘important’ independent

explained by the variables are missing

model (%).

6/9/2016 61

Writing it in report 6!

One of the hypothesis in this study is that several personal characteristics of

the youths interviewed can predict the monthly amount they spent on the

mobile phone. Simple linear regression analysis using CGPA as the

explanatory variable produced a significant result with F=5.615 and

significant at the 0.05 level (p=0.022). There was a weak and inverse

relationship between CGPA and expenditure on mobile phone with a

correlation coefficient of 0.324. The derived R2 was rather small at 0.105,

indicating only 10.5% variations in expenditure on mobile phone were

explained by academic performance measured in CGPA. The resultant

model from the analysis is Y= 296.813 – 48.65*X where Y is expenditure on

mobile phone and X is CGPA. It showed that the higher the CGPA of the

respondent, the lower the amount spent on mobile phone bills.

Note: Although the ANOVA test shows that the model is significant, but the small R

and R2 values indicated that the CGPA is not a good predictor for expenditure on

mobile phone.

as predictor.

6/9/2016 62

Fitted Line

Construct the scatter plot for Age and Handphone bills. Double click the

plot obtained yielded the follow graph.

Select ‘Linear’

model

click

this

‘Add fit

line at

total’

This is a

possible

outlier

point

6/9/2016 63

Multiple Linear Regression

It involves one dependent variable and a set of

several independent variables.

Assumptions in simple linear regression are

hold e.g. linearity, normality, uncorrelated and

equal variance.

IMPORTANT: At the end of MLR, we try to

select a model that has a minimum number of

independent variables with acceptable model

performance (e.g. large coefficient of

determination)

6/9/2016 64

Matrix form of multiple linear regression model

model in matrix form:

y = Xβ + ε

where y = [ y1 y2 L yn ] ' 1 x11 x12 L x1k

1 x21 x22 L x2 k

β = [β0 β1 Lβk ] '

X=

M M M M

ε = [ε 1 ε 2 Lε k ]

'

1 xn1 xn 2 L xnk

size px1, where p = n+1

6/9/2016 65

∧ ∧

y = X β = X(X X ) X′y = Hy

−1

′

∧

β = (X X ) X′y

−1

′

conditions must be satisfied: WHY?

1) The number of observations (n) must

be at least the number of regressors

(k).

2) The matrix X’X must be non-singular.

3) All regressors must be linearly

independent

6/9/2016 66

Open SPSS file ‘Data17’

move one variable to ‘Dependent’, Analyze

Linear Regression

regressors to ‘Independent’ menu

Select types of statistics:

Statistics Estimates, Model fit, Continue Plots

Descriptive

1

4 12

2 3

5 8

7

6

6/9/2016 67

click ‘Histogram’ and Move Dependent to ‘Y’ and

Continue OK

‘Normal Probability Plot’ ZRESID (or others) to ‘X’

Dependent variable

Standardized

predicted values

Standardized 11

residuals

Deleted residuals

10

Residual plot

Adjusted to check

predicted values assumptions

9

Studentized

residuals

Studentized

deleted residuals

6/9/2016 68

Do the data meet the conditions and assumptions?

1) Do the explanatory variables have a relatively strong linear

relationship with the dependent variable?

Correlations

YES, except variable AGE

Working

Job Experience

Satisfcation (Years) INCOME (RM) AGE (Yeas) SEX

Pearson Correlation Job Satisfcation 1.000 .660 .719 -.045 -.407

Working

.660 1.000 .482 .135 -.232

Experience (Years)

INCOME (RM) .719 .482 1.000 -.093 -.183

AGE (Yeas) -.045 .135 -.093 1.000 .166

SEX -.407 -.232 -.183 .166 1.000

Sig. (1-tailed) Job Satisfcation . .000 .000 .406 .013

Working

.000 . .003 .239 .109

Experience (Years)

INCOME (RM) .000 .003 . .313 .166

AGE (Yeas) .406 .239 .313 . .191

SEX .013 .109 .166 .191 .

N Job Satisfcation 30 30 30 30 30

Working

30 30 30 30 30

Experience (Years)

INCOME (RM) 30 30 30 30 30

AGE (Yeas) 30 30 30 30 30

SEX 30 30 30 30 30

6/9/2016 69

Residual Analysis The deviation

∧ between the data

Residual is defined as ei = yi − y i and the fit.

The realized or

observed values of

the errors

ei Zero mean and

Standardized Residuals: di =

MS E approximately

unit variance

Average standard deviation

ei

ri =

Studentized Residuals:

(

xi − x )

2

Useful in

MS E 1 − +

1

n S xx regression

Exact standard error diagnosis

than the standardized residuals

6/9/2016 70

1) Does the normality assumption hold?

YES, since the histogram is

approximately bell shape or the points

lie approximately along a straight line

in Normal P-P plot

6/9/2016 71

Heavy-tailed/long tailed distribution:

•The points show a sharp upward and

downward curve at both extremes.

distribution: Flattening at the

extremes

Negative/left

skewed

Positive/right

skewed

6/9/2016 72

1) Does the errors variance constant?

contained in a horizontal band.

Double bow Nonlinear/cur

funnel funnel

vilinear

6/9/2016 73

Model Summaryb • The response variable and regressors

Adjusted Std. Error of have strong relationship.

Model R R Square R Square the Estimate

a

1 .834 .695 .646 1.492 • 69.5% of the variation in response are

a. Predictors: (Constant), SEX, AGE (Yeas), INCOME

explained by the regressors.

(RM), Working Experience (Years)

b. Dependent Variable: Job Satisfcation • No important regressor missing.

ANOVAb

Model

Sum of

Squares df Mean Square F Sig.

The overall regression

1 Regression

Residual

126.678

55.622

4

25

31.670

2.225

14.234 .000a model was significant at

Total 182.300 29 0.05 level (p=0.000)

a. Predictors: (Constant), SEX, AGE (Yeas), INCOME (RM), Working Experience

(Years)

b. Dependent Variable: Job Satisfcation

6/9/2016 74

JobStat = 1.406 + 0.413Exp + 0.001Income – 0.003Age – 1.134Sex

significant at 0.05 level, they

Coefficientsa should be removed from further

Unstandardized Standardized analysis.

Coefficients Coefficients

Model B Std. Error Beta t Sig. • Income is the most significant

1 (Constant) 1.406 1.225 1.148 .262

Working predictor in predicting job

.413 .148 .368 2.798 .010

Experience (Years)

satisfaction.

INCOME (RM) .001 .000 .499 3.884 .001

AGE (Yeas) -.003 .030 -.011 -.097 .923

SEX -1.134 .578 -.228 -1.963 .061

•Increase in working experience

a. Dependent Variable: Job Satisfcation (1 year) will lead to an increase

in 1 level of job satisfaction.

Increase in sex (from female to

Significant test for individual parameter male) will lead to a decrease in 1

using one sample t-test level of job satisfaction

6/9/2016 75

Writing it in report 7!

This study examines the possibility of several personal characteristics of

workers in explaining job satisfaction in the firm. Job satisfaction was in a

9-point Likert scale in which 1 denotes extreme dissatisfaction and 9

denotes extreme satisfaction, working experience in years, monthly

income in RM, age in years and sex a categorical variable in which 0 is

for female and 1 for male. Initial analysis using the Pearson correlation

method found that the dependent variable had relatively strong

correlations with the independent variables and with the normality

assumption hold but the error variance is not constant. (Assuming that a

transformation method was applied to deal with the error variance) In

general the analysis yielded a significant regression model with F value

of 14.234 and significant at the 0.05 level. The derived model is

JobStat = 1.406 + 0.413Exp + 0.001Income – 0.003Age – 1.134Sex

6/9/2016 76

Job satisfaction was found to be positively correlated with working

experience and income but had an inversed relationship with age and

sex. High job satisfaction was associated with high income, longer

working experience and young workers. This study also found that

female workers were more likely to be more satisfied than their male

counterparts. Taking the regression model as a whole it was found that

the four independent variables were able to explain 69.5% of the

variance in levels of job satisfaction among the studied workers. At the

individual level, income was the most significant variable in explaining

variations in levels of job satisfaction, followed by working experience

and sex, while age did not have significant impact.

6/9/2016 77

• Enter (Regression): all variables in a block are entered

in a single step.

Model Selection • Remove: all variables in a block are removed in a

single step.

• Stepwise: At each step, the independent variable not in

the model that has the smallest probability of F is

entered, if that probability is sufficiently small. Variables

already in the regression equation are removed if their

probability of F becomes sufficiently large.

• Backward Elimination: all variables are entered into the

equation and then sequentially removed. The variable

with the smallest partial correlation with the dependent

variable and meets the elimination criterion is removed

first. Repeat the procedure until there are no variables

Choose ‘Stepwise’ in the model that satisfy the removal criteria.

method • Forward Selection: variables are sequentially entered

into the model. The variable with the largest

Repeat by using positive/negative correlation with the dependent

‘Remove’, ‘Backward’, variable and satisfies the entry criterion enter first.

‘Forward’ methods Repeat the procedure until there are no variables that

meet the entry criterion.

6/9/2016 78

Model Summary

Variables Entered/Removeda Adjusted Std. Error of

Model R R Square R Square the Estimate

Variables Variables

1 .719a .517 .500 1.773

Model Entered Removed Method

1 Stepwise 2 .803b .645 .619 1.548

(Criteria: 3 .834c .695 .660 1.463

Probabilit a. Predictors: (Constant), INCOME (RM)

y-of-

b. Predictors: (Constant), INCOME (RM), Working

F-to-enter

INCOME Experience (Years)

. <= .050,

(RM)

Probabilit c. Predictors: (Constant), INCOME (RM), Working

y-of- Experience (Years), SEX

F-to-remo

ve >= .

100).

2 Stepwise Models 1, 2 and 3 are all significant, but Model 3 is

(Criteria:

Probabilit

preferred because there is a large increase in R2

Working

y-of-

F-to-enter

values from Model 2 to Model 3.

Experience . <= .050,

(Years) Probabilit ANOVAd

y-of-

F-to-remo Sum of

ve >= . Model Squares df Mean Square F Sig.

100). 1 Regression 94.261 1 94.261 29.979 .000a

3 Stepwise Residual 88.039 28 3.144

(Criteria: Total 182.300 29

Probabilit 2 Regression 117.581 2 58.790 24.527 .000b

y-of- Residual 64.719 27 2.397

F-to-enter

Total 182.300 29

SEX . <= .050,

Probabilit

3 Regression 126.657 3 42.219 19.727 .000c

y-of- Residual 55.643 26 2.140

F-to-remo Total 182.300 29

ve >= . a. Predictors: (Constant), INCOME (RM)

100).

b. Predictors: (Constant), INCOME (RM), Working Experience (Years)

a. Dependent Variable: Job Satisfcation

c. Predictors: (Constant), INCOME (RM), Working Experience (Years), SEX

d. Dependent Variable: Job Satisfcation

6/9/2016 79

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 1.875 .609 3.078 .005

INCOME (RM) .001 .000 .719 5.475 .000

2 (Constant) .344 .724 .475 .638

INCOME (RM) .001 .000 .522 3.990 .000

Working

.458 .147 .408 3.119 .004

Experience (Years)

3 (Constant) 1.320 .832 1.586 .125

INCOME (RM) .001 .000 .501 4.034 .000

Working

.410 .141 .365 2.913 .007

Experience (Years)

SEX -1.145 .556 -.230 -2.059 .050

a. Dependent Variable: Job Satisfcation

method (remove constant)

6/9/2016 80

Nonlinear Regression and Curve Estimation

Open SPSS file ‘Data18’ and create the scatter plot with

fitted line

The linear line is not fitted

well to the data. There

are two ways to solve

this problem:

1. Transform the data

using an appropriate

transformation (say

log)

2. Fit the data using curve

estimation

6/9/2016 81

Transform the y and x values using logarithm: y => ln y

and x => ln x

• Power transformation: Y= aXb

• Compound transformation: Y = abX

• Logarithmic transformation: Y = a + b*lnX

6/9/2016 • Exponential transformation: Y = a*exp(bX) 82

Curve Estimation

move one variable to ‘Dependent’, Analyze

Curve Estimation Regression

regressor to ‘Independent’ menu

Select types of models: Logarithmic, Inverse, OK

Quadratic, Cubic, Power, Growth etc..

1

6

4

2

3

6/9/2016 83

Model Summary and Parameter Estimates

Model Summary Parameter Estimates

Equation R Square F df1 df2 Sig. Constant b1 b2 b3

Logarithmic .759 25.137 1 8 .001 799276.8 -431306

Inverse .894 67.131 1 8 .000 -206770 1165690

Quadratic .768 11.598 2 7 .006 1150714 -379446 29109.793

Cubic .851 11.460 3 6 .007 1811317 -1089058 199266.6 -11279.3

Power .895 68.545 1 8 .000 1990114 -4.253

Growth .918 89.780 1 8 .000 14.831 -1.227

Exponential .918 89.780 1 8 .000 2761657 -1.227

The independent variable is Distance.

Exponential methods are the

most appropriate due to their

large R2 values and model

simplicity:

Power method

Value = 1990114*Distance-4.253

6/9/2016 84

Writing it in report 8!

This study examines the relationship between property value and

distance with the hypothesis that property values are the highest in

areas near the city centre but decreases with the distance from the city

centre. Initial analysis using simple linear regression yielded a

significant but weak relationship with only about 60% variations in

property values explained by distance. Double-log or power

transformation (growth and exponential as well) using natural logarithm

for both variables resulted in a better R2 of 0.895 indicating

approximately 90% accuracy. This study concludes that there is a non-

linear relationship between property values and distance from the city

centre with the following equation:

VALUE = 1990114 / DISTANCE4.253

where VALUE is average property values in RM per 1000m3 and

DISTANCE is distance in kilometer from the city centre.

6/9/2016 85

Dummy Variable

scale.

It may consist of 2 groups (dichotomy),

e.g. sex or > 2 groups.

A dummy variable acts as an independent

variable.

6/9/2016 86

Open SPSS file ‘Data10b’: the scatter plot indicates that

perception of the public is influenced by age has strong

negative relationship. Careful investigation reveals that the

dummy variable, family background, could possibly influence

public opinion.

6/9/2016 87

Method 1: split the variables into two groups manually,

based on non-drug addict family and drug-addict family

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 9.043 .597 15.158 .000

Age -.104 .014 -.713 -7.686 .000

Family -2.647 .453 -.542 -5.842 .000

a. Dependent Variable: Perception

6/9/2016 88

Model Summary Model Summary

Model R R Square R Square the Estimate Model R R Square R Square the Estimate

1 .854a .730 .713 1.290 1 .760a .578 .552 1.356

a. Predictors: (Constant), AGE0 a. Predictors: (Constant), AGE1

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 71.889 1 71.889 43.224 .000a

Residual 26.611 16 1.663

Total 98.500 17

a. Predictors: (Constant), AGE0 ANOVAb

b. Dependent Variable: PERCEP0 Sum of

Model Squares df Mean Square F Sig.

1 Regression 40.355 1 40.355 21.945 .000a

Residual 29.423 16 1.839

Total 69.778 17

a. Predictors: (Constant), AGE1

b. Dependent Variable: PERCEP1

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 9.903 .782 12.665 .000

AGE0 -.127 .019 -.854 -6.575 .000

a. Dependent Variable: PERCEP0

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 5.769 .693 8.326 .000

AGE1 -.085 .018 -.760 -4.685 .000

a. Dependent Variable: PERCEP1

6/9/2016 89

Method 2: Use split file command of the SPSS (see module 1)

Model Summary

Change Statistics

Adjusted Std. Error of R Square

Family Model R R Square R Square the Estimate Change F Change df1 df2 Sig. F Change

non-drug addict family 1 .854a .730 .713 1.290 .730 43.224 1 16 .000

drug-addict family 1 .760a .578 .552 1.356 .578 21.945 1 16 .000

a. Predictors: (Constant), Age

ANOVAb

Sum of

Family Model Squares df Mean Square F Sig.

non-drug addict family 1 Regression 71.889 1 71.889 43.224 .000a

Residual 26.611 16 1.663

Total 98.500 17

drug-addict family 1 Regression 40.355 1 40.355 21.945 .000a

Residual 29.423 16 1.839

Total 69.778 17

a. Predictors: (Constant), Age

b. Dependent Variable: Perception

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Family Model B Std. Error Beta t Sig.

non-drug addict family 1 (Constant) 9.903 .782 12.665 .000

Age -.127 .019 -.854 -6.575 .000

drug-addict family 1 (Constant) 5.769 .693 8.326 .000

Age -.085 .018 -.760 -4.685 .000

a. Dependent Variable: Perception

Drug addict: Perception = 5.769 – 0.085AGE

6/9/2016 90

Dummy Variable with 3 or more categories

Need special treatment in the SPSS.

The # of dummy variables created = #

categories in the variable – 1.

1) Sex (# categories=2) => 1 dummy variable (0=female, 1=male)

2) Agreement (3 categories) => 2 dummy variables

3) Ethnic (4 categories) => 3 dummy variables

Variable 1 Variable 2 Malay 1 0 0

Agree 1 0 Chinese 0 1 0

Disagree 0 1 Indian 0 0 1

Neutral 0 0 Others 0 0 0

6/9/2016 91

Model Summary

Adjusted

R Square

Std. Error of

the Estimate

1 .895a .800 .774 1.177

a. Predictors: (Constant), Ethnic2, Age, Ethnic1, Family

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 172.013 4 43.003 31.031 .000a

Residual 42.960 31 1.386

Total 214.972 35

a. Predictors: (Constant), Ethnic2, Age, Ethnic1, Family

b. Dependent Variable: Perception

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 8.139 .631 12.907 .000

Age -.084 .013 -.579 -6.445 .000

Family -1.705 .483 -.349 -3.528 .001

Ethnic1 .663 .445 .136 1.492 .146

Ethnic2 -1.378 .543 -.278 -2.539 .016

a. Dependent Variable: Perception

Malay: Perception = 8.139 – 0.084AGE – 1.705FAMILY + 0.663(1) – 1.378(0)

= 8.802 – 0.084AGE – 1.705FAMILY

6/9/2016 92

Writing it in report 9!

One of the objectives of this study is to examine whether several personal

background characteristics of the surveyed public influence the shaping of

their opinion towards the proposal that drug addicts be sent to an

uninhabited island as a measure to rehabilitate them. There are three

independent variables, namely age, family background and ethnicity,

where the last two are dummy variables. Analysis using multiple linear

regression taking perception as the dependent variable yielded a

significant model (F=31.03, p=0.000) with R2=0.80 implying 80% of

variations in perception is explained by the independent variables. This

implies that the independent variables exerted strong and significant

influences in shaping public opinion. The derived model is as follows:

PERCEPTION=8.139 – 0.084*AGE – 1.705*FAMILY + 0.663*ETHNIC1 –

1.377*ETHNIC2

6/9/2016 93

Regardless of family background and ethnicity, age had an inverse but

strong (WHY?) and significant influence on perception in that older

respondents tended to be less favorable towards the proposal. In terms

of ethnic influence, the regression model indicated that the Malays

generally showed more favorable response to the proposal followed by

the Indians and the Chinese in that order. In terms of family experience

in drug addiction, respondents from non-drug addict families were more

favorable to the proposal of sending drug addicts to the island compared

with those from drug addict families.

6/9/2016 94

Logit Analysis/ Logistic Regression

It is used when the dependent variable is in

categorical scale.

It is a linear probability method of predicting the

category of outcome for individual cases or

observations.

Advantage: not requiring assumptions of

multivariate normality and equal variance-

covariance

6/9/2016 95

Open ‘Data19[dengue fine]’

Fine=whether the house was fined by the local authority (1=fined, 0=not fined)

Size=size of the house indicated by the floor area

Age=average age of family members

Pot=number of outdoor flower pots in the house compound

Family=total number of persons in the family

Helper=whether the house employs a housemaid (1=yes, 0=no)

House1=house location (1=squatter, 0=other types of houses)

House2=house location (1=flat, 0=other types of houses)

6/9/2016 96

move one variable to ‘Dependent’, Analyze

Binary Logistic Regression

regressors to ‘Covariates’ menu

cases’, ‘include constant in model’

1 8

4

2

3

6

6/9/2016 97

The prediction equation for the probability of a household being fined by the

local authority is:

exp( 6.141− 0.001Size − 0.269Age − 0.049Pot + 0.324Family − 0.741Help + 4.773H1+ 3.879H2)

Prob( y =1) =

1+ exp( 6.141− 0.001Size − 0.269Age − 0.049Pot + 0.324Family − 0.741Help + 4.773H1+ 3.879H2)

outcome A is associated with the presence or absence of factor B in a

given population.

None of the factors is found to be significant (all p > 0.05) or exp(0)=1 lies

in all the 95% CI

Hosmer and Lemeshow Test that predictions made by the model fit

Step Chi-square df Sig. perfectly with observed group memberships

1 23.908 8 .002 Since p=0.02 < 0.05, we reject the null

hypothesis and conclude that predictions

The model is significant

made by the model does not fit with

observed group memberships.

Model Summary

Step likelihood R Square R Square

1 14.583 a .498 .665

a. Estimation terminated at iteration number 7 because

parameter estimates changed by less than .001.

2 Log likelihood measures how poorly the model predicts the decisions. The

smaller the statistic the better the model

Cox & Snell R square and Nagelkerke R square can be interpreted like R

square in multiple linear regression. Cox & Snell R2 cannot reach the

maximum of 1 and Nagelkerke R2 can reach 1.

6/9/2016 100

Classification Tablea

Predicted

Out of 11 households that

Actual FINE Percentage were actually not fined, 10

Observed not fined fined Correct were correctly predicted

Step 1 FINE not fined 10 1 90.9

fined 1 9 90.0 not fined (90.0%

Overall Percentage 90.5 accuracy)

a. The cut value is .500

Casewise List

a

Case Status FINE Predicted Group Resid ZResid

1 S f .804 f .196 .494

2 S f .909 f .091 .317

3 S f .666 f .334 .707

4 S f .890 f .110 .351 Household no. 8 was

5

6

S

S

f

f

.954

.999

f

f

.046

.001

.220

.028

misclassified. It was actually

7 S f .999 f .001 .028 fined, but predicted not fined

8 S f** .033 n .967 5.444

9 S f .717 f .283 .628

with the probability of 0.033.

10 S f .935 f .065 .264

11 S n .387 n -.387 -.794

12 S n .196 n -.196 -.493

13 S n .185 n -.185 -.477

14 S n .001 n -.001 -.030 Household no. 16 was

15

16

S

S

n

n**

.007

.550

n

f

-.007

-.550

-.084

-1.105

misclassified. It was actually

17 S n .016 n -.016 -.126 not fined, but predicted fined

18 S n .302 n -.302 -.658

19 S n .116 n -.116 -.362

with the probability of 0.55.

20 S n .196 n -.196 -.494

21 S n .139 n -.139 -.401

a. S = Selected, U = Unselected cases, and ** = Misclassified cases.

6/9/2016 101

Writing it in report 10!

This study is concerned with the dengue fever epidemics and how the

households in a local community responded to measures adopted by the

local health authority in imposing fines on households found to allow their

house compounds to become the breeding ground of the aedes mosquito

which spreads the viruses. The dependent variable took the form of a

dummy with value 1 denoting the households that were fined by the local

health authority and value 0 for those which were not fined. The objective

was to examine whether a set of predictor variables comprising

household characteristics such as house and family sizes, average age of

family members, whether the households employed housemaids, and

locations of the houses could be used to predict the probability of being

fined. The study found that of the six predictor variables entered in the

analysis, only house type was found to be significant at least at the 0.05

level. Other variables were not significant owing to the small sample size

used in this study.

6/9/2016 102

The parameter estimates showed that houses of bigger sizes, with more

members, occupied by family members dominated by the young, which

did not employ housemaids, and were located in squatter and flat areas

had greater probability of being negligent and thus fined by the local

authority. The goodness-of-fit statistic was found to be good with the

observed chi-square estimate of 23.908 and significance at the 0.05 level

(p=0.02). It is indicated that out of the total 11 households that were

actually not fined by the local authority, 10 households were predicted not

fined by the logit model resulting in a 90.9% success, and out of the 10

households that were actually fined, 9 households were predicted fined

resulting in a 90% success, giving the overall model predictability of

90.5%. The study concludes that the household characteristics selected

were able to predict the probability of the households being fined by the

local health authority for failing to prevent the spread of dengue fever in

the community under study.

6/9/2016 103

Principal Component Analysis and

6/9/2016

Factor Analysis 104

Comparing PCA and Factor Analysis

PCA FA

to identify a relatively small number of same

factors that can be used to represent

relationships among sets of many

interrelated variables. (reduce

dimension)

identify the underlying, not directly same

observable, constructs (hidden

variables)

all the variations in a given population only part of the variation in a given

are contained within the variables population is contained within the

used to define that population variables used to define that

population

6/9/2016 105

PCA FA

used in deterministic approach used in studies that use a more

studies flexible experimental approach

Objective: to select a number of Objective: the factors (components)

components that explain as much of are selected mainly to explain the

the total variance as possible interrelationship among the original

variables.

Its value for a given individual is Emphasis on obtaining easily

relatively simple to compute and understandable factors that convey

interpret the essential information contained

in the original set of variables.

6/9/2016 106

move all selected Data Analyze

Descriptive Factor

variables to ‘Variables’ Reduction menu

select ‘initial solution’, ‘coefficients’

and ‘univariate descriptive’ Continue

Open SPSS file ‘Data20’

1

24

4

8

5 16

12 20

2 3

7

6/9/2016 107

Continue select ‘Correlation matrix’, ‘Unrotated factor choose ‘Principal Extractio

solution’, ‘Sree plot’ and ‘Eigenvalues over: 1’ Components’ method n

select ‘Rotated solution’, choose ‘Varimax’

Continue ‘Loading plots’ Rotation

method

Continue score coefficient matrix’ variables: Regression’ Scores

Continue click ‘Sorted by size’ Options

listwise’

9 11 19

For Factor 17

Analysis,

10

choose

18

‘principal axis

factoring’,

‘Alpha

factoring’ or

‘Image 15 23

13 22

factoring’

21

14

6/9/2016 108

Economic Sector Variable Definition

Employment Sector Variable

Agriculture, hunting and forestry AGRIC

Fishing FISH

Mining and quarrying MINE

Manufacturing MANU

Electricity, gas and water supply ELECT

Construction CONST

Wholesale, retail, repair of motor vehicle, personnel WSALE

Hotels and restaurants HOTEL

Transport, storage and communication TRANS

Financial intermediation FINANCE

Real estate, renting and business activities ESTATE

Public admin, defence, social security ADMIN

Education EDU

Health and social work HEALTH

Other community, social and personal services OTHERS

Housemaid MAID

6/9/2016 109

Step #17: save the factor scores as variables yielded the table below.

There are only 4 optimum factors (FAC1_1 to FAC4_1). These factor

scores are listed after the last variable in Data View window.

6/9/2016 110

Determine the number of components using Scree Plot

component. It appears that 3 (or

perhaps 4) sample components

effectively summarize the total

sample variance.

6/9/2016 111

Step #10 Extract:

Determine the number of • If you put the ‘eigenvalues over’

components using % of variance = 1, the optimum # of

components is obtained

The first 4 components contributed

• If the eigenvalues over = 0, the

82.685% cumulative % of variance

where contribution of the 5th component

# of components = the number of

is small 5.231% of variance, which is variables

not significant. • You can also set the number of

components desired

Total Variance Explained

Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings

Component Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %

1 6.738 42.112 42.112 6.738 42.112 42.112 5.790 36.185 36.185

2 3.023 18.892 61.004 3.023 18.892 61.004 3.147 19.672 55.857

3 2.432 15.199 76.203 2.432 15.199 76.203 2.994 18.712 74.568

4 1.037 6.482 82.685 1.037 6.482 82.685 1.299 8.117 82.685

5 .837 5.231 87.916

6 .767 4.796 92.711

7 .414 2.585 95.297 Contributions of the first 4

8 .297 1.854 97.151

9 .208 1.301 98.452 components after rotation.

10 .131 .821 99.273

11 .064 .399 99.672

12 .042 .260 99.933

13 .011 .067 100.000

14 1.10E-016 6.85E-016 100.000

15 -6.6E-017 -4.11E-016 100.000

16 -1.9E-016 -1.17E-015 100.000

6/9/2016 112

Extraction Method: Principal Component Analysis.

Component Matrixa

Component

Component matrix before rotation

1 2 3 4

FINANCE

ESTATE

.932

.929

.286

.242

.011

.056

-.148

.084

Component matrix after rotation

MAID .910 .180 -.327 -.027

TRANS .839 .053 -.241 .198

AGRIC -.797 .164 -.435 -.277

OTHERs

Rotated Component Matrixa

.761 .136 -.385 -.121

WSALE .687 .425 .269 -.329

Component

EDU -.636 .362 .429 -.144

MANU .330 -.815 .203 .364 1 2 3 4

CONST -.268 .778 .349 .100 FINANCE .939 -.038 .271 -.121

ELECT .599 .677 -.032 .188 MAID .939 -.274 -.020 -.105

ADMIN -.528 .565 .283 -.072 ESTATE .901 -.139 .304 .091

HOTEL .167 -.293 .770 .063

ELECT .826 .255 -.015 .325

HEALTH .410 -.188 .656 -.382

FISH -.563 .204 -.583 .040 OTHERs .805 -.245 -.105 -.205

MINE -.233 .529 .251 .643 TRANS .795 -.401 .049 .089

Extraction Method: Principal Component Analysis. WSALE .743 .310 .392 -.179

a. 4 components extracted.

MANU -.135 -.823 .488 .112

CONST .044 .797 .002 .413

Component 1 consists of variables ADMIN -.259 .756 -.078 .197

‘Finance’, ‘Maid’, ‘Estate’, ‘Elect’, ‘Others’, EDU -.472 .707 .075 .106

‘Trans’ and ‘Wsale’ because they have HOTEL -.149 -.049 .821 .111

HEALTH .146 .070 .810 -.313

largest coefficient values as compared to

FISH -.277 .143 -.776 .025

components 2, 3 and 4. AGRIC -.519 .355 -.691 -.235

MINE -.034 .360 -.047 .823

Component 2 consists of variables Manu,

Extraction Method: Principal Component Analysis.

Const, Admin and Edu. Rotation Method: Varimax with Kaiser Normalization.

a. Rotation converged in 7 iterations.

Component 3 consists of variables Hotel,

Health, Fish and Agric. Component 4 consists of variable Mine

only.

6/9/2016 113

Rotated Component Matrixa

Hidden Variables: 1 2

Component

3 4

FINANCE .939 -.038 .271 -.121

• Component 1: Dominated by Tertiary MAID .939 -.274 -.020 -.105

Employment (Y1) ESTATE .901 -.139 .304 .091

ELECT .826 .255 -.015 .325

• Component 2: Lack in Manufacturing OTHERs .805 -.245 -.105 -.205

TRANS .795 -.401 .049 .089

(Y2) WSALE .743 .310 .392 -.179

MANU -.135 -.823 .488 .112

• Component 3: Dominant in Tourist- CONST .044 .797 .002 .413

Related Activities (Y3) ADMIN -.259 .756 -.078 .197

EDU -.472 .707 .075 .106

• Component 4: Dependent on Mining (Y4) HOTEL -.149 -.049 .821 .111

HEALTH .146 .070 .810 -.313

FISH -.277 .143 -.776 .025

Instead of using 16 variables, the AGRIC -.519 .355 -.691 -.235

researcher may use these 4 components to MINE -.034 .360 -.047 .823

Extraction Method: Principal Component Analysis.

study the problem concerned. Rotation Method: Varimax with Kaiser Normalization.

a. Rotation converged in 7 iterations.

+ 0.805OTHERS + 0.795TRANS + 0.743WSALE

Y2 = –0.823MANU + 0.797CONST + 0.756ADMIN + 0.707EDU

Y3 = 0.821HOTEL + 0.810HEALTH – 0.776FISH – 0.691AGRIC

Y4 = 0.823MINE

6/9/2016 114

The component plot for the first 3 components from the

rotated component matrix, e.g. Financial coordinate is

(0.939, -0.038, 0.271)

6/9/2016 115

Scatter plot: the relative positions of states in terms of dominance of

employment in the tertiary sector (component 1) and problem of lack in

manufacturing (component 2). You may plot the scatter diagrams for any

combination of the components.

Plot using the factor scores obtained from Step #17 116

6/9/2016

Writing it in report 11!

In this study, principal component analysis was used to portray the

basic structure of economic development of states in Malaysia

based on the percentage of employment in sixteen economic sub-

sectors, namely agriculture (including hunting & forestry), fishing,

mining (including quarrying), manufacturing, utility (including

electricity, gas & water supply), construction, wholesale (including

retail, repair of motor vehicle & personal trade), hotels and

restaurants, transport (including storage & communication),

financial intermediation, real estate (including renting and business

activities), administration, housemaid services, education, health

and other community, social and personal services. Initial analysis

showed variations in the values of the variables and correlations

among several variables, both indicating the possibility of using this

technique. Initial solution resulted in four components (factors) that

had eigenvalues of more tan one with a total of 82.7% variance

explained.

6/9/2016 117

The first component accounted for 42.1% of the variance explained,

the second component contributed 18.9%, the third component 15.2%

and the fourth component 6.5%. Rotation using the varimax method

did not result in an increase in total variance explained, but it

managed to raise slightly the percentage of variance explained for the

first, second and third components so that the contributions of these

components to the underlying economic structure of the country

studied were made clear and more significant.

Seven employment sub-sector variables, namely in finance,

housemaid services, real estate, utility, transport and communication,

wholesale and others loaded high on Component 1. Employment in

manufacturing, construction, administration and educational services

loaded high on Component 2. Employment in hotel, health services,

fishing and agriculture loaded high on Component 3. The only

variable contributing to the formation of Component 4 was

employment in mining. In this study, the first component was labeled

‘Dominant by tertiary employment’, the second component ‘Lack in

manufacturing’, the third ‘Dominant in tourist-related activities’ and the

fourth ‘Dependent on mining’.

6/9/2016 118

Figure 1 maps out the position of each state in the employment

structure based on factor scores. Kuala Lumpur was positioned far

ahead of other states in the first component of employment

dimension, followed by Selangor in the middle, while other states

flocked together with generally low scores. On the other extreme,

Perlis and Kelantan, and to a lesser extent, Terengganu, showed not

only low scores for tertiary activities but also lack of employment in

manufacturing. Seen from the second figure which was defined as

the dominance of employment in tertiary activities and tourism,

Kuala Lumpur did quite well, positioned positively far from other

states. States such as Melaka, Penang and Kelantan did quite well

in tourism but greatly lacked in tertiary activities.

6/9/2016 119

From the third figure where employment in tertiary activities was

seen together with mining, the position of Terengganu clearly

eclipsed that of other states. But because several other sub-sectors

such as utility, construction and agriculture too loaded (either

positively or negatively) quite heavily on this component, Melaka

was also positioned near Terengganu. When scores for component 2

depicting lack in manufacturing and component 3 for significant

contribution to tourism in employment were put together, the position

of Kelantan, Perlis, Terengganu and Kuala Lumpur as a group

separated from other states was very clear. Relative to other states,

Penang did well both in manufacturing and tourism as shown by the

negative and low scores on components 2 and 4 were plotted

together, Terengganu was found to be positioned far from other

states indicating a problem of lack in manufacturing and dependence

on mining, while Kelantan and, to a lesser extent, Perlis were behind

other wates in both sub-sectors. Finally, when components 3 and 4

were plotted in a two-dimension map, the two states were positioned

in two extremes where Melaka lacked in mining, fishing and

agriculture but did well in tourism, and Sabah did well in agriculture

but not quite well in mining.

6/9/2016 120

Better tertiary employment

6/9/2016 121

6/9/2016 122

Clustering

Techniques

6/9/2016 123

This procedure attempts to identify relatively

homogeneous groups of cases (or variables)

based on selected characteristics, using an

algorithm that starts with each case (or variable)

in a separate cluster and combines clusters until

only one is left

It can be used to reduce dimension or reduce

cases. Since the variables or cases in the same

group are homogeneous, we can choose only

one of them to represent the whole group

6/9/2016 124

Measure

Allows you to specify the distance or similarity measure to be used

in clustering. Select the type of data and the appropriate

distance or similarity measure:

1. Interval: Available alternatives are Euclidean distance,

squared Euclidean distance, cosine, Pearson correlation,

Chebychev, block, Minkowski, and customized

2. Counts: Available alternatives are chi-square measure and phi-

square measure

3. Binary: Available alternatives are Euclidean distance, squared

Euclidean distance, size difference, pattern difference,

variance, dispersion, shape, simple matching, phi 4-point

correlation, lambda, Anderberg's D, dice, Hamann, Jaccard,

Kulczynski 1, Kulczynski 2, Lance and Williams, Ochiai, Rogers

and Tanimoto, Russel and Rao, Sokal and Sneath 1, Sokal and

Sneath 2, Sokal and Sneath 3, Sokal and Sneath 4, Sokal and

Sneath 5, Yule's Y, and Yule's Q

6/9/2016 125

Euclidean Distance

Let the data of m variables and n observations were

recorded as follows: Var 1 Var 2 Var m

x x22 L x2m

Obs 2

21

M M xij M

Obs n xn1 xn 2 L xnm

2 2 2

d hk = h, k = 1, 2,K , n

2 2 2

h, k = 1, 2,K, m

6/9/2016 126

Distance Matrix

d11 = 0 d12 L d1m

d 0 L d 2 m

The diagonal entries = 0, means

D= 21

M M O M there is no distance between a

variable/case and itself.

d n1 dn2 L d nm = 0

s jk

1 r12 L r1q rjk =

s j sk

r21 1 r2 q

R= 1 n

= ∑ ( x hj − x j )( x hk − x k )

M M s jk

n h =1

rq1 rq 2 L 1 s j = s jj

Note: The distance matrix and similarity matrix can be obtained by

using the command Analyze => Correlate => Distances

6/9/2016 127

Single (Nearest) Linkage

Single link method defines the similarity between the (j,k)-

group and the remaining variables l as fallows

{ }

r( jk )l = max rjl , rkl , l ≠ j , k ; 1 ≤ l ≤ n ( or m )

Complete link method defines the similarity between the

(j,k)-group and the remaining variables l as fallows

{ }

r( jk )l = min rjl , rkl , l ≠ j , k ; 1 ≤ l ≤ n ( or m )

6/9/2016 128

Between (Average) Groups Linkage

6/9/2016 129

move categorical variable move all selected Hierarchical Classif Analyze

to ‘Label Cases by’ variables to ‘Variables’ Cluster y menu

select ‘cases’ Agglomeration

Statistics Continue

under Cluster schedule Open SPSS file ‘Data20’

1 17

4

5 Cluster

6 cases

10

2 7

13

3

8 9

6/9/2016 130

Plot Dendogram Continue Method Select cluster method

‘Furthest neighbor’

select measure

OK Continue ‘Pearson Correlation’

11 12 14 16

15

1. Between (average) groups linkage 5. Centroid clustering

2. Within (average) groups linkage 6. Median clustering

3. Nearest neighbor/Simple linkage 7. Ward’s method

Proximity Matrix

13: 14:Kuala

Case 1:Johor 2:Kedah 3:Kelantan 4:Melaka5:N.Sembilan6:Pahang 7:Perak 8:Perlis 9:Penang 10:Sabah11:Sarawak12:Selangor Terengganu Lumpur

1:Johor 1.000 .949 .637 .973 .951 .680 .927 .698 .967 .526 .528 .930 .728 .618

2:Kedah .949 1.000 .817 .894 .985 .869 .975 .867 .842 .749 .759 .818 .831 .523

3:Kelantan .637 .817 1.000 .601 .805 .965 .856 .979 .448 .914 .927 .531 .933 .486

4:Melaka .973 .894 .601 1.000 .929 .611 .905 .663 .973 .416 .424 .968 .735 .736

5:N.Sembilan .951 .985 .805 .929 1.000 .840 .981 .859 .863 .692 .710 .876 .843 .626

6:Pahang .680 .869 .965 .611 .840 1.000 .874 .973 .485 .953 .966 .520 .866 .393

7:Perak .927 .975 .856 .905 .981 .874 1.000 .902 .825 .752 .755 .847 .891 .647

8:Perlis .698 .867 .979 .663 .859 .973 .902 1.000 .515 .903 .924 .583 .944 .477

9:Penang .967 .842 .448 .973 .863 .485 .825 .515 1.000 .303 .301 .950 .585 .655

10:Sabah .526 .749 .914 .416 .692 .953 .752 .903 .303 1.000 .989 .339 .764 .235

11:Sarawak .528 .759 .927 .424 .710 .966 .755 .924 .301 .989 1.000 .344 .783 .225

12:Selangor .930 .818 .531 .968 .876 .520 .847 .583 .950 .339 .344 1.000 .691 .825

13:Terengganu .728 .831 .933 .735 .843 .866 .891 .944 .585 .764 .783 .691 1.000 .631

14:Kuala Lumpur .618 .523 .486 .736 .626 .393 .647 .477 .655 .235 .225 .825 .631 1.000

This is a similarity matrix

Agglomeration Schedule

Start with 14 clusters

Stage Cluster First

Cluster Combined Appears (states). First

Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage

1 10 11 .989 0 0 9

combine cluster 10

Complete 2 2 5 .985 0 0 4 and cluster 11

3 3 8 .979 0 0 7

linkage 4 2 7 .975 2 0 10 because they have

5

6

1

1

4

9

.973

.967

0

5

0

0

6

8

the largest similarity

7 3 6 .965 3 0 9 value (0.989). This is

8 1 12 .930 6 0 12

9 3 10 .903 7 1 11

followed by

10 2 13 .831 4 0 11 combining clusters 2

11 2 3 .692 10 9 13

12 1 14 .618 8 0 13 and 5, and so on.

13 1 2 .225 12 11 0

6/9/2016 132

Horizontal Icicle

Number of clusters

Case 1 2 3 4 5 6 7 8 9 10 11 12 13

11:Sarawak X X X X X X X X X X X X X

X X X X X X X X X X X X X

10:Sabah X X X X X X X X X X X X X

X X X X X

6:Pahang X X X X X X X X X X X X X

X X X X X X X

8:Perlis X X X X X X X X X X X X X

X X X X X X X X X X X

3:Kelantan X X X X X X X X X X X X X

X X X

13:Terengganu X X X X X X X X X X X X X

X X X X

7:Perak X X X X X X X X X X X X X

X X X X X X X X X X

5:N.Sembilan X X X X X X X X X X X X X

X X X X X X X X X X X X

2:Kedah X X X X X X X X X X X X X

X

14:Kuala Lumpur X X X X X X X X X X X X X

X X

12:Selangor X X X X X X X X X X X X X

X X X X X X

9:Penang X X X X X X X X X X X X X

X X X X X X X X

4:Melaka X X X X X X X X X X X X X

X X X X X X X X X

1:Johor X X X X X X X X X X X X X

1 cluster 3 clusters

2 clusters 4 clusters

6/9/2016 133

Dendogram

Cluster 1: Sabah,

Sarawak, Kelantan,

Perlis, Pahang

Cluster 2: Kedah, N9,

Perak, Terengganu

Cluster 3: Johor, Melaka,

Penang, Selangor

Cluster 4: KL

6/9/2016 134

The number of clusters can be adjusted according to the

desired degrees of similarity

6/9/2016 135

6/9/2016 136

6/9/2016 137

Clustering variable

6/9/2016 138

K-Means Clustering

This procedure attempts to identify relatively

homogeneous groups of cases based on selected

characteristics, using an algorithm that can handle large

numbers of cases. However, the algorithm requires you

to specify the number of clusters

You can specify initial cluster centers if you know this

information

You can select one of two methods for classifying cases,

either updating cluster centers (centers will change)

iteratively or classifying only

The k-means cluster analysis command is efficient

primarily because it does not compute the distances

between all pairs of cases, as do many clustering

algorithms

6/9/2016 139

It is a tool designed to assign cases to a fixed

number of groups (clusters) whose

characteristics are not yet known but are based

on a set of specified variables. It is most useful

when you want to classify a large number

(thousands) of cases

A good cluster analysis is:

Efficient. Uses as few clusters as possible

Effective. Captures all statistically and commercially

important clusters. For example, a cluster with five

customers may be statistically different but not very

profitable

6/9/2016 140

Select the number of move all selected K-Means Analyze

Classify

clusters needed variables to ‘Variables’ Cluster menu

Create a new

file to store Iterate and Define the maximum

Continue Save

results classify number of iterations

click the boxes to

OK Continue select ‘Statistics’ Options

save and continue

1 15

4

5

7

2

3

6

14 8 10 12

13

9 11

8

6/9/2016 141

Initial Cluster Centers

1 2 3

AGRIC 1.4 .3 29.5 Change in Cluster Centers

FISH .8 .0 4.7 Iteration 1 2 3

MINE .2 .3 .3 1 10.000 7.484 10.664

MANU 40.1 13.6 12.2 2 2.283 .000 1.753

ELECT .5 1.0 .6 3 1.743 .000 1.944

CONST 6.2 10.6 7.6 4 .000 .000 .000

WSALE 15.9 22.3 15.3 a. Convergence achieved due to no or small

HOTEL 7.5 7.4 4.7 change in cluster centers. The maximum

TRANS 4.8 6.4 4.7 absolute coordinate change for any center is

FINANCE 2.4 7.6 1.3 .000. The current iteration is 4. The minimum

ESTATE 4.3 8.9 1.6 distance between initial centers is 28.739.

ADMIN 4.3 6.9 6.5

EDU 4.2 4.5 4.7

HEALTH 2.9 2.4 1.3 In early iterations, the cluster centres

OTHERs 2.0 3.2 2.3 shift quite a lot.

MAID 2.3 3.7 2.5

Case Number Cluster Distance settled down to the general area of

1 1 5.680 their final location, and the last

2 1 7.869

3 3 4.522

iteration is minor adjustments.

4 1 6.084

5 1 7.462

6 3 3.860

If the algorithm stops because the

7 1 8.716 maximum number of iterations (10 in

8 3 4.437 this example) is reached, you may

9 1 13.639

10 3 8.516

want to increase the maximum

11 3 7.837 because the solution may otherwise be

12 2 7.484

13 3 10.846

unstable.

6/9/2016 14 2 7.484 142

Final Cluster Centers

Cluster

1 2 3 The final cluster centres

AGRIC 10.0 1.3 23.2 Number of Cases in each Cluster

FISH .9 .2 2.2

are computed as the Cluster 1 6.000

MINE .3 .3 .5 mean for each variable 2 2.000

MANU 30.2 20.1 13.8

ELECT .6 .9 .6 within each final cluster. 3 6.000

Valid 14.000

CONST

WSALE

7.8 10.0 11.0 The final cluster centres Missing .000

15.4 19.6 15.3

HOTEL 7.2 6.3 6.1 reflect the

TRANS

FINANCE

4.7

1.9

7.1

6.0

3.9

1.2

characteristics of the

ESTATE 3.0 8.0 2.1 typical case for each

ADMIN 7.1 7.0 8.5

EDU 5.5 4.7 6.7

cluster.

HEALTH 2.0 2.2 1.7

OTHERs 1.7 2.9 1.7

MAID 1.9 3.4 1.6

distances between the final cluster

Distances between Final Cluster Centers

centres. Greater distances between

Cluster 1 2 3

1 15.922 21.514 clusters correspond to greater

2 15.922 24.916 dissimilarities.

3 21.514 24.916

• Cluster 3 is approximately equally

similar to clusters 1 and 2.

6/9/2016 143

ANOVA

Cluster

Mean Square df

Error

Mean Square df F Sig.

The ANOVA table

AGRIC 462.014 2 33.735 11 13.696 .001 indicates which

FISH 4.296 2 1.139 11 3.771 .057

MINE .064 2 .070 11 .916 .429

variables contribute

MANU 407.604 2 27.601 11 14.768 .001 the most to your

ELECT .073 2 .023 11 3.184 .081

CONST 14.986 2 5.299 11 2.828 .102

cluster solution.

WSALE 15.184 2 3.097 11 4.903 .030

HOTEL 1.764 2 1.229 11 1.435 .279

TRANS 7.688 2 .426 11 18.037 .000 Variables with

FINANCE

ESTATE

18.092 2 .579 11 31.242 .000 large F values (or

26.271 2 .764 11 34.375 .000

ADMIN 3.908 2 2.818 11 1.387 .290 small p-values)

EDU

HEALTH

3.922 2 1.390 11 2.821 .103 provide the

.167 2 .282 11 .590 .571

OTHERs 1.202 2 .214 11 5.618 .021 greatest

MAID 2.535 2 .198 11 12.774 .001 separation

The F tests should be used only for descriptive purposes because the clusters have been

chosen to maximize the differences among cases in different clusters. The observed between clusters.

significance levels are not corrected for this and thus cannot be interpreted as tests of the

hypothesis that the cluster means are equal.

6/9/2016 144

A new SPSS file was created to store the results

column data

Data

Transpose

select all

variables needed

OK

6/9/2016 145

Plot of Distances from Cluster Center by Cluster Membership

This is a diagnostic plot that helps you to find outliers within clusters.

► Click the Graphs

menu and select Chart

Builder

► Click the Gallery

tab, select Boxplot

from the list of chart

types, and drag and

drop the Simple

Boxplot icon onto the

canvas.

► Drag and drop

Distance of Case from

its Classification

Cluster Center onto the

y axis.

► Drag and drop

Cluster Number of

Case onto the x axis.

► Click OK to create

the boxplot.

6/9/2016 146

6/9/2016 147

Discriminant Analysis

6/9/2016 148

Discriminant analysis is used to model the value

of a dependent categorical variable based on its

relationship to one or more predictors

Given a set of independent variables,

discriminant analysis attempts to find linear

combinations of those variables that best

separate the groups of cases. These

combinations are called discriminant functions

and have the form displayed in the equation

6/9/2016 149

d ik = b0 k + b1k xi1 + K + b pk xip

will separate the groups as much as possible. It then

chooses a second function that is both uncorrelated with

the first function and provides as much further separation

as possible. The procedure continues adding functions in

this way until reaching the maximum number of functions

as determined by the number of predictors and categories

in the dependent variable.

6/9/2016 150

The discriminant model has the following assumptions:

1. The predictors are not highly correlated with each other

2. The mean and variance of a given predictor are not

correlated

3. The correlation between two predictors is constant across

groups

4. The values of each predictor have a normal distribution

Exercise:

If you are a political analyst, you want to be able to identify characteristics

that are indicative of voters who are likely to vote presidential candidate of

USA, and you want to use those characteristics to identify supporters and

opponents.

‘Data21’ . Use a random sample of these 1102 (about 70%) voters to

create a discriminant analysis model, setting the remaining voters aside to

validate the analysis. Then use the model to classify the 466 voters as

supporter

6/9/2016

or opponent. 151

select ‘Fixed Value’ select ‘Set Random Number Transform

OK

and type ‘9191972’ Starting Point’ Generators menu

OK

Numeric Expression text box. Target Variable text box. Variable menu

1,6 This sets the

7 values of

Generate 5 validate to be

Random randomly

selection generated

3 Bernoulli

of cases

4 variates with

probability

parameter 0.7.

2

Approximately 70

% of the voters

8 9 previously voted

will have a

validate value of

1. These voters

will be used to

create the model.

The remaining

voters who were

previously voted

will be used to

validate the

model results.

6/9/2016 152

10

Define range: select ‘candidate’ as

Discriminant Classify

Analyze

Continue minimum=1, maximum=2 the grouping variable menu

selection variable the independent

1

4 22

5

2

8 9

11 16 19

3

6 10

5 2 9

6/9/2016 153

Select ‘Within-groups select ‘Fisher’s’ and select Means, Univariate

Continue correlation’ ‘Unstandardized’ ANOVA, Box’s M

‘Leave-one-out classification’

‘Probabilities of group membership’

18

14

12

13

15 17

21

20

6/9/2016 154

Classification Statistics

The classification functions are used to assign cases to

groups.

Prior Probabilities for Groups

CLINTON, BUSH Prior Unweighted Weighted

Bush .500 466 466.000

Clinton .500 636 636.000

Total 1.000 1102 1102.000

The coefficients for Age and Highest

VOTE FOR CLINTON, year of school completed are smaller

BUSH

Bush Clinton

for the Clinton classification function,

AGE OF RESPONDENT

age categories

.589

-4.613

.563

-4.213

which means that voters who have

HIGHEST YEAR OF

5.761 5.740 voted in previous presidential election

SCHOOL COMPLETED

RS HIGHEST DEGREE -9.436 -9.411 are less likely to Clinton’s supporter.

RESPONDENTS SEX 6.604 7.126

(Constant) -46.455

Fisher's linear discriminant functions

-46.770

Similarly, voters with larger age

categories and higher degree are more

likely to support Clinton.

6/9/2016 155

There is a separate function for each group. For each case, a

classification score is computed for each function. The discriminant model

assigns the case to the group whose classification function obtained the

highest score.

Classification function for Bush:

Y1 = −46.455 + 0.589 Age − 4.613 AgeCategory + 5.761School − 9.436 Degree + 6.604 Sex

Y2 = −46.770 + 0.563 Age − 4.213 AgeCategory + 5.740 School − 9.411Degree + 7.126 Sex

Y1=37.223 and Y2=37.025. Thus, this voter will be classified as Bush supporter.

chance that she will vote Clinton, so she is a more likely to vote Bush

(but not a strong supporter).

6/9/2016 156

The eigenvalues table provides information about the relative efficacy

of each discriminant function. When there are two groups, the

canonical correlation is the most useful measure in the table, and it is

equivalent to Pearson's correlation between the discriminant scores

and the groups. Eigenvalues

Canonical

Function Eigenvalue % of Variance Cumulative % Correlation

1 .020a 100.0 100.0 .141

a. First 1 canonical discriminant functions were used in the

analysis.

Wilks' lambda is a measure of how well each function separates cases into

groups. It is equal to the proportion of the total variance in the discriminant

scores not explained by differences among the groups. Smaller values of Wilks'

lambda indicate greater discriminatory ability of the function.

Wilks' Lambda

Wilks'

Test of Function(s) Lambda Chi-square df Sig.

1 .980 22.038 5 .001

The associated chi-square statistic tests the hypothesis that the means

of the functions listed are equal across groups. The small significance

value indicates that the discriminant function does better than chance at

separating the groups.

6/9/2016 157

Checking Collinearity of Predictors: The within-groups correlation matrix

shows the correlations between the predictors.

Pooled Within-Groups Matrices

HIGHEST

AGE OF YEAR OF

RESPON age SCHOOL RS HIGHEST RESPOND

DENT categories COMPLETED DEGREE ENTS SEX

Correlation AGE OF RESPONDENT 1.000 .943 -.306 -.246 .037

age categories .943 1.000 -.247 -.189 .021

HIGHEST YEAR OF

-.306 -.247 1.000 .870 -.068

SCHOOL COMPLETED

RS HIGHEST DEGREE -.246 -.189 .870 1.000 -.066

RESPONDENTS SEX .037 .021 -.068 -.066 1.000

measure each independent variable's potential before the model is created.

Wilks' lambda is Tests of Equality of Group Means Each test displays the

another measure of Wilks' results of a one-way

a variable's

Lambda F df1 df2 Sig. ANOVA for the independent

AGE OF RESPONDENT 1.000 .069 1 1100 .794

potential. Smaller age categories

variable using the grouping

1.000 .194 1 1100 .660

values indicate the HIGHEST YEAR OF variable as the factor. If the

1.000 .147 1 1100 .701

variable is better at

SCHOOL COMPLETED significance value is greater

RS HIGHEST DEGREE 1.000 .042 1 1100 .837

discriminating than 0.10, the variable

RESPONDENTS SEX .985 16.822 1 1100 .000

probably does not

between groups.

contribute to the model.

6/9/2016 158

Standardized Canonical Discriminant Function Coefficients

Function

The standardized coefficients allow you to

1 compare variables measured on different

AGE OF RESPONDENT -1.497 scales. Coefficients with large absolute

age categories 1.453

HIGHEST YEAR OF

values correspond to variables with greater

SCHOOL COMPLETED

-.210 discriminating ability. It downgrades the

RS HIGHEST DEGREE .104 importance of Sex.

RESPONDENTS SEX .886

Function

of each predictor variable with the

1 discriminant function. The ordering in the

RESPONDENTS SEX .868

structure matrix is the same as that

age categories .093

HIGHEST YEAR OF suggested by the tests of equality of group

-.081

SCHOOL COMPLETED means and is different from that in the

AGE OF RESPONDENT -.055

RS HIGHEST DEGREE -.043

standardized coefficients table. This

Pooled within-groups correlations between discriminating disagreement is likely due to the collinearity

variables and standardized canonical discriminant functions noted in the correlation matrix.

Variables ordered by absolute size of correlation within function.

Since the structure matrix is unaffected by collinearity, it's safe to say that this

collinearity has inflated the importance of Age, Age Categories, Highest year of

school completed and Highest degree in the standardized coefficients table.

Thus, voter’s sex best discriminates between supporters and opponents.

6/9/2016 159

Checking for Correlation of Group Means and Variances: The group

statistics table reveals a potentially more serious problem. For all five

predictors, larger group means tend to associate with larger group

standard deviations. Group Statistics

CLINTON, BUSH Mean Std. Deviation Unweighted Weighted

Bush AGE OF RESPONDENT 48.9893 16.53325 466 466.000

age categories 2.5300 1.05757 466 466.000

HIGHEST YEAR OF

13.9614 2.65156 466 466.000

SCHOOL COMPLETED

RS HIGHEST DEGREE 1.7103 1.18402 466 466.000

RESPONDENTS SEX 1.5193 .50016 466 466.000

Clinton AGE OF RESPONDENT 48.7264 16.41338 636 636.000

age categories 2.5582 1.04002 636 636.000

HIGHEST YEAR OF

13.8931 3.10213 636 636.000

SCHOOL COMPLETED

RS HIGHEST DEGREE 1.6950 1.25292 636 636.000

RESPONDENTS SEX 1.6415 .47993 636 636.000

Total AGE OF RESPONDENT 48.8376 16.45719 1102 1102.000

age categories 2.5463 1.04709 1102 1102.000

HIGHEST YEAR OF

13.9220 2.91902 1102 1102.000

SCHOOL COMPLETED

RS HIGHEST DEGREE 1.7015 1.22374 1102 1102.000

RESPONDENTS SEX 1.5898 .49209 1102 1102.000

mean is lower but standard deviation for the Clinton group are

considerably higher. In further analysis, you may want to consider

using transformed values of this predictor.

6/9/2016 160

Checking Homogeneity of Covariance Matrices: Box’s Test

H0: There is equality of covariance matrices

H1: The covariance matrices are not equal

Log determinants are a

Log Determinants measure of the variability of the

VOTE FOR CLINTON, Log groups. Larger log determinants

BUSH Rank Determinant correspond to more variable

Bush 5 2.865

Clinton 5 3.166

groups. Large differences in log

Pooled within-groups 5 3.075 determinants indicate groups

The ranks and natural logarithms of determinants that have different covariance

printed are those of the group covariance matrices.

matrices.

Test Results

Since Box's M is significant,

Box's M 39.887

F Approx. 2.646

you should request separate

df1 15 matrices to see if it gives

df2 4016676 radically different classification

Sig. .001

results. See the section on

Tests null hypothesis of equal population covariance matrices.

specifying separate-groups

covariance matrices for more

information.

6/9/2016 161

Model Validation

The classification table shows the practical results of using the discriminant model.

b,c,d

Classification Results

Predicted Group

Membership

240 of the 466 voted Bush are

VOTE FOR

CLINTON, BUSH Bush Clinton Total classified correctly.

Cases Selected Original Count Bush 240 226 466

Clinton 250 386 636 Of the cases used to create

% Bush 51.5 48.5 100.0

Clinton 39.3 60.7 100.0 the model, 386 of the 636

a Count Bush

Cross-validated 238 228 466 voters who previously voted

Clinton 255 381 636

% Bush 51.1 48.9 100.0

Clinton are classified correctly.

Clinton 40.1 59.9 100.0

Cases Not Selected Original Count Bush 105 90 195 The cross-validated section of the

Clinton 120 151 271

table attempts to correct this by

% Bush 53.8 46.2 100.0

Clinton 44.3 55.7 100.0 classifying each case while

a. Cross validation is done only for those cases in the analysis. In cross validation, each case is leaving it out from the model

classified by the functions derived from all cases other than that case.

b. 56.8% of selected original grouped cases correctly classified.

calculations

c. 54.9% of unselected original grouped cases correctly classified.

d. 56.2% of selected cross-validated grouped cases correctly classified. Overall, 56.8% of the cases are

classified correctly.

used to create the model. These results are shown in the Cases Not

Selected section of the table.

54.9% of these cases were correctly classified by the model. This suggests

6/9/2016 that, overall, your model is in fact correct about half of the times. 162

Separate-groups: Creates

separate-group scatterplots of

the first two discriminant

function values. If there is only

one function, histograms are

displayed instead.

6/9/2016 163

Combined-groups. Creates an all-groups

scatterplot of the first two discriminant function

values. If there is only one function, a histogram

is displayed instead.

Territorial map. A plot of the boundaries used to

classify cases into groups based on function

values. The numbers correspond to groups into

which cases are classified. The mean for each

group is indicated by an asterisk within its

boundaries. The map is not displayed if there is

only one discriminant function.

6/9/2016 164

SPSS also produces an ASCII territorial map plot which shows the relative

location of the boundaries of the different categories.

Territory for

Group 2

Group 1 Group 3

6/9/2016 165

The Discriminant Analysis procedure is useful

for modeling the relationship between a

categorical dependent variable and one or more

scale independent variables.

If your dependent variable is scale, use the

Linear Regression procedure.

Alternatively, if your dependent variable is scale,

try the GLM Univariate procedure.

If your predictors are multicollinear and you want

to reduce their number, use the Factor Analysis

procedure.

6/9/2016 166

Differences between DA and CA

In clustering, the category of the object is unknown.

However, we know the rule to classify (usually based on

distance) and we also know the features (independent

variables) that can describe the classification of the

object. There is no training example to examine whether

the classification is correct or not. Thus, the objects are

assigned into groups merely based on the given rule.

In discriminant analysis, object groups and several

training examples of objects that have been grouped are

known. The model of classification is also given (e.g.

linear or quadratic) and we want to know the best fit

parameters of the model that can best separate the

objects based on the training samples.

6/9/2016 167

Neural

Network

Method

ANNs – The basics

ANNs incorporate the two fundamental

components of biological neural nets:

1. Neurones (nodes)

2. Synapses (weights)

The Key Elements of Neural Networks

Neural computing requires a number of neurons, to be

connected together into a neural network. Neurons are

arranged in layers. Inputs Weights

p1 w1

w2

p2 a

w3 f Output

p3

1

Bias

a = f (p1w1 + p 2 w2 + p3 w3 + b ) = f (∑ pi wi + b )

processing unit which takes one or more inputs and

produces an output. At each neuron, every input has an

associated weight which modifies the strength of each

input. The neuron simply adds together all the inputs and

calculates an output to be passed on.

Day 3 – Data Science 170

Activation functions

The activation function is generally non-linear. Linear

functions are limited because the output is simply

proportional to the input.

Perceptrons

Neuron Model

input into the transfer function is equal to or

greater than 0, otherwise it produces a 0.

Architecture Decision boundaries

Feed-forward nets

Information flow is unidirectional

Data is presented to Input layer

Passed on to Hidden Layer

Passed on to Output layer

Information is distributed

Feeding data through the net:

= - 0.5

1

Squashing: = 0.3775

1+ e 0. 5

Backpropagation

1. A set of examples for training the

network is assembled. Each

case consists of a problem

statement (which represents the

input into the network) and the

corresponding solution (which

represents the desired output

from the network).

2. The input data is entered into the

network via the input layer.

Backpropagation

3. Each neuron in the network processes 5. Fine tuning the

the input data with the resultant values weights in this way

steadily "percolating" through the

network, layer by layer, until a result is has the effect of

generated by the output layer. teaching the

4. The actual output of the network is network how to

compared to expected output for that produce the correct

particular input. This results in an error output for a

value. The connection weights in the particular input, i.e.

network are gradually adjusted,

working backwards from the output

the network learns.

layer, through the hidden layer, and to

the input layer, until the correct output

is produced.

The Learning Rule

The delta rule is often utilized by the most common class

of ANNs called backpropagational neural networks.

Input

Desired

Output

pattern it makes a random guess as to what it might be.

It then sees how far its answer was from the actual one

and makes an appropriate adjustment to its connection

weights.

Day 3 – Data Science 177

Recurrent Networks

Feed forward networks:

Information only flows one way

One input pattern produces one output

No sense of time (or memory of previous state)

Recurrency

Nodes connect back to other nodes or themselves

Information flow is multidirectional

Sense of time and memory of previous state(s)

Choose MLP or RBF Neural Networks Analyze

method covariates (scale, categorical or both)

2 3

6

-The MLP procedure can find more

complex relationships, while the

RBF procedure is faster.

7

6/9/2016 179

Use 70% training:

30% testing Partitions

activation function activation function

8 10

9

11

12

6/9/2016 180

Specify network performance methods Specify network structure Output

13

14

15

16

17

6/9/2016 181

Main Reference

Statistical Methods in

Research, Petaling Jaya:

Prentice Hall

6/9/2016 182

- Science(101)Uploaded byrobertsgilbert
- AnovaUploaded byHimanshu Jain
- pengelolaan limbah laboratoriumUploaded byAgustina Itin
- ANOVA 2-Way RandomUploaded byanjo0225
- Published Article on Performance AppraisalUploaded byBhukyaThirupathiNaik
- Post-ANOVA Comparison of Means.pptUploaded byFelix Ws
- When People Are the Instrument Sensory Evaluation MethodsUploaded byanon_249087119
- Twoway StataUploaded byYoung-Hoon Sung
- detoxification.pdfUploaded byAljoMolo
- 2 way anovaUploaded bychawlavishnu
- Case StudyUploaded byJared Wuerzburger
- 1-s2.0-S2212017314000851-mainUploaded bymanpreet
- activated arbon.pdfUploaded byCleverSeyramKeteku
- Joint Drumming TommaselloUploaded bychainofbeing
- Final Analysis of Data - Group 2Uploaded byginish12
- 762.Test.2007Uploaded byPETER
- Asseigment-6Uploaded byGoutham Kumar Allam
- Efecto de inactivación de lipasa por aplicación de pulsos eléctricos en arrozUploaded byFélix Baez
- Stat 5050 Syllabus (1)Uploaded byRenxiang Lu
- Effectiveness of Ascorbic Acid and Sodium Metabisulfite as Anti-browning Agent and Antioxidant on Green Coconut Water (Cocos Nucifera) Subjected to Elevated Thermal ProcessingUploaded byRizki Ichi
- Manuscript 2 TablesUploaded byMasrat Ahmed
- SampleUploaded byArnold
- Hanchinaletal.2008PCTOCUploaded byvignesh9489
- Kupdf.net Industrial Plant Engineering Reviewer CompletepdfUploaded byMiguel Ocampo
- 4A Randomized T.docUploaded bySondang Yuliana
- tugas regresiberganda.docxUploaded byTalitaSafitri
- MegaStat Users GuideUploaded byT Deus Prizfelix
- 16_list of tables.pdfUploaded byLINSHA SINGH
- statuganova.pdfUploaded byAyu Valentine
- Time and Place of the StudyUploaded byXTin Cacho

- Material Balance Palm Oil MillUploaded bymalikldu
- Carnival ModelUploaded byAndy Tan WX
- NameList StudentsUploaded byAndy Tan WX
- SdsUploaded byAndy Tan WX
- Production of Formaldehyde From MethanolUploaded bySofia Mermingi
- Results and Calculations for Batch ReactorUploaded byAndy Tan WX
- ChemicalUploaded byAndy Tan WX
- Capone 2013Uploaded byAndy Tan WX
- Do Rtet 2012Uploaded byAndy Tan WX
- Factory PidUploaded byAndy Tan WX
- Annealing-Test-1 (1)Uploaded byAndy Tan WX
- Texas CityUploaded byAndy Tan WX
- Tuning Part for Flow System Control UnitUploaded byAndy Tan WX
- Chapter 1 Reaction and Reactor FundamentalsUploaded byAndy Tan WX
- 228463383-MPOB-Sterilizer-Process-Control.pdfUploaded byAndy Tan WX
- FFB GradingUploaded byeddypurwanto
- CSTR (1)Uploaded byAndy Tan WX
- Flow Visualization 1Uploaded byAndy Tan WX
- jcy part V3Uploaded byAndy Tan WX
- Fluid AssignmentUploaded byAndy Tan WX
- Detail for Second Distillation ColumnUploaded byAndy Tan WX
- Formox ReactionUploaded byAndy Tan WX
- :)Uploaded byAndy Tan WX

- Managerial EconomicsUploaded byShreya
- Reporting StatisticsUploaded byTodd Anderson
- ARDLUploaded byShekhar Mishra
- Introduction to Econometrics James Stock Watson 2e Part2Uploaded bygrvkpr
- IV Lecture 2Uploaded byRena Diana
- Ch3 Nonparametric Estimation講義_學生Uploaded byChih-ChunTsai
- optimizationUploaded bydhiraj
- Course Outline for Portfolio TheoryUploaded byshariz500
- Op Tim IzationUploaded byjeyasuriya10
- Chap03- Linear Programming - Sensitivity Analysis.pptUploaded byBriana Blakemore
- Small and Medium ScaleUploaded byYaronBaba
- Cutting Stock Problem Solution BehaviorsUploaded byPrathap Sankar
- Holt WintersUploaded byarnabpramanik
- Simple Sensitivity AnalysisUploaded byarunaksh athreya
- Hypothesis Testing for Binomial DistributionUploaded byDaniyal Arshad
- Insurance Annuity and Reserve FormulasUploaded byMonica Revadulla
- Reliability TestUploaded byEmmanuel Jimenez-Bacud, CSE-Professional,BA-MA Pol Sci
- Operations Management, ch4 by heizerUploaded byArdale Palillo
- 10.1.1.98Uploaded bykarthick_2007
- Practice Midterm 1 SolUploaded byZeeshan Ali Sayyed
- Ebalance a Stata Package for Entropy BalancingUploaded byGerardo Damian
- Chapter 9Uploaded byMarc Kub
- Optimization techniqueUploaded byrahuljiit
- Regression AnalysisUploaded byjjjjkjhkhjkhjkjk
- AssignmentUploaded bytoronanga
- Causal Inference Woth Observational DataUploaded bydiegojo
- Iso ToneUploaded byWabounet
- Answer Key for StatUploaded byMatthew Lasap
- Probit Analysis.docxUploaded byMinarti
- Quantitative Finance CollectorUploaded bytigerguob

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.