Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
0 of .
Results for:
P. 1
Asgn2 MultVarStats Jul 10

# Asgn2 MultVarStats Jul 10

Ratings: (0)|Views: 5|Likes:

### Availability:

See more
See less

10/13/2013

pdf

text

original

Multivariate Statistics- Special Lecture: Assignment 2July2010

The assignment has to be completed using STATA version 10.0 or above.

The assignment has to be done in groups of 3-4 students and submitted latest by 9
th
August (5 pm). Late submissions will not be accepted.

From the file
hhld_new_9.dta

each group has to pick up one state. The STATAcommand (keep if hv024==1) selects the state with code=1 that is the state of J&K for yourassignment. The state codes are available in the file
state codes.doc
.

The submitted assignment should include (i) the word file presenting the main resultsincluding figures along with discussions; (ii) the STATA do file and (iii) the STATA log/outputfile or the smcl file. Please DO NOT submit a hard copy of the assignment and email this entireset to my gmail address:brindav3@gmail.com. Kindly indicate the group members in the wordfile clearly.

All the members in a group will be given same marks but I will hold the discretion to calla group and have a separate session with them.From the file
hhld_new_9.dta

use the following variables to do the assignment.hv025 – type of place of residence (Urban=1 and Rural=2)stata code for generating this variable is

gen plaresi= hv025==2The above variable plaresi is created by the user and will generate a dummy variablewhich takes a value 1 for urban and zero for rural.
The variables to be generated are indicated as bullets or a
mark
. These variables have tobe generated from the original variables using the codes as indicated for ‘plaresi’ above.
For the sake of uniformity use the variable names suggested below which have been highlighted.hv201 - source of drinking water
o

dwpipe-
Drinking water from pipe (codes 11-13 are for yes=1, and the remainingfor no=0)
;
o

dwborw
- Drinking water from borewell/well etc (codes 21,31,32 are for yes=1,and the remaining for no=0)
;
o

dwoths
- Drinking water from other sources (codes >=41 are for yes=1, and theremaining for no=0)
;
Note that using the variable hv201 you have to create three new variables as indicated below:
gen dwpipe= 1 if (hv201==11 | hv201==12| hv201==13)mvencode dwpipe, mv(0)gen dwborw= 1 if (hv201==21 | hv201==31| hv201==32)mvencode dwborw, mv(0)gen dwoths= 1 if (hv201>=41)mvencode dwoths, mv(0)hv205- type of toilet facility
o

dsanit1- Flush toilet (codes 11-15 are for yes=1 and the remaining for no=0);
o

dsanit2- pit toilet/latrine (codes 21-23 are for yes=1 and the remaining for no=0);
o

dsanit3- none/other toilet (codes >=31 are for yes=1 and the remaining for no=0);hv225- share a toilet1

o

dsanit4 - (yes=1, no=0)hv242- separate room as a kitchen
o

dsepkitch (yes=1, no=0)hv226- type of cooking fuel
o

dclfuel- Clean cooking fuels include those in codes 1-4; main cooking fuel is ‘clean’(yes=1, no=0)hv206 – household has electricity
o

delect- (electricity=1, others=0)hv243b-
Own clock/watch
dclock
hv207-
hv208-
Own television
dtv
hv221, hv243a-
Own a telephone/mobile
dtelemob
hv209-
Own refrigerator
drefrg
hv210-
Own bicycle
dbycycl
hv211-
Own motorcycle/scooter
dscootr
hv212-
Own car
dcar
hv247-
has bank account
dbankacc
sh42- where do members go for treatment when sick
o

dfmlhosp
(formal institutions Y=1, N=0, codes 11-33 are considered as formal)hv213- floor material-
dhifloor- high quality Yes=1 , No=0 (high quality materials are codes>=31)
hv214- wall material
- dhiwall- high quality Yes=1 , No=0 (high quality materials are codes>=31)
hv215- roof material

dhiroof- high quality Yes=1 , No=0 (high quality materials are codes>=31)
o

ddwelhi
- All high quality dwelling materials (yes (1) in all three=1 and 0otherwise)
o

ddwello
- All low quality dwelling materials (no (0) in all three and 0 otherwise)sh47d- chair sh47f – table
o

dtabchr
- owns table or chair
sh47e-
owns cot/ bed
dcotbed
In all 24 variables are created and the principal component and factor analysis has to be carriedout on these based on the following questions.(I)
Principal Component Analysis
(1a) Perform the principal component analysis on all these 24 discrete variables. Report the
eigen
values,
eigen
vectors and the indicate what proportion of variation is explained by thecomponents. Are all weights positive in the first PC, if not what do you make of the negativeweights?2

(1b) Now choose about 12 variables from this 24 based on the magnitude of the weights in thefirst principal component. That is, those with very small weights are to be taken off and withcomparatively larger weights are to be retained. Alternatively you can choose a set of 12variables in certain logical sense and after substantiating the choice complete the following.Redo the same exercise of PCA and indicate what proportion of variation is explained by thesenew set of components.(2) On what basis will you decide how many components to retain? After deciding on thenumber of components to retain, try and interpret those components.(3) Obtain the predicted value of the first principal component and call it pca1. Which are thevariables that have large weights in the first component? Which variables are more correlatedwith pca1? Discuss your findings in brief.(4) How would you like to interpret the first principal component? Obtain the mean and standarddeviation of pca1 for the state as a whole using the following STATA command.table state, c(m pca1 sd pca1)(5) Further obtain the mean and standard deviation of pca1 for those households which have avalue 1 in all the (
X
) variables. Note that for drinking water source and sanitation you chooseonly the first variable dwpipe and dsanit1 respectively; the other two categories are not to beconsidered (any reason why?).How would you like to characterize such households? How do the mean values of pca1 comparebetween this and the overall mean for the state.(6) Now get the mean and standard deviation of pca1 for the following categories.Place of residence- Rural and urban households separately,Religion- Hindus, Muslims and other religions separatelyCaste- SC/ST, OBC and Others separately.What do you infer from the mean and standard

deviation values of pca1 across the groupsmentioned in each of the case?STATA command for the rural/urban case would be as follows:table plaresi, c(m pca1 sd pca1)Similarly it can be estimated for other two cases as well. Note that religion and caste variable areavailable in the data.(II)
Factor Analysis (Retain three factors as and when possible or else retain two factors)
(1a) Perform the factor analysis using Principal Component method for the same data andinterpret the first two factors to the extent possible. Report the communalities and specificvariances for the first two factors along with the other necessary results.(1b) Rotate the factors and indicate how the results change.3