Asgn2 MultVarStats Jul 10

Asgn2 MultVarStats Jul 10

Published by Amitmse

Published by: Amitmse on Jun 07, 2011
Copyright:Attribution Non-commercial


Multivariate Statistics- Special Lecture: Assignment 2July2010
The assignment has to be completed using STATA version 10.0 or above.
The assignment has to be done in groups of 3-4 students and submitted latest by 9
 August (5 pm). Late submissions will not be accepted.
From the file
each group has to pick up one state. The STATAcommand (keep if hv024==1) selects the state with code=1 that is the state of J&K for yourassignment. The state codes are available in the file
state codes.doc
The submitted assignment should include (i) the word file presenting the main resultsincluding figures along with discussions; (ii) the STATA do file and (iii) the STATA log/outputfile or the smcl file. Please DO NOT submit a hard copy of the assignment and email this entireset to my gmail address:brindav3@gmail.com. Kindly indicate the group members in the wordfile clearly.
All the members in a group will be given same marks but I will hold the discretion to calla group and have a separate session with them.From the file
use the following variables to do the assignment.hv025 – type of place of residence (Urban=1 and Rural=2)stata code for generating this variable is
gen plaresi= hv025==2The above variable plaresi is created by the user and will generate a dummy variablewhich takes a value 1 for urban and zero for rural.
The variables to be generated are indicated as bullets or a
. These variables have tobe generated from the original variables using the codes as indicated for ‘plaresi’ above.
For the sake of uniformity use the variable names suggested below which have been highlighted.hv201 - source of drinking water
Drinking water from pipe (codes 11-13 are for yes=1, and the remainingfor no=0)
- Drinking water from borewell/well etc (codes 21,31,32 are for yes=1,and the remaining for no=0)
- Drinking water from other sources (codes >=41 are for yes=1, and theremaining for no=0)
 Note that using the variable hv201 you have to create three new variables as indicated below:
gen dwpipe= 1 if (hv201==11 | hv201==12| hv201==13)mvencode dwpipe, mv(0)gen dwborw= 1 if (hv201==21 | hv201==31| hv201==32)mvencode dwborw, mv(0)gen dwoths= 1 if (hv201>=41)mvencode dwoths, mv(0)hv205- type of toilet facility
dsanit1- Flush toilet (codes 11-15 are for yes=1 and the remaining for no=0);
dsanit2- pit toilet/latrine (codes 21-23 are for yes=1 and the remaining for no=0);
dsanit3- none/other toilet (codes >=31 are for yes=1 and the remaining for no=0);hv225- share a toilet1
dsanit4 - (yes=1, no=0)hv242- separate room as a kitchen
dsepkitch (yes=1, no=0)hv226- type of cooking fuel
dclfuel- Clean cooking fuels include those in codes 1-4; main cooking fuel is ‘clean’(yes=1, no=0)hv206 – household has electricity
delect- (electricity=1, others=0)hv243b-
Own clock/watch
Own radio -
Own television
 hv221, hv243a-
Own a telephone/mobile
Own refrigerator
Own bicycle
Own motorcycle/scooter
Own car
has bank account
 sh42- where do members go for treatment when sick 
(formal institutions Y=1, N=0, codes 11-33 are considered as formal)hv213- floor material-
dhifloor- high quality Yes=1 , No=0 (high quality materials are codes>=31)
hv214- wall material
- dhiwall- high quality Yes=1 , No=0 (high quality materials are codes>=31)
hv215- roof material
dhiroof- high quality Yes=1 , No=0 (high quality materials are codes>=31)
- All high quality dwelling materials (yes (1) in all three=1 and 0otherwise)
- All low quality dwelling materials (no (0) in all three and 0 otherwise)sh47d- chair sh47f – table
- owns table or chair
owns cot/ bed
 In all 24 variables are created and the principal component and factor analysis has to be carriedout on these based on the following questions.(I)
Principal Component Analysis
 (1a) Perform the principal component analysis on all these 24 discrete variables. Report the
vectors and the indicate what proportion of variation is explained by thecomponents. Are all weights positive in the first PC, if not what do you make of the negativeweights?2
(1b) Now choose about 12 variables from this 24 based on the magnitude of the weights in thefirst principal component. That is, those with very small weights are to be taken off and withcomparatively larger weights are to be retained. Alternatively you can choose a set of 12variables in certain logical sense and after substantiating the choice complete the following.Redo the same exercise of PCA and indicate what proportion of variation is explained by thesenew set of components.(2) On what basis will you decide how many components to retain? After deciding on thenumber of components to retain, try and interpret those components.(3) Obtain the predicted value of the first principal component and call it pca1. Which are thevariables that have large weights in the first component? Which variables are more correlatedwith pca1? Discuss your findings in brief.(4) How would you like to interpret the first principal component? Obtain the mean and standarddeviation of pca1 for the state as a whole using the following STATA command.table state, c(m pca1 sd pca1)(5) Further obtain the mean and standard deviation of pca1 for those households which have avalue 1 in all the (
) variables. Note that for drinking water source and sanitation you chooseonly the first variable dwpipe and dsanit1 respectively; the other two categories are not to beconsidered (any reason why?).How would you like to characterize such households? How do the mean values of pca1 comparebetween this and the overall mean for the state.(6) Now get the mean and standard deviation of pca1 for the following categories.Place of residence- Rural and urban households separately,Religion- Hindus, Muslims and other religions separatelyCaste- SC/ST, OBC and Others separately.What do you infer from the mean and standard
deviation values of pca1 across the groupsmentioned in each of the case?STATA command for the rural/urban case would be as follows:table plaresi, c(m pca1 sd pca1)Similarly it can be estimated for other two cases as well. Note that religion and caste variable areavailable in the data.(II)
Factor Analysis (Retain three factors as and when possible or else retain two factors)
(1a) Perform the factor analysis using Principal Component method for the same data andinterpret the first two factors to the extent possible. Report the communalities and specificvariances for the first two factors along with the other necessary results.(1b) Rotate the factors and indicate how the results change.3

