Professional Documents
Culture Documents
*LDEMO2404 - Class 1
********************************
*PART 1: IMPORT AND EXPORT DATA
********************************
browse // shows how the data looks like, opens window 'data editor'
list // shows data in results window (only useful for small datasets or selected cases)
list in 1/10
*now we can open this data in stata format from our directory
use lifeexp.dta, clear
/*OR*/ use "C:\Users\menasheoren\OneDrive - UCL\Bureau\lifeexp.dta", clear
***********************************
*Part 2 VARIABLE FORMATS AND LABELS
***********************************
/*
EXTRA: Rounding matters! If variable contains integers with 9 digits or less
accuracy, store as longs
* Otherwise, store as doubles (16 digits of accuracy)
recast double id // changes storage type of identification number to double
* When in doubt (computing new variable), always store as double
* Then, save on space by reformatting all variables with the most economical format:
compress
*************************
*we can sort the order of observations by variable(s)
*e.g. we can sort the countries by their life expectancy
list country lexp
sort lexp
list country lexp
* Warning! Missing values are not displayed in standard output (see later)
tab safewater
tab safewater, missing
mean popgrowth if region==1 //gives mean value for Europe and Asia
mean popgrowth, over(region) //gives region-specific means
/*alternative*/
bysort region: summarize popgrowth, detail
* IF restricts the scope of a command to those observations for which the logical expression
is true
mean lexp if region==1 //gives mean value for Europe and Asia
* IF is very often used!
* Display according to a condition
tab country lexp if lexp<70 // handy if crossing two categorical variables
list country lexp if lexp<70 // lists countries and their life expectancies
browse country lexp if lexp<70 // same as list, but in a separate window
*Missing values:
*******************************
*- basic missing values displayed as a dot "."
*- may distinguish different types with 26 additional codes denoted by .a to .z
*- missing values are in Stata very large (+infinite) numbers: valid numbers < . < .a
< .z
tab gnppc, missing // check whether there are missing values in this variable
list country gnppc if missing(gnppc) // list countries with missing values in gnppc
/*OR*/
list country gnppc if gnppc==.
tab gnppc, missing // check whether there are missing values in this variable
list country gnppc if missing(gnppc) // list countries with missing values in gnppc
/*OR*/
list country gnppc if gnppc==.
***********************
* Graphical association between 2 continuous variables
help graph twoway
* Scatter plot:
twoway scatter lexp gnppc