LDEMO2404 Session1 Dofile - Do

20/09/23 13:59
*LDEMO2404 - Class 1
********************************
*PART 1: IMPORT AND EXPORT DATA
********************************
*1.1 Introduction to Stata

*************************
*do file example (pay attention to the different colours!)

********************************************************************
* with * you easily add a separate comment line
* with // you add a comment after the command
* comments should always be green
*1) clear the memory

*it is important to clear the memory before loading a data set to stata
clear // clear memory but keeps saved results
clear all // clears all information in memory
set more off // to avoid that the output appears only stepwisely
*2) define directory

cd "path" // define directory where data set should be saved
*example: data path
cd "C:\Users\menasheoren\OneDrive - UCL\Bureau\"
*3) open an example data set

sysuse lifeexp, clear // this command (sysuse)opens an example data set that Stata offers
*to open all other datasets, we would write -- use filename.dta, clear --
*1.2 Check and format data: basics

*********************************
*check whether your data looks OK
describe // this command describes the variables in the data set

*we see that one variable (country) is a string variable, as it contains names.
*we see that some variables have notes. I do not recommend this feature, but lets
check:
notes // contains a bit more information than what is in the variable names
browse // shows how the data looks like, opens window 'data editor'
list // shows data in results window (only useful for small datasets or selected cases)
list in 1/10
summarize // gives mean, s.d., range

sum //abbreviated summarize command
sum, detail // this is a command that gives all the summary statistics (incl percentiles) of
the variables
sum lexp // most often we do this only for one variable
help summarize // viewer file opens with information on sum command

search summarize //viewer file opens with search results
*label variables your own way

des // shows which variable need labels
label var safewater "pop access to safe water"
*label categorical data your own way

tab region // shows whether categories need (better) labels
label drop region //as the variable is already labelled, we first drop the labels
label define region 1 "Europe & C.Asia" 2 "North America" 3 "South America"
lab value region region
https://moodle.uclouvain.be/pluginfile.php/516396/mod_resource/content/1/LDEMO2404_session1_dofile.do Page 1 sur 4

20/09/23 13:59
*1.3 Export the data

**************************************
*1) export data in Stata format

save lifeexp.dta, replace // saves data as Stata data set in our directory, as we specified
it with 'cd'
/*alternatively*/ save "C:\Users\menasheoren\OneDrive - UCL\Bureau\lifeexp.dta", replace
*2) Export in excel format

*we can also save it in other formats, e.g. in Excel
export excel lifeexp.xlsx, firstrow(variables) replace
*1.4 Import the data

******************************
*now we can open this data in stata format from our directory
use lifeexp.dta, clear
/*OR*/ use "C:\Users\menasheoren\OneDrive - UCL\Bureau\lifeexp.dta", clear
*or we open the excel file again

import excel lifeexp.xlsx, firstrow clear
*import data in raw format

*if the file includes a categorical variable, precede the variable name with str[number of
maximum characters]
clear
/*example*/ infile str14 country setting effort change using
"https://grodri.github.io/datasets/effort.raw"
***********************************
*Part 2 VARIABLE FORMATS AND LABELS
***********************************
* 2.1 Data type

*************************************
* open dataset: here we use an example dataset provided by Stata (="system")
* Note again: `sysuse' instead of `use' (no need of path)
sysuse lifeexp, clear
describe // storage types and labels of the variables

* Note: variables contain either numbers or strings
help datatype // to get more details about storage types
/*
EXTRA: Rounding matters! If variable contains integers with 9 digits or less
accuracy, store as longs
* Otherwise, store as doubles (16 digits of accuracy)
recast double id // changes storage type of identification number to double
* When in doubt (computing new variable), always store as double
* Then, save on space by reformatting all variables with the most economical format:
compress
* Display format is NOT the same as storage format

* Even if double, browse may display variable like this: 2.30470e+14. To display fully:
format id %20.0f //sets variable's output format. Now id is 230470003170001 */
* 2.2 get fast overview over data

***********************
sysuse lifeexp, clear // open again the original data set
browse //shows data in editor window
list country lexp // like `browse', but in results window
*2.3 Structure data set

20/09/23 13:59
*************************
*we can sort the order of observations by variable(s)
*e.g. we can sort the countries by their life expectancy
list country lexp
sort lexp
list country lexp
*we can also re-order the variables in the dataset

*e.g. we want that country and life expectancy are the first variables
order country lexp, first
*2.4 Descriptive statistics

******************************
summarize lexp gnppc // gives observations, mean, s.d., range
summarize lexp gnppc, detail // gives also percentiles
mean lexp //gives mean

tabulate country //frequency table
tabulate country region //cross-tabulation
*commands can be abbreviated

sum lexp
tab country
* variable names can be abbreviated: v*
* or listed: v01-v05 (in the order of the dataset)
*Display a statistic for a subgroup

tab region
tab region, nolab //first step: get the category numbers
* Display categories with values AND labels
numlabel, add
tab region
* Warning! Missing values are not displayed in standard output (see later)
tab safewater
tab safewater, missing
mean popgrowth if region==1 //gives mean value for Europe and Asia
mean popgrowth, over(region) //gives region-specific means
/*alternative*/
bysort region: summarize popgrowth, detail
* IF restricts the scope of a command to those observations for which the logical expression
is true
mean lexp if region==1 //gives mean value for Europe and Asia
* IF is very often used!
* Display according to a condition
tab country lexp if lexp<70 // handy if crossing two categorical variables
list country lexp if lexp<70 // lists countries and their life expectancies
browse country lexp if lexp<70 // same as list, but in a separate window
* IN restricts the scope of the command to a specific observation range

list lexp in 1/10 //gives the values for the first ten observations
* Do not use this command, prefer browse
* Cross-tabulation of categorical variables

* Here we use a continuous variable (lexp) since dataset has no other categorical
variable than region
*Column, row, and cell percentages
tab lexp region, nof col // no absolute frequencies (nof), only column percents (col)
tab lexp region, nof row // no absolute frequencies (nof), only row percents (row)
tab lexp region, nof cell // no absolute frequencies (nof), only cell percents (cell)

20/09/23 13:59
*Missing values:
*******************************
*- basic missing values displayed as a dot "."
*- may distinguish different types with 26 additional codes denoted by .a to .z
*- missing values are in Stata very large (+infinite) numbers: valid numbers < . < .a
< .z
tab gnppc, missing // check whether there are missing values in this variable
list country gnppc if missing(gnppc) // list countries with missing values in gnppc
/*OR*/
list country gnppc if gnppc==.
tab gnppc, missing // check whether there are missing values in this variable
list country gnppc if missing(gnppc) // list countries with missing values in gnppc
/*OR*/
list country gnppc if gnppc==.
***********************
* Graphical association between 2 continuous variables
help graph twoway
* Scatter plot:
twoway scatter lexp gnppc

LDEMO2404 Session1 Dofile - Do

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LDEMO2404 Session1 Dofile - Do

Uploaded by

Copyright:

Available Formats

20/09/23 13:59

*1.1 Introduction to Stata

*do file example (pay attention to the different colours!)

*1) clear the memory

*2) define directory

*3) open an example data set

*1.2 Check and format data: basics

describe // this command describes the variables in the data set

summarize // gives mean, s.d., range

help summarize // viewer file opens with information on sum command

*label variables your own way

*label categorical data your own way

https://moodle.uclouvain.be/pluginfile.php/516396/mod_resource/content/1/LDEMO2404_session1_dofile.do Page 1 sur 4

*1.3 Export the data

*1) export data in Stata format

*2) Export in excel format

*1.4 Import the data

*or we open the excel file again

*import data in raw format

* 2.1 Data type

describe // storage types and labels of the variables

* Display format is NOT the same as storage format

* 2.2 get fast overview over data

*2.3 Structure data set

https://moodle.uclouvain.be/pluginfile.php/516396/mod_resource/content/1/LDEMO2404_session1_dofile.do Page 2 sur 4

*we can also re-order the variables in the dataset

*2.4 Descriptive statistics

mean lexp //gives mean

*commands can be abbreviated

*Display a statistic for a subgroup

* IN restricts the scope of the command to a specific observation range

* Cross-tabulation of categorical variables

https://moodle.uclouvain.be/pluginfile.php/516396/mod_resource/content/1/LDEMO2404_session1_dofile.do Page 3 sur 4

https://moodle.uclouvain.be/pluginfile.php/516396/mod_resource/content/1/LDEMO2404_session1_dofile.do Page 4 sur 4

You might also like