This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

-1- Environment

-2.a- System Syntax

-2.b- Statistical Syntax

-3- Output

**Stata Tutorial — IT2010
**

Francesco Andreoli – Andrea Bonfatti

Università di Verona

January 11-15, 2010 Alba di Canazei

C.I.D.E.

January 11-15, 2010 1 / 59

Francesco Andreoli - Andrea Bonfatti (Università di Verona) Stata Tutorial

Introduction

-1- Environment

-2.a- System Syntax

-2.b- Statistical Syntax

-3- Output

Acknowledgements Department of Economics (DSE) - Università di Verona --Nicola Tommasi (CIDE and Università di Verona) provided valuable support with related documents and an early version of this presentation.

C.I.D.E.

January 11-15, 2010 2 / 59

Francesco Andreoli - Andrea Bonfatti (Università di Verona) Stata Tutorial

Introduction

-1- Environment

-2.a- System Syntax

-2.b- Statistical Syntax

-3- Output

**Organization of the Directory
**

../stata_tutorial | |--> /read_me_first.txt | |--> /tutorial.pdf | |--> /data_files | | | |--> /smallPSELL.dta | |--> /smallPSELL.csv | |--> /smallPSELL.txt | |--> /smallPSELL2.dta | |--> /smallPSELL2.csv | |--> /smallPSELL2.txt | |--> /estimates.txt | |--> /do_files | |--> /do_first.do |--> /do_first.log |--> /model.txt |--> /model.gph

* You are here! *

C.I.D.E.

January 11-15, 2010 3 / 59

Francesco Andreoli - Andrea Bonfatti (Università di Verona) Stata Tutorial

Introduction

-1- Environment

-2.a- System Syntax

-2.b- Statistical Syntax

-3- Output

Motivation

This tutorial addresses to a beginner level or early trained audience which needs notions as well as operational hints to get started with Stata software. This handout is meant to provide a support for a two hours tutorial, and it can be considered complementary to other more exhaustive and rigorous sources. You are invited to take a look at the Oﬃcial Stata website, http://www.stata.com/bookstore/documentation.html, where you can ﬁnd all published documents. The “Getting Started with Stata” manual is an introductory (although very complete) users manual.

C.I.D.E.

January 11-15, 2010 4 / 59

Francesco Andreoli - Andrea Bonfatti (Università di Verona) Stata Tutorial

Introduction

-1- Environment

-2.a- System Syntax

-2.b- Statistical Syntax

-3- Output

Motivation

Additional material can be found in oﬃcial websites:

http://www.stata.com/bookstore/pdf/gsw_samplesession.pdf http://www.stata.com/bookstore/pdf/r_intro.pdf (commands) http://www.stata.com/bookstore/pdf/g_graph_intro.pdf (graphs) http://www.stata.com/bookstore/pdf/d_merge.pdf (merge)

and unoﬃcial ones: http://www.ats.ucla.edu/stat/stata/ http://www.nyu.edu/its/statistics/Docs/Intro_stata5.pdf http://www.eui.eu/Personal/Researchers/decio/PS/Stata.pdf http://leuven.economists.nl/stata/stataintro.pdf

C.I.D.E.

January 11-15, 2010 5 / 59

Francesco Andreoli - Andrea Bonfatti (Università di Verona) Stata Tutorial

2010 6 / 59 Francesco Andreoli . actually at the 11th release (the same procedure holds for older releases and other versions). interface. input/output of data ﬁles.I. preserving the measurability. C. principles of graphs). On a second stage.a. to be able to replicate an experiment starting from the observed phenomena.Output Organization of the Tutorial The tutorial focuses on Stata for Windows package. the tutorial will exploit the general settings of the environment where the data are stored. managed and analyzed. in order to make the research work intelligible by all audienced.Environment -2. Here we will look closely on how organize data into folders and how to work with a hierarchy of folders. basic syntax (data management.D.Introduction -1.Statistical Syntax -3.e.ado extensions. Order is necessary to work scientiﬁcally. i.log .Andrea Bonfatti (Università di Verona) Stata Tutorial . Moreover.dta .E. attention will be paid to programming and coding and the general setting of the program will be introduced: windows. regression. January 11-15. order is a prerequisite for understanding and interpretability of results. Firstly. SE version.b. statistics.System Syntax -2. .do .

C.do ﬁle) to a small subsample (50 times 8) from the PSELL (Panel Socio-économique “Liewen zu Lëtzebuerg”) database (. Stata output will be described and analyzed (.Statistical Syntax -3.b.Environment -2.dta ﬁle).Andrea Bonfatti (Università di Verona) Stata Tutorial . Stata Technical Bullettin or from Repec Library. The handout structure follows the tutorial sectioning. Finally.I. 2010 7 / 59 Francesco Andreoli . as the one treated in classes to perform income distribution analysis.System Syntax -2. The use of any particular syntax.D.log ﬁle). For more advanced users. you will be given with the means to autonomously search and apply new syntax already stored in Stata memory (. depends on your research scopes.Introduction -1. by applying the program (. Stata oﬀers the possibility to code new functions according to program language standards.E.a. January 11-15.Output Organization of the Tutorial The concluding stage of the tutorial aims at showing how Stata works in practice. As a general purpose.ado ﬁles) or downloadable from the web sites of Stata Journal.

b) an input window. c) a review window in which is recorded the code inputted in Stata and d) a variables window.b.Introduction -1. You recognize four windows: a) the output window. 2010 8 / 59 Francesco Andreoli . All inputs you select can either be written (see point b)) or selected interactively in by the option bar of the program.E.Environment -2. the regression coeﬃcients.Andrea Bonfatti (Università di Verona) Stata Tutorial .) is displayed in the output window (point a)). C.. the output (the value of the mean. In the former case. These inputs may refer to statistical operations as means..I. regressions. frequency tables. a sort of summary of the database. Open Stata by clicking on the Stata icon.Statistical Syntax -3. which reports all the variables in use. etc.System Syntax -2. line by line. the variable window (point d) will report the resulting modiﬁcations. where each code...D. while in the latter. etc. as well as changes to the database.a. transformations or the creation of new variables or modiﬁcation of the entire database. the table. and are systematically recorded by Stata in the review windows (see point c)). January 11-15. reporting code input and associated output from a database in use. can be written.Output A First Approach.

the original database is sacred.Introduction -1.a. When you decide to save... For the sake of reproducibility of the experiment you are running on your data. In the philosophy of the software.D. Input data are stored in the RAM memory of your computer (which is then required to be at least as large as the database dimension) as an n observations by k variables matrix which you do not need to see while programming.e. your original database) must be preserved intact and unaﬀected by additional elaborations which may induce errors in future users elaborations.Environment -2. January 11-15.b.Output A First Approach.System Syntax -2.Statistical Syntax -3. For this reason.Andrea Bonfatti (Università di Verona) Stata Tutorial . any change of the database remains stored only in the RAM memory of your computer and it will be completely deleted when Stata is closed.I. 2010 9 / 59 Francesco Andreoli .E. it is important to choose what to save and how to save it appropriately. the primitive source of information (i. if you refuse to save changes and results when asked. C. tests or veriﬁcations.

b.Environment -2.. reporting the full list of commands appearing in the review window on a .Andrea Bonfatti (Università di Verona) Stata Tutorial .a.txt extension.D.Output A First Approach. This list can be used by other readers to understand how the new database and the output saved in the log ﬁle were obtained. Inputs as well can be saved.Introduction -1. this is a do-ﬁle. Stata opens and saves databases in the . In Stata jargon. C. but any information about how to go from one database to the other will be lost. Three fundamental steps: 1 If the original database has been modiﬁed.E. reporting a list of your results in a .I. a new database must be created containing all new variables and transformations performed.. From this ﬁle you can copy tables or coeﬃcient results to be used in you research report. January 11-15.Statistical Syntax -3. 2010 10 / 59 2 3 Francesco Andreoli . Output results can be saved in a log-on ﬁle. In this way. In Stata jargon. this is a log-ﬁle.dta extention. there is no need to save it and you will not be even asked to do so. the original database will be preserved as an independent object from the new one. If the original database is not modiﬁed.System Syntax -2.txt document.

A relative path is a path relative to the working directory of the user or application (eg: . Data can be read. January 11-15. speciﬁes a unique location in a ﬁle system.do).System Syntax -2. Stata programming must be created to work with relative paths which can be used in any ﬁle system.b.Environment -2.. The ﬁrst object you need to manage is the initial source of data.Introduction -1.do) which does not need to depend on the full path root usually system speciﬁc (C://Francesco_Andrea/IT2010/stata_tutorial/do_files/do_first. C. The original database is your primitive source of information./stata_tutorial/do_files/do_first.Output Setting the Environment The scope of this section is to show how to correctly manage your input. 2010 11 / 59 Francesco Andreoli . commands and output ﬁles for the sake of reproducibility and intelligibility of your work.a. but never modify or delete them because they represent the primitive source of information. the general form of a ﬁlename or of a directory name. therefore it must be preserved on its original status. A path.Andrea Bonfatti (Università di Verona) Stata Tutorial .I.D.E.Statistical Syntax -3. and you can work with them.

E.I. In the programs directory put your coding and the statistical results. January 11-15.a. parameters estimates.b.Andrea Bonfatti (Università di Verona) Stata Tutorial ...Output Setting the Environment Golden rules in Stata Generate a directory projects in which to put in all your current projects. programs as distinct elements..Environment -2. take documents.System Syntax -2.. In the data directory put your database.. Use simple names for ﬁles. C. Use comments and always describe what you are doing. File log in accordance with programming ﬁle.). Be sequential. 2010 12 / 59 Francesco Andreoli .. It is your original object.D.Introduction -1. Inside a speciﬁc project. tables.) or by other softwares (derived databases. data.Statistical Syntax -3. Statistical results can be used either directly in a paper (graphs.

Andrea Bonfatti (Università di Verona) Stata Tutorial January 11-15.System Syntax -2.scheme. . advanced code .D.Introduction -1.ado ﬁles used by programmers.b.Environment -2.smcl output logging ﬁles.sthlp help ﬁles .do program ﬁle containing coding. .hlp.gph extension for graph attributes and to save graphs C.a.txt .Statistical Syntax -3. . results storing .style.E.dta data ﬁles. Francesco Andreoli . the starting point . .csv or . 2010 13 / 59 .log. also in .I.Output Files Extentions in Stata .mata ﬁle extention in Mata environment .

I. options mandatory coding between brackets optional coding between commands in { } are parameters whose value must be speciﬁed underlined letters are abbreviations for commands parts in ’. The general syntax of a command: command varlist =exp if in weight . you can search for command help ﬁles. 2010 14 / 59 Francesco Andreoli .Statistical Syntax -3. even for previously unseen objects.Introduction -1.a. January 11-15. allowing you to recognize always what is the command employed.Environment -2.System Syntax -2.Andrea Bonfatti (Università di Verona) Stata Tutorial .E.D.b.’ are the commands options C. the variables in use and the options.Output Commands All Stata commands are contructed following a precise syntax structure. In this way.

search. Help is the most important command in Stata..Introduction -1. find help command_or_topic_name .. appearing on Stata screen.D.b.Statistical Syntax -3. Use help! You cannot (and you must not) recall all commands options and possible applications.Environment -2. search_options findit word word .I. has the form: 1 2 3 4 5 6 Command syntax Description Options Additional options for related commands Examples See Also C. options search word word .E. Each help ﬁle..System Syntax -2.Output Help The general syntax for help.. 2010 15 / 59 Francesco Andreoli . .a. January 11-15.Andrea Bonfatti (Università di Verona) Stata Tutorial . It provides the full dictionary and help of the commandlist speciﬁed.

In case of errors.System Syntax -2.I.do ﬁles must be intellegible: use comments but be sequential and schematic Do ﬁle is the source of your commands.do ﬁle: positioning Stata write and run the .do File Stata works with commands.Environment -2.Introduction -1.Statistical Syntax -3.b. 2010 16 / 59 Francesco Andreoli .do ﬁle by Stata do-editor (or parts of it) look at results on .E. you can easily correct it an re-run the program.Andrea Bonfatti (Università di Verona) Stata Tutorial .Output Prepare Your .D.a. This would replace erroneous estimation previously done with new ones Follow a operational sequence: 1 2 3 4 5 double click on .log ﬁles save changes and a new database if the original one has changed keep all the results in a organized directory C. January 11-15. so you need to know the syntax Coding in .

Francesco Andreoli . log using dofile.b...System Syntax -2. command list. you are recommanded to follow the three steps here presented (“.Introduction -1.do ﬁle #delimit.log ﬁle Output is recorded in doﬁle. rep. 2010 17 / 59 .a. exit. set mem 150m. . January 11-15.E.” can be deleted if no delimit is speciﬁed): . set more off.I.do File In any application of a doﬁle.log. capture log close. clear.log as it appears in result window doﬁle do doﬁle C.Output Prepare Your ..Andrea Bonfatti (Università di Verona) Stata Tutorial Stata runs Stata .do.Statistical Syntax -3. ..D.Environment -2.

Many of them can be derived (see the help ﬁles) from the ones we put here. Common Syntax section refers to commands and options which allow to use other statistical commands or to menage the data stored in memory.Introduction -1. regression and testing.D.Statistical Syntax -3. All these commands apply exclusively to data stored in Stata memory at the moment in which they are used.I. and they provide a syntetic output (coeﬃcients. Statistical Syntax section contains mainly used statistical tools in Stata to perform data management. graphic analysis. data analysis. They are mainly functional to a statistical program.Environment -2.E. while other can be found in Repec Lybrary or looking at the Stata online help.System Syntax -2.a. It is worth note that our list does not exhaust the full set of statistical routines in Stata. January 11-15. 2010 18 / 59 Francesco Andreoli .b. We will see in the next section how to work with estimates.Andrea Bonfatti (Università di Verona) Stata Tutorial .Output Syntax: an Introduction In the following 2 sections we will describe mainly used syntax in Stata programming. estimates) as well as new variables to be added at the database. C.

a. 2010 19 / 59 Francesco Andreoli .D. January 11-15.b.I.Introduction -1.System Syntax -2.Andrea Bonfatti (Università di Verona) Stata Tutorial .Output Change Folder Shows current working folder pwd Create a new folder mkdir path directoryname Folder content dir path directoryname Change working folder cd path C.Statistical Syntax -3.Environment -2.E.

E. 2010 20 / 59 Francesco Andreoli . n(#) ssc new Update the software update all Update commands adoupdate pkglist .Statistical Syntax -3.Output Search and Upload New Commands Search ssc hot . January 11-15.System Syntax -2. C.D. options adoupdate.Environment -2.Andrea Bonfatti (Università di Verona) Stata Tutorial .b. update Note: These commands work iﬀ Stata is connected with the internet.Introduction -1.a.I.

clear Use data /2 use varlist if in using ﬁlename .Andrea Bonfatti (Università di Verona) Stata Tutorial . clear nolabel Compress the data compress varlist C.dta Files Set memory capacity set memory # b|k|m|g .Statistical Syntax -3.Environment -2.E. permanently Use data /1 use ﬁlename .I.Introduction -1.System Syntax -2.Output Use .D. January 11-15.b.a. 2010 21 / 59 Francesco Andreoli .

I.b.Statistical Syntax -3. insheet insheet varlist using ﬁlename .D.Andrea Bonfatti (Università di Verona) Stata Tutorial ..System Syntax -2. | <space> <tab> Between the most important features: tab: to indicate that data are devided by tabs.Introduction -1.Output Use Delimited Format Data Each variable realization is separated from the other by a given separating character or by tabulation Delimiters: . options . .txt comma: to indicate that data are devided by commas.Environment -2. 2010 22 / 59 Francesco Andreoli .a.E. January 11-15.csv delimiter: speciﬁes the delimiting object between quotations clear: to clean other data stored in memory C.

b.Environment -2.a.Each varible is identiﬁed depending on the required space.System Syntax -2.Output Use Not Delimited Format Data .D. in a . January 11-15. using(ﬁlename2) clear Build a dictionary ﬁle infix dictionary using datafile.E. 2010 23 / 59 Francesco Andreoli . while each observation is a row .Introduction -1.txt data ﬁle .Andrea Bonfatti (Università di Verona) Stata Tutorial .Starting from the left. position is determined in terms of columns.ext { var1 s1-e1 var2 s2-e2 var3 s3-e3 } C.A dictionary set boundary limits of each variable inﬁx infix using dictﬁlename if in .I.Statistical Syntax -3.

replace saveold In text format outsheet outsheet varlist using filename if in .b. January 11-15.Introduction -1. 2010 24 / 59 Francesco Andreoli . not the label replace overwrite the existing ﬁle C.E. options comma data separated by “.a.Environment -2. replace .Output Export Data In Stata format save & saveold save ﬁlename ﬁlename .Andrea Bonfatti (Università di Verona) Stata Tutorial .Statistical Syntax -3.” nolabel export the numeric value.” (usually .I.D.csv) delimiter(“char”) other delimiter. for instance “.txt) instead of tabulation (.System Syntax -2.

Introduction -1.I. January 11-15.System Syntax -2.Output Key Variable(s) Deﬁnition A key variable(s) is that variable (or set of variables) which uniquely identiﬁes each observation How to identify the key variable duplicates report duplicates report varlist if in C.E.Statistical Syntax -3.b.Environment -2. 2010 25 / 59 Francesco Andreoli .Andrea Bonfatti (Università di Verona) Stata Tutorial .D.a.

Statistical Syntax -3.Introduction -1. 2010 26 / 59 Francesco Andreoli .Environment -2.Andrea Bonfatti (Università di Verona) Stata Tutorial .System Syntax -2.b. January 11-15.I.E.a.D.Output Qualiﬁers in and if in restricts the set of observations to which a command applies it refers to the rows identifying the observations not applicable to all commands not sensitive to the sorting of data if speciﬁes the conditions for the execution of a command it applies to the values of variables and always refers to observations not applicable to all commands not sensitive to the sorting of data it requires relational qualiﬁers C.

System Syntax -2.E.D.Output Relational-Logical-Jolly Operators Relational operators > strictly greater of < strictly less of >= greater or equal to <= less or equal to == equal to (note the use of the double sign ==) ˜= or != diﬀerent from Logical operators & (and) it requires that both relations hold | (or) it requires that at least one of the relations holds Jolly characters * any character and for whatever number of times ? any character for one time only . this espression depends on the order of variables!) January 11-15.a contiguous series of variables. 27 / 59 Francesco Andreoli .I.a. (Note.Statistical Syntax -3.Environment -2.Andrea Bonfatti (Università di Verona) Stata Tutorial .b. 2010 C.Introduction -1.

Introduction -1.D.I.Output by and bysort by repeats the command for each group of observations for which the values of the variables in varlistare the same. Without the sort option by requires that the data be sorted by varlist bysort performs the sorting of varlistand then repeats the command by and bysort by varlist: command command bysort varlist: Not all commands are byable.Environment -2. that is support bysort C.System Syntax -2.Andrea Bonfatti (Università di Verona) Stata Tutorial January 11-15.b.Statistical Syntax -3. 2010 28 / 59 .a.E. Francesco Andreoli .

D.System Syntax -2.b. memory_options memory_options short less information and memory space allocated.Output Describe and Label describe describe varlist . number of variables.a. constants) compact yields a more concise report on variables C. number of observations detail more detailed information fullnames variable names not abbreviated codebook codebook varlist if in . options notes displays the notes associated to the variables tabulate(#) shows the values of categorical variables problems detail reports problems to the dataset (missing variables.Statistical Syntax -3.Introduction -1.I. variables without label. 2010 29 / 59 Francesco Andreoli .Andrea Bonfatti (Università di Verona) Stata Tutorial .E. January 11-15.Environment -2.

label define label_name #1 “desc 1” add modify nofix #2 “desc 2” .. ...System Syntax -2.a.I.b. options C..Andrea Bonfatti (Università di Verona) Stata Tutorial ..Environment -2.Statistical Syntax -3.D.Output Describe and Label Put a label to your variable (var# by default) label variable varname “label” Deﬁne a label (a). 2010 30 / 59 Francesco Andreoli .E.Introduction -1.#n “desc n” .. and label your values (b) label values varname label_name . January 11-15.

Output Rename Variables Put a new name on variables rename old_varname new_varname renvars renvars varlist \ newvarlist . 2010 31 / 59 Francesco Andreoli .I. display test varlist .D.a. display test symbol(string) display displays each change upper convert the names in upper case lower convert the names in lower case prefix(str) assign the preﬁx str to the name postfix(str) add str at the end of the name subst(str1 str2) replace all str1 with str2 (str2 can be empty) trim(#) take only the ﬁrst # characters of the name trimend(#) take only the last # characters of the name C.Statistical Syntax -3.Introduction -1.b.System Syntax -2.Environment -2.Andrea Bonfatti (Università di Verona) Stata Tutorial . January 11-15.E. transformation_option .

2010 32 / 59 Francesco Andreoli .D.Output Modify Data Your original database (n × k) can be integrated. C. compressed or shaped: Add observations: type help append You add observations to your database from other data sources (m observations) obtaining a new database (n + m) × k with m > 0 and k = k required to have a balanced sample (otherwise missing values generated for surplus variables).a.E.b.System Syntax -2.Introduction -1.I. 3} is created. Both databases used must have the same key variable(s). showing if missing observations result from merging.Andrea Bonfatti (Università di Verona) Stata Tutorial . A variable _merge∈ {1. January 11-15. 2.Statistical Syntax -3.Environment -2. Add variables: type help merge or help mmerge You add variables to your database from other data sources (h variables) obtaining a new database n × (k + h) with h > 0 and n = n required to have a balanced sample (no missing observations).

. C. Note that information lost cannot be restored from the last database saved.Introduction -1.. Transform but not preserve information: type help collapse Transform a (n × m) × k database in a m × k format.Environment -2. It is not required that all n groups display m observations. sd.Andrea Bonfatti (Università di Verona) Stata Tutorial .. You loose information but you can work with subsample group averaged data..System Syntax -2.D.E. where k ≥ k contains statistics of k as mean.Output Modify Data Transform and preserve information: type help reshape Transform a (n × m) × (k × h) database in a n × (k × h × m) format (wide option) or in a (n × m × h) × k format (long option).b. freq. 2010 33 / 59 Francesco Andreoli . January 11-15. count.a.Statistical Syntax -3.I.

options varname Keep or drop observations keep if condition drop if condition sample # if in .Environment -2.E.Output Sort..Andrea Bonfatti (Università di Verona) Stata Tutorial . 2010 34 / 59 Francesco Andreoli .a.System Syntax -2.D.. count by(groupvars) Keep or drop variables keep (or drop) varlist C. Keep of Drop Variables/Observations Order variables order varlist move varname1 varname2 Sort observations sort varlist in gsort +|- .I.b.Introduction -1. . stable +|varname . January 11-15.Statistical Syntax -3.

xn) returns the minimum value of x1.Andrea Bonfatti (Università di Verona) Stata Tutorial .. xn sum(x) returns the running sum of x treating missing values as zero uniform() returns uniformly distributed pseudorandom numbers on the interval [0..Environment -2.xn) returns the maximum value of x1.. .System Syntax -2. upper(s) return s in lower (upper) case letters C.b..1) invnormal() returns the inverse cumulative standard normal distribution lower(s). 2010 35 / 59 Francesco Andreoli . They are column operators which return a new column of values in the database.Statistical Syntax -3... x2.E. generate generate type newvarname=exp if in An algebraic function between existing variables abs(x) generate the absolute value of each value of the variable x int(x) returns the integer obtained by truncating x toward 0 ln(x) returns the natural logarithm of x max(x1. .. xn min(x1.. January 11-15.. x2.x2.Output Create Variables These commands allow to operate with numeric variables only.I.D.a.....Introduction -1..x2.

.D.Output Create Variables Advanced generate command (for column-wide functions) egen type newvarname = fcn(arguments) if in .Andrea Bonfatti (Università di Verona) Stata Tutorial .b. . for the groups formed by varlist Replace values of a variable replace varname =exp if in C.System Syntax -2.a. 2. January 11-15. treating missing as 0 group(varlist) creates one variable taking on values 1.I. 2010 36 / 59 Francesco Andreoli .. options count(exp) creates a constant (within varlist) containing the number of nonmissing observations of exp mean(varlist) creates a constant (within varlist) containing the mean of exp rowtotal(varlist) creates the (row) sum of the variables in varlist.Introduction -1.Statistical Syntax -3.Environment -2.E.

E. 2010 Francesco Andreoli . or (0) otherwise. or: 1) Recode your data into a dummy variable recode varlist (erule) (erule) .Andrea Bonfatti (Università di Verona) Stata Tutorial .D.I. when the character of interest is present. options prefix(string) create new variables with the preﬁx string This command can be used simply to change sequences of values 2) Follow a programming procedure char varname omit value xi .System Syntax -2.. To generate the variable you can either use generate and than replace missing values generated.Statistical Syntax -3. January 11-15.Introduction -1. generate(newvar) create a new variable if in .a. 37 / 59 char speciﬁes the reference variable of a set of dummies (to evitate perfect collinearity) term speciﬁes with a i.b.. prefix(string) : term(s) C.varname the variables that must be converted in dummies.Environment -2.Output Create Variables: Dummy Variables Dummy variables: variables taking on the values (1).

min.b. abspct.Statistical Syntax -3. sd. p75. options Percentiles on variables: generate pctile values or ranking function pctile type newvar = exp if in if in weight .Environment -2. options C. p25. p95.System Syntax -2. January 11-15. 2010 38 / 59 xtile newvar = exp weight . mean. se. p99. options where the main option is stats() with these possibilities: n. p5.I. p50 or median. p1.Introduction -1. options Francesco Andreoli .Output Continuous Variables To obtain statistics as output coeﬃcients summarize varlist if in weight .D.E. max To obtain statistics on the mean (like ci and se) ci varlist if in weight . vari. miss. detail To obtain statistics between data fsum varlist weight if in .a.Andrea Bonfatti (Università di Verona) Stata Tutorial .

System Syntax -2. generate(newvar1 newvar2 ) p(#) in . options C. correlate_options Compute the correlation (2) pwcorr varlist if in weight .Output Continuous Variables Compute the correlation (1) correlate varlist if in weight .a.Environment -2.Andrea Bonfatti (Università di Verona) Stata Tutorial . pwcorr_options obs print the number of observations for each couple of variables sig print the signiﬁcance level of the correlation star(#) display with the sign * signiﬁcance levels less than # bonferroni use Bonferroni-adjusted signiﬁcance level sidak use Sidak-adjusted signiﬁcance level Check for outliers hadimvo varlist if grubbs varlist if in .E. January 11-15..) generate dummy variables for identifying outliers Francesco Andreoli .I. 2010 39 / 59 drop eliminate the observations identiﬁed as outliers generate(newvar1 .Introduction -1.b..D.Statistical Syntax -3.

Andrea Bonfatti (Università di Verona) Stata Tutorial January 11-15. tabulate_options weight . frequences and missing observations fre varlist if in weight .System Syntax -2.Environment -2.I.Output Discrete Variables Table of frequences for single variable(s) tabulate varname if tab1 varlist if in in weight . tab1_options missing include missing values nolabel display numeric codes rather than value labels sort display the table in descending order of frequency Generate a table of counts.b. 2010 .Statistical Syntax -3.D.E. options nomissing omit missing values from the table nolabel omit labels include(numlist) include only values speciﬁed in numlist ascending display rows in ascending order of frequency descending display rows in discending order of frequency C.Introduction -1.a. 40 / 59 Francesco Andreoli .

Introduction -1. options report Fisher’s exact test gamma report Goodman and Kruskal’s gamma column report the relative frequency within its column of each cell row report the relative frequency within its row of each cell cell report the relative frequency of each cell nofreq do not display frequencies (use only with column.a.System Syntax -2.I.D.Environment -2.E. sd) for varname3 Cross-tabulation of more than 2 variables.Output Discrete Variables Cross-tabulation of 2 variables tabulate varname1 varname2 chi2 report Pearson’s χ exact (#) 2 if in weight . options C.Statistical Syntax -3. row or cell) summarize(varname3) report summary statistics (mean.Andrea Bonfatti (Università di Verona) Stata Tutorial .b. by values tab2 varlist if in weight . January 11-15. 2010 41 / 59 Francesco Andreoli .

. options where in rowvar colvar supercolvar we put categorical variables (up to 3).Introduction -1.b.I.Statistical Syntax -3. by(superrowvarlist) variables to be treated as superrows (up to 4) contents(clist) contents of the table’s cells. where clist may contain up to 5 statistics mean varname mean sd varname standard deviation sum varname sum n varname count of nonmissing observations max.D. p99 varname percentiles iqr varname interquartile range (p75-p25) C..Output Tables of Statistics table table rowvar colvar supercolvar if in weight .Environment -2. 2010 42 / 59 Francesco Andreoli . min varname maximum and minimum value median varname median p1. January 11-15.Andrea Bonfatti (Università di Verona) Stata Tutorial .E.a.System Syntax -2.

in by(varname) a categorical variable and among the options in statistics() we can choose: mean n sum max.Introduction -1.Andrea Bonfatti (Università di Verona) Stata Tutorial .D.p25 C.min iqr interquartile range = p75 .I.System Syntax -2. by(varname) options where in varlist we place a list of continuous variables. January 11-15.Environment -2.Output Tables of Statistics tabstat tabstat varlist if in weight .Statistical Syntax -3.. p99 range = max ..a. 2010 43 / 59 Francesco Andreoli . min sd cv coeﬃcient of variation (sd/mean) semean standard error of mean (sd/sqrt(n)) skewness index of skewness kurtosis index of kurtosis p1.E.b.

sd. means) that you like to compare.Andrea Bonfatti (Università di Verona) Stata Tutorial . once a dicotomus variable identifying groups is selected (use missing values in this dummy varible to select only two subgroups which do not exhaust the sample dimension n).a.Introduction -1.b.Environment -2.I. You can specify distributions. You can specify distributions. Other more speciﬁc tests can be downloaded and installed.Statistical Syntax -3. Test for standard deviations equality: help sdtest Performs t-test for 1) one variable sample sd equality to a constant 2) one variable two subsample sd equality 3) two variables sd equality 4) two variables two subsample sd equality. 44 / 59 Francesco Andreoli . 2010 C. Here is a list of general features: Test for means equality: help ttest Performs t-test for 1) one variable sample mean equality to a constant 2) one variable two subsample means equality 3) two variables means equality 4) two variables two subsample means equality. You can also use the command to perform tests under subgroups mean equality for the same variable. You can use the test_command i to obtain t-tests from inputted data (n.Output Sample Tests Tests apply to variables and allow to compare statistical signiﬁcance of estimates against a null hp on the whole sample observed or for its subgroups.D.E. January 11-15.System Syntax -2.

. lfit.b. for univariate or multivariate representations. The objective is to obtain a classical 2 coordinates graph. C. bar.I. mband. as well as statistical constructions. graphs are built sequentially and all the objects will appear in the same space.Statistical Syntax -3.E. distributions. like scatter plots.a. regression ﬁt. Options and in. Using by()..Environment -2. This is the broadest family of graphics. In this way. connected points.Andrea Bonfatti (Università di Verona) Stata Tutorial . you obtain separeted graphs according to the variable you want to be conditioned to. 2010 45 / 59 Francesco Andreoli ... spike.). tsline.Output Graphs help graph This is the general command for all graphs. help twoway This command applies to bivariate graphics. if must be speciﬁed for each graphic tool you use. lpoly.System Syntax -2. bacause they refer to a particular set of data in use.. connected. conﬁdence intervals. January 11-15. function. you have to select (in pairs) the variables that you want to put in the same graph and the type of graph linking the two variables (ex: scatter. You can add graphs options by looking at the help.Introduction -1. Once twoway is declared. From here you can start searching the graphic style that you need. One graph may contain diﬀerent series (ex: time against income and consumption) or diﬀerent objects (ex: observed and predicted values).D.

censoring and selection bias corrections.E. help tsset can be used to declare a time series structure of your data.b. and than proceed with usual regression techniques. In particular.Andrea Bonfatti (Università di Verona) Stata Tutorial . and how to build econometrics models in Stata. 3SLS.D. Using the command help regress_postestimationts you also obtain information on post-estimation tests and model application syntax.Output Regression Analysis Stata econometrics models can be included into 5 large families. January 11-15. You are invited to read on the help the full characterization of commands. quantile regression.Statistical Syntax -3. Using the command help regress_postestimation you also obtain information on post-estimation tests and model application syntax.a. limited dependent variables methods. 2010 46 / 59 Francesco Andreoli . 2) Time series econometrics You are invited to look at help time. treatment eﬀects models. you will ﬁnd all the list of commands associated with time series estimations. IV.System Syntax -2. C. In each help ﬁle you also ﬁnd examples and interpretations of output results. including OLS.Introduction -1.Environment -2. but only the ﬁrst one will be analysed.I. systems of equations. 1) Cross-section econometrics You are invited to look at help regress for the admissible regression commands.

I.System Syntax -2.Output Regression Analysis 3) Panel data econometrics You are invited to look at help xt. while with help xtreg_postestimation you also obtain information on post-estimation tests and model application syntax.a. help xtreg oﬀers a wide explanation of panel-data analysis techniques.Introduction -1. January 11-15. 4) Survey data analysis See help survey for all the details on data setting and regression techniques 5) Spatial econometrics See help spatreg or spatwmat for geographically located data. C.D. you will ﬁnd all the list of commands associated with the panel dimesion of a database. 2010 47 / 59 Francesco Andreoli .Statistical Syntax -3.Andrea Bonfatti (Università di Verona) Stata Tutorial .b.E.Environment -2. Commands available for Stata 10 or superior. In particular.

Andrea Bonfatti (Università di Verona) Stata Tutorial . GMM or limited info max likelihood. C.b. January 11-15. IV. 2010 48 / 59 Francesco Andreoli .Environment -2.Introduction -1.a. noc options options refers to regression speciﬁc options or estimation correction (robust se..I. multiple logit regress depvar if in weight indepvars if in weight . constant.Statistical Syntax -3.Output Regression Analysis Common models: OLS.System Syntax -2.D.) estimator IV can be performed by 2SLS.E.. noc options ivregress estimator depvar probit depvar mlogit depvar indepvars indepvars varlist1 if if in in (varlist2=varlist_iv) weight weight . options . options . probit.

System Syntax -2.D. 2010 49 / 59 Francesco Andreoli .Output Regression Anlysis Other regression models: List I areg an easier way to ﬁt regressions with many dummy variables arch regression models with ARCH errors arima ARIMA models boxcox Box-Cox regression models cnreg censored-normal regression cnsreg constrained linear regression eivreg errors-in-variables regression frontier stochastic frontier models heckman Heckman selection model intreg interval regression ivregress single-equation instrumental-variables regression ivtobit tobit regression with endogenous variables newey regression with Newey-West standard errors qreg quantile (including median) regression C.I.E.Introduction -1.a.Andrea Bonfatti (Università di Verona) Stata Tutorial .Environment -2.b. January 11-15.Statistical Syntax -3.

Output Regression Anlysis Other regression models: List II tobit tobit regression treatreg treatment-eﬀects model truncreg truncated regression xtabond Arellano-Bond linear dynamic panel-data estimation xtdpd linear dynamic panel-data estimation xtfrontier panel-data stochastic frontier model xtgls panel-data GLS models xthtaylor Hausman-Taylor estimator for error-components models xtintreg panel-data interval regression models xtivreg panel-data instrumental variables (2SLS) regression xtpcse linear regression with panel-corrected standard errors xtreg ﬁxed.E.System Syntax -2.D.and random-eﬀects linear models with an AR(1) disturbance xttobit panel-data tobit models January 11-15.a.Introduction -1.b. 50 / 59 Francesco Andreoli .Environment -2.Andrea Bonfatti (Università di Verona) Stata Tutorial . 2010 C.Statistical Syntax -3.I.and random-eﬀects linear models xtregar ﬁxed.

Predict calculates predictions. residuals.System Syntax -2.I.Andrea Bonfatti (Università di Verona) Stata Tutorial .D.E.Output Post Estimation Any Post Estimation command must be used immediately after a regression model. 2010 51 / 59 Francesco Andreoli . and the like after estimation Predict regression output as new data predict statistic xb linear prediction. statistic C.Environment -2. You can save any model with a name and then proceed in post estimations recalling the model name. and in any case it refers to last estimates stored in Stata memory. the default residuals residuals rstandard standardized residuals rstudent studentized (jackknifed) residuals stdp standard error of the linear prediction stdr standard error of the residual type newvar if in .Statistical Syntax -3. January 11-15.Introduction -1.b.a. inﬂuence statistics.

Andrea Bonfatti (Università di Verona) Stata Tutorial .E.D.I.Output Post Estimation Test linear hypothesis on coeﬃcients test coeﬂist test exp=exp =.a.. 2010 52 / 59 Francesco Andreoli .Introduction -1. Test on residuals: normality and zero-mean sktest varname ttest varname = 0 acatter/mband Test on residuals: heteroskedasticity by graph and tests rvplot estat hettest estat imtest whitetst C. January 11-15.b.Environment -2.Statistical Syntax -3..System Syntax -2.

System Syntax -2.Environment -2. 2010 53 / 59 Francesco Andreoli .D. January 11-15.Statistical Syntax -3.E.a.b.Output Post Estimation Test on variables: inﬂuential values dfbeta Logic of the dfbeta test: 1 2 3 Estimation with the complete sample Estimation without the i-th observation Comparison Test on variables: collinearity vif collin NOTE: values of VIF > 10 deserve attention C.Andrea Bonfatti (Università di Verona) Stata Tutorial .I.Introduction -1.

D. add columns).Output Output: Interpreting Results from computations are showed in the Results Window and stored in memory by the . the mean of a variable. In this Programming section we will see how output coeﬃcients. Remember that all the syntax previously presented applies exclusively to data stored in memory.a.e. its standard deviation.Andrea Bonfatti (Università di Verona) Stata Tutorial . we can perform advanced econometrics on data. Stata oﬀers other opportunites to list estimates and coeﬃcients in a proper way (ex.I.System Syntax -2.log ﬁle. matrices (ex. 2010 54 / 59 Francesco Andreoli . or an R2 estimation) can be properly represented and used. with standard errors between brackets) whithout requiring additional changes in estimates values. C.b. To apply this syntax to your output coeﬃcients (for example compute the average of average values obtained by bootstrapping) you need to transform your results in data format (i.Introduction -1.Environment -2. By matrix algebra. vectors (ex. January 11-15. the regression parameters).E.Statistical Syntax -3. the variance-covariance matrix of regression coeﬃcients) or scalars (ex. You can paste your output directly on your paper (or in other spreadsheet to rework them) by using the tables reported in the .log ﬁle.

January 11-15. N.Statistical Syntax -3. you need to perform your tests or report coeﬃcients independently from your output windows results (ex: t-test for means).D.Andrea Bonfatti (Università di Verona) Stata Tutorial . Therefore.b. 2 Save your results and estimates in the memory by using r() or e() respectively.System Syntax -2. In this way you generate a new object.Introduction -1. 2010 55 / 59 Francesco Andreoli .a.: addstat(r2. save regression results outreg varlist using ﬁlename . coefstar signiﬁcance opt. 3 Use your objects or display them in the output ﬁles. 3aster stat opt.Output Show Estimates Your code must be operable even after that changes in your data occurs (by merge or append). The general procedure is the following: 1 Perform your statistical procedures or your estimations (like summarize or regress). Stata allows to save the estimates as scalars or vectors/matrices.I.: nol. options text opt.: bdec().: se | p | ci | beta. bracket. Alternatively.Environment -2. xstats other opt.E. F). append C. addn() coeﬃcients opt. title().: replace.

Show estimates After statistics: see help return After regressions: see help ereturn Scalar deﬁne. 2010 56 / 59 Francesco Andreoli .Output Show Estimates See help postest for a general description of post estimations commands.Environment -2. use and report estimates.a.E.Andrea Bonfatti (Università di Verona) Stata Tutorial . January 11-15. help estimates provides all the info to store.Statistical Syntax -3.b.I. after summary statistics or table are reported scalar define scalar_name = exp scalar list scalar_list scalar drop scalar_list C.Introduction -1. All commands work iﬀ there is some result stored in the memory.D.System Syntax -2.

drop(coeflist) to eliminate some coeﬀ from the table.E. p specify the format of coeﬀ reported and additioal stats. To use results t(names). 2010 57 / 59 Francesco Andreoli .I.2f).b.D. options stats(N r2 ll chi2 aic bic rank) reports statistics in the table. nocopy To report statistics and estimates tables from model name estimates stats namelist estimates table namelist .Environment -2. r(coef). b(%9. r(stats) C. January 11-15.Statistical Syntax -3. use: To store estimates in the memory (use name of your model) estimates store name estimates dir scalar drop name_list .a. t.System Syntax -2.Andrea Bonfatti (Università di Verona) Stata Tutorial .Introduction -1. se. n(#) . keep(coeflist).Output Show Estimates After a regression command.

you can use previously seen commands. See help mata for review of commands.Output Work with Data as Matrices Data can be exported as a matrix n × k (a vector is a single column matrix). If you operate with matrices. C. predictions) can be transformed in data from matrices.Environment -2.Andrea Bonfatti (Università di Verona) Stata Tutorial . If you transform matrices in data (see the data editor). any other command will not be working.System Syntax -2. ¯ X = (e e)−1 e X).D.Statistical Syntax -3. January 11-15. Use help matrix for an exhaustive review of commands to be used. New data created (ex. any statistics or estimation can be performed and result listed in the matrix format.E. To perform more complicated computation or build your program.Introduction -1. 2010 58 / 59 Francesco Andreoli . use the Mata environment which allows to use Stata more interactively. available in Stata 9 or superior.b. use algebra to compute statistics (ex.I.a..

Statistical Syntax -3.Environment -2....System Syntax -2.][\#[.Andrea Bonfatti (Università di Verona) Stata Tutorial . #..b.][\[.a.Introduction -1. #.I. 2010 59 / 59 Francesco Andreoli . January 11-15. matrix(matname) nomissing rownames(varname) svmat matname . Input Matrix mkmat varlist if in ... names( col | eqcol | matcol | string ) matrix input matname = (#[. Matrix → Data.D.Output Work with Data as Matrices Set the size of a matrix set matsize # Data → Matrix.E.]]]) Operations with matrices matrix define matname = matrix_expression C.

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd