**(ver. 5.8) Oscar Torres-Reyna
**

Data Consultant

otorres@princeton.edu

December 2007 (first draft)

http://dss.princeton.edu/training/

PU/DSS/OTR

**Stata Tutorial Topics
**

What is Stata? Stata screen and general description First steps: Setting the working directory (pwd and cd ….) Log file (log using …) Memory allocation (set mem …) Do-files (doedit) Opening/saving a Stata datafile Quick way of finding variables Subsetting (using conditional “if”) Stata color coding system From SPSS/SAS to Stata Example of a dataset in Excel From Excel to Stata (copy-and-paste, *.csv) Describe and summarize Rename Variable labels Adding value labels Creating new variables (generate) Creating new variables from other variables (generate) Recoding variables (recode) Recoding variables using egen Changing values (replace) Indexing (using _n and _N) Creating ids and ids by categories Lags and forward values Countdown and specific values Sorting (ascending and descending order) Deleting variables (drop) Dropping cases (drop if) Extracting characters from regular expressions Merge Append Merging fuzzy text (reclink) Frequently used Stata commands Exploring data: Frequencies (tab, table) Crosstabulations (with test for associations) Descriptive statistics (tabstat) Examples of frequencies and crosstabulations Three way crosstabs Three way crosstabs (with average of a fourth variable) Creating dummies Graphs Scatterplot Histograms Catplot (for categorical data) Bars (graphing mean values) Data preparation/descriptive statistics(open a different file): http://dss.princeton.edu/training/DataPrep101.pdf Linear Regression (open a different file): http://dss.princeton.edu/training/Regression101.pdf Panel data (fixed/random effects) (open a different file): http://dss.princeton.edu/training/Panel101.pdf Multilevel Analysis (open a different file): http://dss.princeton.edu/training/Multilevel101.pdf Time Series (open a different file): http://dss.princeton.edu/training/TS101.pdf Useful sites (links only) Is my model OK? I can’t read the output of my model!!! Topics in Statistics Recommended books

PU/DSS/OTR

What is Stata?

• It is a multi-purpose statistical package to help you explore, summarize and analyze datasets. • A dataset is a collection of several pieces of information called variables (usually arranged by columns). A variable can have one or several values (information for one or several cases). • Other statistical packages are SPSS, SAS and R. • Stata is widely used in social science research and the most used statistical software on campus.

Features Learning curve User interface Data manipulation Data analysis Graphics Cost Stata Steep/gradual Programming/point-and-click Very strong Powerful Very good Affordable (perpetual licenses, renew only when upgrade) SPSS Gradual/flat Mostly point-and-click Moderate Powerful Very good Expensive (but not need to renew until upgrade, long term licenses) SAS Pretty steep Programming Very strong Powerful/versatile Good Expensive (yearly renewal) R Pretty steep Programming Very strong Powerful/versatile Excellent Open source (free)

PU/DSS/OTR

This is the Stata screen… PU/DSS/OTR .

and here is a brief description … PU/DSS/OTR .

pwd h:\statadata To change the working directory to avoid typing the whole path when calling or saving files.First steps: Working directory To see your working directory. cd "h:\stata and data" h:\stata and data PU/DSS/OTR . for example cd “h:\stata and data” . cd c:\mydata c:\mydata Use quotes if the new directory has blank spaces. type: cd c:\mydata . type pwd .

sort of Stata’s built-in tape recorder and where you can: 1) retrieve the output of your work and 2) keep a record of your work.). replace Note that the option replace will delete the contents of the previous version of the log.log. To close a log file type: log close To add more output to an existing log file add the option append. You can read it using any word processor (notepad. type: log using mylog. word.log. append To replace a log file add the option replace. type: log using mylog. In the command line type: log using mylog. etc.log’ in your working directory.log This will create the file ‘mylog.First steps: log file Create a log file. PU/DSS/OTR .

*To allow more variables type set maxvar 10000 PU/DSS/OTR . depending on the size you can type. following alternatives: 1.First steps: set the correct memory allocation If you get the following error message while opening a datafile or adding more variables: no room to add more observations An attempt was made to increase the number of observations beyond what is currently possible. Store your variables more efficiently. Increase the amount of memory allocated to the data area using the set memory command. 3. see help memory.254M 703. see help drop.000M 1. data space max. RHS vars in models memory usage (1M = 1024k) 1. set mem 700m Current memory allocation settable set maxvar set memory set matsize current value 5000 700M 400 description max. Some big datasets need more memory. rectangle. variables allowed max. You have the (Think of Stata's data area as the area of a You need to set the correct memory allocation for your data or the maximun number of variable allowed.909M 700.163M Note: If this does not work try a bigger number.) Drop some variables or observations. 2. see help compress. Stata can trade off width and length. for example: set mem 700m .

princeton.First steps: do-file Do-files are ASCII files that contain of Stata commands to run specific procedures. You can use any word processor and save the file in ASCII format or you can use Stata’s ‘do-file editor’ with the advantage that you can run the commands from there.edu/~otorres/Stata/ PU/DSS/OTR . It is highly recommended to use do-files to store your commands so do you not have to type them again should you need to re-do your work. Type: doedit Check the following site for more info on do-files: http://www.

pdf PU/DSS/OTR PU/DSS/OTR .princeton. just type use mydatafile To save a data file from Stata go to file – save as or just type: save. run Stata and you can either: • Go to file->open in the menu.dta) To open files already in Stata with extension *. replace If the dataset is new or just imported from other format go to file –> save as or just type: save mydatafile /*Pick a name for your file*/ For ASCII data please see http://dss.dta.edu/training/DataPrep101.dta” If your working directory is already set to c:\mydata. or • Type use “c:\mydata\mydatafile.First steps: Opening/saving Stata files (*.

You will need to be creative with your keyword searches to find the variables you need. type: lookfor educ .First steps: Quick way of finding variables (lookfor) You can use the command lookfor to find variables in a dataset. PU/DSS/OTR PU/DSS/OTR . It always recommended to use the codebook that comes with the dataset to have a better idea of where things are. lookfor will look for the keyword ‘educ’ in the variable name and labels.0g value label variable label Education of R. lookfor educ variable name educ storage display type format byte %10. for example you want to see which variables refer to education.

robust /*Scatterplots when gender = 1 and age < 33*/ scater var1 var2 if gender==1 & age<33 “if” goes at the end of the command BUT before the comma that separates the options from the command.First steps: Subsetting using conditional ‘if’ Sometimes you may want to get frequencies. crosstabs or run a model just for a particular group (lets say just for females or people younger than certain age). for example: /*Frequencies of var1 when gender = 1*/ tab var1 if gender==1. column row /*You can do the same with crosstabs: tab var1 var2 … */ /*Regression when gender = 1 and age < 33*/ regress y x1 x2 if gender==1 & age<33. column row /*Frequencies of var1 when gender = 1 and marital status = single*/ tab var1 if gender==1 & marital==2 | marital==3 | marital==4. PU/DSS/OTR PU/DSS/OTR . column row /*Frequencies of var1 when gender = 1 and age < 33*/ tab var1 if gender==1 & age<33. You can do this by using the conditional ‘if’.

You can’t do any statistical procedure with this variable other than simple frequencies Var3 is a numeric You can do any statistical procedure with this variable For var1 a value 2 has the label “Fairly well”. Stata has a color-coded system for each type. Black is for numbers. It is still a numeric variable Var4 is clearly a string variable. You can do frequencies and crosstabulations with this but not statistical procedures. red is for text or string and blue is for labeled variables. PU/DSS/OTR PU/DSS/OTR . Var2 is a string variable even though you see numbers.First steps: Stata color-coded system An important step is to make sure variables are in their expected format.

First steps: graphic view Three basic procedures you may want to do first: create a log file (sort of Stata’s built-in tape recorder and where you can retrieve the output of your work). You can change it by typing cd c:\mydirectory 3 When dealing with really big datasets you may want to increase the memory: set mem 700m /*You type this in the command window */ To estimate the size of the file you can use the formula: Size (in bytes) = (8*Number of cases or rows*(Number of variables + 8)) PU/DSS/OTR . 1 Click on “Save as type:” right below ‘File name:” and select Log (*.log The log file will record everything you type including the output. This will create the file called Log1. 2 Shows your current working directory.smcl (Formatted Log) only Stata can read it. and set the correct memory allocation for your data. set your working directory.log (or whatever name you want with extension *. If you save it as *. It is recommended to save the log file as *.log) which can be read by any word processor or by Stata (go to File – Log – View).log).

pdf PU/DSS/OTR PU/DSS/OTR .sas7bcat” Type help usespss or help usesas for more details.sav” usesas using “c:\mydata. you may need to install it by typing ssc install usespss ssc install usesas Once installed just type usespss using “c:\mydata. If your data is already in SPSS format (*.edu/training/StatTransfer.edu/training/DataPrep101.sas7bcat). Two options: Option A) Use Stat/Transfer.sav) or SAS(*. For ASCII data please see http://dss.From SPSS/SAS to Stata If you have a file in SAS XPORT format you can use fduse (or go to file-import).pdf Option B) You can use the command usespss to read SPSS files in Stata or the command usesas to read SAS files. For SPSS and SAS.princeton.princeton. see here http://dss.

edu/~otorres/Stata/Students.Example of a dataset in Excel. Variables are arranged by columns and cases by rows.xls PU/DSS/OTR .princeton. Each variable has more than one value Path to the file: http://www.

is the data editor 3 .1 .To go from Excel to Stata you simply copy-andpaste data into the Stata’s “Data editor” which you can open by clicking on the icon that looks like this: Excel to Stata (copy-and-paste) 2 .Press Ctrl-v to paste the data from Excel… PU/DSS/OTR .This window will open.

Saving the dataset 2 . replace You can also use the menu.Close the data editor by pressing the “X” button on the upper-right corner of the editor NOTE: You need to close the data editor or data browser to continue working.save students. go to File – Save As 4 . in the command window type --. the data has been saved as students.dta PU/DSS/OTR .This is what you will see in the output window.Do not forget to save the file.The “Variables” window will show all the variables in your data 3 .1 .

Excel to Stata (using insheet) step 1 Another way to bring excel data into Stata is by saving the Excel file as *. click OK and YES… Go to the next page… PU/DSS/OTR . In Excel go to File->Save as and save the Excel file as *.csv: You may get the following messages.csv (commaseparated values) and import it in Stata using the insheet command.

csv) step 2 In Stata go to File->Import->”ASCII data created by spreadsheet”.csv" PU/DSS/OTR . 1 2 An alternative to using the menu you can type: insheet using "c:\mydata\mydatafile. Click on ‘Browse’ to find the file and then OK.Excel to Stata (insheet using *.

580 (99. describe Contains data from http://dss.0g %8.0g value label variable label ID Last Name First Name City State Gender Student Status Major Country Age SAT Average score (grade) Height (in) Newspaper readership Type help describe for more information… PU/DSS/OTR .9% of memory free) storage variable name type id lastname firstname city state gender student status major country age sat averagescoreg~e heightin newspaperread~k byte str5 str6 str14 str14 str6 str13 str8 str9 byte int byte byte byte display format %8.0g %8.Command: describe To get a general description of the dataset and the format for each variable type describe .0g %9s %9s %14s %14s %9s %13s %9s %9s %8.0g %8.0g %8.dta obs: 30 vars: 14 29 Sep 2009 17:12 size: 2.edu/training/students.princeton.

8.658573 1.866667 6.279368 18 1338 63 59 3 39 2309 96 75 7 Type help summarize for more information… Use ‘min’ and ‘max’ values to check for a valid range in each variable. summarize Variable id lastname firstname city state gender studentsta~s major country age sat averagesco~e heightin newspaperr~k Obs 30 0 0 0 0 0 0 0 0 30 30 30 30 30 Mean 15. Dev.9 80.803408 Min 1 Max 30 Zeros indicate string variables 25. For example.5 Std.870226 275. .43333 4.Command: summarize Type summarize to get some basic descriptive statistics.11139 4.2 1848. ‘age’ should have the expected values (‘don’t know’ or ‘no answer’ are usually coded as 99 or 999) PU/DSS/OTR .1122 10.36667 66.

‘Cum.67 90. tab readnews Newspaper readership (times/wk) 3 4 5 6 7 Total Freq.’ provides a raw count of each value. 10 10 10 30 Percent 33.67 66. For example. The tables below are frequency tables.00 Cum.00 36. PU/DSS/OTR .Exploring data: frequencies Frequency refers to the number of times a value is repeated. ‘Cum.00 ‘Freq.33 33. 30% of the students in the sample read the newspaper 5 days a week. 20.’ is the cumulative frequency in ascending order of the values. 9 students read it 5 days a week. 33. tab major Major Econ Math Politics Total Freq.00 ‘Freq.33 100. Those who read the newspaper 3 days a week represent 20% of the sample.33 10. variable . In Stata use the command tab varname.00 100. 33. values are in ascending order. variable .67 100.67% of the students read the newspaper 3 to 5 days a week. ‘Percent’ gives the relative frequency for each value. 6 5 9 7 3 30 Percent 20.67 30. For example. Type help tab for more details.67% of the students are econ or math majors.33% of the students in this group are econ majors.00 16.’ Here 6 students read the newspaper 3 days a week.33 66.’ 66.00 23. Frequencies are used to analyze categorical data.00 100. In this case 10 students for each major. 66.00 Cum.33 33. ‘Percent’.

contents(freq mean Major Econ Math Politics Freq. table gender.73333 82 The mean age of females is 23 years.2 27.2 79. contents(freq mean age mean score) Gender Female Male Freq. 10 10 10 age mean sat mean mean(sat) 1806 1844 1896. The mean score is 78 for females and 82 for males. contents(freq mean age mean sat mean score mean readnews) .2 mean(score) 78. 15 15 mean(age) 23.Exploring data: frequencies and descriptive statistics (using table) Command table produces frequencies and descriptive statistics per category.8 85. for males is 27. Here are some examples.1 readnews) mean(read~s) 4.7 score mean mean(score) 76. For more info and a list of all statistics type help table.8 PU/DSS/OTR . type table gender.4 5.8 23 28. table major. Here is another example: table major. contents(freq mean age mean score) .3 4.9 mean(age) 23.

00 59 39.05 134 70.26 100. Not sure Refused Total NOTE: You can use tab1 for multiple frequencies or tab2 to run all possible crosstabs combinations. Below is a crosstab between the variable ‘ecostatu’ and ‘gender’.32 3 100.74 100. PU/DSS/OTR .94 22.66 139 39.60 7.60% are females.33 1.00 0. 59 are ‘female’ and believe the economy is doing ‘very well’ The second value in a cell gives you row percentages for the first variable in the xtab. tab ecostatu gender.48 628 45.35 191 100. var1 var2 The first value in a cell tells you the number of observations for each xtab.70 44.33% think the economy is doing ‘very well’ while 7.33 337 50.00 100.91 12 100.22 1.00 0.67 0. Type help tab for further details.13 57 29.Exploring data: crosstabs Also known as contingency tables.92 333 49. In this case.00 13.70 209 60.06 28. 14. 60.92% of females have the same opinion.00 0. We use the command tab var1 var2 Options ‘column’.87 3 100. Out of those who think the economy is doing ‘very well’. crosstabs help you to analyze the relationship between two or more categorical variables. column row Key frequency row percentage column percentage Status of Nat'l Eco Very well Gender of Respondent Male Female 90 60.84 9. 90 respondents are ‘male’ and said that the economy is doing ‘very well’.30 53. ‘row’ gives you the column and row percentages.00 0.99 10 83.34 0 0.373 100.08 2 16.85 670 100.00 25.00 Fairly well Fairly badly Very badly The third value in a cell gives you column percentages for the second variable in the xtab.80 348 100.00 10. .40 14. Among males.00 48.40% are males and 39.16 17.00 Total 149 100.00 745 54.

It does not appear to be a significant bias between males and females for this answer.80 628 45.05 134 70. 11% (10.00 0.00 13.36 31.00 392 47. In the table below we can see how opinions for males and females diverge from the national average. We could say here that males tend to be a bit more optimistic on the economy and females tend to be a bit less optimistic.00 10.00 100.33 1. while 46% of females thing the economy is bad (comparing to 39% aggregate).00 0.16 17.35 191 100. If we aggregate responses.80 348 100.04 10 66.67 0. a margin of error of ±4 percentage points can be used to indicate a significant difference (some use ±3).65 539 100.48 628 45.66 139 39.373 100. column row Key frequency row percentage column percentage Status of Nat'l Eco Very well Gender of Respondent Male Female 90 60.13 57 29.34 0 0.30 53.70 209 60. RECODE of ecostatu (Status of Nat'l Eco) Well Gender of Respondent Male Female 427 52.33 337 50.70 44.84 9.00 Not sure Refused Bad Total Not sure/ref Total recode ecostatu (1 2 = 1 "Well") (3 4 = 2 "Bad") (5 6=3 "Not sure/ref"). with range between 45% and 53%.14 67.40 14. With the margin of error.00 745 54. gen(ecostatu1) label(eco) PU/DSS/OTR .92 333 49.33 0.87 3 100. For example.00 0.26 15 100.22 1.67 1.74 100.34 745 54.62 343 63.00 39.86 52. this gives a range roughly between 7% and 15%.32 3 100. we could get a better picture.09 1.00 1.00 Total 149 100.99 196 36.26 100.373 100.91 12 100. .00 100.64 46.06 28.00 Fairly well Fairly badly Very badly As a rule-of-thumb.00 25.00 59 39. In the table below 68% of males believe the economy is doing well (comparing to 60% at the national level. The response for males is 54% and for females 45%.94 22.00 0.85 670 100.00 48.Exploring data: crosstabs (a closer look) You can use crosstabs to compare responses among categories in relation to aggregate responses.26 100.08 2 16. tab ecostatu gender.74 100. In the ‘fairly well’ category we have 49%.99 10 83.00 Total 819 100.60 7.00 59.21 5 33.85) answer ‘very well’ at the national level. rounding up the percentages. anything beyond this range could be considered significantly different (remember this is just an approximation). Males seem to be more optimistic than females.

Some apply to nominal variables some others to ordinal. Both go from -1 to 1.33 0.62 343 63. low to high).34 745 54.000 ASE = 0.00 33. lrchi2. Gamma and taub are measures of association between two ordinal variables (both have to be in the same direction.67 1. Here the V is 0. Taub is recommended for square tables.00 Pr = 0. Here both chi2 are significant. I am running all of them here for presentation purposes. column row nokey chi2 lrchi2 V exact gamma taub Enumerating sample-space stage 3: enumerations = stage 2: enumerations = stage 1: enumerations = RECODE of ecostatu (Status of Nat'l Eco) Well combinations: 1 16 0 Fisher’s exact test – For nominal data use chi2.00 39. In 2x2 tables.Exploring data: crosstabs (test for associations) To see whether there is a relationship between two variables you can choose a number of tests.000 Pr = 0.00 59.86 52. tab ecostatu1 gender.80 628 45. Here we reject the null and conclude that there is some kind of relationship between variables Bad Not sure/ref Total Pearson chi2(2) likelihood-ratio chi2(2) Cramér's V gamma Kendall's tau-b Fisher's exact PU/DSS/OTR .8162 0. Gamma is recommended when there are lots of ties in the data. lrchi2 reads the same way.36 31.050 ASE = 0.04 10 66.1553 Total 819 100. The null hypothesis (Ho) is that there is no relationship.026 0. Gender of Respondent Male Female 427 52.3095 0.26 100.21 5 33.14 67. i. It goes from 0 to 1 where 1 indicates strong association (for rXc tables).09 1.00 = = = = = = 392 47. negative to positive.00 1. Fisher’s exact test is used when there are very few cases in the cells (usually less than 5).15. tab ecostatu1 gender.1563 0.65 539 100. Cramer’s V is a measure of association between two nominal variables. Therefore we conclude that there is some relationship between perceptions of the economy and gender. The null is that variables are independent.e.64 46. column row nokey chi2 lrchi2 V exact gamma taub Likelihood-ratio χ2(chi-square) X2(chi-square) Goodman & Kruskal’s γ (gamma) Cramer’s V Kendall’s τb (tau-b) .99 196 36.00 100. It tests the relationship between two variables.000 X2(chi-square) tests for relationships between variables.373 100.26 15 100.05 (at 95% confidence). To reject this we need a Pr < 0. which shows a small association.74 100. closer to 1 a strong relationship.5266 33. the range is -1 to 1. V – For ordinal data use gamma and taub – Use exact instead of chi2 when frequencies are less than 5 across the table. Negative shows inverse relationship.

5 4. To get the median you have to order the data from lowest to highest. 68% of the values are within 1 sd from the mean. tabstat age sat score heightin readnews. •The median (p50 in the table above) is the number in the middle . We use the command tabstat to get these stats.2 23 6. Variability refers to the spread of the data from the center value (i. s(mean median sd var count range min max) . It is the simple mean of the squared distance from the mean. Indicates how close the data is to the mean. •Range is a measure of dispersion. 95% within 2 sd and 99% within 3 sd •The variance measures the dispersion of the data from the mean. s(mean median sd var count range min max) age 25.71 30 971 1338 2309 score 80.9 1817 275.2 30 21 18 39 sat 1848. Location tells you the central value the variable (the mean is the most common measure of this) .636782 30 4 3 7 Type help tabstat for a complete list of descriptive statistics •The mean is the sum of the observations divided by the total number of observations.2402 30 33 63 96 heightin 66. •Max is the largest value in the variable. standard deviation). •Count (N in the table) refers to the number of observations per variable. tabstat stats mean p50 sd variance N range min max age sat score heightin readnews.5 10. If the number of cases is odd the median is the single value.11139 102. max – min.279368 1.Exploring data: descriptive statistics For continuous data use descriptive statistics. It is the difference between the largest and smallest value. PU/DSS/OTR . for an even number of cases the median is the average of the two numbers in the middle.1122 75686.866667 5 1. Assuming a normal distribution.658573 21. •The standard deviation is the squared root of the variance.e. Statistics is basically the study of what causes such variability. These statistics are a collection of measurements of: location and variability. variance.7023 30 16 59 75 readnews 4.36667 79.43333 66. •Min is the lowest value in the variable.870226 47.

e.43333 66. p50. PU/DSS/OTR .5 10.587 94609.55238 15 12 63 75 66.870226 47.46667 71 3.66012 113.4 63 3.2 28 6. gender.7023 30 16 59 75 readnews 5.2 23 6.5 4.42857 15 31 65 96 80.2 30 21 18 39 sat 1871.74 15 971 1338 2309 1826 1787 247.773899 45. N.533333 4 1. max by categories of: gender (Gender) gender Female age 23.6381 15 32 63 95 82 82 9.1122 75686.0752 61046.685714 15 9 59 68 69. etc.71 30 971 1338 2309 score 78.2 20 6. s(mean median sd var count range min max) by(gender) .457143 15 4 3 7 4.658573 21.2402 30 33 63 96 heightin 63. variance.943651 15.302013 1. age. s(mean median sd var count range min max) by(gender) Summary statistics: mean.Exploring data: descriptive statistics You could also estimate descriptive statistics by subgroups (i.207122 1.73333 79 10.88571 15 21 18 39 25. min.2 5 1.11139 102. sd.) tabstat age sat score heightin readnews.695238 15 4 3 7 4. range.636782 30 4 3 7 Male Total Type help tabstat for more options.8 1821 307.581359 43. tabstat age sat score heightin readnews.112188 9.31429 15 20 18 38 27.279368 1.14 15 845 1434 2279 1848.36667 79.9 1817 275.613978 92.866667 5 1.

33 33. Each represents 50% of the total cases.00 100.00 50.00 Gender Total 15 100.00 10 66.5 317. tab gender studentstatus. tab gender Key Crosstabulations (tab with two variables) .7 287.67 66. tab gender major. ‘sat’ variable is a continuous variable. 15 15 30 Percent 50.124892 2 1844 329. column row Gender Female Male Total Freq.43773 3 1743.Examples of frequencies and crosstabulations Frequencies (tab command) .3333 312.00 100. Female Male Total .43.8 307. Standard Deviations and Frequencies of SAT Gender Female Econ 1952.07518 15 1848. there are only 3 females with a major in econ.00 100.00 100.00 30 100. 50.00 frequency row percentage column percentage Student Status Graduate Undergrad 5 33.99326 8 2170 72.00 100.33 10 66.2857 155. sum(sat) Average SAT scores by gender and major. The first cell reads the average SAT score for a female whose major is econ is 1952.67 5 33.76928 10 Politics 2030 262.00 Cum.00 In this sample we have 15 females and 15 males.11218 30 Male Total PU/DSS/OTR . Notice.99994 6 1896.20687 10 Total 1871.00 15 100.9 275.67 66. Means.00 50.25052 4 1807.6146 7 1806 219.67 15 50.16559 10 Major Math 1762.3333 with a standard deviation 312.8333 288.58697 15 1826 247.33 15 50.33 33.00 50.

00 100.00 14.00 Male Total -> studentstatus = Undergraduate Key frequency row percentage column percentage Major Math 6 60.Three way crosstabs . bysort studentstatus: tab gender major.00 50.00 100. colum row frequency row percentage column percentage Major Math 2 40.00 6 40.00 85.00 33.00 1 20.00 100.00 4 26.33 3 20.50 8 53.29 7 46.00 66.50 5 50. column row -> studentstatus = Graduate Key bysort var3: tab var1 var2.33 10 100.00 0. colum row bysort studentstatus: tab gender major.00 Total 10 100.00 Politics 3 60.00 Gender Female Econ 3 30.00 66.67 100.33 100.00 100.00 PU/DSS/OTR Male Total .00 3 60.00 4 40.71 1 20.00 50.00 100.00 Politics 1 10.33 100.67 5 100.00 66.00 62.00 33.00 50.67 15 100.33 15 100.00 Total 5 100.67 1 10.00 2 13.67 100.00 33.00 37.00 Gender Female Econ 0 0.00 50.

.3333 61.6667 with a standard deviation of 2. Standard Deviations and Frequencies of SAT Gender Female Econ .30979 6 Major Math 1757. The third cell reads: The average SAT score of a female graduate student whose major is politics is 2092. bysort studentstatus: tab gender major.4 323.8333 208.6667 337.32924 5 1778.8669 8 Total 1966.75 324.Three way crosstabs with summary statistics of a fourth variable . Male Total -> studentstatus = Undergraduate Means.66819 4 Major Math 1777 373. sum(sat) -> studentstatus = Graduate Means.447222 2 Total 1824.25 154.97826 3 Politics 2092.6 257.36872 10 1920.6 317.35238 2 2221 0 1 1925 367.5 305.59952 7 Politics 1842 0 1 1919 0 1 1880.13.6667 282.2 300.8 122.3086 10 1841.5 54.25 154.38219 15 Average SAT scores by gender and major for graduate and undergraduate students. there are 3 graduate female students with a major in politics.43773 3 1855.711695 3 1903.3333 312. Standard Deviations and Frequencies of SAT Gender Female Econ 1952.72682 15 Male Total PU/DSS/OTR .13531 3 1785.6 284.2857 336.01197 6 2119 0 1 1809. 0 1659.23011 5 1856.32286 5 1900.82.66819 4 1659.

type: Before After label variable [var name] “Text” label label label label label variable variable variable variable variable id "Unique identifier" country "Country name" party "Political party in power" imports "Imports as % of GDP" exports "Exports as % of GDP" PU/DSS/OTR . type: rename [old name] [new name] rename rename rename rename rename var1 var2 var3 var4 var5 id country party imports exports After Adding/changing variable labels.Renaming variables and adding variable labels Before Renaming variables.

type: label define label1 1 “Agree” 2 “Disagree” 3 “Do not know” Setp 2: Assign that label to a variable with those categories using label values: label values var1 label1 If another variable has the same corresponding categories you can use the same label. type label values var2 label1 Verify by running frequencies for var1 and var2 (using tab) If you type labelbook it will list all the labels in the datafile.Assigning value labels Adding labels to each category in a variable is a two step process in Stata. NOTE: Defining labels is not the same as creating variables PU/DSS/OTR . Step 1: You need to create the labels using label define.

type generate [newvar] = [expression] … results for the first five students… generate score2 = score/100 generate readnews2 = readnews*4 You can use generate to create constant variables. “ + first label variable fullname “Student full name” browse id fullname last first PU/DSS/OTR . For example: … results for the first five students… generate x = 5 generate y = 4*15 generate z = y/x You can also use generate with string variables. For example: … results for the first five students… generate fullname = last + “.Creating new variables To generate a new variable use the command generate (gen for short).

19 11 30 Percent 63. 83. tab fem_grad fem_grad 0 1 Total . 25 5 30 Percent 83. tab gender status Gender Female Male Total Student Status Graduate Undergrad 5 10 15 10 5 15 Freq. tab age gender Age Total 15 15 30 Gender Female 4 3 1 2 1 0 0 1 1 1 0 1 0 15 Male 1 2 1 1 1 1 1 3 0 2 1 0 1 15 Total 5 5 2 3 2 1 1 4 1 3 1 1 1 30 18 19 20 21 25 26 28 30 31 33 37 38 39 Total PU/DSS/OTR .00 Cum. | = or .Creating variables from a combination of other variables To generate a new variable as a conditional from other variables type: generate newvar=(var1==1 & var2==1) generate newvar=(var1==1 & var2<26) NOTE: & = and.33 100.00 .67 100.67 100.33 16. tab fem_less25 Freq. 63.00 Cum. gen fem_less25=(gender==1 & age<26) . gen fem_grad=(gender==1 & status==1) .33 36.00 fem_less25 0 1 Total .33 100.

33 76.00 90.33 40.33 3.00 Cum.00 93. .Recoding ‘age’ into three groups. 5 5 2 3 2 1 1 4 1 3 1 1 1 30 Percent 16.67 10.67 60.33 3.67 100.33 13.33 10.33 100.00 Cum. 10 9 11 30 Percent 33.Use recode command.33 63.33 3.67 6.00 50.67 33.67 100.67 3. tab agegroups RECODE of age (Age) 18 to 19 20 to 29 30 to 39 Total Freq.The new variable is called ‘agegroups’: .67 80.00 56.00 36.00 63.33 3. generate(agegroups) label(agegroups) 3.33 96.00 6.00 PU/DSS/OTR .00 3.).1. type Type help recode for more details recode age (18 19 = 1 “18 to 19”) /// (20/29 = 2 “20 to 29”) /// (30/39 = 3 “30 to 39”) (else=. tab age Age 18 19 20 21 25 26 28 30 31 33 37 38 39 Total Freq...00 Recoding variables 2. 16. 33..33 30.67 16.33 100.

67 100.33 30. at (break1. at(18. etc. group(3) .00 36. group(# of groups) .00 For more details and options type help egen PU/DSS/OTR .00 Cum. 20. the second starts at 20 and ends before 30. break3.67 100. 33. the third starts at 30 and ends before 40.33 100. tab agegroups2 agegroups2 18 20 30 Total Freq. egen agegroups3=cut(age).33 63.00 You could also use the option group.33 63. which specifies groups with equal frequency (you have to add value labels: egen newvariable = cut (oldvariable).33 30.00 Cum.Recoding variables using egen You can recode variables using the command egen and options cut/group.33 100.00 36. Below we type four breaks. 10 9 11 30 Percent 33. The first starts at 18 and ends before 20. 30. break2. 10 9 11 30 Percent 33. egen newvariable = cut (oldvariable). . egen agegroups2=cut(age).) Notice that the breaks show ranges. 33. 40) . tab agegroups3 agegroups3 0 1 2 Total Freq.

00 Cum.67 30.00 After .00 36.67 100.67 66. tab gender Gender F M Total Freq.00 23.00 16.00 100. 15 15 30 Percent 50.00 100.00 100.33 10.67 90. 6 5 9 7 3 30 Percent 20.00 Cum.00 23.00 23.00 16.00 100. 20.00 100.00 50.00 replace read = .00 replace gender = "F" if gender == "Female" replace gender = "M" if gender == "Male" You can also do: replace var1=# if var2==# PU/DSS/OTR .33 10. 6 5 9 7 3 30 Percent 20.00 100.67 30. 6 5 9 10 30 Percent 20.67 30.67 90.00 100. Total Before .33 10. if read>5 3 4 5 .00 36.Changing variable values (using replace) Before .00 100. Total Before .00 16.67 66. missing Newspaper readership (times/wk) Freq. tab read Newspaper readership (times/wk) 3 4 5 6 7 Total After .00 Cum.00 50.00 33. tab read. tab gender Gender Female Male Total Freq.67 66.00 100. 50.00 100.67 90.00 36. 15 15 30 Percent 50.00 Freq. missing Newspaper readership (times/wk) Freq.00 Freq.33 100.67 66. tab read Newspaper readership (times/wk) 3 4 5 6 7 Total After . 50.00 Cum. tab read. 20.00 Cum.00 36.00 16.00 Cum. 20.00 replace read = . if inc==7 3 4 5 6 . 20. 6 5 9 7 3 30 Percent 20.67 30.

ats."[. 3.596544 ACGETYF. replace .Extracting characters from regular expressions To remove strings from var1 use the following command gen var2=regexr(var1.345 NYSE. 9. AFM.12 var2 12333 2144 2312 3567754 35457 34234 234212 23146 31231 345 12 To extract strings from a combination of strings and numbers gen var2=regexr(var1. 8. 4. 10. 3. list var1 var2 var1 1. list var1 var2 var1 1. 7. 2."[.htm PU/DSS/OTR . 6.1235 var2 AFM ADGT ACDET CDFGEEGY ACGETYF More info see: http://www.edu/stat/stata/faq/regex. 5.ucla.1234564 CDFGEEGY.\}\)\*a-zA-Z]+".123 ADGT."") destring var2. 5.0-9]+"."") . 11. 123A33 2144F 2312A 3567754G 35457S 34234N 234212* 23146} 31231) AFN. 4. 2.2345 ACDET.

‘idall’ is equal to ‘id’ Using _N you can also create a variable with the total number of cases in your dataset: Check the results in the data editor: PU/DSS/OTR .Indexing: creating ids Using _n. you can create a unique identifier for each case in your data. type Check the results in the data editor.

For example by major. PU/DSS/OTR . First we have to sort the data by the variable on which we are basing the id (major in this case).Indexing: creating ids by categories Check the results in the data editor: We can create ids by categories. Then we use browse to check the two variables. Then we use the command by to tell Stata that we are using major as the base variable (notice the colon).

You can create forward values with _n: gen for1_year=year[_n+1] gen for2_year=year[_n+2] You can also use the “F” operand (with tsset) gen f1_year=F1.edu/training/TS101.pdf PU/DSS/OTR . gen lag1_year=year[_n-1] gen lag2_year=year[_n-2] A more advance alternative to create lags uses the “L” operand within a time series setting (tsset command must be specified first): tsset year time variable: delta: year.You can create lagged values with _n .Indexing: lag and forward values ----.year gen f2_year=F2.year ----. 1980 to 2009 1 unit gen l1_year=L1.year gen l2_year=L2.year NOTE: Notice the square brackets For times series see: http://dss.princeton.

Check the results in the data editor: NOTE: You could get the same result without sorting by using egen and the max function PU/DSS/OTR . create a variable with the highest SAT value in the sample. Check the results in the data editor: You can create a variable based on one value of another variable. For example.Indexing: countdown and specific values Combining _n and _N you can create a countdown variable.

Here are some examples: PU/DSS/OTR .Sorting Before sort var1 var2 … After gsort is another command to sort data. Use +/. The difference between gsort and sort is that with gsort you can sort in ascending or descending order. while with sort you can sort only in ascending order.to indicate whether you want to sort in ascending/descending order.

you can use this format to indicate a list so you do not have to type in the name of all the variables PU/DSS/OTR .Deleting variables Use drop to delete variables and keep to keep them Before After Or Notice the dash between ‘total’ and ‘readnews2’.

& = “and” For more details type help keep or help drop. you can keep options you want keep if var1==1 keep if age<40 keep if country==7 | country==13 keep if state==“New York” | state==“New Jersey” | = “or”. for example drop if var1==1 /*This will drop observations (rows) where gender =1*/ drop if age>40 /*This will drop observation where age>40*/ Alternatively.Deleting cases (selectively) You can drop cases selectively using the conditional “if”. PU/DSS/OTR .

Merge/Append

Please check this document: http://dss.princeton.edu/training/Merge101.pdf

PU/DSS/OTR

**Merging fuzzy text (reclink)
**

RECLINK - Matching fuzzy text. Reclink stands for ‘record linkage’. It is a program written by Michael Blasnik to merge imperfect string variables. For example Data1 Princeton University Data2 Princeton U

Reclink helps you to merge the two databases by using a matching algorithm for these types of variables. Since it is a user created program, you may need to install it by typing ssc install reclink. Once installed you can type help reclink for details As in merge, the merging variables must have the same name: state, university, city, name, etc. Both the master and the using files should have an id variable identifying each observation. Note: the name of ids must be different, for example id1 (id master) and id2 (id using). Sort both files by the matching (merging) variables. The basic sytax is: reclink var1 var2 var3 … using myusingdata, gen(myscore) idm(id1) idu(id2) The variable myscore indicates the strength of the match; a perfect match will have a score of 1. Description (from reclink help pages):

“reclink uses record linkage methods to match observations between two datasets where no perfect key fields exist -essentially a fuzzy merge. reclink allows for user-defined matching and non-matching weights for each variable and employs a bigram string comparator to assess imperfect string matches. The master and using datasets must each have a variable that uniquely identifies observations. Two new variables are created, one to hold the matching score (scaled 0-1) and one for the merge variable. In addition, all of the matching variables from the using dataset are brought into the master dataset (with newly prefixed names) to allow for manual review of matches.”

PU/DSS/OTR

Graphs: scatterplot

Scatterplots are good to explore possible relationships or patterns between variables and to identify outliers. Use the command scatter (sometimes adding twoway is useful when adding more graphs). The format is scatter y x. Below we check the relationship between SAT scores and age. For more details type help scatter . twoway scatter sat age twoway scatter sat age, mlabel(last)

2400

2400

DOE15

2200

2200

DOE11

DOE16 DOE28

DOE29 DOE01 DOE10

SAT 1800 2000

SAT 2000 1800

DOE05 DOE02 DOE26 DOE30 DOE08 DOE04 DOE21 DOE12 DOE25 DOE03 DOE19 DOE13 DOE17 DOE22 DOE20 DOE09 DOE23 DOE27 DOE07 DOE06 DOE18 DOE24

1600

1400

1400

1600

DOE14

20

25 Age

30

35

40

20

25 Age

30

35

40

**twoway scatter sat age, mlabel(last) || lfit sat age
**

2400

DOE15 DOE29 DOE01 DOE10

**twoway scatter sat age, mlabel(last) || lfit sat age, yline(30) xline(1800)
**

2400

DOE15

2200

DOE11 DOE28

2200

DOE16

DOE11 DOE28

DOE16

DOE29 DOE01 DOE10

2000

2000

DOE02 DOE26 DOE30 DOE25 DOE03 DOE19 DOE13

DOE05 DOE24

DOE02 DOE26 DOE30 DOE25 DOE03 DOE19 DOE13

DOE05 DOE24

1800

DOE08 DOE04 DOE21 DOE12 DOE14

1800

DOE18

DOE17 DOE22

DOE08 DOE04 DOE21 DOE12 DOE14

DOE17 DOE22

1600

DOE18 DOE20

DOE20 DOE09 DOE23 DOE27 DOE07 DOE06

1600

1400

DOE09

DOE23 DOE27 DOE07

DOE06

20

25 Age SAT

30 Fitted values

35

40

1400

20

25 Age SAT

30 Fitted values

35

40

PU/DSS/OTR

edu/~otorres/Stata/ for additional tips PU/DSS/OTR .Graphs: scatterplot By categories twoway scatter sat age. mlabel(last) by(major.princeton. total) 1000 1500 2000 2500 Econ DOE15 DOE30 DOE08 DOE25 DOE21 DOE12 DOE11 DOE28 DOE02 DOE04 DOE14 DOE09 DOE07 Math DOE16 DOE05 DOE19 DOE27 DOE17 DOE18 DOE06 1000 1500 2000 2500 Politics DOE29 DOE01 DOE10 DOE26 DOE03 DOE24 DOE13 DOE22 DOE23 DOE15 DOE11 DOE28 DOE02 DOE26 DOE30 DOE08DOE03 DOE04DOE25 DOE21 DOE12 DOE20 DOE14 DOE09 DOE07 Total DOE16 DOE19 DOE13 DOE22 DOE23 DOE27 DOE29 DOE01 DOE10 DOE05 DOE24 DOE17 DOE18 DOE20 DOE06 20 25 30 35 40 20 25 30 35 40 Age SAT Graphs by Major Fitted values Go to http://www.

Graphs: histogram Histograms are another good way to visually explore data. frequency histogram age. Type help histogram for details. frequency normal 10 15 Frequency 5 Frequency 0 20 25 Age 30 35 40 0 5 10 15 20 25 Age 30 35 40 PU/DSS/OTR . especially to check for a normal distribution. histogram age.

33 Politics 1 10.00 50.33 2 22.64 70.33 Total 10 100.33 100.00 16.33 100.00 100.00 6.00 36.00 30.67 3 33.00 6.33 30.00 11 100.33 4 44.00 33.00 3.00 frequency 4 4 4 3 20 to 29 2 2 2 2 1 30 to 39 0 Econ Math Politics Total Econ Math Politics Econ Math Politics 18 to 19 20 to 29 30 to 39 Note: Numbers correspond to the frequencies in the table.33 7 5 Econ 4 40.44 40.22 20.33 9 100.00 23.33 2 18. PU/DSS/OTR . Since it is a user defined program you have to install it typing: ssc install catplot tab agegroups major.18 20. blabel(bar) .00 6.67 10 33.00 33.00 2 18.33 10 33. tab agegroups major.67 36.67 30 100.00 33.00 100.00 33. col row cell catplot bar major agegroups.00 10.Graphs: catplot To graph categorical data use catplot.00 10. col row cell Key 8 6 frequency row percentage column percentage cell percentage RECODE of age (Age) 18 to 19 Major Math 5 50.67 7 63.67 10 33.00 13.18 20.00 13.33 100.00 40.00 30.33 33.

00 7 63.00 Total 10 100.00 10.00 36.00 3 33.00 2 22. percent(major) blabel(bar) PU/DSS/OTR .1818 18.00 11 100.00 2 18.00 10 33.00 30.00 2 18.33 30. col row 22.2222 18.00 10 33.33 100.00 33.18 20.00 10 33. percent(agegroups) blabel(bar) 63.33 100.00 50.18 20.00 18 to 19 20 to 29 30 to 39 18 to 19 40 40 20 Econ 20 to 29 30 to 39 20 to 29 Column % 50 30 18 to 19 30 to 39 Math 20 to 29 30 to 39 20 Total 18 to 19 10 20 70 Politics 20 to 29 30 to 39 0 20 40 percent of category 60 80 catplot hbar agegroups major.4444 40 33.3333 .00 100.00 10 0 Econ Math Politics Econ Math Politics Econ Math Politics Econ 4 40.33 9 100.00 40.44 40.Graphs: catplot catplot bar major agegroups.1818 frequency row percentage column percentage RECODE of age (Age) 18 to 19 Major Math 5 50.22 20.64 70.33 100. tab Key agegroups major.6364 60 Row % 50 percent of category 20 40 44.00 Politics 1 10.00 4 44.67 30 100.

00 agegroups major.00 2 100.00 2 50.6667 62.33 0 0.57 7 100.67 15 100.67 0 0.67 18 to 19 Math Politics 18 to 19 Math Politics 16.67 4 26.50 2 25.5714 15 100.00 5 33.00 Major Math 0 0.00 0 0.67 1 33.00 3 100. percent(major blabel(bar) by(gender) 40 60 80 100 0 20 40 60 80 100 percent of category Graphs by Gender PU/DSS/OTR .00 30 to 39 Math Politics 0 20 30 to 39 Math Politics 83.3333 12.00 Econ Female 66.00 Total 3 20. bysort gender: tab -> gender = Female RECODE of age (Age) 18 to 19 20 to 29 30 to 39 Total Econ 2 66.50 1 12.5714 Politics 1 16.86 2 28.33 6 100.00 2 100.8571 100 20 to 29 Math Politics 20 to 29 Math Politics Econ Econ 25 50 28.00 Total 7 46.57 3 42.00 20 to 29 Math Politics 20 to 29 Math Politics Econ Econ 2 2 2 30 to 39 Math Politics 0 1 2 30 to 39 Math Politics 5 3 4 5 0 1 2 3 4 5 frequency Graphs by Gender Percentages by major and gender -> gender = Male RECODE of age (Age) 18 to 19 20 to 29 30 to 39 Total Econ 2 28.00 8 100.00 5 83.00 2 50. blabel(bar) by(gender) Raw counts by major and gender Female Econ 2 5 Male Econ 2 .00 Econ 1 1 2 Econ 3 2 Politics 0 0.33 7 46.3333 catplot gender) hbar major agegroups.6667 Econ 33. col nokey 18 to 19 Math Politics 18 to 19 Math Politics 1 Major Math 5 62.5 50 Econ 42.67 4 26.00 4 100.5 Male Econ 28.Graphs: catplot catplot hbar major agegroups.

3333 gender and major Female.2 80. Female.1667 85.3 19.5 Male.8 0 20 40 Age Newsp read 60 Score 80 PU/DSS/OTR . over(gender) over(studentstatus.1 Female 5. Politics 26. blabel(bar) by(. label(labsize(small))) blabel(bar) title(Student indicators) legend(label(1 "Age") label(2 "Score") label(3 "Newsp read")) Undergraduate Male 4.9 81.5 graph hbar (mean) age (mean) averagescoregrade. Econ 25.75 84. Math 23 79 Female. title(gender and major)) by(gender major. Math 23 83 Male.2 graph hbar (mean) age averagescoregrade newspaperreadershiptimeswk.3667 0 20 40 60 80 mean of age Graphs by Gender and Major mean of averagescoregrade Student indicators 31.8571 78.4 Female 5 Graduate 31. Politics 30. If you do not want to type you can go to ‘graphics’ in the menu.4 78 Male 3. Econ 19 70. total) 0 20 40 60 80 0 20 40 60 80 Total 25.7143 Male.Graphs: means Stata can also help to visually present summaries of data.8 83.1 80.1 19.

67 100. Using tab1 (for multiple frequencies) you can check that they are all 0 and 1 values -> tabulation of major_dum1 major==Econ 0 1 Total Freq.67 100.00 Cum.33 33.Creating dummies You can create dummy variables by either using recode or using a combination of tab/gen commands: tab major.67 100. 66. tab major.00 -> tabulation of major_dum3 major==Poli tics 0 1 Total Freq. 20 10 30 Percent 66.67 33.00 PU/DSS/OTR Cum.00 .33 33.33 100. 66. 33. at the end you will see three new variables.00 Cum. 20 10 30 Percent 66.33 100.33 100.00 . tab1 major_dum1 major_dum2 major_dum3 Check the ‘variables’ window. 10 10 10 30 Percent 33. generate(major_dum) Major Econ Math Politics Total Freq. 66.67 33. generate(major_dum) .33 100.67 100. 20 10 30 Percent 66.33 66.00 Cum.00 -> tabulation of major_dum2 major==Math 0 1 Total Freq.67 33.

00 PU/DSS/OTR . generate(agegroups_dum) .00 36.33 63. 10 9 11 30 Percent 33.67 100.00 Cum.) . 20 10 30 Percent 66. 66.67 100.00 -> tabulation of agegroups_dum2 agegroups== 20 to 29 0 1 Total Freq. 33.Here is another example: tab agregroups.00 Creating dummies (cont.33 30.00 Cum. 70. tab1 agegroups_dum1 agegroups_dum2 agegroups_dum3 Check the ‘variables’ window. at the end you will see three new variables.00 100. 21 9 30 Percent 70. 19 11 30 Percent 63.33 100. 63.00 -> tabulation of agegroups_dum3 agegroups== 30 to 39 0 1 Total Freq.00 100.00 Cum. tab agegroups.33 100.33 36.00 Cum. Using tab1 (for multiple frequencies) you can check that they are all 0 and 1 values -> tabulation of agegroups_dum1 agegroups== 18 to 19 0 1 Total Freq.00 30.67 33. generate(agegroups_dum) RECODE of age (Age) 18 to 19 20 to 29 30 to 39 Total Freq.33 100.67 100.

ucla.htm Data manipulation generate replace egen recode rename drop keep sort encode decode order by reshape Formatting format label Keeping track of your work log notes Convenience display PU/DSS/OTR .edu/stat/stata/notes2/commands.Frequently used Stata commands Category Getting on-line help help search Operating-system interface pwd cd sysdir mkdir dir / ls erase copy type Using and saving data from disk use clear save append merge compress Inputting data into Stata input edit infile infix insheet The Internet and Updating Stata update net ado news Stata commands Basic data reporting describe codebook inspect list browse count assert summarize Table (tab) tabulate Type help [command name] in the windows command for details Source: http://www.ats.

stata.edu/stat/stata/webbooks/logistic/chapter3/statalog3.com/support/faqs/stat/xtreg.princeton.htm PU/DSS/OTR .ucla.pdf Times series: dfueller test for unit roots (for R and Stata) http://www.edu/stat/stata/webbooks/reg/chapter2/statareg2.ucla.econ.com/support/faqs/stat/xt.stata.html http://dss.Is my model OK? (links) Regression diagnostics: A checklist http://www.uiuc.edu/online_help/analysis/panel.stata.com/support/faqs/stat/panel.nyu.html http://www.htm Times series diagnostics: A checklist (pdf) http://homepages.edu/~mrg217/timeseries.ats.html Panel data tests: heteroskedasticity and autocorrelation – – – – http://www.ats.edu/~econ472/tutorial9.htm Logistic regression diagnostics: A checklist http://www.html http://www.

ucla.edu/stat/stata/library/odds_ratio_logistic.htm How to create dummies http://www.ucla.edu/STAT/stata/webbooks/reg/default.ats.com/support/faqs/data/dummy.html http://www.edu/stat/AnnotatedOutput/default.I can’t read the output of my model!!! (links) Data Analysis: Annotated Output http://www.ats.edu/stat/stata/topics/regression.ucla.htm PU/DSS/OTR .stata.htm Logit output: what are the odds ratios? http://www.ucla.edu/stat/dae/ Regression with Stata http://www.ats.ats.htm Data Analysis Examples http://www.ucla.htm How to interpret dummy variables in a regression http://www.edu/stat/stata/faq/dummy.edu/stat/Stata/webbooks/reg/chapter3/statareg3.ucla.htm Regression http://www.ucla.ats.ats.ats.

html Introductory Statistics: Concepts.edu/~statmath/stat/all/ttest/ PU/DSS/OTR .html Stata Library.missouristate. Models.ats. and Applications http://www.ucla. Graph Examples (some may not work with STATA 10) http://www.htm Statistical Data Analysis http://math.edu/badie/statdataanalysis.statsoft.nicholls.com/textbook/stathome.edu/garson/pa765/statnote.ncsu.psychstat.chass.ats.edu/stat/mult_pkg/whatstat/default.indiana.htm Comparing Group Means: The T-test and One-way ANOVA Using STATA. and SPSS http://www.edu/introbook/sbk00. by G.htm Elementary Concepts in Statistics http://www. David Garson http://www2.edu/STAT/stata/library/GraphExamples/default. SAS.ucla.Topics in Statistics (links) What statistical analysis should I use? http://www.htm Statnotes: Topics in Multivariate Analysis.

edu/GStat/docs/StataIntro.Useful links / Recommended books • • • • • • DSS Online Training Section http://dss. New York : Radius Press. 2007. Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King.J.” http://fmwww. 1989 Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam Kachigan.ucla. 1994.princeton. its key features and benefits. 2nd ed. Thomson Books/Cole. and other useful information.. Sidney Verba. Keohane. Cambridge .. Princeton University Press. 2008. Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King. Mark W.edu/dss Books • • • • • • Introduction to econometrics / James H.ats.princeton.edu/training/ UCLA Resources to learn and use STATA http://www. Jennifer Hill. 2007. : Prentice Hall. Watson. USA. 2006 PU/DSS/OTR • . “A 67-page description of Stata.htm Introduction to Stata (PDF). Upper Saddle River. Econometric analysis / William H.com/support/faqs/ Princeton DSS Libguides http://libguides. Cambridge University Press. 6th ed. Greene. Christopher F. Stock. New York : Cambridge University Press. Baum.edu/stat/stata/ DSS help-sheets for STATA http://dss/online_help/stats_packages/stata/stata. Boston College.bc. Boston: Pearson Addison Wesley. c1986 Statistics with Stata (updated for version 9) / Lawrence Hamilton.pdf STATA FAQ website http://stata. N. Data analysis using regression and multilevel/hierarchical models / Andrew Gelman. Robert O.

