Stata Tutorial Reading Data and Saving Stata Files Use: The Stata use command reads data
that has been saved in Stata format:
use "C:\data\ncr00.dta" use "C:\data\ncr00.dta", clear save "C:\data\ncr00r.dta" save "C:\data\ncr00.dta", replace
If you already have a Stata file named "ncr00.dta" and wish to save an updated version of the file under the same name, then use the Stata save command with the replace option. This command will destroy the previous version of your file so use the replace option only if you are certain that you will not need the older version of your file. There is no way to retrieve your original file once another file has written over it. How to Increase Memory Sometimes you may need to allocate additional memory for your Stata session, such as when you are working with a large file. If you recieve this message from Stata: no room to add more observations then you should increase the amount of memory available to your Stata session. Here's how. 1. Find out how large the file is. First, issue the clear command to remove the file from memory. Then issue the desc using filename command: desc using ncr00.dta At the top of the information listed is the size of the file, in bytes. There are 1,000 bytes in a kilobyte, and 1,000 kilobytes in a megabyte, so if the size is 11,000 then the file is 11 kilobytes. For example:
Contains data obs: 899,094 vars: 76
3 Jul 2010 08:39
replace log using "C:\data\ir299_01. Stata will automatically append an extension of ". and is just slightly over 91. You will find it helpful to use names that will help you to remember what you did during that session. but you should use names that will help you remember what analyses you did. the following command allocates 12 megabyes of memory to the current Stata session: set memory 100m Set the memory to a number slightly larger than the size of the file you are trying to read. 3.log". particularly when you have many commands to issue repeatedly. Log and Do Files Log files log using "C:\data\ir299_01. Do files are good for long series of commands that may need to be "tweaked" to work properly. You can give the file any name you like. everything displayed on the screen will be recorded in the log file.log" to the filename. issue the command set memory to increase the amount of memory. Any command you use in Stata can be part of a do file.log" log using "C:\data\ir299_01.log".7 megabytes.094 observations.588
This shows that the file has 899. or to reproduce results with minor or no changes. By default. Now read your data file.size:
91. For example. They are also necessary to replicate things that you have done on new or modified datasets. 2. append log close Log and do files are very useful. Do files are very useful. 76 variables. Examining your Data Describe
. Logs keep a record of what commands you have issued and their results during your Stata session. From the Stata dot prompt. A "do" file is a set of commands just as you would type them in one-by-one during a regular Stata session.707.
Since we no longer need all of these variables. we’ll want to eliminate some of the ones we don’t really need and perhaps rename some of the ones we keep. you will need to create new variables based on the ones you have already. you will want to make sure that all the variables are there and that they are in the format you need. in third example. The command: d using "C:\data\ncr00.Once you have the data in Stata. which can be abbreviated as simply "d. Stata will allow you to do most analyses only on numeric data. Keep. Describe. You can do this with the "describe" command. now we’ve created many new variables and converted some old ones. and var3.
. Drop. you will need to convert string data to numeric data. though. var2. and Rename Well. Since Stata allows you to do analyses only on numeric variables." Here are some examples of gen: gen total= var1 + var2 + var3 gen kid014= kids06 + kids714 gen lnwage= ln(wage) In the first example. You don’t have to call the data into Stata to be able to describe it. The two most common ways of creating new variables is by using "generate" and "egen. we generate a variable called “lnwage” which is the log of wage. we generate a new variable called "total" which is simply the addition of var1n .dta" will accomplish this. We can either keep the variable we are interested in: keep if prov==71 drop if prov==72 ren p6 age Creating Variables Stata can store data as either numbers or characters. Generate Often." will provide basic information about the file and the variables.
it will give you the information for all numeric variables. of another variable. median. 0 otherwise. and equal to zero otherwise. In general. This can save you quite a bit of frustration later on. Stata makes this very easy: If you want a dummy variable to indicate only a particular size category: ren p7 sex gen male=(sex==1) ren p9 mstat gen married=(mstat==2 | mstat==5) ren p22 educ gen grader=1 if educ<16 replace grader=2 if (educ==16 | educ==17) replace grader=3 if (educ>=21 & educ<=24) replace grader=4 if (educ==25) replace grader=5 if (educ>=31 & educ<=58) replace grader=6 if (educ>=61 & educ<=71) Here Stata will create a dummy variable such that: male = 1 if the gender is male. check it before doing any analyses.
. If a variable you thought was numeric shows up as having 0 observations and a mean of 0. Extended Generate (egen) "egen". Summarize "sum". or "extended generate" is useful when you need a new variable that is the mean. of the variables listed. etc. If you don’t list any variables. etc. Here are some examples: ren p2h rel gen kids=(rel>=3 & rel<=8) gen kids06=(kids==1 & (age>=0 & age<=6)) bysort hhid: egen hhkids20=sum(kids20) Basic Commands Now that you have your data in a format you want. Egen is also useful when you need to simply number groups of observations based on some classification variables. sd’s.Sometimes we need to generate a "dummy" variable. will give you the means. for all observations or for groups of observations. using this command will make Stata create a dummy variable equal to 1 for each observation where the expression in brackets is true. or variables. then. most likely. Stata still thinks it’s a character variable. short for summarize.
col row tab prov sex. short for tabulate.sum sum age male bysort male: sum age Tabulate "tab". tab grader tab1 grader sex prov tab prov sex tab prov sex. col nofreq bysort prov: tab grader sex
. col tab prov sex. will produce frequency tables. By specifying two variables. you will get a crosstab.