You are on page 1of 2

The 'auto' data set has been included with Stata for many, many years.

It contains information
about 1978 cars. Every Stata user has access to it so it is frequently used for examples, as we'll
use it today. To load it, type:

sysuse auto

Normally the use commands loads data from disk into memory. The sysuse command is a
variation of the normal use command which loads data that was installed with Stata. You'll
probably never use it for anything other than this data set. (There's also a webuse command that
opens example data sets from Stata's web site.) To see what's in the data set, type:

browse

or click the button that looks like a magnifying glass over a spreadsheet. This opens Stata's Data
Editor, which shows you your data set in a spreadsheet-like form, in browse mode. You can also
invoke the Data Editor by typing edit or clicking the button that looks like a pencil writing in a
spreadsheet, and then it will allow you to make changes. You might use edit mode for data entry,
but since you should never change your data interactively get in the habit of using browse mode
so you don't make changes by accident.

Observations and Variables


A Stata data set is a matrix, with one row for each observation and one column for each variable.
This raises the question "What is an observation in this data set?" The values of the make
variable suggests they are cars, but are they individual cars or kinds of cars? The fact that there is
just one row for each value of make suggests kinds of cars. We'll discuss this much more in Data
Wrangling in Stata, but you should always know what an observation is in your data set.

Variable Types
The variable make contains text or, as Stata calls them, "strings" (as in strings of characters).
Obviously you can't do math with text, but Stata can do many other useful things with string
variables.

Variables like price and mpg are continuous or quantitative variables. They can, in principle,
take on an infinite number of values (though they've been recorded as integers) and represent
quantities in the real world.

The variable rep78 is a categorical variable. It can only take on certain values, or levels. It is an
ordered categorical variable because 5 is better than 4, 4 is better than 3, etc. But they don't
represent actual quantities: a 5 is not five times better than a 1. Other categorical variables are
unordered, and in that case the numbers used to represent the categories are completely arbitrary.

The variable foreign is an indicator or binary or dummy variable. Indicator variables are just
categorical variables with two levels.

You might also like