You are on page 1of 7

QS104 - Introduction to Social

Analytics I: Worksheet Week 7

Dr Florian Reiche
Department of Politics and International Studies
University of Warwick
F.Reiche@warwick.ac.uk

Descriptive Statistics
We have covered quite a large number of descriptive statistics, so far. These are:

• Mean • Standard Deviation


• Median • Variance
• Mode • Quartiles and Percentiles
• Range • Interquartile Range

They are a lot of effort to calculate by hand, especially for larger data sets, but Stata can do
all of these in one go in a matter of seconds. All you need is the following command:

summarize var, detail

This is what the output would look like for an example variable called age
All descriptives but the mode are directly obvious from this output. There are two additional
descriptives, however, which we have not covered so far – skewness and kurtosis.
Skewness is called the third moment of distribution (in case you wonder, the first two moments
are mean and variance). If it has a positive value, then the distribution is positively skewed,
if it is negative, then the distribution is negatively skewed. Sounds a bit cryptic? We have
covered this indirectly in week 4, already, when I talked about the shape of distributions in
the lecture. Maybe you remember the following figure:

Kurtosis meanwhile, is the fourth moment of a distribution and describes the size of the peak.
A normal distribution has a kurtosis of 3. If the kurtosis is > 3, then the distribution is higher
and slimmer. If the kurtosis is < 3, then the distribution is wider and does not peak as high.
If we want to find out the mode, or more generally about the frequencies in individual
categories of the variable, we can tabulate it with the following command:

tabulate var

For our example of age this would look as follows:

Page 2 of 7
Now that you have all the relevant tools at hand, complete the following tasks:
1. Generate descriptive statistics for 5 of your variables.
2. Start to build your presentation and think how you would accomodate these descriptive
statistics.
3. Recode two likert-scale variables into binary variables.
4. Produce tabulations for these binary variables and include them in the presentation.

Graphs
Stata has a very powerful tool to create graphs. To start with, I would recommend to use the
drop-down menus, as the syntax for graphs can get very involved. That being said, once you
have created the graph and Stata displays the corresponding syntax, read through it and try
to understand the logic (also make sure you copy it into your do-file). If you want to delve
deeper into the syntax of graphs, or see what is possible, I strongly recommend the book by
Lawrence Hamilton (on the reading list), chapter 3. Here are some sample commands for the
most frequently used plot types, however, to get you started:
• Histograms: histogram
• Two-variable plots: graph twoway, for example:
– Line graph: graph twoway line y x
– Scatter plot: graph twoway scatter y x
• Pie charts: graph pie
• Bar charts: graph bar
• Box Plots: graph box

Page 3 of 7
Using these commands, and moving beyond with the help of today’s reading, complete the
following tasks:
1. Produce three graphs of different types (line graph, pie chart, etc.) for separate variables
in your data set.
2. Again, incorporate these into your presentation for next week. Decide whether to
display data in the form of descriptve statistics in tables (from above), or whether you
prefer a graphical depiction.
Tip: If you have a categorical variable, for example the type of accomodation with four
categories, and you want to display the percentages of people falling into each category, you
need to tell Stata to separate the percentages into the groups of the variable. You do this in
a pie chart by typing

graph pie, over(var)

The same applies for a bar chart, for example:

graph bar, over(var)

Captions for Tables and Figures in Word


You will be producing tables and figures for the presentation in week 8. It is good practice to
design tables and figures in such a manner that they can be read and interpreted without
having to read the text. Likewise, you need to write the text in such a manner that the
reader obtains all the necessary information contained in the figure/table “without having to
look at them”.
In any case, you will have to refer to tables and figures in the text. Now, you can do this by
writing “the figure below”. But this is not very elegant. Also, what happens if you change
the layout and all of a sudden “the figure below” becomes “the figure above”. This not only
causes additional work because you have to edit the text and check all references to tables
and figures once you are done (which is tedious beyond description), but there is also the risk
that you miss one or a few in the process.
MS Word has a nifty function that allows you to insert captions for figures and tables, and
then to insert cross-references into the text which get updated automatically before you send
the document to the printer. Here is how to do it:
Say, you have a figure inserted into Word. You now click on it, then hover over the bottom
right-hand square, and right-click with your mouse. From the resulting context menu you
select “Insert Caption”.

Page 4 of 7
This results in the following window:

Select whether the item you want to describe is a figure, or a table. Then make sure you
place the caption “below” the item (this is default). Then type your caption into the box at
the top, such as “Figure 1: Skewness of Distributions”. Make sure the caption is telling. The
reader needs to know from the caption what the figure or table is about. When you click OK,
the document looks like this:

Page 5 of 7
Now you start writing the text and come to the point where you refer to the figure in question.
Here, all you have to do is to select “Insert” and “Cross-Reference”” . . .

. . . and select the following options in the pop-up window:

Page 6 of 7
Your text will then look like this:

You don’t have to worry now about the sequence of numbering any more. If you insert
another figure above this one, and insert a cross-reference in the text again, the sequence
is automatically updated and our former “Figure 1” becomes “Figure 2”. Tables and figures
have separate sequences of numbering.
One last word on the display of data in tables: DO NOT screenshot tables from Stata and
insert them into yur assessment or the presentations. They look ugly and unprofessional.
Make the effort and create a proper table, either in Word or Excel and populate it manually
with the data from Stata. The insertion of captions and cross-references is the same as
described above.

Page 7 of 7

You might also like