You are on page 1of 3

Problem Set #1

10 points
DUE DATE: Saturday following end of Module 2: Check D2L for actual due date

Goal: Students will use SPSS and Excel to generate descriptive statistics, create frequency distributions,
and interpret these results using a set of data that was created to study the relationship between the use of
anticonvulsants, size of a treatment center, and the number of convulsions a patient had. Students will
calculate incidence and prevalence rates and compare rates between groups.

This is a two-part assignment. Part 1 includes Module 1 concepts. Part 2 includes Module 2 concepts.

Materials (Uploaded to D2L):


1. SPSS software (on the MSU Virtual desktop).
2. Microsoft Excel
3. SPSS demonstration videos
4. Excel demonstration videos
5. SPSS datafile (anticonvulsants.sav)
6. Excel database (anticonvulsants.xlsx)

Part 1 Instructions:

1. 3 POINTS: Calculate descriptive statistics using SPSS and Excel using the following variable:
convulsions. Calculate mean, median, mode, and standard deviation using Excel and SPSS. Also
calculate the quartiles using SPSS (remember Analyze  Frequencies  Statistics. Copy and paste
the appropriate tables from SPSS and Excel below. How do your results compare from each
technique? Use the appropriate statistics to calculate a 5-point summary.

ANSWER BELOW:
Fre que nc ie s

S tatis tics
Number of convulsions
N Valid 3390
Missing 0
Mean 5.03
Median 3.00
Mode 2
Std. Deviation 5.459
Range 85
Minimum 0
Maximum 85
Percentiles 25 2.00
50 3.00
75 6.00

This table is from SPSS – Analyze – Frequencies – Statistics: Should also include the minimum,
maximum, and range so that you could calculate the 5-point summary.
Convulsions: Actual Formula Used:
Mean 5.03 =AVERAGE(I2:I3391)
Median 3.00 =MEDIAN(I2:I3391)
Mode 2.00 =MODE.SNGL(I2:I3391)
Standard Deviation 5.46 =STDEV.S(I2:I3391)
This table is from Excel. I included the actual formulas you would type into the cells to get the mean,
median, mode, and standard deviation. For example, where 5.03 is you would have entered the
formula =AVERAGE(I2:I3391).

You can see that the values from Excel and SPSS are the same – which is what you would expect to
get if you are using the correct formulas in SPSS.

5-point summary: Remember here there are 5 data points we are interested in (make sense huh? 5-
point summary). We are interested in Q0 (min), Q1 (25th%ile), Q2(median – 50th%ile), Q3 (75th%ile),
Q4 (max). So based on the SPSS output above, the minimum is 0, the maximum is 85, 25 th%ile is 2,
50th%ile is 3, 75th%ile is 6. So the 5-point summary is 0, 2, 3, 6, 85. To go even further, the IQR
(interquartile range) is Q3-Q1 = 6-2 = 4. One step more and we can find outliers to our data. First we
multiple the IQR by 1.5, so 4 X 1.5 = 6. Then we subtract it from Q1, and add it to Q3. So Q1-6 = 2-6
= -4, and Q3+6 = 6+6 = 12. So, any numbers below -4 and above 12 are outliers. You can’t have any
number of convulsions less than 0 so there are not lower outliers, but because the maximum number
is 85, which is above 12, and so we know we have at least this value as an outlier. When we do
graphic displays soon, you’ll see graphical displays of outliers.

2. 2 POINTS: Create a frequency table in SPSS and Excel using the following variable: center_size.
Copy and paste the appropriate tables from SPSS and Excel below. How do your results compare
from each technique?

ANSWER BELOW:
Fre que nc ie s

S tatis tics
Center size
N Valid 3390
Missing 0

Ce nte r s ize
Cumulative
Frequency Percent Valid Percent Percent
Valid Small 594 17.5 17.5 17.5
Medium 1320 38.9 38.9 56.5
Large 1476 43.5 43.5 100.0
Total 3390 100.0 100.0

Table from SPSS – you can see the number (frequency) and valid percent for each category.

Percent
1 594 0.175221239
2 1320 0.389380531
3 1476 0.43539823
0
Sum 3390
From Excel: In the column that has “center-size” you enter 1,2, 3 in separate rows (you know there
are only three codes because SPSS had this for the variable but you could also scroll through the
entire column and see that it is only coded, 1, 2, or 3). Then you select all the cells at the same time
that have 594, 1320, and 1476 in them (which would be blank at the time because there is no formula)
and enter in the formula =FREQUENCY(B2:B3391,B3393:B3395). The first part of this (B2:B3391)
is the data array (all the data in the column), while the second part is the bins array (B3393:B3395 –
where you entered 1, 2, and 3 as the code values). Once you enter the formula and hit enter, it spits
out the numbers 594, 1320, and 1476, which correspond to codes 1, 2, and 3, respectively. The you
calculate the percent for each frequency (e.g. the percent for 594 = 594/3390, where the formula in
the cell with 0.175 is =C3393/$C$3397).

Again, the results from SPSS and Excel should agree if you did everything correctly.

Part 2 Instructions:

3. 5 POINTS: The National Institute on Drug Abuse makes data available on prevalence and incidence
of drug use in the United States. Trends in prevalence of drug use over time among 8th, 10th, and
12th graders is based on data available here: https://www.drugabuse.gov/trends-statistics/monitoring-
future/monitoring-future-study-trends-in-prevalence-various-drugs.

Answer the following questions below:

 What do these data tell us about prevalence and incidence (consider what these concepts mean in
your answer)?
o The data give us prevalence in drug use at different time points. They do not tell us anything
about new drug use over time, so this is not incidence. These data tell us about one point in
time when students in each grade were surveyed. Data from one point in time is prevalence.
We can use these data to compare the rates of use between groups.

 Compare the reported rates (using a ratio) of Marijuana use between 12 th graders and 8th graders
in 2019 and interpret what this number means.
o 12th graders: 35.7% report using marijuana in the last year
o 8th graders: 11.8% report using marijuana in the last year
o Comparison of 12th graders to 8th graders: 0.357/0.118 = 3.02. This means that 12th graders are
3 times more likely to report marijuana use in the past year compared to 8 th graders. This is an
increase in risk of about 200% (3.02 - 1 = 2, 2*100 = 200%)

 Compare the reported rates (using a ratio) of Marijuana use between 12 th graders and 10th graders
in 2019 and interpret what this number means.
o 12th graders: 35.7% report using marijuana in the last year
o 10th graders: 28.8% report using marijuana in the last year
o Comparison of 12th graders to 10th graders: 0.357/0.288 = 1.24. This means that 12th graders
are about 1.24 times more likely to report marijuana use in the past year compared to 8 th
graders. This is an increase in risk of about 24% (1.24 - 1 = 0.24, 0.24*100=24%).

You might also like