Professional Documents
Culture Documents
Imagine you are a statistician who has been tasked with assisting an investigator to determine if there are
differences in hospital experiences that lead to readmission after a hospitalization. You Are given the
dataset: “Diabetes Readmission Data.sav,” and are responsible for determining the best types of analyses
to do to help the investigator answer their questions.
1. 6 POINTS Before analyzing any data, you need to make sure that you understand just what type of
data you have (i.e. categorical - nominal, binary, ordinal, or interval or ratio). Fill out the following
table (specify the data type and why you chose that type) for each of the following variables:
Variable Data type Explain the reason you chose data type
Race Categorical-nominal Race can be classified as categorical that can be
classified into 5 valid values. As there is no
ordering in the categories and also you cannot
specify them from lowest to highest.
Gender Categorical-nominal Gender is classified as a categorical data because
we categorize gender into Female and Male
depending on the unique qualities or
characteristics of each gender type. In this if any
value is missing then a value is specified that’s
why considered in categorical nominal.
Age Interval As Intervals will be specified to take the input as
specified there will be 10 intervals.
Time in hospital Ratio As it finds the difference in the number of days
between the admission and discharge
Change Categorical-binary To check the change value will be there. (value is
either 0 or 1)
Readmit30Days Categorical-binary To check Patient admitted to hospital within 30
days or not
2. 4 POINTS Using the graphing options in SPSS, choose two appropriate graphical display options to
represent the following variable: time in hospital. One option must show outliers on the graph. Copy
and paste your graphs/charts below and for each chart, explain why you chose that chart option.
ANSWER BELOW:
Below are the two graphs that shows the variable time in hospital. You can choose the chart builder
option in SPSS to draw the below graphs. First is you can draw histogram chart and another chart is
boxplot through which outliers can also be drawn. For values in variable time in hospital histogram
graph is used and to show the outliers in the graph also box plot graph is used. You can also create
line graph in that you can check histogram option to draw using line only as shown below.
3. 1 POINT Describe the shape, location, and spread of the data for the time in hospital variable and
chart you generated above.
4. 1 POINT Calculate a 5-point-summary for the time in hospital variable and describe how the 5-
point-summary relates to the graph in Question 2.
ANSWER BELOW:
Statistics
Length of Stay - Inpatient days between
admission and discharge
N Valid 101766
Missing 0
Mean 4.40
Median 4.00
Mode 3
Minimum 1
Maximum 14
Sum 447362
Percentiles 25 2.00
50 4.00
75 6.00
5-point summary: Remember here there are 5 data points we are interested in (5-point summary).
We are interested in Q0 (min), Q1 (25th%ile), Q2(median – 50th%ile), Q3 (75th%ile), Q4 (max). So
based on the SPSS output above, the minimum is 1, the maximum is 14, 25th%ile is 2, 50th%ile is 4,
75th%ile is 6. So the 5-point summary is 1, 2, 4, 6, 14. To go even further, the IQR (interquartile
range) is Q3-Q1 = 6-2 = 4. One step more and we can find outliers to our data. First we multiple the
IQR by 1.5, so 4 X 1.5 = 6. Then we subtract it from Q1, and add it to Q3. So Q1-6 = 2-6 = -4, and
Q3+6 = 6+6 = 12. So, any numbers below -4 and above 12 are outliers. You can’t have any number
of convulsions less than 0 so there are not lower outliers, but because the maximum number is 14,
which is above 12 and also you can see in above graph that shows outliers.
5. 4 POINTS Using the graphing options in SPSS, choose an appropriate graphical display option to
represent the following variables: race and age. Copy and paste your graphs/charts below and for
each chart (1 for each variable), explain why you chose that chart option.
ANSWER BELOW:
As the age can be defined in the form of intervals so the bar graph is recommended for the age
variable. For the Race variable categories can be defined or plotted in the form of Pie chart and bar
chart also. So all the graphs are shown below.
In case of age count and cumulative frequency any one of them is considered.
Bar Plot
ANSWER BELOW:
As there will be only two groups and that are mutually exclusive. Let one group be inpatient visits in
the previous year between the group who had readmission within 30 days and another group who
does not have a readmission within 30 days. So for two independent groups t-test is applied.
Call (Patient was not readmitted to the hospital with in 30 days) -- Group 1 and (Patient was
readmitted to the hospital within 30 days) Group 2
7. 8 POINTS Sometimes hospital readmission is coded in different ways. For instance, data can be
collected that describes whether a patient was readmitted within 30 days, was readmitted beyond 30
days, or not readmitted at all. Instead of looking at readmission within 30 days or not, the investigator
changes their question to ask if there is a difference in the number of inpatient visits in the previous
year between the groups who were not readmitted, were readmitted within 30 days, and were
readmitted after 30 days. Conduct a formal hypothesis test to answer this question (choose the
appropriate statistical test, explain why you chose it, write out your null and alternative hypotheses,
run the test, and interpret the results). Include appropriate output from SPSS to show what you did.
ANSWER BELOW:
In this there are three groups that are mutually exclusive. As in this the number of groups are more
than 2 so t-test is not applicable on this data. So Anova applied on these groups as Anova can take
more than 2 groups that are mutually exclusive.
Let Patients not admitted be group 1, Patients readmitted with in 30 days be group 2, , Patients not
readmitted with in 30 days be group 3.
8. 8 POINTS When a person is admitted to the hospital, if they have uncontrolled diabetes it can greatly
affect the course of their hospital stay. One indicator of uncontrolled diabetes is a hemoglobin A1c
test. The investigator asks: Is there an association between being readmitted within 30 days (or not),
and whether an A1c test is normal (or not)? Conduct a formal hypothesis test to answer this question
(choose the appropriate statistical test, explain why you chose it, write out your null and alternative
hypotheses, run the test, and interpret the results). Include appropriate output from SPSS to show
what you did.
ANSWER BELOW: