You are on page 1of 2

Multivariate Analysi

Homework #

Each question is worth ten points. There are in total one hundred points.
1 What is the difference between an ordered categorical variable and an unordered one?
De ne and then give an example of each.
2 What kind of link function does an ordered logistic regression employ? How does it differ
from an ordinary logit link?
3 When count data are zero-in ated, using a model that ignores zero-in ation will tend to
induce which kind of inferential error?
4 Over-dispersion is common in count data. Give an example of a natural process that might
produce over-dispersed counts. Can you also give an example of a process that might produce
under- dispersed counts?
5 At a certain university, employees are annually rated from 1 to 4 on their productivity, with
1 being least productive and 4 most productive. In a certain department at this certain university in a
certain year, the numbers of employees receiving each rating were (from 1 to 4): 12, 36, 7, 41.
Compute the log cumulative odds of each rating.
6 In 2014, a paper was published that was entitled “Female hurricanes are deadlier than male
hurricanes.” As the title suggests, the paper claimed that hurricanes with female names have caused
greater loss of life, and the explanation given is that people unconsciously rate female hurricanes as
less dangerous and so are less likely to evacuate.
Statisticians severely criticized the paper after publication. Here, you’ll explore the complete
data used in the paper and consider the hypothesis that hurricanes with female names are deadlier.
Load the data with: library(rethinking); data(Hurricanes)

Acquaint yourself with the columns by inspecting the help ?Hurricanes.
In this problem, you’ll focus on predicting deaths using femininity of each hurricane’s name. Fit
and interpret the simplest possible model, a Poisson model of deaths using femininity as a
predictor. Compare the model to an intercept-only Poisson model of deaths. How strong is the
association between femininity of name and deaths? Which storms does the model t (retrodict)
well? Which storms does it t poorly

7 Counts are nearly always over-dispersed relative to Poisson. So t a gamma-Poisson (aka


negative-binomial) model to predict deaths using femininity. Show that the over-dispersed
model no longer shows as precise a positive association between femininity and deaths, with an
89% interval that overlaps zero. Can you explain why the association diminished in strength
8 In order to infer a strong association between deaths and femininity, it’s necessary to include
an interaction effect. In the data, there are two measures of a hurricane’s potential to cause death:
damage_norm and min_pressure. Consult ?Hurricanes for their meanings. It makes some sense to
imagine that femininity of a name matters more when the hurricane is itself deadly. This implies an
interaction between femininity and either or both of damage_norm and min_pressure.
Fit a series of models evaluating these interactions. Interpret and compare the models. In
interpreting the estimates, it may help to generate counterfactual predictions contrasting hurricanes
with masculine and feminine names. Are the effect sizes plausible

1

.
.
.
.
.
.
.
.
fi
5

fi
?

fl

fi
fl

fi

9 In the original hurricanes paper, storm damage (damage_norm) was used directly. This
assumption implies that mortality increases exponentially with a linear increase in storm strength,
because a Poisson regression uses a log link. So it’s worth exploring an alternative hypothesis: that
the logarithm of storm strength is what matters. Explore this by using the logarithm of
damage_norm as a predictor. Using the best model structure from the previous problem, compare a
model that uses log(damage_norm) to a model that uses damage_norm directly. Compare their DIC/
WAIC values as well as their implied predictions. What do you conclude
10 The data in data(Fish) are records of visits to a national park. See ?Fish for details. The
question of interest is how many sh an average visitor takes per hour, when shing. The problem is
that not everyone tried to sh, so the sh_caught numbers are zero-in ated. As with the monks
example in the chapter, there is a process that determines who is shing (working) and another pro-
cess that determines sh per hour (manuscripts per day), conditional on shing (working). We want
to model both. Otherwise we’ll end up with an underestimate of rate of sh extraction from the
park.
You will model these data using zero-in ated Poisson GLMs. Predict fish_caught as a
function of any of the other variables you think are relevant. One thing you must do, however, is use
a proper Poisson offset/exposure in the Poisson portion of the zero-in ated model. Then use the
hours variable to construct the offset. This will adjust the model for the differing amount of time
individuals spent in the park.

2
.
.

fi
fi
fi
fi
fl
fi
fl
?

fi
fl
fi
fi

You might also like