Appendix Brownbag 1

Appendix
by Emir Izat bin Abdul Rashid
July 21, 2022
1 Measuring Health Disparity for Groups
For the purpose of the study, we are looking for a measure that capture inter-group vaccine hesi-
tancy/uptake within a district. Index of disparity is a suitable choice to achieve this purpose.
Definition 1.1. The index of disparity, ID, for n number of groups for a given district is given as
follows
P
n−1 |ri −rrp |
i=1 n−1
ID = [1]
rrp
where ri is the rate of vaccination for the ith group, rrp is the rate of vaccination for the reference
group (the best performing group in a district).
The drawback of the measure above is that it does not take into account the relative size of
the group with respect to the whole population. To remedy this problem, I am proposing a new
measure, weighted index of disparity (IDw ) to account for population size.
Definition 1.2. The weighted index of disparity IDw for a given population of group i from the
collection of n groups {1, . . . , i, . . . , n} is
 P 
n−1 |ri −rrp | pi
i=1 n−1 p1 +···+pn
IDw =  
rrp
given that pj is the population size for the j-th group.
Still, both weighted and non-weighted index of disparity is sensitive to inter-group disparity: but
it may not capture intra-group disparity as well. To capture institutionally-established categories
1
such as ethnicity, gender, income brackets, and others, the index of disparity is a suitable measure
for redistributive policy goals.
2 The Theory of Linear Models
2.1 What is a Linear Model?
Definition 2.1. For an observation i, if we assume that the response variable yi and the explanatory
variables x1 , x2 , . . . , xp are linearly related
yi = β0 + β1 xi1 + · · · + βp xip + ei
Or more concisely,
Y = Xβ + e
Given that X is the design matrix, assuming that E(ei ) = 0, V ar(ei ) = σ 2 for all i, and
Cov(ei , ej ) = 0 for all i 6= j
Theorem 2.1. The least square estimate for the parameter β in a linear model is
β̂ = (X T X)−1 X T Y
given that (X T X)−1 is invertible (it usually is unless you use two exactly dependent variables).
Proof. The proof is left as an exercise to the reader.
2.2 Why is Linear Model Attractive?
The reason for the popularity of linear models is due to the following theorem also known as the
Gauss-Markov theorem.
Theorem 2.2. For any linear, unbiased estimator, the least square estimate has the smallest
sampling variance (i.e. the most optimal)
Proof. The proof is left as an exercise to the reader.
2
Although one could get a non-linear or biased estimator which can fit a model better, we usually
stick with OLS to remove the overfitting problem.
2.3 Variable Selection
This is the most crucial part of the OLS, which is essentially a collection of techniques to pick the
”best” model out of many potential models out there.
There are two distinct techniques which we use to do variable selection; the first is stepwise
regression and the second is criteria-based selection.
Here are the range of the different variable selection techniques commonly used:
1. Backward elimination (stepwise)
2. Forward selection (stepwise)
3. Adjusted R2 (criteria)
4. Akaike Information Criterion (criteria)
5. Bayes Information Criterion (criteria)
6. Mallow’s Cp (criteria)
7. Cross-validation (criteria)
All the techniques above will generally yield different models. Cross-validation is the most
computationally expensive technique among these but a favorite among many practitioners. We
can select the best models chosen by 1-6 and compare them using cross-validation.
2.4 Regression Diagnostics
We need to ensure that our model is robust by checking all the assumptions in definition 2.1 and
removing outliers. Usually the OLS library in R can correct most issues.
2.5 Statistical Significance
Last but not least, we need to conduct hypothesis testing to ensure that our observations are
statistically significant. Statistical significance is an important step towards establishing some
3
grounds for our hypothesis – the first step towards showing causality. The more popular tests
include the F-test and t-tests which are equivalent forms of hypothesis testing for model significance
but requires the assumption that our observations are normally distributed. Another interesting
test is the permutation test, which is also widely used because they involve no assumptions for
normality in the OLS. We usually test our model against an empty model (we essentially argue
that our hypothesized parameters fit the observation better than no parameters at all).
References
[1] Jeffrey N Pearcy and Kenneth G Keppel. A summary measure of health disparity. Public health
reports, 2016.

Appendix Brownbag 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Appendix Brownbag 1

Uploaded by

Copyright:

Available Formats

Appendix

by Emir Izat bin Abdul Rashid

July 21, 2022

1 Measuring Health Disparity for Groups

group (the best performing group in a district).

measure, weighted index of disparity (IDw ) to account for population size.

collection of n groups {1, . . . , i, . . . , n} is

given that pj is the population size for the j-th group.

it may not capture intra-group disparity as well. To capture institutionally-established categories

for redistributive policy goals.

2 The Theory of Linear Models

2.1 What is a Linear Model?

variables x1 , x2 , . . . , xp are linearly related

Cov(ei , ej ) = 0 for all i 6= j

Proof. The proof is left as an exercise to the reader.

2.2 Why is Linear Model Attractive?

sampling variance (i.e. the most optimal)

Proof. The proof is left as an exercise to the reader.

stick with OLS to remove the overfitting problem.

2.3 Variable Selection

”best” model out of many potential models out there.

regression and the second is criteria-based selection.

1. Backward elimination (stepwise)

2. Forward selection (stepwise)

4. Akaike Information Criterion (criteria)

5. Bayes Information Criterion (criteria)

2.4 Regression Diagnostics

2.5 Statistical Significance

statistically significant. Statistical significance is an important step towards establishing some

You might also like