Professional Documents
Culture Documents
Analise - INLA
Analise - INLA
Corresponding author:
Professor PhD Luciano de Andrade
Department of Medicine, State University of Maringa
Post-Graduation Program in Health Sciences
landrade@uem.br
Data availability
All data are publicly and freely available from the Brazilian Health System Informatics
Department (DATASUS), and cartographic base data for all municipalities in Paraná State were
obtained from the Paraná branch of the Brazilian Institute of Geography and Statistics (IBGE). Table
below lists all datasets accessed and utilized in this study. The database and script utilized in this
study can be found in the online repository: https://doi.org/10.6084/m9.figshare.25007108.v1.
#head (data)
#head (variables)
410010 0.6003 1 0 0
410020 0.6127 1 0 0
410030 0.5264 1 0 0
410040 0.5565 0 0 1
410045 0.6446 1 0 0
410050 0.6762 0 1 0
#map (PR)
library(sp)
rownames(d) <- d$id
map <- merge(map, d, by.x = "county", by.y = "id")
head(map@data)
colnames(d)
str(map@data)
Data Organization
We calculated the observed and expected counts, as well as the SMRs and SIRs for each
municipality and year, and created a data frame with the following variables:
Year: year,
Observed Cases
We obtain the number of cases for all strata combined in each municipality and year by aggregating
the data by municipality. To do this, we use the aggregate() function specifying the cases vector, the
list of grouping elements such as list(county = data$name, year = data$year), and the function to be
applied to subsets of data, which is the mean. We also set the names of the returned data frame as
id Y
410010 1.2187
410020 1.7395
410030 2.6666
410040 19.4635
410045 0.5937
410050 2.4479
Expected Cases
table(is.na(d$E))
table(is.na(d$E))
id Y E
This model assumes that the number of observed cases in a specific municipality in a given
The formula "Oij ~ Poisson (nijΘij)" describes a statistical relationship being used to model
the number of observed cases (Oij) in a specific context, such as epidemiology or health data
Oij: Represents the number of observed cases in a specific context, like a municipality, year,
Poisson: Refers to the Poisson distribution, which is a discrete probability distribution used to
model the number of rare or discrete events occurring in a fixed time or space interval. In this context,
nij: Represents the number of people or the population at risk in the same context where we
are counting the cases. In other words, it is the number of individuals who are potentially subject to
event in question in the specific context (municipality, year.). It is the value we are trying to estimate or
model.
Therefore, the formula "Oij ~ Poisson (nijΘij)" indicates that we are modeling the number of
observed cases (Oij) as a random variable following a Poisson distribution, where the rate of
occurrence (Θij) is multiplied by the population at risk (nij) to determine the probability of observing a
The number of observed cases is influenced by the expected number of cases and the
relative risk of that specific municipality and year. This model allows each area to have its own
intercept and linear trend, considering the specific characteristics of each municipality.
log(Θij): Refers to the natural logarithm of the occurrence rate (Θij) of the event under study. The
logarithmic transformation is common in statistical models to stabilize variability and ensure that
μ: Represents an overall intercept or baseline of the occurrence rate. It is a constant parameter that
captures the average level of the occurrence rate across the entire population or context.
ϕi: Represents a specific random effect for each municipality (or other unit of analysis) denoted by i.
These random effects capture spatial variation or unexplained variation beyond the overall intercept.
β: Represents the overall effect of the time variable (tj) on the occurrence rate. It is a parameter that
δi: Represents the specific effect of each municipality for the time variable (tj). These effects capture
tj: Is the time variable that may represent different time points at which data were collected. It is used
# SMR
d$SMR <- d$Y/d$E
# SIR
d$SIR <- d$Y/d$E
#SMR
# Mapping variables
library(leaflet)
l <- leaflet(map) %>% addTiles()
pal <- colorNumeric(palette = "YlOrRd", domain = map$SMR)
l %>% addPolygons(color = "grey", weight = 1, fillColor = ~pal(SMR), fillOpacity = 0.5) %>
%
addLegend(pal = pal, values = ~SMR, opacity = 0.5, title = "SMR", position =
"bottomright")
# Neighbourhood matrix
library(spdep)
library(foreach)
library(INLA)
nb <- poly2nb(map)
head(nb)
nb2INLA("map.adj", nb)
g <- inla.read.graph(filename = "map.adj")
table(is.na(map@data$Lager_sized))
# Results
summary(res)
library(ggplot2)
modelo <- lm(Y ~ INDEX1021+Small_sized+Medium_sized+Lager_sized, data =
map@data)
vif_result <- vif(model1)
print(vif_result)
performance::check_overdispersion(m)
m <- glmmTMB(
Y ~ INDEX1021+Small_sized+Medium_sized+Lager_sized + (1 |CODIBGE),
family = poisson,
data = map@data
)
performance::check_overdispersion(m)
head(res$summary.fitted.values)
names(res)
map$RR <- res$summary.fitted.values[, "mean"]
map$LL <- res$summary.fitted.values[, "0.025quant"]
map$UL <- res$summary.fitted.values[, "0.975quant"]
range(map@data$SMR)
range(map@data$RR)
head(map@data)
summary(d$Y)
summary(d$E)
#SIR
# Mapping variables
library(leaflet)
l <- leaflet(map) %>% addTiles()
pal <- colorNumeric(palette = "YlOrRd", domain = map$SIR)
l %>% addPolygons(color = "grey", weight = 1, fillColor = ~pal(SIR), fillOpacity = 0.5) %>%
addLegend(pal = pal, values = ~SIR, opacity = 0.5, title = "SIR", position = "bottomright")
# Neighbourhood matrix
library(spdep)
library(foreach)
library(INLA)
nb <- poly2nb(map)
head(nb)
nb2INLA("map.adj", nb)
g <- inla.read.graph(filename = "map.adj")
# Results
summary(res)
library(ggplot2)
library(glmmTMB)
m <- glm(Y ~ INDEX1021+Small_sized+Medium_sized+Lager_sized, family = poisson,
data = map@data)
performance::check_overdispersion(m)
m <- glmmTMB(
Y ~ INDEX1021+Small_sized+Medium_sized+Lager_sized + (1 |CODIBGE),
family = poisson,
data = map@data
)
performance::check_overdispersion(m)
head(res$summary.fitted.values)
names(res)
map$RR <- res$summary.fitted.values[, "mean"]
map$LL <- res$summary.fitted.values[, "0.025quant"]
map$UL <- res$summary.fitted.values[, "0.975quant"]
range(map@data$SIR)
range(map@data$RR)
head(map@data)
summary(d$Y)
summary(d$E)
From here, the results were exported along with their respective shapefiles and plotted in QGIS.