ubc_2023_november_jain_sakshi.pdf

Low-cost Air Quality Sensors: From Nuts & Bolts to Real
World Applications
by
Sakshi Jain
B. Tech., Center of Environmental Planning and Technology, 2016

M. Sc., Carnegie Mellon University, 2017
A THESIS SUBMITTED IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF
Doctor of Philosophy
in
THE FACULTY OF GRADUATE AND POSTDOCTORAL

STUDIES
(Mechanical Engineering)
The University of British Columbia

(Vancouver)
August 2023
© Sakshi Jain, 2023

The following individuals certify that they have read, and recommend to the Fac-
ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:
Low-cost Air Quality Sensors: From Nuts & Bolts to Real World Ap-
plications
submitted by Sakshi Jain in partial fulfillment of the requirements for the degree
of Doctor of Philosophy in Mechanical Engineering.
Examining Committee:
Naomi Zimmerman, Assistant Professor, Mechanical Engineering, UBC
Supervisor
Milind Kandlikar, Professor, Institute for Resources, Environment and Sustainabil-
ity, UBC
University Examiner
Patrick Kirchen, Associate Professor, Mechanical Engineering, UBC
University Examiner
Amanda Giang, Assistant Professor, Institute for Resources, Environment and Sus-
tainability, UBC
Supervisory Committee Member
Steven Rogak, Professor, Mechanical Engineering, UBC
Additional Supervisory Committee Members:

Adam Rysanek, Assistant Professor, School of Architecture and Landscape Archi-
tecture, UBC
ii
Abstract
Recent advancements in low-cost sensor (LCS) technology have presented a new
and affordable opportunity to understand and subsequently improve air quality.
This thesis assessed the different stages of adoption and application of LCS tech-
nology, including calibrating the sensors, using sensors to build spatiotemporal
pollutant maps, and using these maps to identify inequities in air pollution expo-
sures.
In Chapter 3, a general calibration method for commercially available low-
cost PM2.5 sensors (PurpleAir/Plantower) was explored, such that the calibration
models can be transferable to large geographical areas, especially in areas with
limited monitoring. Inter-city models (e.g., trained in California and tested in In-
dia) built for regional concentrations were found to be effective in reducing errors
by 30% in measurements. Chapter 4 used data from a network of 50 LCS de-
ployed in Pittsburgh (Pennsylvania, USA) to build daily average land-use regres-
sion (LUR) and random-forests (LURF) spatiotemporal models for PM2.5 , NO2 ,
and CO. The LURF models outperformed traditional regression techniques, with
an increase in average externally cross-validated R2 of 0.10-0.19. Models built
after separating local contributions from the regional signal improved the R2 by
iii
0.14. In Chapter 5, the LURF models for PM2.5 were then used to build static
(population spends 24 hours/day in a fixed residential area) and dynamic models
(population moves between residential and commercial areas) and used to esti-
mate variations in residents’ exposures to PM2.5 due to movement. The exposure
estimates were consistently about 10% higher when the population spends more
time in commercially-dense locations (dynamic model) vs residentially-dense lo-
cations (static model). Weekend concentrations were also 10% higher than week-
day concentrations. Chapter 6 describes the deployment and analysis of data from
a network of 11 LCS deployed in an environmental injustice neighborhood in Van-
couver (British Columbia, Canada). PM2.5 , NO2 , and O3 concentrations were used
to calculate cumulative hazard indices (CHIs) to identify hotspots within the neigh-
borhood and to address the inequities in air pollution when compared to the Greater
Vancouver region. Lastly, Chapter 7 summarizes the lessons learned from this the-
sis and provides insight into key design deployment considerations.
iv
Lay Summary
The air quality and pollutant concentrations can vary over short distances. How-
ever, these effects are poorly characterized since the high cost of traditional air
quality monitoring instruments has limited their widespread deployment, resulting
in a limited number of monitoring stations in most cities and a lack of coverage in
rural areas. Consequently, a significant portion of the global population is not ade-
quately represented in the available air quality data. To address this issue, low-cost
air quality sensors offer a promising solution. These sensors are characterized by
their affordability, low power requirements, and compact size, making them suit-
able, once already calibrated, for wider deployment to fill in the gaps in traditional
monitoring networks. This thesis explores the calibration and subsequent applica-
tion of low-cost sensors to improve our understanding of how air pollution varies in
space and time, and some example assessments of how these variations can impact
exposure estimates.
v
Preface
This dissertation is the original intellectual property of the author, Sakshi Jain.
Various results from Chapters 3-6 of this dissertation have been presented as
published manuscripts in scholarly journals and conference proceedings.
A version of Chapter 3 is in preparation for submission. The results of this
chapter have been presented as a poster at the 2022 Air Sensor International Con-
ference, Pasadena, California. I conceptualized the study, conducted the literature
review and analysis for this work, developed the figures and wrote the manuscript.
NZ provided critical feedback and editing during all stages of the project and the
manuscript.
Chapter 4 contains a paper published in the peer-reviewed journal Environment
Science & Technology. “S. Jain, A. Presto, and N. Zimmerman (2021). Spatial
Modeling of Daily PM2.5 , NO2 , and CO Concentrations Measured by a Low-Cost
Sensor Network: Comparison of Linear, Machine Learning, and Hybrid Land Use
Models. Environmental Science & Technology, 2021 55 (13), 8631-8641.” The
works of this chapter have also been presented as an oral presentation the 37th
American Association for Aerosol Research (AAAR) Annual Conference, Port-
land, Oregon, and as a poster at the 2019 Machine Learning in Science and Engi-
vi
neering Conference, Atlanta, Georgia. I conducted the literature review and anal-
ysis for this work, developed the figures and wrote the manuscript. NZ acquired
the funding for this project, and provided critical feedback during all stages of the
project and the manuscript. AP provided the original low-cost sensor data used in
this work and provided critical feedback for the manuscript.
A version of Chapter 5 is in preparation for submission. The results of this
chapter have been presented as posters at the 2020 AGU Fall Meeting and the 33rd
Annual Conference of the International Society for Environmental Epidemiology,
New York City, New York. I conducted the literature review and analysis for this
work, developed the figures and wrote the manuscript. NZ acquired the funding
for this project, and provided critical feedback during all stages of the project and
the manuscript. AP provided the original low-cost sensor data used in this work
and provided critical feedback for the manuscript.
A version of Chapter 6 is in preparation for submission. The field work de-
scribed in the chapter was covered by UBC Ethics Certificate number H21-02425.
This work was proposed by RGF and me, and we also independently secured fund-
ing via the Public Scholars Initiative. Ethics approval was obtained by RGF and
me, with support from NZ and AG. The field work was designed by everyone and
executed by RGF, NM, and me. DJ provided community support, his insights on
the neighbourhood during a walking tour, and helped with recruitment of hosts. I
conducted the literature review and analysis for this chapter, developed the figures
and wrote the manuscript. AG and NZ provided critical feedback during all stages
of this project. AG provided insights on the methodology and NZ provided critical
feedback on the manuscript.
vii
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Lay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 2
viii
1.3 Thesis Structure and Ordering of Chapters . . . . . . . . . . . . . 4
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Sources of Ambient Air Pollution . . . . . . . . . . . . . . . . . 6
2.2 Air Quality Monitoring . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Air Pollution Regulation . . . . . . . . . . . . . . . . . . 9
2.2.2 Existing Infrastructure and its Limitations . . . . . . . . . 10
2.2.3 Overview: Low-Cost Sensors . . . . . . . . . . . . . . . 12
2.2.4 Operating Principles of Low-Cost Sensors . . . . . . . . . 13
2.2.5 Challenges with Low-cost Sensors and Calibration Tech-
niques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Extending Sensor Data: Spatiotemporal Models . . . . . . . . . . 22
2.4 Extending Spatiotemporal Models: Exposure Assessments . . . . 25
2.5 Extending Spatiotemporal Models: Hotspot Identification . . . . . 27
2.6 Environmental Justice and Air Pollution . . . . . . . . . . . . . . 28
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Exploration of intra-city and inter-city PM2.5 regional calibration
models to improve low-cost sensor performance . . . . . . . . . . . 33
3.1 Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.1 Low-cost Sensor: PurpleAir . . . . . . . . . . . . . . . . 38
3.5.2 Data Collection and Processing . . . . . . . . . . . . . . 40
ix
3.5.3 Baseline Separation . . . . . . . . . . . . . . . . . . . . . 44
3.5.4 Model Building . . . . . . . . . . . . . . . . . . . . . . . 46
3.5.5 Intra-city Models . . . . . . . . . . . . . . . . . . . . . . 48
3.5.6 Inter-city Models . . . . . . . . . . . . . . . . . . . . . . 49
3.5.7 Validation Testing of the Method . . . . . . . . . . . . . . 51
3.5.8 Performance Metrics . . . . . . . . . . . . . . . . . . . . 52
3.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 53
3.6.1 Regression Model Selection . . . . . . . . . . . . . . . . 53
3.6.2 Performance Evaluation of Intra-city Regional Concentra-
tion Models . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6.3 Performance Evaluation of Inter-city Regional Concentra-
tion Models . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6.4 Validation Testing of the Method . . . . . . . . . . . . . . 64
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4 Spatial modelling of daily PM2.5 , NO2 and CO concentrations mea-
sured by a low-cost sensor network: Comparison of linear, machine
learning, and hybrid land use models . . . . . . . . . . . . . . . . . 67
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.1 Measurement Details . . . . . . . . . . . . . . . . . . . . 73
4.5.2 Standard and Decomposed Concentration Data Processing 73
x
4.5.3 Predictor Variables . . . . . . . . . . . . . . . . . . . . . 76
4.5.4 Land Use Regression (LUR) . . . . . . . . . . . . . . . . 77
4.5.5 Land Use Random Forest and Hybrid Models . . . . . . . 78
4.6 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.1 PM2.5 Models . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.2 NO2 Models . . . . . . . . . . . . . . . . . . . . . . . . 84
4.6.3 CO Models . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6.4 Mapping the models . . . . . . . . . . . . . . . . . . . . 89
5 Using Spatiotemporal Prediction Models to Quantify PM2.5 Expo-
sure due to Daily Movement . . . . . . . . . . . . . . . . . . . . . . 92
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.5.1 PM2.5 Measurements . . . . . . . . . . . . . . . . . . . . 96
5.5.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.5.3 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.4 Land Use: Residential and Commercial Areas . . . . . . 99
5.5.5 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5.6 Static and Dynamic Models . . . . . . . . . . . . . . . . 102
5.6.1 Temporal Variations . . . . . . . . . . . . . . . . . . . . 105
5.6.2 Spatial Variations . . . . . . . . . . . . . . . . . . . . . 107
xi
5.6.3 Static and Dynamic Models . . . . . . . . . . . . . . . . 109
5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6 Identification of Neighbourhood Hotspots via the Cumulative Haz-
ard Index: Results from a Community-Partnered Low-cost Sensor
Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.5.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.5.2 Community Partner: Strathcona Residents Association . . 120
6.5.3 Low-cost Sensors . . . . . . . . . . . . . . . . . . . . . . 121
6.5.4 Site Selection . . . . . . . . . . . . . . . . . . . . . . . . 123
6.5.5 Data Collection and Processing . . . . . . . . . . . . . . 125
6.5.6 Spatial Modeling . . . . . . . . . . . . . . . . . . . . . . 126
6.5.7 Estimating Cumulative Air Pollution Impacts . . . . . . . 128
6.6.1 Data Summary . . . . . . . . . . . . . . . . . . . . . . . 129
6.6.2 Inter-neighbourhood Variability . . . . . . . . . . . . . . 132
6.6.3 Intra-neighbourhood Variability . . . . . . . . . . . . . . 136
6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 142
xii
7.2 Revisiting Original Objectives . . . . . . . . . . . . . . . . . . . 143
7.2.1 Objective 1: Explore and develop a geographically-transferable
calibration method to improve sensor performance over
broad concentration ranges. . . . . . . . . . . . . . . . . 143
7.2.2 Objective 2: Develop and compare different spatiotempo-
ral pollution models using data collected via LCS networks. 144
7.2.3 Objective 3: Compare residents’ exposure due to mobil-
ity using spatiotemporal pollution models built from LCS
network data. . . . . . . . . . . . . . . . . . . . . . . . . 145
7.2.4 Objective 4: Assess intra- and inter-neighbourhood vari-
abilities in a community using data collected via LCS to
identify hotspots and recognize environmental injustice con-
cerns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.3 Key Design Considerations for Deployment - Lessons from this
Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.3.1 Purpose: Why are you deploying the sensor? . . . . . . . 147
7.3.2 Pre-deployment . . . . . . . . . . . . . . . . . . . . . . . 148
7.3.3 During Deployment . . . . . . . . . . . . . . . . . . . . 154
7.3.4 Post-deployment . . . . . . . . . . . . . . . . . . . . . . 155
7.4 Future Research Directions . . . . . . . . . . . . . . . . . . . . . 159
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
A Supplementary Information: Chapter 3 . . . . . . . . . . . . . . . . 198
A.1 Calculating ALT Concentrations . . . . . . . . . . . . . . . . . . 198
A.2 RollingBall Algorithm . . . . . . . . . . . . . . . . . . . . . . . 199
xiii
A.3 Rolling Ball Model Input . . . . . . . . . . . . . . . . . . . . . . 201
A.4 Regression Model Selection . . . . . . . . . . . . . . . . . . . . 201
A.5 Intra-city Model Performances . . . . . . . . . . . . . . . . . . . 202
A.6 Inter-City Model Performances . . . . . . . . . . . . . . . . . . . 207
A.7 Comparison with Other Studies . . . . . . . . . . . . . . . . . . . 208
B Supplementary Information: Chapter 4 . . . . . . . . . . . . . . . . 209
B.1 QA/QC for the monitoring data (Zimmerman et al., 2020 and Ma-
lings et al., 2019, 2020) . . . . . . . . . . . . . . . . . . . . . . . 209
B.1.1 Gas Sensor Calibrations – paraphrased from Malings et al.
(2019) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
B.1.2 Correction of PM2.5 Data – paraphrased from Malings et
al. (2020) . . . . . . . . . . . . . . . . . . . . . . . . . . 212
B.1.3 Emperical Correction Method . . . . . . . . . . . . . . . 214
B.2 Distribution of daily pollutant concentrations . . . . . . . . . . . 215
B.3 Limit of Detection values . . . . . . . . . . . . . . . . . . . . . . 217
B.4 Description of wavelet decomposition approach (Zimmerman et
al., 2020) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
B.5 Description of model covariates . . . . . . . . . . . . . . . . . . 219
B.6 Model covariates of sites and LUR variable coefficients . . . . . . 221
B.7 Description of random forest models . . . . . . . . . . . . . . . . 224
B.8 Observed vs Predicted concentrations . . . . . . . . . . . . . . . 225
B.9 LUR Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 226
B.10 Average performance metrics of the pollutants across all models . 229
B.11 Comparison to other published studies . . . . . . . . . . . . . . . 230
xiv
B.12 Spatial Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
B.13 Comparison of model performance with EPA monitoring stations . 235
C Supplementary Information: Chapter 5 . . . . . . . . . . . . . . . . 236
C.1 Limit of Detection (LOD) . . . . . . . . . . . . . . . . . . . . . . 236
C.2 Selection of prediction models and variables . . . . . . . . . . . . 236
C.3 Land Use Types by Allegheny County GIS Group . . . . . . . . . 238
C.4 Spatial distribution at 100m buffers for residential and commercial
areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
C.5 Average day-wise concentrations in 2017 . . . . . . . . . . . . . 241
C.6 Uncertainties in measurement and models . . . . . . . . . . . . . 242
C.7 Static and Dynamic Models . . . . . . . . . . . . . . . . . . . . . 244
D Supplementary Information: Chapter 6 . . . . . . . . . . . . . . . . 245
D.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
D.2 % Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
D.3 Typical Diurnal Pattern . . . . . . . . . . . . . . . . . . . . . . . 247
D.4 RAMPs vs MV comparison . . . . . . . . . . . . . . . . . . . . . 248
D.5 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . 248
D.6 Aggregate Multiplicative and Additive CHI . . . . . . . . . . . . 249
D.7 Wind Speed and Direction . . . . . . . . . . . . . . . . . . . . . 250
xv
List of Tables
Table 2.1 Air quality objectives from the CAAQS (Canada; for 2020 and
2025) MV (Metro Vancouver), US EPA (USA) and recommended
WHO (global) levels for the criteria pollutants. . . . . . . . . . 11
Table 2.2 RAMP monitor Specifications . . . . . . . . . . . . . . . . . . 16
Table 3.1 Information on PurpleAir data and regulatory bodies used to
create intra-city and inter-city calibration models. Wildfire con-
centrations were used for modeling in Chico, to represent high
levels of air pollution in South Asia. . . . . . . . . . . . . . . 41
Table 3.2 Means and medians for the ratio of estimated regional to total
concentrations for window widths 4-9 hours for Chico. The
target average SOA/TOA ratio was 0.65. . . . . . . . . . . . . 46
Table 3.3 Average PM2.5 model RMSE from the inter-city models. . . . . 53
Table 3.4 Regression Model Coefficients for different Cities. . . . . . . . 54
xvi
Table 3.5 Mean R2 , RMSE (µg m−3 ) and nRMSE values of the testing
dataset for the intra-city models, built for estimated regional
ATM concentrations. ‘n’ is the number of test models (months;
maximum = 12). One model was created for each city and
tested for each individual LCS in the city and each month us-
ing LOMO cross validation method. Chico models were trained
using either 2020 or 2021 wildfire data and tested on the other. 56
Table 3.6 Mean (standard deviation) of R2 , RMSE (µg m−3 ) and nRMSE
of inter-city calibration models for sensor observed and cali-
brated ATM and ALT concentrations. . . . . . . . . . . . . . . 60
Table 3.7 Performance comparison of observed ATM and ALT concentra-
tion, with traditional MLR and separate regional and local MLR
models, when tested in Vancouver (VPA1) and Yreka (CPA5). . 65
Table 4.1 Changes to averages of R2 and CvMAE by using ‘standard’ and
‘decomposed’ signal LURF and Hybrid models compared to the
base case (LUR built with the standard signal). More details on
average performance of models are provided in Appendix B.10. 82
Table 6.1 Mean and standard deviations (S.D.) for the performances of
calibration models on withheld data from collocation period. . 123
Table 6.2 PM2.5 (daily average), NO2 (daily 1-hour maximum) and O3
(daily 8-hour maximum) MV air quality objectives and reported
concentrations across four MV stations. The values reported
are average during the deployment period, and the numbers in
brackets are 10th and 90th percentile concentrations. . . . . . . 132
xvii
Table 7.1 Suggested Performance Goals for LCS by the US EPA [251]. . 157
Table A.1 Average values for the ratio of estimated baselines to total con-
centrations for all cities. The selected model input (wm) is in
bold font. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Table A.2 Average model RMSE (µg m−3 ) across all testing PAs. Text in
bold is the lowest RMSE model for the city. . . . . . . . . . . 201
Table A.3 Model Performances for Chico. . . . . . . . . . . . . . . . . . 202
Table A.4 Model Performances for Kathmandu. . . . . . . . . . . . . . . 202
Table A.5 Model Performances for Bengaluru. . . . . . . . . . . . . . . . 203
Table A.6 Model Performances for Delhi. . . . . . . . . . . . . . . . . . 204
Table A.7 Model Performances for Lahore. . . . . . . . . . . . . . . . . 205
Table A.8 Model Performances for Lahore. . . . . . . . . . . . . . . . . 206
Table A.9 Performances of Inter-city Models. RMSE values are in µg m−3
and nRMSE values are in %. . . . . . . . . . . . . . . . . . . 207
Table A.10 Comparison with other studies. MAE and RMSE values are in
µg m−3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Table B.1 Calibrated coefficients (θ0 and θ1 ) calculated using typical lin-
ear regression techniques for Met-One NPM and PurpleAir PPA
over 3 different periods – summer, winter and other. S.D. de-
notes the standard deviation. (Table taken from Malings. et al.
2020 [142]) . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Table B.2 Land use covariates used in LUR and LURF models . . . . . . 220
Table B.3 Predictor values used in model building . . . . . . . . . . . . . 221
xviii
Table B.4 LUR Coefficients for PM2.5 . U and S represent coefficients of
Unstandardized and Standardized data, respectively. . . . . . . 226
Table B.5 LUR Coefficients for NO2 . U and S represent coefficients of
Unstandardized and Standardized data, respectively. . . . . . . 227
Table B.6 LUR Coefficients for CO. U and S represent coefficients of Un-
standardized and Standardized data, respectively. . . . . . . . . 228
Table B.7 Average performance metrics of the pollutants across all models
for 20 iterations . . . . . . . . . . . . . . . . . . . . . . . . . 229
Table B.8 Comparison with other published studies. . . . . . . . . . . . . 230
Table B.9 MAE calculations for data predicted at EPA monitoring stations 235
Table C.1 Top 5 most important variables for modeling of random forest
decomposed signal. Value in the brackets signify buffer distance. 238
Table D.1 Individual RAMPs with corresponding number of days days
when the error-informed dataset exceeded average regional MV
concentrations. Total number of days when the data was as-
sessed is listed in the last column. . . . . . . . . . . . . . . . . 248
Table D.2 Descriptive Statistics for each pollutant (across all RAMPs) and
calculated CHI (across all DBs). . . . . . . . . . . . . . . . . 248
xix
List of Figures
Figure 1.1 Guiding Research Questions. . . . . . . . . . . . . . . . . . . 4
Figure 2.1 Schematic diagram of the main chemical reactions involved in
NOx cycle, including the coupling chemistry with peroxy rad-
icals (RO2 and HO2 ), reproduced from Pan and Faloona [177],
licensed under a Creative Commons Attribution 4.0 Interna-
tional License. . . . . . . . . . . . . . . . . . . . . . . . . . 8
Figure 2.2 Working principle of the low cost PM2.5 sensor, reproduced
from Nguyen et al. [169], licensed under a Creative Commons
Attribution 4.0 International License. The air enters through
the inlet, and the particles in the air scatter the laser beam’s
light. The scattered light is then captured and transformed into
a digital signal. . . . . . . . . . . . . . . . . . . . . . . . . . 14
Figure 2.3 Schematic diagram of an electrochemical sensor, reproduced
from Dräger [68]. . . . . . . . . . . . . . . . . . . . . . . . 15
xx
Figure 3.1 Locations of PurpleAirs (green star) and selected regulatory
stations (pink stars) in the selected 5 locations. The scale of
the map is different for each city. . . . . . . . . . . . . . . . . 43
Figure 3.2 Flowchart for validation, intra-city and inter-city models built
for this work. . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure 3.3 Line and scatter plots for variations in estimated regional con-
centrations (black line) for raw ATM concentration and cali-
brated ATM concentration of BPA1 and BPA2 for intra-city
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure 3.4 Performance of our intra-city (square markers) and inter-city
(circle markers; outline is the training and fill is the testing
city) models with reference to the US EPA guidelines (R2 >
0.7 and nRMSE < 30%). Cities are denoted by different col-
ors. Models in the top-right quadrant represent the models that
met the US EPA guidelines, whereas models in the bottom-left
quadrant didn’t meet either criteria by the US EPA. The model
trained in Kathmandu and tested in Bengaluru (orange circle
with green outline) didn’t meet the nRMSE criteria by the US
EPA, however it met the RMSE criteria (< 7 µg m−3 ). Trian-
gle markers are the average performance of models reported
by other studies. Solid gray and hollow gray markers are the
average performance across all intra-city and inter-city models
for Campmier et al. [42], respectively. The yellow marker with
gray outline is approximately placed to represent the average
performance across all inter-city models by Zusman et al. [273] 58
xxi
Figure 3.5 Line and scatter plots for variations in estimated regional con-
centrations (black line) for raw ATM concentration and cali-
brated ATM concentration of CPA1, CPA2 and CPA3 sensors
when trained in Bengaluru. . . . . . . . . . . . . . . . . . . . 62
Figure 4.1 The location of sampling sites and regulatory monitor sites.
Orange markers show the 50 locations. The purple markers
show the locations of EPA regulatory grade monitors for PM2.5
and CO. The map is divided into municipalities and map shad-
ing becomes darker with increasing population density. . . . . 74
Figure 4.2 Flowchart on model building. The models were created using
LOLOCV method, to generate unique LUR, LURF and Hybrid
externally-validated models. . . . . . . . . . . . . . . . . . . 80
Figure 4.3 Relative VIF for LUR and LURF models for standard and de-
composed signals (short-lived events, long-lived events, per-
sistent enhancements) of PM2.5 . The black dashed lines divide
the figure into spatial (cooler colors) and temporal (warmer
colors) variables. The model selected for the VIF analysis
was the best performing model for the standard signal and kept
same for the time-decomposed models for consistency. . . . . 81
xxii
Figure 4.4 Model performance evaluation of standard and decomposed
signals for CO, NO2 and PM2.5 . Model is evaluated using 3
metrics: (1) external cross-validated R2 (higher is better; max-
imum value 1), (2) MAE (mean absolute error; lower is better)
and (3) CvMAE (Coefficient of variation of MAE; lower is
better). All data above are on testing data (i.e., sites not en-
countered during model building). A time-series and scatter
plot for measurements and predictions at Allegheny County
Health Department (ACHD) can be found in Appendix B.8 as
an example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Figure 4.5 Relative VIF for LUR and LURF models for standard and de-
composed signals (short-lived events, long-lived events, per-
sistent enhancements) of NO2 . The black dashed lines divide
the figure into spatial (cooler colors) and temporal (warmer
colors) variables. The model selected for the VIF analysis
was the best performing model for the standard signal and kept
same for the time-decomposed models for consistency. . . . . 85
Figure 4.6 VIF for performing models for LUR and LURF for standard
and decomposed signals (short-lived events, long-lived events,
persistent enhancements) of CO. Black dashed line divides the
figure into spatial (cooler colors) and temporal (warmer colors)
variables. Models selected for the VIF analysis were the best
performing model for standard signal and kept same for other
signals for consistency. . . . . . . . . . . . . . . . . . . . . . 88
xxiii
Figure 4.7 Mean annual (figure A) and seasonal (figure B) maps for daily
predicted PM2.5 at every 50x50m grid in Allegheny County,
plotted using decomposed LURF model. Pittsburgh and Al-
legheny County boundaries are marked by blue and black lines
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Figure 5.1 Flowchart of steps involved in this work. The blue box at the
top represents the results from Jain et al. (2021) used for this
work. The grey boxes are the outcomes. EPA CO and EPA PM
in the blue box refer to daily measurements of CO and PM2.5
by the US EPA’s Lawrenceville site in the City of Pittsburgh. . 98
Figure 5.2 Flowchart for weighted stratified samples and resultant static
and dynamic models. ‘a’ and ‘b’ represent the hours spent in
the selected land-use type over weekdays or weekends. . . . . 101
Figure 5.3 Sub-daily variations at low-cost sensor sites in the City of Pitts-
burgh (25 out of 47 total sites in Allegheny County) with high
residential (PR ; blue boxes) and high commercial values (PC ;
red boxes) (top 5 sites for commercial and residential density
each). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
xxiv
Figure 5.4 Boxplots for annual averages of predicted PM2.5 for residential
(plots with solid colors) and commercial (plots with diagonal
lines) land-use type separately. Weekday and weekend aver-
ages are also shown separately to represent difference in con-
centration over different days of the week. Summer and winter
concentrations are also displayed separately to represent the
difference in concentration over different seasons. . . . . . . . 105
Figure 5.5 Spatial variations in annual averages of predicted PM2.5 during
weekday (Monday-Friday) and weekends (Saturday-Sunday)
and during summer (May-October) and winter (November-April)
seasons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Figure 5.6 Scalar graph of difference in exposure between static and dy-
namic models informed by amount of time spent in residential
area over weekdays and weekends separately, calculated using
Equations 5.4 and 5.5. . . . . . . . . . . . . . . . . . . . . . 110
Figure 6.1 The Strathcona and Downtown Eastside neighbourhoods of
Vancouver that were studied in this work (black dashed line;
3km x 1km). Green lines are the rail lines within the study
area, and orange lines highlight the major roads (line sources
of air pollution). Red markers identify major point sources of
air pollution (port, industries). Blue star markers are the de-
ployment locations of the RAMPs. . . . . . . . . . . . . . . . 120
xxv
Figure 6.2 Calendar plots for the ratio of sensors exceeding average MV
regional concentrations for lower bound, calibrated LCS and
upper bound datasets for PM2.5 (plots A-C), NO2 (plots D-F)
and O3 (plots G-I). Ratio=0 (blue) indicates that none of the
sensor readings exceeded MV concentrations, whereas ratio=1
(red) indicates that all the operational sensors exceeded MV
averages. The calendar plot at the bottom for each category
(plots J-L), shows the ratio of sensors exceeding average MV
regional concentrations for the three pollutants together (addi-
tive form; combined results from the three plots above). Ra-
tio=0 (blue) indicates that no pollutant across all the sensors
exceeded MV concentrations, whereas ratio=3 (red) indicates
that all the pollutants across all the sensors exceeded MV con-
centrations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Figure 6.3 Multiplicative CHI for the first week of deployment (April 27-
May 3, 2022). The arrow at the bottom of each subplot is the
prevailing wind direction for the day. . . . . . . . . . . . . . 138
Figure A.1 Visual representation of the rolling ball technique. . . . . . . 200
Figure B.1 (Adapted from Malings et al., 2019 [141], duplicated with per-
mission): Performance evaluation of various calibration tech-
niques for gRAMP models. Out of the 5 techniques listed, best
3 performing calibration techniques are displayed for each gas
in the figure. . . . . . . . . . . . . . . . . . . . . . . . . . . 211
xxvi
Figure B.2 Distribution of daily average PM2.5 concentrations for 47 sites
(only 47 out of deployed 50 sensors collected PM2.5 data) for
the period August 2016 – December 2017. Mean values are
marked with an ‘X’ and median values are denoted with a
solid line. PM2.5 concentrations vary notably across 47 sites,
with 4 datapoints exceeding 80 µg m−3 (not shown here). Site
17 (blue plot, with high concentrations) is on a roof in down-
town Pittsburgh, 20m away from a restaurant exhaust vent. The
restaurant specialized in wood-fired pizzas, and was therefore
characterized by extremely high concentrations. . . . . . . . . 215
Figure B.3 Distribution of daily average NO2 concentrations for all sites
for the period August 2016 – December 2017. Mean values
are marked with an ‘X’ and median values are denoted with
a solid line. Site 49 (blue, second last boxplot) is near a rail-
way track, and hence is characterized with high concentrations.
NO2 concentrations vary notably across 50 sites, with 4 data-
points exceeding 35 ppb (not shown here). . . . . . . . . . . . 216
Figure B.4 Distribution of daily average CO concentrations for all sites
for the period August 2016 – December 2017. Mean values
are marked with an ‘X’ and median values are denoted with a
solid line. Site 49 (blue, second last boxplot) is near a railway
track, and hence is characterized with high concentrations. CO
concentrations vary notably across 50 sites, with 4 datapoints
exceeding 2000 ppb (not shown here). . . . . . . . . . . . . . 216
xxvii
Figure B.5 Boxplots of error fractions for PM2.5 , NO2 and CO divided
into deciles (of observed daily average concentrations) used
to determine LoD. Black solid line represents the ideal error
fraction (closer to zero is better) and was used to determine the
decile where it stabilizes. The final LoD was determined as the
lower decile of the bin where error fraction stabilizes. . . . . . 217
Figure B.6 (From Zimmerman et al., 2020, duplicated with permission):
Wavelet decomposition of measured pollutant concentration
(CO), from short-lived events to regional background signals. 219
Figure B.7 (Figures above depicts predicted LUR (orange and green) and
LURF (blue and pink) concentrations (for standard and decon-
volved signals respectively) for the duration between October
2016 and June 2017 for site at Allegheny County Health De-
partment (ACHD). Figure (b) is a scatterplot between observed
and predicted concentrations, with dotted black line represent-
ing 1:1 line. . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Figure B.8 (Figures above depicts annual average PM2.5 using LURF stan-
dard and decomposed signal prediction models. Pittsburgh
city and Allegheny county boundaries are marked by blue and
black lines respectively. . . . . . . . . . . . . . . . . . . . . . 234
Figure C.1 Boxplot for percent of data at each site (n = 47 sites) that were
below Limit of Detection (LOD; 5 µg m−3 ) and replaced with

√
LOD/ 2 (=3.53 µg m−3 ). . . . . . . . . . . . . . . . . . . . . 237
xxviii
Figure C.2 Spatial distribution of (a) Residential and (b) Commercial ar-
eas at 100m buffers in Pittsburgh city, obtained via Allegheny
County GIS Group [12]. Grid cells with no value (colorless)
imply residential or commercial density is zero for that grid. . 240
Figure C.3 Boxplot for mean day-wise concentrations in 2017 for data
collected at EPA’s Lawrenceville site in Pittsburgh city. . . . . 241
Figure C.4 Boxplots for daily predicted PM2.5 for residential (plots with
solid colors) and commercial (plots with diagonal lines) land-
use type separately. Blue and orange boxplots refer to annual
average predicted PM2.5 concentrations when random forest
models noted 5th and 95th percentile concentrations, instead of
mean concentrations (pink boxplots). . . . . . . . . . . . . . 243
Figure C.5 Boxplots for static and dynamic models when α = 12 and β =
18 hours in Equations 4 and 5 of the main manuscript. . . . . 244
Figure D.1 Population across each Dissemination Block [214]. . . . . . . 245
Figure D.2 Boxplots for % error of each pollutant across each decile bin. . 246
Figure D.3 Diurnal pattern observed over the study period (for RAMP 1011).247
Figure D.4 Aggregated Multiplicative CHI across the study period. . . . . 249
Figure D.5 Aggregated Additive CHI across the study period. . . . . . . . 249
Figure D.6 Windrose for the direction and speed of wind during the study
period. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
xxix
List of Abbreviations
ANN Artificial Neural Networks

AOD Aerosol Optical Density
API Application Programming Interface
AQHI Air Quality Health Index
AQI Air Quality Index
AQ-SPEC Air Quality Sensor Performance Evaluation Center
BC British Columbia
CA Census Agglomerations
CAAQS Canadian Ambient Air Quality Standards
CACES Center for Air, Climate, and Energy Solutions
CHI Cumulative Hazard Index
CMA Census Metropolitan Areas
CPCB Center Pollution Control Board
CV Cross Validation
CvMAE Coefficient of Variation of Mean Absolute Error
DP Dew Point
DPCC Delhi Pollution Control Committee
DQO Data Quality Objectives
DTES Downtown Eastside
ERC Estimated Regional Concentration
GVRD Greater Vancouver Regional District
HEI Health Effects Institute
IDW Inverse Distance Weighting
IMD India Meterology Department
KSPCB Karnataka State Pollution Control Board
LCS Low-Cost Sensors
LMIC Low Middle Income Countries
xxx
LOD Limit of Detection
LOLO Leave One Location Out
LOMO Leave One Month Out
LU Land Use
LUR Land Use Regression
LURF Land Use Random Forests
MAE Mean Absolute Error
MBE Mean Bias Error
ML Machine Learning
MLR Multiple Linear Regression
MV MetroVancouver
N2 Nitrogen
NAPS National Air Pollution Surveillance Program
NCR National Capital Region
NDIR Non-dispersive InfraRed
NN Neural Networks
NO Nitrogen monoxide
NO2 Nitrogen dioxide
NOx Nitrogen oxides, including NO2 and NO
nRMSE Normalized Root Mean Square Error
NYCCAS New York City Community Air Survey
O2 Oxygen
O3 Ozone
OPC Optical Particle Counter
PA PurpleAir
PM Particulate matter
PM2.5 Particulate matter equal to or less than 2.5 um in aerodynamic diameter
RAMP Remote Air Quality Monitoring Platform
RF Random Forests
RH Relative Humidity
RMSE Root Mean Square Error
SO2 Sulphur dioxide
SOA Secondary Organic Aerosol
SRA Strathcona Residents Association
TOA Total Organic Aerosol
TRAP Traffic Related Air Pollutants
US United States
US EPA United Stated Environmental Protection Agency
VIF Variable Importance Factor
VOC Volatile Organic Compounds
WHO World Health Organization
xxxi
Acknowledgments
I would like to express my heartfelt gratitude to Naomi for her unwavering belief
in me and her continuous support throughout this incredible journey. I am grateful
that you took a chance on me and provided me with a life-changing opportunity.
Your understanding, empathy, and relatability as a supervisor made this daunting
and often lonely journey more bearable. Thank you for being an incredible guiding
force.
I would also like to thank my committee members, Amanda, Steve, and Adam.
Thank you for being so supportive of my journey, always pushing me to be better
and asking insightful questions. Thank you all for believing in me.
I am deeply grateful to Dan Jackson for his immense support with the com-
munity project, as well as to the residents of Strathcona for their unwavering en-
couragement. I am also indebted to Emily Pearson and Johnny Le for generously
offering their assistance with the projects.
My gratitude extends to every member of iREACH I had the pleasure of col-
laborating and brainstorming with. Special thanks to Rivkah, my friend and work
partner in the department, who made this Ph.D. journey easier and more fun.
I cannot overstate my immense gratitude to Shalini, for always believing in me
xxxii
and being my biggest cheerleader. Your continuous support and reminders that I
am capable have been invaluable. I would like to thank Ankur, Charu, Garima, and
Khayal for providing much-needed breaks throughout this journey. To Shivam,
thank you for patiently listening to my rants and the food deliveries. I am forever
grateful to you. To my family, who never pressed me for updates and provided
much-needed relief from the school. Thank you all for your unwavering support
and incredible companionship.
My deepest gratitude to my wonderful Ph.D. companion, Maira. Our walks
together have been the highlight of my time at UBC, and I would have been lost
without your unwavering support. Thank you for always making time to listen to
my personal and professional experiences, and for being my biggest cheerleader.
And Mary, thank you for always finding time to watch movies or play board games
with me, for patiently listening to my school, family, and life-related rants, and
for being my partner during the pandemic. The two of you have been the most
wonderful outcome of my Ph.D. journey, and I would not have made it through
without your constant support and love.
Lastly, I want to give a resounding shout-out to my wonderful partner Erick,
who has supported me unconditionally throughout my thesis writing journey. You
have listened to my struggles, fed me every day so I could find moments of relax-
ation, and made this entire experience more manageable. I am forever grateful for
your unwavering love and support. Thank you.
xxxiii
Dedication
To Misha, my niece, whose future inspires me to work harder in my efforts to create
a better world.
To Maira and Mary, my unwavering constants and forever cheerleaders, who
inspire me to be a better person.
To Erick, my partner, who inspires me to be better.
xxxiv
Chapter 1
Introduction
1.1 Background and Motivation

Recently, the World Health Organization (WHO) has issued more stringent guide-
lines for air pollutant concentrations, including reducing the PM2.5 (particulate
matter mass concentrations for particles with diameter <2.5 µm) by 50%, high-
lighting the adverse health effects even at low levels of air pollution [254]. How-
ever, despite air pollution being a leading cause of premature mortality globally, the
reporting of air pollution often relies on a limited number of monitoring stations
that are sparsely distributed. This presents a significant challenge because air pol-
lutants exhibit small-scale spatial and temporal variations [71, 244], consequently
monitoring stations generally represent only a limited portion of the population
[149]. The sparse distribution of regulatory stations is primarily due to the sig-
nificant operating and capital cost involved in establishing and maintaining these
stations [45, 140], which poses a challenge to achieving widespread monitoring
efforts. The emergence of low-cost air quality sensor (LCS) technology presents
1
an opportunity to supplement the existing monitoring data by enabling broader de-
ployment and data collection.
This thesis aims to investigate the potential of LCS through various different
stages. One of the significant challenges in adopting LCS technology on a larger
scale is its inherent lower accuracy and precision compared to regulatory stations,
due to its sensitivities to meteorological factors and other pollutants [58, 145, 187].
As such, robust and transferable calibration models are necessary to use LCS as
a reliable supplementary source of information [21, 141, 150]. However, despite
these limitations, calibrated LCS networks have potential for enhancing our un-
derstanding of air pollution. By filling the spatial gaps in air monitoring, LCS
technology can provide valuable insights into air quality patterns in time and space
[102, 166]. Additionally, when combined with different statistical and predictive
methods, they can contribute towards a better understanding of the impacts of air
pollution. This contribution encompasses various aspects, such as improved ex-
posure estimates and identification of pollution hotspots, among others. As such,
given the rapid growth in LCS technology, a meaningful adoption has the poten-
tial to greatly amplify its impact. In this thesis, the capabilities and limitations of
LCS technology are explored and examined, with the aim of contributing to the
advancement of LCS and air pollution research, and its applications in improving
air quality management by governments, academics and communities.
1.2 Research Objectives

The overall objective of this thesis is to leverage LCS for improved understanding
of air pollution and its impacts. This work is built on the central hypothesis that
existing air pollution management protocols often fail to capture the small-scale
2
variations in pollutant concentrations that the public is exposed to, and that the
adoption of LCS technology can help bridge this gap. By developing spatially-
transferable calibration models, LCS can be deployed in areas lacking in monitor-
ing infrastructure. LCS network data can be valuable for generating maps of pollu-
tion concentrations at high temporal resolution (e.g., sub-annual), which can pro-
vide insights into human exposure patterns resulting from movement. Moreover,
data from LCS networks can assist in identifying intra- and inter-neighbourhood
pollutant variabilities, facilitating the identification of pollution hotspots which
may be used to address environmental injustice concerns. Keeping these objec-
tives in mind, the following overarching research objectives have been identified
and listed below:
1. Explore and develop a geographically-transferable calibration method to im-
prove sensor performance over broad concentration ranges.
2. Develop and compare different spatiotemporal pollution models using data
collected from LCS networks.
3. Compare residents’ exposure due to mobility using spatiotemporal pollution
models built from LCS network data.
4. Assess intra- and inter-neighbourhood variabilities in a community using
data collected via LCS to identify hotspots and recognize environmental in-
justice concerns.
3
1.3 Thesis Structure and Ordering of Chapters
Although each chapter is based on LCS data collected during different campaigns,
the thesis is designed to have a coherent thematic flow, and the guiding research
questions are summarized in Figure 1.1. Chapter 2 is a review of published litera-
ture on air pollution, working principles of LCS, and opportunities and challenges
associated with the adoption of the technology, which motivates the real-world
studies in Chapters 3-6.
LOW-COST SENSORS (LCS)

How can we improve sensor performance
1. CALIBRATION
over broad geographic ranges? (Chapter 3)
Can we use LCS data, in combination with

2. SPATIOTEMPORAL machine learning, to improve the modelling
MODELS of spatiotemporal variations in pollutant
concentrations? (Chapter 4)
Can we leverage LCS data to improve

exposure estimates? (Chapter 5)
3. EXPOSURE MODELS Can we use LCS to identify hotspots within a
neighbourhood and acknowledge inequity
concerns? (Chapter 6)
Figure 1.1: Guiding Research Questions.
Due to environmental and pollutant cross-sensitivities, LCSs need to be cal-
ibrated across a broad range of meteorological conditions and pollutant concen-
trations. Chapter 3 focuses on calibration methods for LCS to enable application
in a broader geographical range. Using calibrated data from an LCS network, a
spatiotemporal model using land use predictor variables was built via simple mul-
4
tiple linear regression and tested against a more complex machine learning random
forest regression model in Chapter 4. The best performing model from Chapter 4
was used to create a movement-based dynamic exposure estimate in Chapter 5 and
assess whether ignoring mobility in exposure analysis results in systematic under-
estimations in exposure. In Chapter 6, a reduced complexity spatial interpolation
map was built to identify hotspots within a neighbourhood of concern for envi-
ronmental injustice. Chapter 7 summarizes the main contributions of this work,
revisits the research objectives, provides key design considerations for future de-
ployments and suggests future work.
5
Chapter 2
Literature Review
As part of this literature review, sources of air pollution and air pollution regu-
lations are explored. Furthermore, the limitations of the existing air pollution
monitoring infrastructure, along with opportunities and challenges arising from
the adoption of LCS-based monitoring is discussed. The application of LCS to
build air pollutant maps, and their use to assist in assessing exposure and identi-
fying hotspots is explored. Finally, the existing inequities in risks of air pollution
exposure are discussed.
2.1 Sources of Ambient Air Pollution

Air pollutants are classified into two types: (1) primary pollutants which are emit-
ted directly into the atmosphere and that can either be anthropogenic (human-
derived; e.g., traffic, power plants) or biogenic (e.g., sea spray, wildfires) [254]
and (2) secondary pollutants, which are formed due to atmospheric reactions (e.g.,
ozone) [123]. Sources of primary pollutants can be further divided into station-
ary (e.g., industrial facility) or mobile (e.g., traffic) sources. Major air pollutants
6
include oxides of nitrogen (NOx ), PM2.5 , volatile organic compounds (VOCs), car-
bon monoxide (CO) and ozone (O3 ). While other air pollutants exists, most regu-
lations are focused on these pollutants.
Oxides of nitrogen (NOx , including NO and NO2 ), are emitted mostly from
the incomplete combustion of fossil fuel (anthropogenic sources). NOx is typi-
cally formed by reactions between O2 and N2 at the high temperatures that occur
during combustion [5]. In ambient conditions, NO2 is formed by rapid oxidiza-
tion of NO in the air by available oxidants (oxygen, O2 ; ozone, O3 ; volatile or-
ganic compounds, VOCs [5]; Equation 2.1). This photolysis reaction generates an
excited oxygen atom, that combines with an O2 molecule to form O3 (Equation
2.2). O3 then reacts with NO to convert to NO2 and O2 (Equation 2.3). Under
high O3 conditions, O3 can also react with NO2 to form NO3 and O2 (Equation
2.4), which photodissociates into NO or NO2 and re-feeds the cycle [73]. Oxygen
atoms, formed due to the dissociation of NO2 , also reacts with water vapor in the
air to form hydroxyl (OH), which further feeds the NOx cycle (Figure 2.1) [73].
NO2 + hν → NO + O (λ < 424nm) (2.1)
O + O2 + M → O3 + M (M = N2 , O2 ) (2.2)
O3 + NO → NO2 + O2 ( f ast) (2.3)
NO2 + O3 → NO3 + O2 (2.4)
NO3 + hν → NO + O2 (2.5)
2NO + O2 → 2NO2 (2.6)
NO3 + hν → NO2 + O (2.7)
7
Figure 2.1: Schematic diagram of the main chemical reactions involved in
NOx cycle, including the coupling chemistry with peroxy radicals (RO2
and HO2 ), reproduced from Pan and Faloona [177], licensed under a
Creative Commons Attribution 4.0 International License.
Since the NOx cycle (Equations 2.1-2.3) is very fast (tens of seconds to a few
minutes) [124], NO2 is generally also considered as a primary pollutant [5]. Traf-
fic is the primary source of ambient NO2 globally [5], while other major sources
associated with NOx emissions include airports and power generation facilities. In
Canada, mobile emissions (primarily on-road and off-road diesel vehicles) con-
tribute to 50% of total NOx emissions; other major sources are industries (30%)
and electricity generation (12%) [92]. In the United States, traffic emissions (from
the categories highway vehicles, off-highway vehicles, and stationary fuel com-
8
bustion sources) are the top three sources of NOx in emission inventories [236].
Although O3 is a byproduct of combustion activities, it is typically suppressed near
fresh traffic emissions as it reacts with NO (a primary pollutant emitted during
combustion activities) to form NO2 (Equation 2.3) [97].
PM2.5 can originate from both anthropogenic and natural sources, and the con-
tribution of each can vary by location and weather conditions [176]. PM2.5 particles
resulting from anthropogenic activities are either emitted directly via incomplete
combustion (primary) or formed via transformation of gaseous emissions (sec-
ondary) [230]. As such, stationary and mobile sources emit primary PM2.5 and
precursors to secondary PM2.5 (e.g., SO2 ) directly into the ambient air [235].
Ambient CO is formed by both natural and anthropogenic processes, with the
latter contributing to over two-thirds of total CO emissions [229]. It is formed
primarily due to incomplete combustion of fuels, and therefore combustion con-
ditions, such as air-to-fuel ratio and fuel type, influence the rate of formation of
CO. Highly efficient combustion activities, such as at large power plants, typically
result in low CO. However, mobile sources have widely varying operating con-
ditions, vehicle maintenance conditions and emission control technologies, that
result in higher and more variable CO formation, and therefore are a significant
source of CO [229].
2.2 Air Quality Monitoring
2.2.1 Air Pollution Regulation
In 2019, air pollution was associated with an estimated 4.2 million premature
deaths globally [255]. Since air pollution has health consequences [30, 184, 223],
9
governing bodies develop and implement regulations to help reduce overall levels
of air pollution by limiting the amount of pollutants that are released into the air.
In Canada, the Canadian Ambient Air Quality Standards (CAAQS) establish
the minimum standards to which all provinces must adhere. While provinces have
the flexibility to establish their own regulatory standards, they must still, at a mini-
mum, align with the objectives in the CAAQS. Uniquely, in the province of British
Columbia, air quality management is handled by Metro Vancouver (MV), the air
monitoring agency in the Greater Vancouver Regional District (GVRD) [167], un-
der authority delegated from the Provincial government under the Environmental
Management Act. In the United States (US), the Environmental Protection Agency
(US EPA) regulates PM2.5 , NO2 , O3 and CO as criteria pollutants [230]. The
CAAQS (2020 and 2025) [83], the MV [154] and US EPA [229, 233, 236, 237]
air quality objectives and the World Health Organization (WHO) [254] guidelines
limits for these pollutants are listed in Table 2.1.
2.2.2 Existing Infrastructure and its Limitations
Compliance with air quality standards is typically assessed via regulatory moni-
toring stations that are equipped with reference grade instruments. The siting and
quality control of these stations are subject to regulations, and in Canada, guide-
lines are provided by the National Air Pollution Surveillance Program (NAPS) to
ensure the attainment of air quality objectives outlined in the CAAQS [72]. For
siting, it is required that each province or territorial air zone, as well as all cen-
sus metropolitan areas (CMA) and census agglomerations (CA) with a population
of at least 100,000, have at least one monitoring site that continuously measures
PM2.5 , NO2 , and O3 levels [72]. The quality control measures for instruments vary
10
Table 2.1: Air quality objectives from the CAAQS (Canada; for 2020 and
2025) MV (Metro Vancouver), US EPA (USA) and recommended WHO
(global) levels for the criteria pollutants.
CAAQS
Pollutant Averaging Time MV US EPA WHO
2020 2025
24-Hour 27α 27α 25γ 35 15
PM2.5 (µg m−3 ) β β δ
Annual 8.8 8.8 8 12 5
1-Hour 60ε 42ε 60ε 100 100
NO2 (ppb)
Annual 17δ 12δ 17δ 53 5
8-Hour 62 η 60 η 62η 70 100
O3 (ppb)
Peak Season* - - - - 60
1-Hour - - 13 35 30
CO (ppm)
8-Hour - - 5γ 9 9
*Average of daily maximum 8-hour mean O3 concentration in the six consec-
utive months with the highest six-month running-average O3 concentration.
α: 3-year average of annual 98th percentile of daily 24-hour average concen-
trations.
β : 3-year average of annual average of all 1-hour concentrations.
γ: Rolling average.
δ : Annual average of 1-hour concentrations, over one year.
ε: Annual 98th percentile of the daily maximum 1-hour concentration, aver-
aged over three consecutive years.
η: Annual 4th highest daily maximum 8-hour average concentration, averaged
over three consecutive years.
depending on the pollutant being monitored. Data Quality Objectives (DQO) in-
clude continuous 1-hour averaging samples for gas instruments, and continuous
or semi-continuous 24-hour averaging samples for PM instruments [72]. For both
gases and PM, 75% data completeness and accuracy within 15% is required. To
maintain the reliability, gas and PM instruments are required to undergo weekly
and quarterly quality checks respectively [72].
While they provide detailed and accurate air quality measurements, regula-
tory monitoring stations are often insufficiently distributed across a region. For
11
example, Allegheny County (Pennsylvania, USA; 2020 Population 1.25M [226])
has only eight regulatory monitoring stations for PM2.5 to measure concentra-
tions throughout the county, while the Metro Vancouver Regional District (British
Columbia, Canada; 2021 Population 2.8M [155]) has only 15 PM2.5 monitoring
stations. As such, the measurements reported represent concentrations for a lim-
ited area surrounding the station, or regional background concentrations [131]. The
primary reason for the lack of more spatially-resolved pollution concentration pro-
files is that although instruments used at regulatory monitoring stations have high
accuracy, they are also characterized by high capital and maintenance costs. These
instruments can cost anywhere from $5,000 to $40,000 per pollutant [45, 140, 206].
Additionally, they are bulky and require specialized personnel for routine calibra-
tion and maintenance [45, 206]. Consequently, many places have limited or no
monitoring in place [157].
2.2.3 Overview: Low-Cost Sensors
Low-cost sensing technologies have emerged as an complementary solution to reg-
ulatory stations due to advancements in sensor technologies and statistical methods
for sensor calibration (Section 2.2.5), which can overcome the shortcomings of reg-
ulatory stations [46, 58, 182, 210, 271]. Apart from being cheaper (< CAD300
for a single pollutant sensor [21]), there are several advantages of using LCS.
They can be easily deployed in areas with limited monitoring due to their small
size, and low power demand, and therefore are optimal for use in denser networks
[21, 45, 271, 273]. Additionally, they have high time resolution (hourly or better)
[18, 42, 141], which enables the quantification of both spatial and temporal varia-
tions in pollutant concentrations. Some LCS are also multi-pollutant sensors, that
12
can improve our understanding of the relative mix of pollutants [271]. Finally, data
collection is generally more accessible due to the ability of many LCS systems to
transfer information via cellular networks [141, 271]. As such, there are opportu-
nities with LCS to increase our understanding of air pollution. Sections 2.2.4 and
2.2.5 describe common LCS systems in more detail, including operating principles,
calibration approaches and quality assurance / quality control guidelines.
2.2.4 Operating Principles of Low-Cost Sensors
Low-cost PM2.5 sensors typically use nephelometers or optical particle counters
(OPCs) that measure concentrations by scattering light [192]. They are an indirect
measurement device for particles using laser counting to estimate particulate matter
mass concentrations in real time. Each laser counter uses a fan to draw a sample
of air past a laser beam. The beam from these lasers reflects light from any present
particles onto a detection plate, like shimmering of dust in sunlight. The reflection
is measured as a pulse by the detection plate, and the length of the pulse determines
the size of the particle, while the number of pulses determines the particle count.
Figure 2.2 illustrates the working principle of a low cost PM2.5 sensor, reproduced
from Nguyen et al. [169].
LCS for measuring NOx , O3 , and CO are typically electrochemical sensors that
operate on the principle of the reduction-oxidation (redox) reaction. Electrochem-
ical low-cost sensors consist of a working (measuring) electrode, a counter elec-
trode, and a reference electrode that are separated by a wetting filter (hydrophilic
separator, typically sulphuric acid), and are submerged in an electrolytic liquid dur-
ing sensing operations. The reference electrode is kept at a constant potential to
account for changes in temperature and relative humidity (RH), and therefore is not
13
Figure 2.2: Working principle of the low cost PM2.5 sensor, reproduced from
Nguyen et al. [169], licensed under a Creative Commons Attribution 4.0
International License. The air enters through the inlet, and the particles
in the air scatter the laser beam’s light. The scattered light is then cap-
tured and transformed into a digital signal.
exposed to direct air. When the target gas reaches the working electrode, an elec-
trochemical reaction occurs, initiating the flow of electrons (or current) between
the working and counter electrodes and an electrical current proportional to the
concentration of the target gas is generated [150, 215, 271]. Figure 2.3 illustrates
the schematic diagram of an electrochemical sensor, reproduced from Dräger [68].
14
Figure 2.3: Schematic diagram of an electrochemical sensor, reproduced
from Dräger [68].
Commercial LCS monitors often package individual sensors in a weatherproof
box, and typically feature local data storage (via SD cards) and/or remote transmis-
sion via cellular networks, enabling data access when deployed [45, 271]. Multi-
pollutant monitors combine several types of LCS to measure multiple pollutants
(generally PM2.5 and gaseous pollutants) in real-time, providing a useful tool for
understanding the relative mix of pollutants. These sensors are often equipped
with temperature and RH sensors, and are battery-operated, solar powered, or both
[45, 141, 271]. However, the cost of these sensors can range from CAD700-
CAD14,000, which is much higher than the cost of individual pollutant sensors
(CAD25-125) [45, 58, 119, 199]. Some common examples of multi-pollutant
monitors are the AQMesh monitor (Environmental Instruments Ltd, UK) [45], the
AriSense monitor (Aerodyne Research, Inc.) [58], and the Remote Air Quality
15
Monitoring Platform monitor (RAMP, SENSIT Technologies) [141, 271]. RAMPs
are typically equipped with four electrochemical sensors for NO, NO2 , O3 and
CO, a commercial nephelometer to measure PM2.5 , and one NDIR CO2 sensor
[141, 271]. RAMPs also have high time resolution (15-seconds). In this thesis,
RAMP monitors were used in Chapters 4, 5 and 6, and therefore specifications for
the RAMP monitor are provided in Table 2.2.
Table 2.2: RAMP monitor Specifications
Pollutant Sensor Manufacturer Detection Accuracy

Range
NO (ppb) NO-B4 (Al- 20 ppb – 25 ±20
phasense) ppm ppb
NO2 (ppb) NO2-B43F (Al- 20 ppb – 25 ±20
phasense) ppm ppb
Electrochemical
O3 (ppb) Ox-B431 (Al- 20 ppb – 25 ±40
phasense) ppm ppb
CO (ppb) CO-B4 (Al- 100 ppb – 25 ±100
phasense) ppm ppb
PM2.5 Nephelometer PurpleAir-II 1 - 1000 ±10
(µg m−3 ) (Plantower) µg m−3 µg m−3
CO2 (ppm) Nondispersive 100 - 2000 ±200
infrared (NDIR) ppm ppm
Temperature Bandgap SST CO2S-A (SST
(°C) Technologies)
Relative Capacitive
Humidity
(%)
2.2.5 Challenges with Low-cost Sensors and Calibration Techniques
LCS are associated with several drawbacks that need to be taken into account.
Firstly, they can be sensitive to environmental conditions [10, 22, 58, 142, 146,
150, 178, 187, 250]. Masson et al. [146] found that high relative humidity (>
16
75%) can lead to significant errors in NO sensors, while Cross et al. [58] reported
that temperature can interfere with NO, NO2 , and O3 sensors, and mask real vari-
ations in pollutant concentrations. Similarly, deSouza et al. [64] found RH to have
an impact on PM2.5 sensor performance, and observed the highest degradation in
sensor performance took place in hot and humid environments, while Bai et al.
[18] reported larger mean normalized error in PM2.5 readings at high relative hu-
midity (> 75%). Secondly, LCS can have cross-sensitivities with other pollutants
[22, 58, 129, 146, 150, 164, 212] For instance, Mead et al. [150] reported an im-
provement in NO2 sensor performance once corrected for O3 interference, while
Zimmerman et al. [271] found that the NO2 sensor reading was the most important
variable in a calibration model of O3 , after O3 itself. This highlights the advantage
of using multi-pollutant monitors that measure many gases simultaneously, as this
may help correct for these cross-sensitivities. Finally, sensor readings can drift over
time as the sensing unit wears out [112, 142, 145, 150, 209]. According to deSouza
et al. [64], for PM2.5 sensors, the difference between corrected measurements and
the corresponding reference measurements increased after operating for 3.5 years.
Afshar-Mohajer et al. [7] reported approximately 2% drift per month for NOx and
O3 sensors and Wei et al. [247] found average drift of 2 ppb per month for NOx
and O3 and 0.02 ppm per month for CO sensors. Additionally, the electrochemical
sensor manufacturer Alphasense suggests replacing their sensors after two years of
operation, as at that point, sensors can have a 50% difference in measured readings
compared to the first day of operation [7].
To overcome the limitations of environmental artifacts and cross-sensitivity to
other pollutants associated with LCS and to improve the accuracy and precision,
sensors need to be calibrated across the full range of meteorological conditions and
17
pollutant concentrations that a sensor may experience during deployment [58, 146,
150, 160, 178]. Calibrating LCS monitors typically involves collocating them with
regulatory stations and using statistical calibration models [145, 150].
One simple form of calibration is multiple linear regression (MLR) models
(Equation 2.8), that describes the output (Y ) as a linear combination of input vari-
ables (X1 , X2 ...Xn ) and a constant term (β ). MLR models have been widely used
since they are easy to construct and apply and don’t require high computational
efficiency. Additionally, they have a closed form with interpretable model coeffi-
cients, which help to understand the influence of any given environmental or pollu-
tant variable. Calibration models for PM2.5 have largely been MLR models, as they
have shown to have acceptable model performances [42, 190]. For gas-phase LCS
monitors, which have more complex cross-sensitivities, machine-learning (ML)
based regression models have typically been used [141, 212]. Commonly applied
ML models are random forests (RF), that combine multiple decision trees to im-
prove predictive accuracy [33] (detailed description can be found in Section B.7 in
Appendix B), and artificial neural networks (ANNs), that leverage interconnected
layers of nodes to model complex patterns and relationships within data [76].
Y = α1 X1 + α2 X2 + ...αn Xn + β (2.8)
Different studies have used different metrics to assess the performance of cali-
bration models after applying the models on testing datasets (i.e., data that is with-
held while building the calibration models). For LCS, the US EPA suggests at least
75% data completeness and precision and bias error of at most 30% [251]. The
typical parameters used for performance analysis of calibration models are (1) the
18
coefficient of determination (R2 ), (2) root mean square error (RMSE), (3) mean
bias error (MBE) and (4) mean absolute error (MAE).
The coefficient of determination (R2 ) is a statistical measure of regression mod-
els that calculates the amount of variance in dependent variable that can be ex-
plained by independent variables. A higher R2 value indicates better correlation
between the calibrated and observed values, with a maximum value of 1, where
observed values refer to the reported concentrations whereas predicted values refer
to the concentrations calculated after applying calibration or spatial models.
Root Mean Square Error (RMSE) is defined as the standard deviation of the
prediction errors (Equation 2.9). For locations with high average concentrations,
normalized RMSE (nRMSE) (Equation 2.10) is instead used to estimate error in the
model as high concentrations may skew the RMSE [270]; nRMSE is calculatecd
by normalizing the RMSE by the average observed value. For both RMSE and
nRMSE, a lower value is better.
s
1 n
RMSE = ∑ (Predicted valuei − Observed valuei )2
n i=1
(2.9)
RMSE
nRMSE = (2.10)
Average observed value
Mean Bias Error (MBE) is the mean of the difference between the predicted
values and the observed values, with lower value indicating better accuracy (Equa-
tion 2.11).
1 n
MBE = ∑ (Predicted valuei − Observed valuei )
n i=1
(2.11)
19
Mean Absolute Error (MAE) is the average of the absolute value of the dif-
ference between the predicted value and the observed value (Equation 2.12). For
high average concentrations, the coefficient of variation of mean absolute error
(CvMAE) (Equation 2.13) can also be used by normalizing the MAE by the aver-
age observed concentration [141]. For both MAE and CvMAE, a lower value is
preferred.
1 n
MAE = ∑ |Predicted valuei − Observed valuei |
n i=1
(2.12)
MAE
CvMAE = (2.13)
Average observed value
After applying either MLR or ML based regression models, PM2.5 LCS per-
formance has generally been shown to improve. For instance, Barkjohn et al. [21]
created a calibration model for data collected in the USA using MLR, with only
RH as the additive term, and reported RMSE reduction from 8 µg m−3 to 3 µg m−3 .
Similarly, Zusman et al. [273] created calibration models for multiple cities in the
USA and found MLR to perform well, with R2 between 0.74-0.95 and RMSE of
2.46 µg m−3 . Using machine learning approaches, Bai et al. [18] and Si et al. [206]
created ANN models and found ANNs to have improved performance compared to
MLR models. Bai et al. [18] reported an increase in R2 from 0.75 to 0.84, whereas
Si et al. [206] reported reduced RMSE by as much as 1 to 3.9 µg m−3 . Malyan
et al. [143] compared regression and RF models for PM2.5 and found RF to have
superior performance, with an increase in R2 from 0.49 to 0.75 and a reduction in
RMSE from 25.3 µg m−3 to 20.7 µg m−3 . However, McFarlane et al. [148] and Jha
et al. [111] created both MLR and ML models (random forests and deep neural net-
20
works, respectively) for PM2.5 and reported the performance of the MLR models
comparable to the machine learning models.
Linear regression models have been found to perform well for CO [141, 271].
However, these models may not work well for other gases [212, 271]. Spinelle
et al. [212] reported that MLR models for NO2 , CO, and CO2 had higher uncer-
tainties compared to ANNs. Zimmerman et al. [271] reported that random forest
(RF) models reduced relative error by between 14-29% for NO2 , O3 and CO2 when
compared to MLR models. Malings et al. [141] tested six calibration models (var-
ious linear regression and machine learning techniques) and reported that hybrid
linear regression-random forest models for NO2 and O3 gave the best and most
consistent performance.
The majority of the aforementioned studies investigating LCS have focused on
developing calibration models for individual monitors based on local collocation
with a regulatory grade monitor. Nevertheless, as the deployment of LCSs can
extend to more extensive networks (e.g., with over 100 sensor nodes), there is a
growing interest in exploring generalized calibration models. These models, offer-
ing a common calibration model for all LCSs, offer enhanced efficiency and scala-
bility in the calibration of numerous sensors across a network. Barkjohn et al. [21]
developed a generalized model for PurpleAir sensors (commercial nephelometers)
using 50 sensors in the USA. Malings et al. [141] developed a generalized calibra-
tion model for gas sensors using a network of 50 RAMPs in Pittsburgh. Malings
et al. [141] reported 15% higher CvMAE for NO2 using a generalized model, com-
pared to individual models (unique calibration model for each LCS), and suggested
that this decline in performance may be acceptable depending on the intended use
case, such as for calibrating LCS that have not been collocated before deployment.
21
Furthermore, Malings et al. [141] highlighted that generalized models had better
transferability to other locations, as they are less influenced by changes in ambient
conditions or the mixture of gases.
2.3 Extending Sensor Data: Spatiotemporal Models

While LCS systems can support more dense air quality monitoring networks, it is
not feasible to collect air quality data from all locations and at all times within a
given geographic area. As such, spatiotemporal models are often developed using
data from networks of spatially distributed sensors to estimate pollutant concentra-
tions in unsampled locations or times. These models can range from monitor-based
(e.g. satellite data), statistical (e.g. land use), to process-based (e.g. dispersion,
chemical transport models) [234], and the choice of model depends on factors such
as accuracy and spatiotemporal resolution [77]. This section discusses statistical
methods as they are the approaches most commonly applied to LCS data [94, 166].
The simplest form of spatial model is interpolation, which uses arithmetic pro-
cessing to estimate concentrations in nearby locations [29]. Various interpolation
techniques have been employed; common techniques include averaging of all the
sensors in a buffer [197] and inverse distance weighting (IDW: inverse distance
weighted average of all monitors within an area) [162, 198]. Kriging is a special
form of spatial interpolation that accounts for auto-correlation in the data (unlike
other forms of interpolation). Ordinary kriging (Equation 2.14) uses arithmetic
processing to estimate concentrations in unmeasured locations (zy ), by assigning
a kriging weight (λi ) to concentrations at measured locations (zi ). Kriging has
been extensively used in air quality research as it is easy to implement and doesn’t
require any additional data [40, 108, 170, 260]. However, since kriging does not in-
22
corporate any temporal or spatial variables that might influence the concentrations,
it is usually not as accurate as more complex methods.
N
zy = ∑ λi zi (2.14)
i=1
Land Use (LU) models are a data-intensive models, that use land use variables
to predict concentrations in unsampled areas, and have been extensively used in
air quality research [25, 71, 94]. The predictors for LU models can vary, but of-
ten include surrounding spatial parameters (such as traffic counts, land use) and
meteorological data (temperature, relative humidity) within certain circular buffers
[88, 130, 196, 213, 248]. One of the most common types of LU models is the land
use regression (LUR) model, which uses spatial covariates as independent predic-
tor variables in a multiple regression to predict pollutant concentrations at different
locations [14, 25, 59, 91, 94, 110, 139, 191, 195, 261]. A typical LUR model is
shown in equation 2.15, in which concentration c at location i and time j is calcu-
lated as an affine function of spatial variables (A and B), temporal variables (Y and
Z) and constant (α0 ).
ci, j = α0 + (β1 Ai + β2 Bi ) + (γ1Y j + γ2 Z j ) (2.15)
However, there are several disadvantages of LUR models. Many of the fac-
tors and processes that determine pollutant concentrations are non-linear, however
LUR generally assumes a linear correlation between pollutant concentrations and
different predictors [69, 118]. Moreover, correlations between the predictor vari-
ables can also be difficult to resolve and may lead to overfitting [71]. Another
disadvantage of LURs is their poor spatial transferability, meaning that they may
23
not perform well when applied to areas with different characteristics from the area
in which they were developed [181, 186].
To overcome the limitations of LURs, researchers have explored using ML
techniques as a land use model. Two of the most common ML techniques that
have been implemented in spatiotemporal predictions of air pollutants are Neural
Networks (NNs) [8, 180, 240, 268] and random forests (RFs)[34, 268]. Multiple
studies have reported improved prediction accuracy by using machine learning land
use models over LUR models [11, 63, 138, 158]. Specifically, Araki et al. [14] and
Rao et al. [191] found that using RFs improved the R2 values for NO2 predictions
by more than 0.05 compared to regression. For PM2.5 spatial models, Yao et al.
[262] observed an increase in performance from R2 of 0.34 for LUR models to
0.66 for NN models.
While ML techniques offer advantages over LURs, such as the ability to cap-
ture non-linear relationships between pollutant concentrations and predictor vari-
ables, they are not without limitations. One such limitation is that ML models may
still suffer from the issue of poor transferability [15]. They are also unable to ex-
trapolate outside the training range [95]. Furthermore, they are unable to determine
the magnitude and direction of predictor variables [34]. However, variable impor-
tance factor (VIF) can be used to identify predictive power of each variable in a
regression model. For MLR, one way to assess VIF is by comparing the absolute
value of the t-statistic for each variable, where t-statistic is calculated by divid-
ing the coefficient for the regression model (β ) by the standard error (SE) of the
coefficient estimates (Equation 2.16). For RF models, VIF can be assessed by cal-
culating the increase in mean square error (MSE) if the variable is permuted (i.e.,
the values of the selected variable are randomly shuffled, effectively eliminating
24
this variable from the model or breaking the relationship between this variable and
the target output; Equation 2.17).
β
t − statistic = (2.16)
SE
1 N
MSE = ∑ (Observed valuei − Predicted valuei )2
N i=1
(2.17)
Studies have explored hybrid models to improve the predictive performance
of air pollutant models. Brokamp et al. [35] used satellite data as a predictor in
a RF model and found missingness of aerosol optical density (AOD) data was an
effective predictor of PM2.5 concentrations. Wu et al. [256] and Chen et al. [51]
showed improved performance when using hybrid kriging-LUR models, which in-
corporated kriging predictions as a variable in a stepwise LUR model. Wu et al.
[256] reported an increase in model R2 from 0.66 for the traditional LUR model
to 0.85 for the kriging-LUR model, similarly Chen et al. [51] reported increase in
model R2 from 0.75 for the traditional LUR to 0.87 for the kriging-LUR model.
2.4 Extending Spatiotemporal Models: Exposure

Assessments
As introduced in Section 2.2.2, regulatory monitors represent concentrations for
a limited area surrounding the station. However, numerous studies have reported
small-scale spatial variations in pollutant concentrations [19, 71, 130, 244]. For
example, Henderson et al. [94] reported that traffic variables (e.g., automobile den-
sity, distance to highway) within a 100 m buffer to be an important predictor in an
NO spatial model. Baldwin et al. [19] reported NO2 concentrations 50 m down-
25
wind from the study road were double the NO2 concentrations 15-20 m upwind
from the study road. These variations can lead to differences in human exposure to
pollutants and potentially the resultant health impacts [66, 109].
Exposure studies often rely on annual average concentrations at residential
addresses, without accounting for the daily movements of individuals for work,
recreation, and other activities [66, 96]. However, people’s mobility exposes them
to varying pollution concentrations as they move through different spaces, which
may lead to different exposure profiles and potential health impacts. By focusing
on residence-only exposures and ignoring mobility or time spent in other locations,
exposure estimates may be negatively biased and may result in the underestimation
of the relative risks of a pollutant [55, 57, 263, 264].
To better understand this, personal monitors have been used to estimate an in-
dividual’s exposure to pollutants [137]. Although they are likely to be the most ac-
curate tools for assessing exposure, personal monitor studies suffer from logistical
(e.g., recruiting volunteers) and cost constraints (e.g., cost of personal monitors),
which limits the number of samples. Subsequently, due to this small sample size,
study findings may be strongly influenced by the characteristics of the participants
[16]. As an alternative, some studies have developed movement-based exposure
(dynamic) models by merging mobility of individuals and spatiotemporal models
(as discussed in Section 2.3). Nyhan et al. [171] used mobile network data to track
mobility and estimated an average increase in exposure to PM2.5 of 0.02 µg m−3
when compared to residence-only (static) models, while Lu [136] used agent-based
models and found that ignoring mobility resulted in an underestimation of exposure
by an average of 13%. Setton et al. [201] used time-activity patterns to develop a
mobility-based model and reported that ignoring daily mobility patterns resulted in
26
relative risks being underestimated by 16%.
2.5 Extending Spatiotemporal Models: Hotspot

Identification
Section 2.4 explores the small-scale spatial variations that exist in pollutant con-
centrations, which can result in hotspots where concentrations are higher than the
surrounding areas. Spatial models, discussed in Section 2.3, can improve the iden-
tification of these potential hotspots [204, 205]. Additionally, the accuracy or spa-
tial resolution of hotspot identification may be enhanced by incorporating LCS
data as a supplemental source of information. Huang et al. [101] created two sep-
arate machine learning models for PM2.5 , one using only US EPA data (regula-
tory) and another using New York City Community Air Survey (NYCCAS) data
(non-regulatory) from 150 locations within New York City. The study reported
that the model created using non-regulatory monitors could identify more pollu-
tion hotspots compared to the regulatory-monitoring-data-only model. However,
since different pollutants exhibit different behaviors, single pollutant hotspots may
not necessarily coincide, and different pollutants may highlight different areas of
concern [80, 217]. With this in mind, identifying hotspots for cumulative expo-
sure to multiple pollutants has also been shown to be important, because while
individual pollutants have health impacts, three pollutants at slightly elevated con-
centrations may be worse than one highly elevation pollutant. The methods used
to estimate cumulative exposure have varied, with a key consideration being the
interactions between pollutants [217]. The two most commonly used aggregat-
ing methods for estimating cumulative exposure are the additive method and the
multiplicative method. In the simple additive method, the indicators for different
27
pollutants are added together, assuming their independence [117, 165]. The mul-
tiplicative method assumes interactions between different indicators and involves
multiplying them together [217, 269]. Another approach, introduced by Giang and
Castellani [80], is the binary aggregation method, where pollutants are assigned a
score of either 0 or 1 based on whether they meet the air quality objectives.
2.6 Environmental Justice and Air Pollution

The US EPA defines environmental justice as “the fair treatment and meaning-
ful involvement of all people regardless of race, color, national origin, or income
with respect to the development, implementation and enforcement of environmen-
tal laws, regulations and policies” [231]. The concern for environmental justice be-
gan in 1982, when civil rights activists organized to stop the state of North Carolina
from dumping contaminated soil in a county with predominantly African Ameri-
can individuals [173]. Since then, a large body of literature in the US has high-
lighted the unequal distribution of air pollution hazards within populations, with
historically or ongoing marginalized communities (including low-income, people
of color and Indigenous communities) often disproportionately experiencing many
of the health risks of exposure to ambient air pollution [20, 115, 157, 159]. For
example, a study conducted in the contiguous United States found that low-income
non-white young children and elderly individuals were disproportionately exposed
to residential outdoor NO2 [54]. In addition to environmental exposure, marginal-
ized groups often also experience social and political marginalization, which can
be due to inequitable access to healthcare and policy decisions, further increas-
ing their vulnerability to the health impacts of air pollution [175]. Furthermore,
marginalized communities are frequently underrepresented in air quality reporting;
28
the strategic placement of monitoring stations by the US EPA has often overlooked
hotspot areas [149].
In Canada, studies have documented and researched the uneven distributions
of environmental hazards since the 1990s. For example, a study conducted in
Canada’s three largest cities (Toronto, Montreal, Vancouver) found that areas with
higher proportions of non-English or non-French speaking tenants and residents
were associated with greater exposure to ambient NO2 [183]. However, the num-
ber of studies have been limited, and Canada has been slow to adopt environmental
justice as a framework for environmental research [49]. Haluza-Delay et al. [90]
has also emphasized that the environmental justice research from the US may not
be directly implementable in Canada due to many reasons, including geography,
history, and demographics.
2.7 Conclusion
Recent advances in low-cost sensing technology for air pollution have provided
a unique opportunity to improve our understanding of air pollution and pollution
exposure estimates. This literature review has highlighted the potential of LCS to
address real-world questions, such as improving our understanding of the spatial
and temporal variability of a pollutant. However, it has also revealed key gaps in
current research on LCS, including lack of geographically-transferable calibration
models. The structure of the thesis is guided by these gaps, progressing from im-
proving the performance of LCS and culminating in the practical implementation
of LCS within real-world scenarios, such as generating exposure estimates.
Despite Low-Middle Income Countries (LMICs) grappling with poor air qual-
ity and a greater disease burden, the majority of monitoring stations and the widespread
29
deployment of large scale LCS are predominantly found in developed nations [189].
As such, LCS are not being used in areas where monitoring doesn’t currently exist.
One of the key challenges associated with the widespread deployment of LCS in
LMICs is the limited availability of monitoring stations suitable for sensor collo-
cation, or constrained access to such stations. This existing gap in calibrating LCS
for areas characterized by elevated air pollution levels served as the guiding princi-
ple behind the first objective of this work: to explore a geographically-transferable
calibration model to enable the calibration and deployment of LCS in different
locations. In Chapter 3 of this thesis, I explore a calibration method that is built
(trained) and tested in different locations on a global scale (i.e., built in one country,
tested in another).
Since there are small-scale spatial variations in pollutant concentrations and
LCS can be used to construct denser networks due to their low cost and low main-
tenance, potentially improving the ability to measure spatial gradients in air pol-
lutants. Spatial maps have traditionally been created using LUR models [25, 59,
110], although recent research has begun to explore the use of machine learning
methods for creating prediction models [11, 34, 63]. However, there is still a lack
of research on using LCS to create machine learning land use models, and these
models are often not easily transferable to other cities. This gap motivated the
second objective of this work: to develop and compare different spatiotemporal
pollution models built from LCS network data. In Chapter 4 of this thesis, I com-
pare traditional LUR and random forest models using data collected from an LCS
network and create spatiotemporal prediction models. I also investigate potential
solutions to the issue of model transferability by building models after removing
regional background signals and decomposing the remaining signal into different
30
time frequencies (e.g., local influence) to increase the influence of spatial predic-
tors in the models.
The central objective of achieving a more accurate spatiotemporal models stems
from the exposure implications of these variations. This inspired the extension of
spatiotemopral models built for LCS to exposure models. (1) Previous exposure
estimates for PM2.5 have relied on a limited number of centrally located moni-
tors [66]. However, people move between different areas throughout the day and
this mobility can have an impact on their overall exposures. This leaves a gap in
our understanding of exposure to PM2.5 due to the movement of people between
different areas, such as work and home. This gap motivated the third objective
of this work: to compare residents’ exposure due to mobility using spatiotempo-
ral pollution models built from LCS network data. In Chapter 5, using the spa-
tial maps created in Chapter 4 (with a grid size of 50 m), I built static models
(where the population spends 24 hours a day in a fixed residential area) and dy-
namic models (where the population moves between residential and commercial
areas) to estimate variations in residents’ exposure due to movement.(2) People
are exposed to a mixture of air pollutants, but spatiotemporal predictions within a
neighbourhood have traditionally been built for individual pollutants. Cumulative
assessments that take into account the combined exposure to multiple pollutants
may be more representative of human exposure and may potentially assist com-
munity members in making informed decisions. This leaves a gap in literature
that assesses the cumulative impacts of air pollutants using spatiotemporal mod-
els built for LCS networks. This gap motivated the last objective of this work: to
assess intra- and inter-neighbourhood variabilities in a community using data col-
lected via LCS to identify pollution hotspots and recognize environmental injustice
31
concerns. In Chapter 6 of this thesis, I used the method of estimating cumulative
hazard indices to identify hotspots and areas of concern within an environmental
justice community.
32
Chapter 3
Exploration of intra-city and

inter-city PM2.5 regional
calibration models to improve
low-cost sensor performance
3.1 Executive Summary

This chapter contains a paper prepared for submission. Supporting Information
from this paper is presented in Appendix A. This work uses data from 15 PurpleAir
sensors across globe to create geographically transferable calibration models.
S. Jain and N. Zimmerman (2023). Exploration of intra-city and inter-city
PM2.5 regional calibration models to improve low-cost sensor performance. Pre-
pared for submission.
33
3.2 Author Contributions
SJ conceptualized the study, conducted the literature review and analysis for this
work, developed the figures and wrote the manuscript. NZ provided critical feed-
back and editing during all stages of the project and the manuscript.
3.3 Summary
Low-cost PM2.5 sensors often suffer from environmental cross-sensitivities, requir-
ing regular calibration across a wide range of concentrations. This is typically
achieved by collocating LCS with regulatory stations and using statistical models.
However, this approach becomes challenging in regions with limited regulatory
monitoring stations or access. To address this challenge, we explored building
separate calibration models for the regional component of the total PM2.5 concen-
tration, which represents background concentration, and the local component of the
total concentration. This is based on the premise that the regional concentration is
consistent across a given region and therefore direct collocation is less necessary,
and the idea that the local concentration is not influenced by geographic properties
and therefore can be calibrated based on collocation elsewhere. In this work, we
used publicly-available PurpleAir data for 2022 from five different cities in South
Asia and North America, and built city-specific calibration models for the regional
concentrations using multiple linear regression. We tested the model performance
in the city the model was built in (intra-city models; trained and cross-validated in
the same city) and in other cities (inter-city models; trained and cross-validated in
different cities). By calibrating the regional concentration separately, we were able
to reduce the normalized root mean square error (nRMSE) of both intra-city mod-
34
els, from 51% to 26%, and inter-city models, from 55% to 25%. Overall, the results
of this work demonstrated the potential for improved transferability of calibration
models and provides evidence that calibration models built for regional concentra-
tion and local concentration separately may be a viable solution when deploying in
places with limited regulatory monitoring or access to monitoring stations.
3.4 Introduction
Low- and Middle Income-Countries (LMICs), which represent 82% of the total
global population, disproportionately experience the burden of air pollution risks,
with 87% of total air pollution-related deaths occurring in LMICs in 2012 [253].
Despite the higher disease burden associated with poor air quality in LMICs, most
monitoring stations are located in developed nations [253]. One possible reason for
this disparity is the lack of financial resources to establish a widespread monitoring
network, as reference-grade monitoring instruments can cost anywhere between
$5000 to $40,000 per pollutant [45, 140, 206]. Additionally, these instruments
require operational personnel for routine calibration and maintenance, further lim-
iting their accessibility in LMICs.
Recent advancements in lower-cost sensor (LCS) technologies provide an af-
fordable opportunity to fill the gaps of regulatory monitoring and improve spatial
coverage [26, 156, 182, 210, 271]. LCS sensors cost a fraction of the price of
traditional monitoring stations, typically costing < CAD300 for a single pollutant
sensor [21], allowing for a denser network of sensors to be deployed. Additionally,
their small size and low power demands can enable deployment in areas with lim-
ited monitoring. LCS also offer high time resolution (hourly or better) [18, 42, 141]
and data collection is generally more accessible due to their ability to transfer data
35
via cellular networks [141, 271].
LCS technology is often promoted as a means to improve air quality in areas
significantly affected by both poor air quality and limited monitoring (e.g., LMICs)
[253]. However, visual inspection of LCS deployment maps for some of the most
popular systems, such as PurpleAir, reveals that the vast majority of deployments
have been in developed nations where monitoring already exists [189]. In this
context, LCS networks are mainly used as a supplemental source of information,
rather than being deployed where monitoring does not currently exist.
A major challenge in deploying LCS in LMICs is the development of effec-
tive calibration models. Sensor readings can drift over time as the sensing unit
wears out [64, 112, 142, 224]. They are also sensitive to environmental conditions
[18, 64, 142] and can exhibit cross-sensitivities with other pollutants [58, 146, 150].
Therefore, statistical calibration models are typically required across the full range
of meteorological conditions and pollutant concentrations to achieve good perfor-
mance [106, 142, 266].
To build correction models for low-cost sensors (LCS) and assess their per-
formance, a common approach is to collocate them with a regulatory station and
account for meteorological factors that can affect sensor readings using a cali-
bration model [112, 114, 273]. Regression models with environmental variables
(such as relative humidity) as an additive term have been widely used for statistical
calibration of PM2.5 sensors [21, 42, 190], due to their ease of development and
comparable or better performance compared to more complex models [111, 148].
Multiple linear regression models have also been found to exhibit improved perfor-
mance [266] and reduce uncertainties [190] in studies conducted in LMICs, where
average PM2.5 concentrations are higher than in developed nations.
36
Since LCS have the potential of denser and wider networks, a single gener-
alized calibration model has previously been tested as a solution to the logistical
challenges of individually calibrating multiple sensors [21]. Some studies have
also developed and tested location-specific calibration models (trained and tested
for all LCS in a city; intra-city models), however these studies have found that
direct collocation calibration models tend to perform better [143], implying that
hyper-local meteorological and pollutant profiles can impact sensor performance.
Sensors are typically calibrated in the global north due to the ease of access
to monitoring stations for collocation. This presents a challenge when deploying
sensors to regions where limited regulatory monitoring stations exist, especially
in areas with higher pollutant concentrations outside of the concentration ranges
typically observed in the global north. Moreover, calibration models developed
in one location may not be readily transferable, as even intra-city models exhibit
reduced performance. Studies have shown that inter-city models (trained and tested
in different cities) have higher errors than direct collocation or intra-city (trained
and tested in the same city) models [42, 273].
To fill this research gap, we propose a calibration approach for low-cost PM
sensors using the popular Plantower sensor found in PurpleAir (PA) sensors. As
part of this method, the LCS PM concentration is separated into ‘estimated re-
gional’ and ‘local’ components by isolating the concentration baselines and cal-
ibrating them separately. The regional concentration represents the background
concentrations across a region, while the local concentration consists of brief spikes
in concentrations that are likely caused by local sources such as domestic fuel burn-
ing. We hypothesize that the regional concentration should be consistent across a
broader (city-wide) geographical region, reducing the need for direct collocation.
37
Furthermore, we hypothesize that the local concentration could be calibrated via
collocation anywhere since local spikes should be independent of regional or geo-
graphical influences. As part of this investigation, data from the PurpleAir network
in Southeast Asia (India, Nepal, and Pakistan) and wildfire concentrations in Cali-
fornia (to focus on similarly high concentrations of PM2.5 ) were used.
3.5 Methods
3.5.1 Low-cost Sensor: PurpleAir
The low-cost sensors used in this study were PurpleAir sensors, which measure am-
bient PM2.5 concentrations using the Plantower PMS5003 sensor and cost around
CAD300-400 [21, 42]. PurpleAir has two Plantower sensors, labeled as channels
A and B, and provides two-minute average data. The sensor operates by scattering
light [192] at a 90° angle using a laser with a wavelength range of 680±10 nm
[119, 199]. The number of pulses from the scattering signal is converted to PM
concentration using a proprietary algorithm developed by Plantower [199]. The
sensors are factory calibrated with ambient aerosol across several cities in China
[142] and have an effective detection range of 0-500 µg m−3 , with uncertainties of
±10 µg m−3 for concentrations between 0-100 µg m−3 and ±10% for concentra-
tions between 100-500 µg m−3 . The working temperature and relative humidity
(RH) ranges are -10 to +60°C and 0-99%, respectively, with a reported sensor life-
time of 3 years [199]. The Plantower sensors quantify the count of particles per
deciliter within six overlapping size categories (> 0.3 µm, > 0.5 µm, > 1 µm, >
2.5 µm, > 5 µm and > 10 µm) and uses a proprietary algorithm to derive esti-
mates for various PM size fractions (PM1 , PM2.5 , and PM10 ). Plantower provides
38
two data-series to the users: (1) CF=1 (calibration factor = 1, denoted as ‘CF1’ sig-
nal) that is based on laboratory evaluations and is recommended for indoor sensors
and (2) CF=atmosphere (atmospheric corrected data, denoted as ‘ATM’ signal) that
is based on field evaluations and is recommended for outdoor sensors.
However, since Plantower provides no information on algorithm, for a more
transparent and reproducible alternative method, Wallace et al. [242] used the num-
ber of particles per deciliter less than 2.5 µm in three different size categories
(0.3-0.5 µm, 0.5-1 µm, 1-2.5 µm) to estimate PM2.5 concentrations and arrived
at a correction factor of 3. This signal is independent of CF1 or ATM concentra-
tions, and is reported by PurpleAir as the ‘ALT’ signal. All three signals, CF1,
ATM and ALT, can be downloaded via PurpleAir API. Appendix A.1 outlines the
steps to calculate ALT concentrations. Appendix A.1 contains the steps to calcu-
late ALT concentrations. ALT concentrations have been shown to have superior
performance, with 4-6% precision, compared to 7-14% precision of CF1 concen-
trations [242]. In addition to PM2.5 measurements, PurpleAir sensors also have
temperature, relative humidity, and pressure sensors (BOSCH-BME280) and can
transmit data online [21].
PurpleAir data can be obtained through an Application Programming Interface
(API) available online using an API key [4]. Users are required to input the start
date and time, sensor ID, and the desired output data, such as ‘ATM’, ‘CF1’, or
‘ALT’ concentration. Additional parameters, such as relative humidity, temper-
ature, uptime, and firmware version, can also be selected. Users can choose to
retrieve data from separate channels or average outputs. PurpleAir reports the av-
erage of the A and B channels, however if the values observed for the two channels
drift apart, the system may downgrade one of the channels and exclude the down-
39
graded channel. It is important to note that access to PurpleAir data is subject to
PurpleAir fair use policies [4].
3.5.2 Data Collection and Processing
To develop the calibration method, we downloaded data from a total of 15 Pur-
pleAir sensors located in Bengaluru (India), Chico (California, USA), Delhi (In-
dia), Kathmandu (Nepal), and Lahore (Pakistan) using the PurpleAir API (Table
3.1). Locations in South Asia were selected based on availability of regulatory
data, and based on availability from at least two PA in the city for the year 2022.
During wildfires, California experiences high pollutant concentrations similar to
those observed in South Asia. Chico (Butte County) had an extended wildfire sea-
son in both 2020 and 2021, making it a relevant addition to our analysis to test
inter-city models (total 85 days of data).
We downloaded two-minute data for the ATM concentration, which we con-
sidered as the ‘raw’ data for this study, and for the ALT concentration, which is
the PA-corrected data. To calibrate the PAs, we used regulatory data from the
U.S. Environmental Protection Agency (US EPA) for Chico [227] and from US
embassies and consulates for Lahore and Kathmandu [9]. We used data from the
Center Pollution Control Board (CPCB) of India, along with state agencies Delhi
Pollution Control Committee (DPCC) [1], Karnataka State Pollution Control Board
(KSPCB) [3], and India Meteorology Department (IMD) [2], to calibrate the PAs
in India [47]. Since the US EPA reports 1-hour data, we down-averaged the 2-
minute PA data and 15-minute CPCB data to hourly concentrations. As a valida-
tion dataset, we downloaded 2-minute data for a PA sensor in Vancouver, British
Columbia (VPA1) and in Yreka, California (CPA5), which was down-averaged
40
Table 3.1: Information on PurpleAir data and regulatory bodies used to create
intra-city and inter-city calibration models. Wildfire concentrations were
used for modeling in Chico, to represent high levels of air pollution in
South Asia.
Used in
Location Sensor Time Period Regulatory Body
Final Analysis
For building and testing of estimated regional calibration models
BPA1 Yes
Bengaluru Jan-Dec 2022 CPCB*
BPA2 Yes
CPA1 Yes
Aug-Oct 2020;
CPA2 Yes
Chico Aug-Sep 2021 US EPA
CPA3 Yes
(Wildfire)
CPA4 No
DPA1 Jan-Sep 2022 Yes
Delhi DPA2 Jan-Dec 2022 CPCB* Yes
DPA3 Jan-Dec 2022 Yes
KPA1 Jan-Dec 2022 No
Kathmandu US EPA
KPA2 Jun-Dec 2022 Yes
LPA1 Jan-Dec 2022 Yes
LPA2 Jan-Oct 2022 Yes
Lahore US EPA
LPA3 Jan-Jun 2022 Yes
LPA4 Apr-Dec 2022 Yes
For conceptual testing of the model
Yreka CPA5 Sep-Dec 2021 US EPA
Vancouver VPA1 Apr-Nov 2022 MV
CPCB*: Data gathered by the central Indian agency (CPCB), state agencies
(DPCC and KSPCB) and by India Meteological Department (IMD) has been
collectively called as CPCB data.
CPA4 was excluded due to high degree of disagreement of temperature and
and RH with other PurpleAirs in Chico.
KPA1 didn’t meet the inclusion criteria.
to hourly data. The VPA1 dataset was tested against six monitoring stations of
Metro Vancouver (MV), the regulatory body responsible for air quality in Vancou-
ver [155], and CPA5 was tested against the nearest US EPA station (Table 3.1).
Both Bengaluru and Delhi have multiple monitoring stations throughout the
41
city. In the National Capital Region (NCR), which includes Delhi and has more
than 20 monitoring stations, we narrowed down the list to include only those within
a 10-kilometer radius of any PA sensor, resulting in a final selection of 14 stations.
For model building, two stations, one situated at the airport and another along a
national highway, were excluded to prevent their persistent sources from influenc-
ing the models. Additionally, one station was removed due to its extended period
of inoperability, leaving a total of 11 stations for building calibration models for
Delhi. Out of the 10 monitoring stations in Bengaluru, we used data from only
four stations for model development. This selection was based on several factors:
two stations had an operational uptime of only 10% throughout 2022, three sta-
tions exhibited flat-lined data (lacked diurnal patterns) rendering them unreliable,
and one station was located in close proximity to a factory, introducing potential
interference. Consequently, these stations were not included in building calibration
models for Bengaluru. The finalized map of regulatory stations and PA monitors
used in this study are shown in Figure 3.1.
The analysis excluded data points with a percent difference between the A and
B channels, calculated using Equation 3.1, exceeding one standard deviation (=
68%). However since percent difference can be skewed high at low concentrations,
differences below 5 µg m−3 were included in the analysis even if the percentage
error exceeded 68% [224]. Thus, data points were considered valid if the difference
between the two channels was either less than 5 µg m−3 or 68%.
(A − B) ∗ 2
% di f f erence = (3.1)
A+B
Sensor CPA4 was excluded from the analysis due to low agreement on temper-
42
Figure 3.1: Locations of PurpleAirs (green star) and selected regulatory sta-
tions (pink stars) in the selected 5 locations. The scale of the map is
different for each city.
43
ature and relative humidity with other PurpleAir sensors in Chico. Sensor KPA1
was also excluded because it had a high degree of disagreement between its two
channels, and it didn’t meet the inclusion criteria. In Delhi, DPA1 had high dis-
agreements between its two channels between October and December 2022, and
therefore, all the data between those periods were removed from the analysis.
3.5.3 Baseline Separation
In this work, we suggest that the total concentration for a pollutant can be divided
into two parts: the regional component and the local component, such that the total
concentration can be expressed as the sum of the regional concentration and the
local concentration. The regional concentration typically reflects regional back-
ground sources that are located farther away from receptors and produces slower-
varying (i.e. lower frequency) signals. The local concentration represents short-
term spikes in concentration, that are likely caused by local sources of pollution
such as traffic, construction, or industrial activity, and produce faster-varying (i.e.,
higher frequency) signals [221].
‘Estimated regional’ concentrations were separated from total concentrations
for both PA data and regulatory data using the rolling ball method [134]. The
method estimates baseline by dividing the dataset into multiple subsets, each with
a local window width of wm, and then smoothing the results by iterating ws times; a
detailed description of the rolling ball method and the parameters are in Appendix
A.1. The local concentrations were then calculated by subtracting the estimated
regional concentrations from the total concentration. Since the rolling ball method
requires a continuous set of data, we used the Kalman filter to impute missing
values [60]. Imputed data points were removed from the dataset after baseline
44
separation, but before analysis. In cases where the estimated regional concentration
exceeded the total concentration, the regional concentration was set equal to the
total concentration (i.e., local concentration forced to zero). These datapoints were
removed from datasets for modeling of local concentrations. One limitation of
this method is that to estimate the baseline at any point it requires data from both
before and after the point, since it works with rolling averages. Therefore, for
calibration of both local and regional components, datasets equal to the wm from
the beginning and end of each dataset were removed to avoid underestimation of
baseline concentrations. When the time gap between data points was longer than
the wm, the data was divided into separate sets for baseline estimation.
To tune the choice of wm for regional baseline estimation, we relied on previous
field studies of regional pollution indicators, such as the ratio of secondary organic
aerosol (SOA) to total organic aerosol (TOA). Therefore, we calculated the ratio of
estimated regional to total concentration for regulatory data for wm values ranging
from 4-13 hours and compared it to the SOA/TOA ratio, under the assumption that
SOA is more regional since it is formed due to chemical transformations in the
atmosphere [200]. ws values were set the same as wm values.
The ratio of SOA to TOA has been previously investigated in Delhi and Cali-
fornia, however, we couldn’t identify any studies estimating the SOA/TOA ratio in
Lahore, Kathmandu, or Bengaluru. According to Gani et al. [74], the SOA/TOA
ratio ranged from 0.5 to 0.7 during winter and 0.6 to 0.8 during summer in Delhi.
Jo et al. [113] reported SOA/TOA ratios varying between 0.5 and 0.73 across four
different cities in California, with an average ratio of 0.65 (median=0.68). We
used the values from Delhi, separate for summer and winter, as a proxy for Lahore.
Since cities in the Indo-Gangetic Basin (Delhi and Lahore) experience higher levels
45
Table 3.2: Means and medians for the ratio of estimated regional to total con-
centrations for window widths 4-9 hours for Chico. The target average
SOA/TOA ratio was 0.65.
wm (Hours) 4 5 6 7 8 9
Ratio - Average 0.69 0.67 0.65 0.64 0.62 0.6
Ratio - Median 0.73 0.7 0.68 0.67 0.65 0.63
of air pollution during the post-monsoon/winter season (Section 3.5.5) [50, 193],
summer ratios from Delhi were used a proxy for annual ratios in Bengaluru and
Kathmandu. Table 3.2 shows the corresponding ratios as a function of wm for
Chico, California, as an example. In Chico and Lahore, since there is only one
regulatory station, we used the average value across the whole period (separately
for summer and winter for Lahore) to select the wm value. In Delhi, Bengaluru and
Kathmandu, since there are multiple stations, we estimated the ratio at each station
and used the mean ratio across all the stations to select wm values (separately for
summer and winter for Delhi). The wm values varied between 6-11 hours across
different cities and periods (Appendix A.3, Table A.1).
3.5.4 Model Building
Previous studies [21, 142] have demonstrated that linear regression models for
PM2.5 calibrations have comparable or superior performance compared to other
modeling techniques, such as machine learning [111, 148]. Since linear regres-
sion models also have advantages in terms of simplicity and interpretability [270],
we only tested various linear regression models. In addition to the PurpleAir esti-
mated regional concentration (ERC), other candidate predictor variables included
temperature (T), relative humidity (RH), and dew point (DP) [18, 21, 42, 142].
These predictor variables were regressed against the estimated regional PM2.5 data
46
from the regulatory monitoring stations.
Temperature and RH were included as candidate variables in the model be-
cause they are two factors that have been shown to influence disagreement between
reference-grade PM2.5 sensors and low-cost optical sensors [106, 107]. As RH in-
creases, particles undergo hygroscopic growth, resulting in an increase in their
light scattering coefficient [41]. Additionally, most optical low-cost PM2.5 sensors
report data at ambient conditions, while the reference-grade PM2.5 sensors have
strict requirements on temperature (20-23°C) and relative humidity (30%-40%)
[142, 232]. Dew point (DP) was also considered as a variable related to condensa-
tion that could impact the PM2.5 sensor. Dew point was calculated using one form
of the Magnus-Tetens Formula [43, 61] (Equation 3.2) due to its suitability at room
temperature conditions.
RH .T
λ .(ln 100 ) + λβ+T
DP(T, RH) = .T
; where λ = 243.12C and β = 17.62 (3.2)
RH
β − (ln( 100 ) + λβ+T )
Candidate models for the analysis included both a simple linear regression and
multiple linear regression (MLR). In the MLR models, all combinations of ERC,
T, RH and DP were exhaustively considered. (Equations 3.3-3.10). Predictor vari-
ables with p-values > 0.05 were removed.
47
Corrected PM2.5 = β0 + β1 ∗ ERC (3.3)
= β0 + β1 ∗ ERC + β2 ∗ T (3.4)
= β0 + β1 ∗ ERC + β3 ∗ RH (3.5)
= β0 + β1 ∗ ERC + β4 ∗ DP (3.6)
= β0 + β1 ∗ ERC + β2 ∗ T + β3 ∗ RH (3.7)
= β0 + β1 ∗ ERC + β2 ∗ T + β4 ∗ DP (3.8)
= β0 + β1 ∗ ERC + β3 ∗ RH + β4 ∗ DP (3.9)
= β0 + β1 ∗ ERC + β2 ∗ T + β3 ∗ RH + β4 ∗ DP (3.10)
The various linear regression models were created and tested in unique loca-
tions (inter-city), and the root-mean-square error (RMSE) was calculated between
the corrected PA estimated regional PM2.5 and regulatory regional PM2.5 . The
model with the lowest RMSE was selected as the final model [21].
3.5.5 Intra-city Models
A single calibration model was developed for each city for the PA’s estimated re-
gional ATM concentration. This city-specific model was built by regressing the
median regional concentration across all PA sensors against the median regional
concentration across all regulatory stations at each timestamp. The models were
tested and trained within the same city, and their performance was evaluated using
R2 , RMSE, and nRMSE metrics (Section 3.5.8).
To validate the model’s ability to predict PM2.5 concentrations in new datasets,
48
a leave-one-month-out (LOMO) cross-validation approach was used. This in-
volved testing the model on one month of data while training it on the rest of the
year, and repeating this process for all months. For Chico, the models were instead
trained on 2020 wildfire data and then tested on 2021 wildfire data, and vice versa,
as we only included wildfire period data from Chico in this work. This technique
was adopted to validate the model’s ability to predict PM2.5 concentrations in the
same city for a new dataset.
The Indo-Gangetic Basin (IGB), which covers both Delhi and Lahore, expe-
riences higher levels of air pollution during the post-monsoon season (October-
December) [50, 193]. This is due to meteorological conditions that result in haze
and smog [86], which can alter the optical and physical properties of aerosols [62],
as well as higher-than-usual pollution emissions from crop burning and Diwali
festival-related fireworks [151, 193, 208]. A previous study conducted in Delhi
has shown that calibration models created for the monsoon season exhibited sea-
sonality and lacked transferability to other seasons [42]. Therefore, for sensors in
Delhi and Lahore, separate models were constructed for the monsoon period (July-
September) and the non-monsoon period (October-June). Intra-city models created
for this work are shown in Figure 3.2 (city-specific calibration models for locations
in pink and blue boxes).
3.5.6 Inter-city Models
In addition to testing city-specific calibration models in the same city, we also
tested city-specific calibration models for geographical transferability in other cities
(inter-city models). These models were trained in one city and then tested in a dif-
ferent city, which enabled us to perform an external validation by testing the model
49
Calibration Models
Regional
Total Concentration
Concentration
Vancouver Yreka IGB Non-IGB
Traditional and separate (regional

Delhi Lahore Bengaluru Chico
+ local) concentration models
Intra-city and Inter-city models
Kathmandu
Intra-city and Inter-city models
Figure 3.2: Flowchart for validation, intra-city and inter-city models built for
this work.
on different PAs and locations. For instance, the models were trained using the
three PA sensors in Chico and then tested separately on the PurpleAir sensors in
Bengaluru. Models were created for both ATM and ALT concentrations.
As previously mentioned in Section 3.5.5, post-monsoon (winter) season expe-
riences fog in IGB which can alter the optical and physical properties of aerosols
[62]. Therefore, the models in the IGB (Delhi and Lahore) were trained and tested
within the same region, stratified by monsoon and non-monsoon periods; e.g.,
models created using monsoon data in Delhi was tested against monsoon data col-
lected in Lahore. The remaining sensors (in Bengaluru, Chico and Kathmandu)
were trained and tested against each other. Inter-city models created for this work
are shown in Figure 3.2 (cities within the pink and cyan boxes were trained and
tested against each other).
50
3.5.7 Validation Testing of the Method
To test the concept behind our model building, we used two different PA sensors,
located in Vancouver (VPA1) and Yreka (CPA5), respectively (Figure 3.2, orange
box). VPA1 was located less than a block away (< 500 m) from a MV monitoring
station, and CPA1 was located 60 m away from the nearest US EPA monitoring sta-
tion. These locations were specifically selected because they typically have cleaner
air due to the absence of significant local or regional sources of air pollution, except
during wildfires. Therefore, for this analysis, we excluded the dataset from Van-
couver during wildfire periods (10-14 September and 14-20 October, 2022) and
used dataset from post wildfires in Yreka (after 21 September, 2022).
The model validation involved separating the total concentration into estimated
regional and local concentrations and creating separate calibration models for each
using the results of Section 3.5.4. The performance of the local+estimated re-
gional concentration was then compared with the sensor observed ATM and ALT
concentrations, and with traditional MLR performance. This was done to test the
hypothesis behind our work, which suggests that estimating regional concentration
and calibrating it separately from local concentration concentration could lead to
improved performance.
For CPA1, both the estimated regional and local concentrations (as well as the
total concentration) were tested against a single US EPA station. However, since
Metro Vancouver has multiple regulatory monitoring stations, for VPA1, the local
concentration was calibrated against the nearest Metro Vancouver station, while the
estimated regional concentration was calibrated against the median regional con-
centration, at each timestamp, across six different regulatory monitoring stations.
51
3.5.8 Performance Metrics
To evaluate the calibration models, three performance metrics were calculated on
the testing dataset (i.e., data that was withheld during the model-building process):
(1) the coefficient of determination (R2 ), (2) root mean square error (RMSE), and
(3) normalized RMSE (nRMSE). The coefficient of determination (R2 ) is a statis-
tical measure for regression models that determines the amount of variance in the
dependent variable that can be explained by independent variables. A higher R2
value indicates a better correlation between calibrated and expected values, with a
maximum value of 1. The US EPA recommends an R2 value between a reference
monitor and a calibrated sensor to exceed 0.70 for daily average concentrations
[70]. The root mean square error (RMSE), which is the standard deviation of pre-
diction errors, was calculated using Equation 3.11. A lower RMSE value indicates
better performance. Normalized root mean square error (nRMSE) provides the ap-
proximate mean normalized error of the model, with a lower value indicating better
performance. It was calculated using Equation 3.12. The US EPA recommends a
target RMSE between reference monitor and calibrated sensor of ≤ 7 µg m−3 or
target nRMSE of ≤ 30% for daily average concentrations [70].
∑ni=1 (Calibrated valuei − Observed valuei )2

RMSE = (3.11)
n
RMSE
nRMSE = (3.12)
Average observed concentrations
52
3.6 Results and Discussion
3.6.1 Regression Model Selection
To determine the best model from the eight linear regression models tested, the
RMSE metric was used. We observed that including RH as an additive term led
to the lowest RMSE (Table 3.3), while maintaining parsimony, which is consistent
with other studies [21]. Therefore, for further examination of intra-city, inter-city
and validation test models, we used a MLR model with PA-observed concentra-
tions and RH as the predictors (Equation 3.5). Comprehensive performance evalu-
ations for each training and testing model can be found in Appendix A.4.
Table 3.3: Average PM2.5 model RMSE from the inter-city models.
Model RMSE (µg m−3 )

PM 10.70
PM+T 11.06
PM+RH 10.68
PM+DP 11.74
PM+T+RH 11.65
PM+T+DP 11.80
PM+RH+DP 11.70
PM+T+RH+DP 11.80
During our testing in Delhi and Lahore, we observed a considerable decline in
the performance of the calibration model for concentrations exceeding 100 µg m−3 .
Thus, we built a separate model for concentrations below 100 µg m−3 which demon-
strated better performance for both Delhi and Lahore. In Bengaluru and Kath-
mandu, no data exceeded 100 µg m−3 , making it a non-issue. For Chico, since less
than 2% of the data exceeded 100 µg m−3 (n=103 data-points), these points were
removed, and the models were developed for concentrations below 100 µg m−3 .
53
The model coefficients for the regression model for PM2.5 for different cities are
listed in Table 3.4.

Table 3.4: Regression Model Coefficients for different Cities.
City β0 β1 β3
< 100 µg m−3
Bengaluru 8.80 0.64 -0.13
Chico 2.54 0.69 -0.09
Kathmandu 8.21 0.60 -0.16
Delhi (Monsoon) 13.33 1.35 -0.22
Delhi (Non-monsoon) 36.64 1.21 -0.52
Lahore (Monsoon) 22.53 0.92 -0.27
Lahore (Non-monsoon) 56.69 0.59 -0.41
> 100 µg m−3
Delhi (Non-monsoon) 67.00 1.26 -0.58
Lahore (Non-monsoon) 81.61 0.85 -0.70
The intercept of the MLR model (β0 ) varied between models built in IGB and
non-IGB locations, and between monsoon and non-monsoon periods. Higher β0
and β1 values were observed in IGB models, compared to non-IGB models, and
during non-monsoon periods, likely due to overall higher concentrations experi-
enced in IGB during non-monsoon periods. Our model coefficients for the non-
IGB (Bengaluru, Chico, Kathmandu) regression models are similar in magnitude
and direction to a previous study conducted by the US EPA [21], however, our β1
and β3 values are up to 2x times higher (Barkjohn et al. [21]: β0 =5.75, β1 =0.524,
β3 =-0.0862). Similarly, our β1 and β3 for the IGB (Delhi, Lahore) regression mod-
els are in same direction to a study conducted in India [42] but up to 2.5x times
higher in magnitude (Campmier et al. [42]: β1 =50.6, β1 =0.546, β3 =-0.936). These
differences could be attributed to the fact that the aforementioned studies con-
structed calibration models using the total CF1 concentration, whereas our models
54
were constructed for estimated regional concentrations of the ATM concentration.
3.6.2 Performance Evaluation of Intra-city Regional Concentration

Models
Intra-city models were developed to assess each model’s performance when cor-
recting concentrations in a new dataset within the same city. The performance
of the calibration models, except for the Chico models, was assessed using the
LOMO cross-validation method, where the model was tested on one month of data
and trained on the rest of the year. In Chico, models were built for wildfire data
in either 2020 or 2021 and tested on the other. The evaluation metrics, includ-
ing R2 , RMSE, and nRMSE, for each sensor are reported in Table 3.5. Detailed
performance metrics of each city and PurpleAir can be found in Appendix A.5.
Overall for models created for concentrations < 100 µg m−3 , applying the cal-
ibration model resulted in improved performance, with average improvement in
cross-validated nRMSE by 25%; albeit, R2 improvement was only 0.06. When con-
sidering individual sensor performance, the nRMSE values were generally below
the US EPA recommended 30% across all sensors. However, the cross-validated
R2 values for all city-specific models, except Chico, were lower than the US EPA
guidelines, with an average of 0.55; an example of the pre- and post-calibration
data from one of the better performing intra-city models is shown as line and scat-
ter plots in Figure 3.3. Similarly, models created for concentrations > 100 µg m−3
showed minimal improvement in R2 (0.05), but modest improvements in nRMSE
(17%). This discrepancy in R2 and nRMSE performance highlights the systematic
bias in PurpleAir data, as shown in other studies that due to hygroscopicity, Pur-
pleAirs tend to overestimate concentrations by approximately 40% [21, 199, 242].
55
Table 3.5: Mean R2 , RMSE (µg m−3 ) and nRMSE values of the testing
dataset for the intra-city models, built for estimated regional ATM con-
centrations. ‘n’ is the number of test models (months; maximum = 12).
One model was created for each city and tested for each individual LCS
in the city and each month using LOMO cross validation method. Chico
models were trained using either 2020 or 2021 wildfire data and tested
on the other.
Observed ATM Calibrated ATM

Sensor ID n
R2 RMSE nRMSE R2 RMSE nRMSE
< 100 µg m−3
BPA1 12 0.61 14.9 0.71 0.66 5.32 0.28
BPA2 12 0.59 7.07 0.34 0.63 5.36 0.27
CPA1 2 0.95 18.04 0.61 0.97 8.62 0.29
CPA2 2 0.94 18.82 0.64 0.96 8.61 0.29
CPA3 2 0.87 19.17 0.65 0.91 8.76 0.29
DPA1 10 0.60 28.11 0.59 0.70 13.73 0.30
DPA2 11 0.56 19.04 0.35 0.64 20.04 0.40
DPA3 11 0.46 24.95 0.48 0.56 11.53 0.24
KPA1 6 0.42 13.26 0.81 0.46 4.18 0.24
LPA1 10 0.39 20.99 0.37 0.47 14.3 0.24
LPA2 10 0.44 21.94 0.34 0.53 11.68 0.19
LPA3 2 0.25 27.98 0.40 0.37 11.17 0.16
LPA4 7 0.46 19.68 0.33 0.51 11.97 0.21
Average 0.58 19.53 0.51 0.64 10.41 0.26
> 100 µg m−3
DPA2 2 0.66 73.08 0.43 0.68 34.77 0.21
DPA3 1 0.69 88.78 0.52 0.70 31.63 0.19
LPA1 8 0.34 48.46 0.30 0.43 37.31 0.22
LPA2 6 0.41 42.68 0.32 0.47 23.94 0.17
LPA3 4 0.40 38.22 0.28 0.43 24.36 0.17
LPA4 3 0.45 49.32 0.30 0.51 33.12 0.18
Average 0.49 56.76 0.36 0.54 30.86 0.19
Consistent with previous studies [42], we observed seasonality in sensor per-
formance in Delhi and Lahore. Specifically, the models built using monsoon data
yielded better results compared to non-monsoon model, with average improve-
56
Reference Concentrations
1:1 Line
BPA1 BPA2
Uncalibrated ATM Concentration
100 100
Uncalibrated concentrations (μg/m )

90 90
3
Baseline concentrations (μg/m )
3
80 80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
2022-10-11 2022-10-21 2022-10-31 0 20 40 60 80 100
3
Reference concentrations (μg/m )
Calibrated ATM Concentration
100 100
90 90
Calibrated concentrations (μg/m )

3
3
80 80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
2022-10-11 2022-10-21 2022-10-31 0 20 40 60 80 100
3
Figure 3.3: Line and scatter plots for variations in estimated regional con-
centrations (black line) for raw ATM concentration and calibrated ATM
concentration of BPA1 and BPA2 for intra-city model.
ments in R2 of 0.22 in Delhi (monsoon model R2 =0.79) and 0.16 in Lahore (mon-
soon model R2 =0.67). RMSE was also higher during non-monsoon periods, which
can be attributed to lower air pollution during the monsoon season and therefore
less influence of persistent air pollution in the area that might influence the estima-
tion of regional concentrations. Generally, models built for concentrations above
100 µg m−3 had similar performance compared to IGB models built for concen-
trations below 100 µg m−3 , and had improved performance when compared to raw
57
ATM concentration (R2 increased from 0.49 to 0.54 and nRMSE reduced from
36% to 19%, Table 3.5).
Bengaluru
Chico 0.1
Kathmandu
Delhi
Lahore
0.2
R2
0.4 0.5 0.6 0.8 0.9 1.0
0.4
Intra-city models
Inter-city models
Outline - Training
Fill - Testing
0.5
Other Studies
nRMSE
Figure 3.4: Performance of our intra-city (square markers) and inter-city (cir-
cle markers; outline is the training and fill is the testing city) models
with reference to the US EPA guidelines (R2 > 0.7 and nRMSE < 30%).
Cities are denoted by different colors. Models in the top-right quadrant
represent the models that met the US EPA guidelines, whereas models
in the bottom-left quadrant didn’t meet either criteria by the US EPA.
The model trained in Kathmandu and tested in Bengaluru (orange cir-
cle with green outline) didn’t meet the nRMSE criteria by the US EPA,
however it met the RMSE criteria (< 7 µg m−3 ). Triangle markers are
the average performance of models reported by other studies. Solid gray
and hollow gray markers are the average performance across all intra-
city and inter-city models for Campmier et al. [42], respectively. The
yellow marker with gray outline is approximately placed to represent
the average performance across all inter-city models by Zusman et al.
[273]
.
Previous studies have explored the performance of intra-city models. A study
58
conducted by Campmier et al. [42] collocated PurpleAir PMS5003 sensors at three
different locations in India (Delhi, Hamirpur, and Bengaluru) and reported higher
R2 values ranging from 0.82 to 0.95 and nRMSE values ranging from 20% to 32%.
However, their approach involved collocating sensors at specific locations within
a city and did not spread out the LCS to create a single model; as such, the model
transferability was not tested with the intra-city models. V et al. [239] created
a city-specific model using the leave-one-location-out (LOLO) cross-validation
method for the Plantower PMS7003 sensor at multiple locations within Mumbai,
India, and reported low R2 (as low as 0.02) and very high RMSE (up to 85 µg m−3 ).
Our intra-city models had better performance in comparison. Figure 3.4 maps the
performance of our intra-city models for each city (square markers) relative to
the US EPA guidelines and with Campmier et al. [42]. As such, the results of
this approach provides confidence in the potential of using a regional models for
LCS. This approach could be specifically beneficial in areas where direct access
to monitoring stations is limited, as a low-cost sensor can be deployed in different
locations within a city and calibrated for the estimated regional component of the
concentration.
3.6.3 Performance Evaluation of Inter-city Regional Concentration

Models
Inter-city models were developed to evaluate the transferability of the regional cal-
ibration model from one location to another. To test the model, we applied the
regional calibration model to PA sensors in a different country, stratified by IGB
and non-IGB locations (Figure 3.2). The evaluation metrics, including R2 , RMSE,
and nRMSE, for each sensor are reported in Table 3.6. Detailed performance met-
59
Table 3.6: Mean (standard deviation) of R2 , RMSE (µg m−3 ) and nRMSE of
inter-city calibration models for sensor observed and calibrated ATM and
ALT concentrations.
Observed Calibrated
Testing Training
< 100 µg m−3 ATM Concentration
Chico 0.81 11.67 0.56 0.86 6.44 0.3
Bengaluru
Kathmandu 0.81 11.67 0.56 0.84 6.94 0.32
Bengaluru 0.92 19.19 0.62 0.93 6.86 0.22
Chico
Kathmandu 0.92 19.19 0.62 0.93 6.45 0.21
Bengaluru 0.84 16.66 0.54 0.89 5.86 0.19
Kathmandu
Chico 0.84 16.66 0.54 0.89 4.42 0.14
Lahore Delhi 0.19 28.5 0.45 0.31 18.97 0.32
Delhi Lahore 0.55 23.28 0.48 0.62 14.84 0.32
Average 0.74 18.35 0.55 0.78 8.85 0.25
−3
< 100 µg m ALT Concentration
Chico 0.84 7.99 0.38 0.86 5.81 0.27
Bengaluru
Kathmandu 0.84 7.99 0.38 0.88 5.55 0.26
Bengaluru 0.9 10.79 0.35 0.93 6.86 0.22
Chico
Kathmandu 0.9 10.79 0.35 0.9 8.13 0.27
Bengaluru 0.81 6.76 0.22 0.86 5.25 0.17
Kathmandu
Chico 0.81 6.76 0.22 0.86 5.69 0.18
Lahore Delhi 0.21 31.98 0.5 0.23 26.39 0.47
Delhi Lahore 0.56 30.12 0.63 0.58 15.23 0.33
Average 0.73 14.15 0.38 0.76 9.86 0.27
−3
> 100 µg m ATM Concentration
Lahore Delhi 0.62 43.36 0.28 0.68 73.84 0.48
Delhi Lahore 0.50 64.66 0.43 0.37 43.54 0.29
> 100 µg m−3 ALT Concentration
Lahore Delhi 0.10 36.96 0.55 0.62 49.70 0.32
Delhi Lahore 0.51 74.92 0.49 0.48 40.85 0.27
rics of each model can be found in Appendix A.6.
When averaged across all models created for estimated regional ATM concen-
trations < 100 µg m−3 , inter-city models met the US EPA guideline for sensor
performance, with average R2 of 0.78 and nRMSE of 25% (improvement of 0.04
60
and 30% respectively when compared to uncalibrated ATM concentrations). Mod-
els built in non-IGB locations (Bengaluru, Chico, Kathmandu) exhibited strong
performance and all models met the US EPA guideline for sensor performance,
with an average cross-validated R2 at 0.89 and an average cross-validated nRMSE
of 23%; an example of the pre- and post-calibration data is shown in Figure 3.5.
Interestingly, applying a model from a different city did not result in worsened
performance (when compared to intra-city models). This suggests that the calibra-
tion models developed for estimated regional concentrations can be transferable to
other cities, even across continents.
In contrast, models built in the IGB region demonstrated lower R2 and higher
error (nRMSE: 32%), with the Delhi dataset performing worse (R2 = 0.31) than
the Lahore dataset (R2 = 0.62), and neither model met the US EPA criteria. As
with the intra-city models, the monsoon models resulted in higher R2 values (0.69)
compared to the non-monsoon models (0.39). However, both RMSE and nRMSE
were lower for the non-monsoon models. Applying the Lahore model to the Delhi
dataset rendered similar outcomes to the intra-city models for Delhi (R2 and nRMSE
for intra-city and inter-city models were approximately equal; Figure 3.4). How-
ever, although the performance of the Lahore dataset after applying the Delhi
model outperformed the raw ATM concentration, the performance was worse when
compared to the intra-city model for Lahore (R2 decreased by 0.16 and nRMSE in-
creased by 12%). These findings suggest that the inter-city models built in the
IGB region may have some potential for transferability, but the estimated regional
concentrations could be influenced by persistent air pollution sources specific to
the region. Furthermore, it is worth noting that this work uses existing PA net-
work data, and due to the high levels of air pollution in the IGB region, the sensors
61
Reference Concentrations
1:1 Line
CPA1 CPA2 CPA3
Uncalibrated ATM Concentration
180 200
Uncalibrated concentrations (μg/m )

180
3
160
3
140 160
140
120
120
100
100
80
80
60
60
40 40
20 20
0
2020-08-21 2020-08-31 2020-09-10 2020-09-20 2020-09-30 2020-10-10 0 20 40 60 80 100
3
Calibrated ATM Concentration
120 100
90
Calibrated concentrations (μg/m )

3
3
100
80
70
80
60
60 50
40
40
30
20
20
10
0 0
2020-08-21 2020-08-31 2020-09-10 2020-09-20 2020-09-30 2020-10-10 0 20 40 60 80 100
3
Figure 3.5: Line and scatter plots for variations in estimated regional con-
centrations (black line) for raw ATM concentration and calibrated ATM
concentration of CPA1, CPA2 and CPA3 sensors when trained in Ben-
galuru.
may have experienced enhanced degradation over time. This wear-out effect could
require calibration models specific to the individual sensors, limiting their transfer-
ability.
Upon testing the ALT concentration to further calibrate the dataset for concen-
trations < 100 µg m−3 , we observed a modest improvement in performance met-
rics. Specifically, the R2 value increased from 0.73 to 0.76, and nRMSE decreased
62
from 38% to 27% (Table 3.6), and further calibrating the ALT concentrations re-
sulted in average performance across all models meeting the US EPA criteria. Cali-
brated ATM concentrations had slightly better performance compared to calibrated
ALT concentrations (nRMSE improvement by 2%; Table 3.6), especially for loca-
tions outside of the IGB.
We also built the model for ALT and ATM estimated regional concentrations
exceeding 100 µg m−3 (observed during non-monsoon periods only) by training in
Delhi and testing in Lahore and vice versa. The models built for ALT concentra-
tions had improved performance (R2 improved by 0.25 and nRMSE improved by
0.22, Table 3.6), when compared to the raw ALT concentration. However, models
built for ATM concentrations resulted in mixed results and were inconsistent, and
neither the Delhi nor the Lahore dataset met the US EPA criteria for R2 and only
the model trained in Lahore met the nRMSE criteria. One factor contributing to
the varying results may be the limited instances of estimated regional concentra-
tions exceeding 100 µg m−3 in Delhi; this may have affected the performance of
the calibration model. To gain a better understanding of the behavior of regional
estimates and the transferability of the model at very high concentrations, further
data collection and analysis are necessary.
Our inter-city model tested in non-IGB locations resulted in comparable or bet-
ter performance than the inter-city models previously built by Zusman et al. [273]
for the USA (model R2 : 0.67-0.80; RMSE: 3.41 µg m−3 ). Moreover, we tested our
model across continents, reinforcing the advantages this method. Campmier et al.
[42] noted poorest performance of Delhi models when tested in another rural lo-
cation in the IGB (nRMSE = 39%), which is similar to our findings. Nonetheless,
our models built in Lahore exhibited comparable performance to Campmier et al.
63
[42] (lower R2 and nRMSE compared to average inter-city model performance by
the study). Figure 3.4 maps the performance of our inter-city models for each city
(circle markers) relative to the US EPA guidelines, and with Campmier et al. [42]
and Zusman et al. [273].
3.6.4 Validation Testing of the Method
We further assessed the concept behind our work by building separate calibra-
tion models for estimated regional and local concentrations for PAs in Vancouver,
Canada and Yreka, California. These models were then compared to the sensor-
observed ATM and ALT concentrations, and traditional MLR models. Overall,
separate MLR models resulted in improved performance when compared to sensor-
observed concentrations, with R2 improvement of up to 0.12 and RMSE reduction
by up to 8 µg m−3 (Table 3.7). Moreover, separate regional and local MLR models
resulted in similar (CPA5) or better (VPA1) performance of both R2 and RMSE
compared to traditional MLR. Notably, this improvement was more significant for
VPA1 where the median regional concentration was determined from six moni-
toring stations. As such, it highlights the effectiveness of our approach and sup-
ports our hypothesis that regional concentrations exhibit consistency throughout a
region, allowing for the construction of a model without the need for direct collo-
cation with regulatory stations.
The performance of low-cost PM2.5 calibration models have varied across dif-
ferent studies, in both the global north and south, and a comparitive analysis can
be found in Appendix A.7. In the global south, Bai et al. [18] calibrated hourly
data collected in Nanjing, China, using the Shinyei PPD42NS sensor and reported
a model R2 value of 0.75. Meanwhile, Malyan et al. [143] and Jha et al. [111]
64
Table 3.7: Performance comparison of observed ATM and ALT concentra-
tion, with traditional MLR and separate regional and local MLR models,
when tested in Vancouver (VPA1) and Yreka (CPA5).
R2 RMSE (µg m−3 ) nRMSE

Model Type
VPA1 CPA5 VPA1 CPA5 VPA1 CPA5
ATM concentration 0.51 0.72 7.32 11.53 1.33 1.38
ALT concentration 0.49 0.74 4.39 4.31 0.80 0.51
Traditional MLR 0.53 0.72 3.29 3.46 0.59 0.41
Separate Regional & Local MLR 0.63 0.73 2.75 3.46 0.50 0.41
calibrated hourly data collected in Mumbai, India, from the Plantower PMS5003
and PMS7003 sensors, respectively, and reported varying performance, with the
former achieving a model R2 of 0.49 and the latter reporting a model R2 of 0.75. In
the global north, Malings et al. [142] developed a calibration model for hourly data
collected in Pittsburgh, USA, from the Plantower PMS5003 sensor, and reported
an average model R2 of 0.52, while Zusman et al. [273], calibration models were
created across various cities in the USA for daily average concentrations, with re-
ported R2 values ranging from 0.74 to 0.95 and an average root mean square error
(RMSE) of 2.46 µg m−3 . Given these comparisons, the performance of our de-
composed concentration model (R2 : 0.63-0.73, RMSE: 2.75-3.46 µg m−3 ) demon-
strates comparable performance to other relevant studies in the field.
3.7 Conclusion
One of the main challenges with wide adoption of LCS is a lack of geographically
transferable models. With this work, we created intra- and inter-city calibration
models for estimated regional concentrations that relied solely on openly available
regulatory data and tested the transferability of the model in the same city and
across continents, respectively.
65
The findings of this study provide evidence that calibrating models built sepa-
rately for regional and local component of the total signal can lead to more transfer-
able models, while having comparable performance to models built via traditional
MLR. The comparable performance of intra-city and inter-city models, as well
as their performance compared to other studies, highlights the potential scope of
deploying sensors in areas with limited regulatory stations or restricted access to
monitoring stations.
There are several limitations to this work. Firstly, it is exploratory and uses
limited datasets from specific locations. For a general calibration model, a wider
network of data needs to be tested. Secondly, this work heavily focuses on the
estimated regional component of the total concentration. Calibration models for
the local concentration may need to be developed to account for variations in pol-
lution chemistry or other factors that may affect sensor performance [104, 179].
Thirdly, the results reported here heavily focus on estimated regional concentra-
tions below 100 µg m−3 . More analysis needs to be done to test the transferability
of estimated regional models at very high concentrations. Finally, when estimating
the proportion of regional to total concentrations, our approach assumed similarity
across regions regarding the contribution of organic and inorganic aerosols to total
concentrations. Calibration models may need to be tested for areas where this as-
sumption might not hold true, such as in regions like western China characterized
by substantial mineral or desert dust [44, 133]. Addressing this aspect stands as a
recommended area for future research.
66
Chapter 4
Spatial modelling of daily PM2.5,

NO2 and CO concentrations
measured by a low-cost sensor
network: Comparison of linear,
machine learning, and hybrid
land use models

This chapter contains a paper published in the peer-reviewed journal Environment
Science & Technology. Reprinted with permission from Environ. Sci. Technol.
67
2021, 55, 13, 8631–8641. Publication Date: June 16, 2021. https://doi.org/10.1021/acs.est.1c02653.
Copyright 2021 American Chemical Society. The Supporting Information from
this paper is presented in Appendix B. This work uses the data collected by 50
RAMPs in Allegheny County to build LUR and LURF spatiotemporal models.
S. Jain, A. Presto, and N. Zimmerman (2021). Spatial Modeling of Daily
PM2.5 , NO2 , and CO Concentrations Measured by a Low-Cost Sensor Network:
Comparison of Linear, Machine Learning, and Hybrid Land Use Models. Environ-
mental Science & Technology, 2021 55 (13), 8631-8641.

SJ conducted the literature review and analysis for this work, developed the figures
and wrote the manuscript. NZ acquired the funding for this project, and provided
critical feedback during all stages of the project and the manuscript. AP provided
the original low-cost sensor data used in this work and provided critical feedback
for the manuscript.
4.3 Summary
Previous studies have characterized spatial patterns of pollution with land use re-
gression (LUR) models from distributed passive or filter samplers at low temporal
resolution. Large-scale deployment of low-cost sensors (LCS), which typically
sample in real time, may enable time-resolved or real-time modelling of con-
centration surfaces. The aim of this study was to develop spatiotemporal mod-
els of PM2.5 , NO2 and CO using an LCS network in Pittsburgh, Pennsylvania.
We modelled daily average concentrations in August 2016–December 2017 across
50 sites. Land use variables included 13 time-independent (e.g., elevation) and
68
time-dependent (e.g., temperature) predictors. We examined two models: LUR
and a machine-learning-enabled land use model (land use random forest, LURF).
The LURF models outperformed LUR models, with increase in average externally
cross-validated R2 of 0.10-0.19. Using wavelet decomposition to separate short-
lived events from the regional background, we also created time-decomposed LUR
and LURF models. Compared to the standard model, this resulted in improvement
in R2 of up to 0.14. The time-decomposed models were more influenced by spa-
tial parameters. Mapping our models across Allegheny County, we observed that
time-decomposed LURF models created robust PM2.5 predictions, suggesting that
this approach may improve our ability to map air pollutants at high spatiotemporal
resolution.
4.4 Introduction
Over 92% of the global population lives in areas with unhealthy air [93]. Exposure
to particulates and air pollutants are associated with several negative health out-
comes (e.g., cardiovascular disease, lower respiratory infections, etc.), resulting in
an estimated 4.2 million premature deaths globally in 2015 [253]. Therefore, fine
particulate matter (PM2.5 ), nitrogen dioxide (NO2 ) and carbon monoxide (CO) are
all criteria air pollutants regulated by the United States Environmental Protection
(US EPA) Agency [230].
Regulatory monitoring networks, such as the US EPA Air Quality System, are
a common tool used to monitor compliance with ambient air pollution standards.
However, regulatory-grade instruments are expensive to purchase and maintain,
and stations are usually sparsely distributed. Numerous studies have shown that
pollutant concentrations have small-scale spatial variations that are not captured
69
by regulatory networks [71, 130, 244]. These spatial variations create variations in
human pollutant exposures and resultant health impacts [66, 109].
To capture and quantify these small-scale spatial variations in pollutant con-
centrations, interest has risen in low-cost sensor networks due to improved sensor
technologies becoming available and new methods for sensor calibration being de-
veloped [26, 156, 182, 210, 271]. A significant benefit of using low-cost sensors
is that their small size and low power demand allow them to be deployed in places
where limited monitoring currently exists. Another benefit is that low-cost sensors
typically operate at high time resolution (hourly or better), enabling the quantifica-
tion of both spatial and temporal variations in pollutant concentrations.
However, it is impossible to sample all areas at all times in a given geographic
area. Thus, spatially distributed pollutant measurements are often used to train
spatial models for predicting pollutant concentrations in unsampled locations. A
common spatial model is the land use regression (LUR) model. LURs use spatial
covariates as predictor variables in a multivariate regression to predict the concen-
tration of different air pollutants [195]. The predictors for these models include
parameters such as surrounding land use, traffic, and meteorological data within
certain circular buffers [88, 102, 248].
LUR models typically assume a linear correlation between pollutant concen-
trations and different predictors. This has several consequences, as many of the
processes that determine pollutant concentrations are non-linear [69, 118]. One
approach to account for non-linearities is to include non-linear variables, such as
inverse distances from roadways, as predictors. While this approach can reason-
ably account for spatial patterns driven by dilution away from an emission source
[219], it cannot account for the effects of one predictor variable interacting with
70
another. Interactions between predictors are difficult to resolve without overfit-
ting [71]. Another limitation of linear models is that they often suffer from poor
transferability; models built for a specific city, or even part of a city, are often not
transferable outside of the model training region [181, 186]. This occurs because
local models are essentially tuned to the model training domain; this tuning can be
interpreted as a type of over fitting [17]. Furthermore, pollutant concentration at
any spatial point is an aggregate of local sources and regional background concen-
trations. The latter remains relatively unchanged from one site to another in a city
and thus including regional background concentrations may dampen the relation-
ship between the site specific predictors and concentrations [272].
To combat the issues that arise when LUR is used, non-parametric techniques,
such as artificial neural networks (ANNs) and random forests, have been used
to predict air pollutant spatial and temporal patterns, as they do not assume a
functional form to the relationship between pollutant data and predictor variables
[11, 63]. Land Use Random Forests (LURF), an alternative to LURs, apply the
machine learning algorithm of random forests to land use models. LURF models
have a set of training data that is fed into the system, which uses decision trees
based on land use covariates to predict the pollutant concentration.
The advantage of using LURF is that it does not assume a linear relation be-
tween inputs (land use covariates) and outputs (pollutant concentrations) and can
capture complex relationships between the predictor variables and the prediction
values. This can be achieved even for small training datasets as a consequence of
bootstrapping for processing. The disadvantages of these models are their inabil-
ity to determine the magnitude and direction of the effect of an individual variable
[34] and the model’s inability to predict concentrations outside of the concentra-
71
tions encountered within the training dataset [95]. Additionally, similar to LUR
models, LURF models suffer from poor model transferability.
In this work, we constructed LUR and LURF models to predict daily (24-hr)
average PM2.5 , NO2 and CO concentrations measured by a network of low-cost
sensors deployed in Pittsburgh from August 2016 – December 2017. A combined
hybrid model using LUR-LURF methodologies was also developed and tested to
circumvent the inability of random forests to extrapolate at concentrations outside
those encountered in the training data. To combat the issue of model transferability,
we also built LUR and LURF models after removing regional background, and the
remaining signal was decomposed into different frequencies: short-lived events
(events<2h), longer-lived events (2-8h), and persistent enhancements above the
regional background. A unique LUR and LURF was constructed for each of these
time frequency signals and then they were reconstructed into a single concentration
estimate.
4.5 Materials and Methods

We used PM2.5 , NO2 and CO concentration data collected by a network of low-cost
sensors to build and evaluate LUR, LURF and LUR-LURF spatial models. Models
were built for daily (24 h average) concentrations for standard and time-frequency
decomposed signals (details in Section 4.5.2). The leave one location out cross-
validation (LOLOCV) method was adopted to test the models for performance
[116, 246]. In this cross-validation method, data is split into training and testing
sets based on their monitoring location. For each iteration, the model is trained
on all sites but one, and the left-out site acts as the testing data for performance
evaluation. The model is iterated as many times as there are unique locations, until
72
all sites have been withheld as the testing site (here, n = 50). The LOLOCV method
was adopted to externally validate the dataset at a spatially unique location.
4.5.1 Measurement Details
Figure 4.1 shows the locations of the 50 sampling sites used for model training and
testing (orange hexagons). The sites were deployed as part of the Center for Air,
Climate, and Energy Solutions (CACES) air quality monitoring network [272] and
has been described fully in a previous publication [220].
All sampling sites were equipped with a low-cost pollutant monitoring system,
the Real-Time Affordable Multi-Pollutant sensor (RAMP, SENSIT Technologies)
[271]. The RAMP uses a commercial nephelometer (either a Met-One Neighbor-
hood Monitor or a PurpleAir PA-II) to measure PM2.5 and electrochemical sensors
(Alphasense CO-B4 and Alphasense NO2-B43F) for CO and NO2 . CO and NO2
signals were converted to concentration using the calibration approaches described
by Zimmerman et al. [271] and Malings et al. [141]. PM2.5 data were corrected
(e.g., for humidity) following the approach of Malings et al. [142]. Details on
the calibrations applied to the RAMP data are extensively provided in Appendix
B.1. Data were reported at 15-minute resolution, over a period from August 2016
to December 2017, that resulting in more than 25,000 unique 24 h average data
points. The range of measured CO, NO2 and PM2.5 concentrations across all 50
sites measured during this campaign are reported in Appendix B.2.
4.5.2 Standard and Decomposed Concentration Data Processing
The 15-min data reported for each site was first down averaged to daily concentra-
tions. We then define the ‘standard’ signal as the original calibrated daily average
73
Figure 4.1: The location of sampling sites and regulatory monitor sites. Or-
ange markers show the 50 locations. The purple markers show the lo-
cations of EPA regulatory grade monitors for PM2.5 and CO. The map
is divided into municipalities and map shading becomes darker with in-
creasing population density.
concentration from the deployment. The lower Limit of Detection (LoD) was iden-
tified as the smallest concentration that can be reliably measured. To determine the
LoD for each pollutant, we first ran 20 unique LURF standard signal models using
5 random sites as testing data and the remaining 45 sites as training data. Using the
predicted concentrations from these model runs and their observed concentrations,
we calculated the error fraction using the following equation -
Predicted concentration − Observed concentration

Error Fraction = (4.1)
Observed concentration
74
The calculated error fractions were plotted against decile bins of observed concen-
trations (Figure B.5, Appendix B.3). The bin value closest to zero was selected (and
subsequently, rounded off) as the limit of detection: – 20th percentile for PM2.5 (7
µg m−3 ) and 30th percentile for NO2 and CO (6.5 ppb and 200 ppb, respectively).
For each pollutant, models were trained only with daily average concentrations
exceeding the LoD limits. We hypothesize that this would improve the model per-
formance as data deemed unreliable due to measurements are removed.
The time decomposed signals (henceforth referred to as ‘decomposed’ signals)
were determined by applying wavelet decomposition to the original 15 min concen-
trations to isolate the short-lived (<2 h), longer-lived events (2-8 h), and persistent
enhancements above regional background concentrations. Details of the wavelet
decomposition are described in Zimmerman et al. [272] and summarized in Ap-
pendix B.4. The regional background was defined dynamically as the site with the
lowest concentration after removing the short and long-lived events from the data.
The separated short-lived event, long-lived event and persistent enhancement sig-
nals were then converted to individual daily averages and modeled independently.
This process is summarized by the following steps:
1. Wavelet decomposition was performed on original, calibrated, 15-min con-
centrations. This resulted in three signal components - short-lived events,
longer-lived events and persistent enhancement. Regional background con-
centrations were calculated dynamically as the lowest persistent enhance-
ment at any given time.
2. The four components of the signal were each averaged into daily average
concentrations.
75
3. Daily average concentrations were also calculated for the ‘standard’ (non-
decomposed) total signal.
4. Separate LUR and LURF prediction models were developed for the standard,
short-lived, long-lived and persistent signals.
5. The ‘decomposed’ signal prediction was calculated as LUR(F)persistent +
LUR(F)long-lived event + LUR(F)short-lived event + regional background
6. The ‘standard’ and ’decomposed’ models were tested for performance on the
withheld sites.
4.5.3 Predictor Variables
Land-use-based models were built using a combination of static and time-varying
predictor variables. Static predictors used for model building were chosen from
a previous PM2.5 LUR built for Pittsburgh by Li et al. [130] and included land
use variables often found in LUR models, such as population, traffic and land use
covariates. Appendix B.5 defines each of the land use predictors used and the
buffer zones taken into consideration.
To build models representing daily average concentrations, we included time-
varying meteorological variables such as daily average wind speed, average tem-
perature, and total daily precipitation. The US EPA regulatory measurements of
CO and PM2.5 (Figure 1 – Purple Diamonds) were also included to account for
day-to-day variations in regional pollutant concentrations (e.g., from shifts to the
atmospheric boundary layer). These measurements may also account for day-of-
week changes in emissions (e.g., due to weekday-weekend traffic patterns). Static
predictor variables for each site were also extracted from various published datasets
76
and are outlined in Appendix B.6. Different buffer sizes were considered for each
of the static land use variables. Since different buffer sizes for the same variable
are correlated, only one buffer size for a given variable was permitted in the final
model. The buffer with the highest adjusted R2 was chosen.
4.5.4 Land Use Regression (LUR)
The methodology used for model building and evaluation was a supervised step-
wise regression (combination of forward and backward regression), in which ev-
ery predictor is evaluated separately and provided a direction for regression. For
instance, elevation is negatively correlated with PM2.5 concentration (higher con-
centrations in river valleys) and traffic variables are positively correlated with CO
concentration (higher concentration near busy roads). Models were selected based
on the concept of parsimony.
Four basic rules were applied to variable selection: (1) a new predictor was
added to the model only if it increased the model’s adjusted R2 value by more than
0.01 and had the correct correlation direction; (2) predictor variables with p-values
>0.1 were removed; (3) predictor variables with variance inflation factor >3 were
removed; and (4) predictors with Cook’s D value >1 were removed if large changes
in predictor’s coefficients were observed when a site was removed [34, 130].
Variables in the final model were analyzed and evaluated using the variable
importance factor (VIF). For regression, VIF represents the absolute value of the
t-statistic for each variable. The higher the VIF, the higher the contribution to the
prediction capability of the model. Relative VIF is defined as the relative impor-
tance of variable with respect to other variables (with total contribution by all the
variables equaling 100%).
77
To validate our models, we calculated several performance metrics after ap-
plying our models at the withheld testing sites (i.e., not used in training). The
first was the coefficient of determination (R2 ) calculated from a linear least squares
regression of predicted concentrations versus observed concentrations. We also
calculated mean absolute error (MAE), which is the absolute value of the average
difference between the model prediction and the measured concentration. The co-
efficient of variation of the MAE (CvMAE), which is MAE normalized by the aver-
age concentration, was also computed to compare differences between the models.
CvMAE can be interpreted as the relative error of the model. The reported results
in this study provide R2 , MAE and CvMAE on the hold-out (testing) site.
4.5.5 Land Use Random Forest and Hybrid Models
For LURF model building on the training dataset, we used a 10-fold cross valida-
tion and terminated at 1000 trees with minimum terminal node size of 1. More
details about random forests can be found in Appendix B.7.
For random forests, VIF explains the prediction power of each variable by
calculating the increase in model error if the variable is permuted (i.e., randomly
shuffled). The initial model was created using all the variables to get the variable
importance factor (VIF). Several models were built, each one created by removing
the least important predictor variable from the model and reporting the new R2 .
For all the iterations of LURF created by removing the least important variable,
VIF from the original model (with all variables; non-recursive) was used. This was
adopted as the recursive version was previously found to be greedier (overfit) and
have worse performance [218].
Random forest predictions are calculated by averaging training values associ-
78
ated with the terminal node. As such, these models are incapable of predicting
concentrations outside of the training range. To account for this, during external
validation with the testing sites, if the predicted concentration from the LURF ex-
ceeded the 90th concentration percentile of the training data concentrations, the
LURF prediction was replaced with the LUR prediction. This amalgamation of
LUR-LURF is referred to as the ‘Hybrid’ model in the manuscript (Figure 4.2).
Similar to LUR validation techniques, we calculated R2 , MAE and CvMAE
for LURF and Hybrid models using LOLOCV method.
4.6 Results and discussion
4.6.1 PM2.5 Models
Figure 4.3 shows the relative VIF from the best performing model for both standard
and decomposed signals. The daily average EPA PM2.5 concentration was an im-
portant predictor in both the LUR and LURF models of the ‘standard’ signal. The
EPA monitor is located at an urban background site, which is less influenced by
short-lived events, and thus is expected to primarily be influenced by the regional
background. Consequently, in the ‘decomposed’ LUR and LURF models, which
do not model the regional background, the importance of the EPA PM2.5 monitors
is reduced.
As the time-frequency of the model increases (i.e., going from models of per-
sistent enhancements to short-lived event frequencies), the importance of spatial
variables in LUR model increased. This follows the logic that static predictors,
such as traffic, represent a local source. In other words, temporal variables domi-
nate the regional background signal, while spatial variables are a major contributor
79
Daily average
n= 1 location withheld
concentrations,
for testing
n=50 sites
Wavelet
n= 49 remaining locations decomposition
for model training
Omit
LUR Does including NO
LUR and LURF variable
predictor increase
model building from
R2 by >0.01?
model
LURF
Final LUR and YES

LURF models
Test
Performance LURF Prediction on
withheld sites
Hybrid model: Is
LURF prediction ≥90th
LUR Prediction YES percentile of training NO
on withheld sites predictions?
Figure 4.2: Flowchart on model building. The models were created us-
ing LOLOCV method, to generate unique LUR, LURF and Hybrid
externally-validated models.
to short-lived events. This is significant because it influences the way we think
about the construction and transferability of LUR models. Temporal variables at
a regulatory monitoring site can be used to determine the regional concentrations
80
of the pollutant, and LUR models of enhancements above the regional background
can be layered on top for a more accurate concentration estimate in a specific grid
cell. Furthermore, these LUR models may be more transferable to new domains,
as the influence of regional background has been removed.
Figure 4.3: Relative VIF for LUR and LURF models for standard and decom-
posed signals (short-lived events, long-lived events, persistent enhance-
ments) of PM2.5 . The black dashed lines divide the figure into spatial
(cooler colors) and temporal (warmer colors) variables. The model se-
lected for the VIF analysis was the best performing model for the stan-
dard signal and kept same for the time-decomposed models for consis-
tency.
Figure 4.4 shows that the greatest improvements in performance for PM2.5 pre-
diction modelling were achieved by switching from LUR to LURF, with an average
increase in external cross-validated R2 of 0.19 and decrease in CvMAE by 0.07
(Table 1). This is consistent with our hypothesis that LURF should outperform
the LUR in model building and suggests that LUR models failed to capture the
81
Table 4.1: Changes to averages of R2 and CvMAE by using ‘standard’ and
‘decomposed’ signal LURF and Hybrid models compared to the base
case (LUR built with the standard signal). More details on average per-
formance of models are provided in Appendix B.10.
Standard Signal Decomposed Signal

Pollutant Metric LURF Hybrid LUR LURF Hybrid
R2 +0.19 +0.15 +0.14 +0.17 +0.15
PM2.5
CvMAE -0.07 -0.06 -0.02 -0.04 -0.04
R2 +0.19 +0.17 +0.04 +0.16 +0.14
NO2
CvMAE -0.05 -0.05 -0.01 -0.05 -0.05
R2 +0.10 +0.05 +0.01 +0.11 +0.04
CO
CvMAE -0.01 -0.04 0.00 -0.01 -0.04
Change in R2 = R2i - R2LUR ST D (higher is better)
Change in CvMAE = CvMAEi - CvMAELUR ST D (lower is better),
Where i represents the modified model and LUR ST D is the base case.
complex interactions between covariates that drives PM2.5 concentration. Hybrid
models were developed for cases where testing site concentrations exceeded the
training range, with the hypothesis that this would improve the low-performing
LURF models. These low-performing models were characterized by test sites
whose maximum daily average concentrations exceeded the maximum daily av-
erage concentrations of the training sites. However, adoption of the Hybrid model
didn’t improve the overall model performance or robustness of LURF models. This
implies that even though LURs are technically capable of extrapolation, the ex-
trapolation performance poor. A possible remedy for this would be training LURs
specifically at higher concentration ranges, which is a recommended area of future
work.
From Table 4.1, decomposition lead to dissimilar impacts on PM2.5 model per-
formance when compared to their standard signal counterparts. Although decom-
position decreased the externally cross-validated R2 of the LURF models by 0.02,
82
Figure 4.4: Model performance evaluation of standard and decomposed sig-
nals for CO, NO2 and PM2.5 . Model is evaluated using 3 metrics: (1)
external cross-validated R2 (higher is better; maximum value 1), (2)
MAE (mean absolute error; lower is better) and (3) CvMAE (Coeffi-
cient of variation of MAE; lower is better). All data above are on testing
data (i.e., sites not encountered during model building). A time-series
and scatter plot for measurements and predictions at Allegheny County
Health Department (ACHD) can be found in Appendix B.8 as an exam-
ple.
it increased by 0.14 for LUR models. This implies that decomposition is a poten-
tial method to improve LUR performance. Additionally, although it didn’t improve
the overall performance, decomposition may be advantageous. This is because, as
noted above for LUR, decomposing the signal increased the relative importance of
static variables (Figure 4.3), especially when modelling short-lived events. This
83
implies that local spikes in PM2.5 concentration can be predicted by surrounding
land use.
It is difficult to directly compare performance of our daily average models to
previous studies, which typically model annual or seasonal averages. Performance
of our PM2.5 LUR and LURF models is comparable to other published regression
and machine-learning enabled models [35, 213, 262] for predicting daily average
concentrations (see Appendix B.11 for full analysis), suggesting that low-cost sen-
sors are an acceptable tool for LUR and LURF model building, and that the meth-
ods described here can improve daily average predictions. Comparatively, although
LURF analyzed by the CCAAPS group in Cincinnati had a high R2 of 0.88 when
sensor data was combined with satellite monitoring [35], their R2 using Harvard-
type Impactors only had average R2 of 0.51 [34].
4.6.2 NO2 Models
Traffic related variables (vehicle density, bus fuel consumption) were found in both
LUR and LURF standard models (Figure 4.5). This is consistent with previous re-
search showing that the major source of NO2 in the atmosphere is combustion
[230]. Contribution of spatial variables to different decomposed signals is fairly
consistent between different LUR models, however, varies noticeably for LURF
models, with LURF’s short-lived model primarily driven by spatial variables. Nev-
ertheless, standard and decomposed models of both LUR and LURF are heavily
driven by temporal variables. For this work, annual averages of spatial variables
were used, which doesn’t account for seasonal changes - this is instead accounted
for in temporal variables (e.g., temperature, which is a major contributor in all
models except two). This is important because of several reasons. In colder tem-
84
peratures, the mixing in planetary boundary layer is greatly reduced [144]. Ad-
ditionally, cold weather causes higher fuel consumption [168] and increases cold-
start emissions; emissions per car are higher in lower temperatures. Moreover,
meteorology-based models have been shown to have high correlation with NO2
[135]. Henceforth, it isn’t surprising that temporal variables had a significant in-
fluence on NO2 concentration. The importance of temporal variables hasn’t been
explored independent of seasonal variations and hence, this is an important out-
come of this study.
Figure 4.5: Relative VIF for LUR and LURF models for standard and de-
composed signals (short-lived events, long-lived events, persistent en-
hancements) of NO2 . The black dashed lines divide the figure into spa-
tial (cooler colors) and temporal (warmer colors) variables. The model
selected for the VIF analysis was the best performing model for the
standard signal and kept same for the time-decomposed models for con-
sistency.
Switching from standard signal LUR to LURF increased externally cross-validated
85
R2 by 0.19 and decreased CvMAE by 0.05 (Table 1). This is consistent with our
hypothesis that LURF should outperform LUR and suggests that LUR failed to
capture complex interactions between variables in the model building. Similar to
PM2.5 performance, hybrid LUR-LURF models did not improve the NO2 LURF
model performance, implying that LURF outperformed LUR for all instances.
From Table 1, the decomposed signal increased the average externally cross-
validated R2 of the LUR model by 0.04 and made performance more consistent
(smaller range between 25th and 75th percentile), when compared to standard LUR
signal (Figure 4.4). NO2 is formed through secondary reactions between NO and
O3 and persists more uniformly in the atmosphere. However, NO2 can be variable,
driven by characteristics of local land use - for instance, traffic. Removing the
regional background enables a better fit as it excludes the non-spatial fraction of
the signal (regional background) and thereafter is driven by factors that influence
NO2 , such as traffic. In other words, when regional background concentrations
are excluded from the prediction modelling, the models better reflect the impact of
various combustion related variables on the concentrations leading to an improved
model performance. Although decomposition didn’t improve the average perfor-
mance of LURF or hybrid models (no changes in CvMAE), it made the models
more consistent. Additionally, LURF short-lived models were primarily driven by
static spatial predictors (Figure 4.5) and may better reflect the true relationship
between land use and air quality.
There are limited studies conducted on daily average NO2 in areas with low am-
bient concentrations. Brunelli et al. [37] modelled daily maximum concentrations,
however, it is hard to directly compare our results with the study since they used
regulatory grade monitors and modelled for daily maximum concentrations. This
86
is significant because low-cost sensors perform better at higher concentrations. It
was previously established in Zimmerman et al. [271] that RAMP sensor error
increases substantially with decreasing average concentration. For instance, at 8.7
ppb daily average concentration, Zimmerman et al. [271] suggest a CvMAE of up
to 120% compared to reference monitors. Thus, it is likely that the uncertainty in
the measurement masks the relationship between the measured concentrations and
the predictor variables. Studies conducted in areas with low ambient concentra-
tions using low-cost sensors [94, 110, 191] were modelled for annual averages and
therefore, aren’t directly comparable. Moving forward, we recommend adopting
LURF modelling approaches to regions with higher NO2 to better assess perfor-
mance.
4.6.3 CO Models
Traffic-related variables (vehicle density, road length, inverse distance to the road,
bus fuel consumption) were found to be covariates in all of the constructed LUR
and LURF CO models, including both standard and decomposed signal (except
long-lived models); traffic-related variables contributed between 5-100% of overall
VIF (Figure 4.6). This is consistent with our understanding that the major source of
CO in the atmosphere is traffic. Another interesting observation to note in the VIF
analysis was that the short-lived signals of CO for LUR and LURF models have
dissimilar significant predictor variables. The LURF short-lived model is highly
driven by traffic-related variables (80% contribution by traffic-related variables),
whereas, LUR model is predominantly driven by non-traffic related variables (<
20% contribution by traffic-related variables). Since traffic is one of the major
sources of short-lived CO, LURF’s ability to capture the intricate relationship be-
87
tween different variables may make it preferable to LUR models.
Figure 4.6: VIF for performing models for LUR and LURF for standard and
decomposed signals (short-lived events, long-lived events, persistent en-
hancements) of CO. Black dashed line divides the figure into spatial
(cooler colors) and temporal (warmer colors) variables. Models selected
for the VIF analysis were the best performing model for standard signal
and kept same for other signals for consistency.
Across all models, CO was poorly predicted compared to the other pollutants
examined here (Figure 4.4). In general, the standard signal LURF model performed
better than the LUR model, with externally cross-validated R2 average for LUR
and LURF as 0.30 and 0.40 , respectively. Introduction of hybrid models didn’t
improve the performance (R2 = 0.35), however it decreased the CvMAE by 0.02
(Table 1). Decomposing the signal didn’t improve the model performance for either
LUR, LURF or hybrid signals (CvMAEdecomposed - CvMAEstandard < 0.01; Table
1).
One of the probable reasons for the poor performance of the CO models is that
88
in ambient conditions, CO signals are typically characterized by extremely short-
lived spikes when a high-emitting source is present (e.g., old, poorly tuned vehicle)
followed by long periods of very low concentrations. Given these characteristics,
calculating a daily average concentration dampens the spikes in CO concentrations,
resulting in lack of precision for model building, and hence, unfavorable prediction
models. CO sources are also inconsistent (traffic), and difficult to associate with
different land use variables – especially since annual averages of different spatial
variables (e.g., vehicle density) were considered for modelling. Going forward,
CO modelling may be improved by considering hourly models due to the transient
nature of CO signals. In general, published studies using LUR and LURF to predict
CO are lacking. Brunelli et al. [37] successfully created a LUR model for CO
(Appendix B.11). However, they used regulatory grade monitors and modelled
only daily maximum values. This is important to note because low-cost sensor
measurements at high concentrations are generally more certain. As such, it is
difficult to compare our model to other studies, and this is a recommended area of
future work.
4.6.4 Mapping the models
The final PM2.5 LURF decomposed model was applied to relevant grids in Al-
legheny county for 2017 (i.e., grids for which LURF model variables are available)
(Figure 4.7), the steps for which can be found in Appendix B.12. The range of pre-
dicted concentrations were compared against EPA’s monitored data [13] and spa-
tial distribution was visually inspected and compared with the Pittsburgh Breathe
Project [32].
Annual average concentrations at each grid varied between 10-30 µg m−3 [10-
89
Annual Spring Summer
Average
PM2.5 (μg/m3)
Value
16
9
Fall Winter
Average
PM2.5 (μg/m3)
Value
14
Pittsburgh City
10
0 3 6 12 18 24 Allegheny County 0 5 10 20 30 40
km km
A B
Figure 4.7: Mean annual (figure A) and seasonal (figure B) maps for daily
predicted PM2.5 at every 50x50m grid in Allegheny County, plotted
using decomposed LURF model. Pittsburgh and Allegheny County
boundaries are marked by blue and black lines respectively.
90th percentile: 10.96-12.33]. Seasonally, summer had the highest predicted daily
PM2.5 , followed by winter, fall and spring. This pattern is consistent with PM2.5
data obtained via EPA monitors in Allegheny county for 2017, when averaged sea-
sonally. In general, the annual average concentrations (=11.67 µg m−3 ; averaged
across all grids) were slightly higher when compared to EPA’s Allegheny county
monitors (10 µg m−3 ). This is because the training dataset consisted of observa-
tions above 7 µg m−3 (LOD, Section 4.5.2) and random forests are incapable of
predicting concentrations outside training ranges. Consequentially, the results are
on the higher side. This issue, in future work, could potentially be circumvented
by replacing < LOD values with regional concentration estimates.
Spatially, our model successfully identifies hot-spots in the county (downtown,
highways, rails), (Figure 4.7) as areas with higher concentrations and is comparable
to black carbon spatial maps prepared in the Breathe Project [32]. Visual inspec-
90
tion of the figure 4.7 shows higher concentrations in Pittsburgh City, as is expected
due to the higher population and traffic densities. Moreover, cross-county, high-
ways and major roads have higher predictions (red lines in the map), which is also
a typical pattern that was expected. This was a unique feature of the decomposed
LURF model; the standard LURF failed to resolve roadways (see Appendix B.12).
When compared to EPA’s monitoring data, MAE across the 6 locations was found
to be 2.1 µg m−3 , which is similar to our testing limit (2.6 µg m−3 ; Figure 4.4 and
Appendix B.13), strengthening the reliability and the confidence in the model ap-
plication. As such, the ability of the prediction model to estimate expected spatial
variations is an important outcome of the study and thereafter, can be used in un-
monitored locations to estimate exposure.
Broadly speaking, this study illustrates that low-cost sensors can be useful tools
to build more accurate predictive models of air pollutant concentrations at higher
time resolution than most previously reported models, especially when combined
with more advanced data analysis techniques such as machine learning and wavelet
decomposition to resolve short-lived events from regional backgrounds. Going
forward, we recommend investigating similar analysis techniques in regions with
higher NO2 concentrations (e.g., Europe or other locations where diesel is promi-
nent), testing our models at higher time resolution (hourly) to improve modelling
of CO, and testing model the transferability of PM2.5 models in new geographic
domains.
91
Chapter 5
Using Spatiotemporal Prediction

Models to Quantify PM2.5
Exposure due to Daily Movement

from this paper is presented in Appendix C. This work uses the data from Chapter
4 builds on its results to assess the effect of mobility between different land use
types on the exposure of the residents.
S. Jain, A. Presto, and N. Zimmerman (2023). Using Spatiotemporal Predic-
tion Models to Quantify PM2.5 Exposure due to Daily Movement. Submitted.
92
SJ conducted the literature review and analysis for this work, developed the figures
and wrote the manuscript. NZ acquired the funding for this project, and provided
critical feedback during all stages of the project and the manuscript. AP provided
the original low-cost sensor data used in this work and provided critical feedback
for the manuscript.
5.3 Summary
To date, epidemiological studies have generally not accounted for the spatiotem-
poral variations in PM2.5 concentration that populations experience. These studies
typically infer exposure using home address and annually-averaged concentrations
measured by a few centrally-located monitors. To quantify the impact of spatiotem-
poral variation on exposure estimates, this study uses land-use random forest mod-
els to estimate daily-average PM2.5 concentrations in Allegheny County, USA. The
data were collected using a network of 47 low-cost air quality sensors, and predic-
tions were made for 50x50 m grids in Pittsburgh. Residential (PR ) and commercial
(PC ) probability weighting values were assigned to each grid. The daily-average
predictions were divided into ”weekday” and ”weekend” concentrations for each
grid and averaged annually to estimate total annual exposure. Weighted stratified
sampling was conducted using PR and PC values as probabilities, and weekdays and
weekends as strata. Static models (population spends 24 hours/day in a fixed res-
idential area) and dynamic models (estimates that account for movement between
areas) were created using these samples. The daily-average predicted concentra-
tions across all grids ranged from 4-75 µg m−3 (µ=12.0 µg m−3 ). Weekend con-
93
centrations were 10% higher than weekday concentrations, and commercial area
concentrations were 9% higher than residential areas. These results support the
hypotheses that exposure profiles vary due to movement between different areas
and that exposure is underestimated when residents’ mobility is ignored. Further-
more, the existence of temporal variations between weekdays and weekends sug-
gests that short-term exposure can be improved through behavioral changes. As
low-cost sensor networks adoption grows, this work suggests that epidemiological
exposure models can leverage this data to further refine exposure estimates and
identify behaviors that may reduce exposure.
5.4 Introduction
Exposure to particulate matter (PM2.5 ) is associated with several health problems,
ranging from asthma to premature death [30, 185]. Short-term exposure to high
PM2.5 concentrations can trigger cardiovascular-disease-related mortality and non-
fatal events [36, 66], while long-term exposure can lead to acute and chronic ill-
nesses, including aggravated asthma, cardiovascular disease and lung cancer [185].
As a result, a reduction of annual average PM2.5 concentrations by 10 µg m−3 has
been associated with a 7.3% reduction in all-cause mortality in the US Medicare
population [66].
To mitigate the public health effects of PM2.5 , it is important to accurately
estimate the population’s exposure. However, there are challenges with properly
assessing exposure. Exposure is often inferred from PM2.5 concentrations taken via
only a few centrally-located outdoor monitors [75, 128, 265]. In contrast, previous
studies have shown that small-scale spatial variations in PM2.5 exist [71, 244]. As
such, peoples’ movement exposes them to changing pollution concentrations, re-
94
sulting in varying exposure profiles, which can impact a given person’s consequent
health [66, 109]. Additionally, exposure misclassification due to unaccounted hu-
man mobility can have effects on the inferences derived and consequently, on rel-
evant policies. Residence-only exposure profiles have been found to result in neg-
ative biases in the estimates [136, 171, 201, 264], in essence, the relative risk is
underestimated by ignoring mobility.
The effect of mobility on exposure levels has previously been assessed using
personal monitors and comparing personal monitoring concentrations with ambi-
ent concentrations at home residences and applying correction factors. However,
even though personal monitors are the most accurate method to estimate personal
exposure, they have both logistic and cost constraints, such as recruiting an ade-
quate number of individuals from representative populations to carry the monitors.
Furthermore, the characteristics of participants (e.g., age, gender) may affect the
accuracy of the correction factors used to infer personal exposure from ambient
PM2.5 [16]. An alternate way of addressing the impacts of mobility and the dis-
crepancy between personal exposure and at-residence concentrations is by includ-
ing the spatial variability of the pollutant [109, 125, 132, 172].
Most exposure studies are based on residential address [66]; the daily move-
ment of an individual (for work, recreation etc.) isn’t typically accounted for.
Consequentially, the impact of spatial movement in a person’s day is often not
represented in epidemiology studies. There are some studies that have estimated
movement-based exposure, using mobile phone data [65, 171, 264] activity-based
data [23], or agent-based models [136]. However, their spatial resolution is coarse,
ranging from 400 m [171] up to 3 km [23, 264].
The lack of spatial resolution via ground measurements primarily exists be-
95
cause dense networks of regulatory monitoring stations aren’t feasible due to their
high initial capital investment and ongoing maintenance costs (USD 10,000-100,000
per pollutant). To overcome the shortcomings associated with monitoring stations,
lower-cost sensing technologies have increasingly been used as an alternative due
to a combination of improved sensor technologies and researcher-developed meth-
ods for sensor calibration [46, 58, 182, 210, 271]. Due to the low-cost and low
power demands of low-cost sensors (LCS), they can be deployed to form a dense
network, which can assist in capturing small-scale spatial variations. As such,
there are opportunities to use LCS to increase our understanding of air pollution
exposure.
For this work, we used previously published predicted concentration data at ev-
ery 50x50 m grid in the City of Pittsburgh [105] to estimate exposure of Pittsburgh
residents to PM2.5 informed by land-use type. The objective of this work is to
compare the base case used in epidemiology (PM2.5 estimated at home addresses;
static models) with an estimate where people spend time at both home and work
or commercial locations (dynamic models). In this work, we have excluded expo-
sure during commuting or transit; i.e., we are not replicating personal exposure.
Instead, the term ‘movement’ implies that people aren’t necessarily always located
in the same place and may move from one land-use type to another.
5.5 Methods
5.5.1 PM2.5 Measurements
As a part of Center for Air, Climate, and Energy Solutions (CACES) air qual-
ity monitoring network, a network of 47 Real-time Affordable Multi-Pollutant
96
(RAMP) sensors (developed by SENSIT Technologies) was deployed in Allegheny
County between August 2016 and December 2017. These sensors use a commer-
cial nephelometer (either a Met-One Neighborhood Monitor or a PurpleAir PA-II)
to collect 15-minute resolution PM2.5 data. The collected data has previously been
calibrated according to Malings [142] against collocated US EPA data and down-
averaged to daily concentrations [105]. For this work, we identified 5 µg m−3 as the
smallest concentration that can be reliably measured by the sensors (limit of detec-
tion, LOD). This was approximately 9% of the total data set across 47 sites and the
√
15-minute data below LOD was replaced with 3.53 µg m−3 (LOD/ 2) [99, 222].
More information on LOD determination can be found in Appendix C.1.
5.5.2 Modeling
The general methodology adopted for prediction modeling of PM2.5 concentrations
has previously been described in Jain et al. [105], with one primary change - for this
√
work, we replaced concentrations below LOD with LOD/ 2 (in Jain et al. [105],
concentrations below LOD were removed). We opted for this change to avoid the
data set being skewed high.
Briefly, the collected RAMP data was processed using signal decomposition
(wavelet decomposition) into 4 separate signals [272]: (1) regional concentrations,
(2) persistent enhancements above regional (> 8 h) events, (3) long-lived (2-8 h)
events and (4) short-lived ( < 2h) events. The latter three signals were individually
modelled using land-use random forests (LURF) and subsequently added together
with regional concentrations and tested for validation using the leave-one location-
out cross-validation (LOLOCV) technique [116, 246]. Various spatial and tempo-
ral variables were used as predictors in the model. The variables used in the final
97
Temporal Variables - Temperature, Wind, Precipitation, EPA Deployed 50 low-cost sensors across
Jain et. al., 2021

PM and EPA CO Allegheny county
Input variables
Spatial Variables - Elevation, Inverse Distance to the Road,
Housing Density (100m), Population Density (100m), Vehicle Developed prediction models for daily
Density (50m; 100m), Rail Length (100m), Road Density PM2.5 using time-decomposed
(50m; 100m), Bus Fuel Consumption (50m; 100m) random forest methods
Divide the city in 50x50m grid Remove the grids with =>50% spatial
(total = 57,768 grids) variables outside of training model limits
Land use variables (housing Analyze weekly,

Select the remaining grids seasonal and
& commercial densities) at
(total = 44,595) annual variations
100m buffer (Data: Allegheny
County GIS Group)
Assign residential/commercial probabilities Predict daily PM2.5 concentrations

to each grid using 100m buffer values at every grid for 2017
Divide the predictions into 'weekday' and Combine the land use information with
'weekend' concentrations predicted concentrations
Static - the population is in

Sample the dataset using weighted residential area 24 hours/day
stratified sampling for residential & Calculate
commerical area for weekday and weekday exposure Dynamic - the population moves
concentrations between residential &
commercial areas
Figure 5.1: Flowchart of steps involved in this work. The blue box at the
top represents the results from Jain et al. (2021) used for this work. The
grey boxes are the outcomes. EPA CO and EPA PM in the blue box refer
to daily measurements of CO and PM2.5 by the US EPA’s Lawrenceville
site in the City of Pittsburgh.
models can be found in Figure 5.1 (blue box). Full details of all variables assessed
are detailed in Jain et al. [105] (See Table C.1 in Appendix C for a summary). The
value in brackets refers to the buffer sizes. Multiple buffer sizes represent different
buffers used for different signals.
5.5.3 Predictions
The City of Pittsburgh was divided into grids of 50x50 m (total grids = 57,768) to
quantify the small-scale variations in PM2.5 concentrations. We quantified PM2.5
98
concentrations at each grid using a spatiotemporal land-use random forest model as
it was identified to render more robust models (see Appendix C.3 for more details).
Grids with ≥ 50% of the spatial variables exceeding training model limits (both
upper and lower limits from 47 training sites) were excluded from the assessment
since random forests are incapable of extrapolation (remaining grids = 44,595, 77%
retained).
5.5.4 Land Use: Residential and Commercial Areas
Each 50x50 m grid in Pittsburgh was assigned a residential and commercial value.
This was done using a land cover area data set provided by the Allegheny County
GIS group [12], that demarcated land cover into 14 types (Appendix C.3). For as-
signing a residential density value, we calculated the total area in a 100 m buffer
with land use type demarcated as ‘Residential’ (Types 6, 7 and 8 in Appendix C.3).
The maximum residential value was calculated as 31,425 m2 (Area = πr2 ; r = 100
m), when all the locations in the buffer area were categorized as ‘Residential’; and
the minimum value was 0 m2 when none of the locations in the buffer area were
categorized as ‘Residential’ (Appendix C.4). Similarly, commercial values were
assigned using areas demarcated by the Allegheny GIS Group as ‘Commercial’
(Type 10 in Appendix C.3). The LURF model used for PM2.5 prediction deter-
mined that 100 m was the optimal buffer size for housing density, therefore, we
chose the same buffer size for the estimating residential and commercial values
(Figure 5.1). These residential and commercial values were then normalized on a
scale from 0-1 and used as probability weights, PR and PC , respectively, for sam-
pling (Section 5.5.5).
We chose to assess population exposure using this split between residential and
99
commercial areas to acknowledge that a population spends time in both areas al-
most every day, but the exposure profiles might be different due to various factors
(e.g., higher vehicle emissions in commercially-dense areas). The exposure esti-
mate can be further improved by tracking individual people via personal notes or
cellular network data. However due to lack of movement data, this is a recognized
limitation of this work.
5.5.5 Sampling
For this work, we defined a sample as the average concentration (for the defined
period) at a grid cell that is picked via weighted stratified sampling. To determine
the minimum number of samples required to represent the population, the formula
in Equation 5.1 was used [252]:
z ∗ σ 2
n≥ (5.1)
MOE
In Equation 5.1, z represents the desired confidence level (z=1.96 at 95% CI)
and σ is the standard deviation (annual average at each grid cell, 1.88 µg m−3 ).
MOE, margin of error, is the acceptable tolerance level or sensitivity, set as the
least count of PM2.5 measured by the RAMPs for this work (=0.01 µg m−3 ). With
these inputs, the number of samples was determined to be approximately 140,000,
as shown in Equation 5.2.
2
1.96 ∗ 1.88
n≥ = 135, 778 ≈ 140, 000 total samples (5.2)
0.01
The daily predictions were then separated into ‘weekday’ and ‘weekend’ con-
centrations for each grid to acknowledge difference in human movement patterns
100
during different days of the week and then averaged annually to estimate total an-
nual exposure for different models (i.e., the weekend concentration of a sample is
the average concentration over all 52 Saturdays and Sundays in 2017 at the selected
grid cell. See Appendix C.5 for more details).
Sampling
n = 140,000 n = 140,000
Residential; Commercial;
Probability = PR Probabilty = PC
n = 100,000 n = 40,000 n = 100,000 n = 40,000
Weekday Weekend Weekday Weekend

(RD) (RE) (CD) (CE)
'a' 'b' '24-a' '24-b'

Static
model
Dynamic
model
Figure 5.2: Flowchart for weighted stratified samples and resultant static and
dynamic models. ‘a’ and ‘b’ represent the hours spent in the selected
land-use type over weekdays or weekends.
Sampling was achieved using a weighted stratified sampling (with replace-
ment) method, in which the population is divided into homogeneous strata (strata
for this work: Weekdays and Weekends) and samples are selected from each stra-
101
tum based on the assigned probability weights (Figure 5.2). The probability weights
PR and PC were used to calculate sampling fraction, such that, grids with higher
probability weights were sampled more. For instance, between two grid cells with
PR values as 0.1 and 0.2, the latter is twice as likely to be picked as a sample.
For each of the land use types (residential and commercial), a total of 140,000
samples was taken. The total number of samples were then divided into weekday
and weekend concentrations based on population size of the strata (weekday size =
5 days, weekend size = 2 days). Therefore, 5/7*140,000 = 100,000 samples were
taken of weekday concentrations and 2/7*140,000 = 40,000 samples were taken of
weekend concentrations. The samples for residential and commercial areas were
assessed for statistically significant differences using the Welch Two Sample t-test.
The Welch Two Sample t-test was used since it doesn’t assume that the two data
sets have equal variances.
5.5.6 Static and Dynamic Models
For this work, we used the term “exposure” as a proxy for time-weighted concen-
trations, i.e., for the total concentration of PM2.5 that the population experiences in
different locations (Equation 5.3).
(Concentration × Time)Location A + (Concentration × Time)Location B

Exposure =
Total time
(5.3)
The ‘Static’ models assumes that residents spend 24 hours in a day in residen-
tial areas. ‘Dynamic’ models were defined as the models that account for move-
ment between commercial and residential areas. These models were created using
102
Equations 5.4 and 5.5 and were used to estimate difference in exposure to PM2.5
due to daily movement.
100,000
(
∑n=1 RD
Static model (samples) = (5.4)
40,000
∑n=1 RE
100,000
∗ RD + (24−α)
( α

∑n=1 24 24 ∗CD
Dynamic model (samples) = (5.5)
40,000 β (24−β )
∑n=1 24 ∗ R E + 24 ∗CE
In Equations 5.4 and 5.5, R and C refer to sample concentrations taken for
residential and commercial areas respectively. Subscripts D and E are the time
periods, used for weekdays and weekends respectively (e.g., RD is the sampled
concentration for the sample residential area over weekdays). α represents the
number of hours spent in residential areas over weekdays, whereas β represents the
number of hours spent in residential areas over weekends. As such, the individual
exposure level will vary depending on the amount of time spent in each area.
For our analysis, α and β were estimated as 12 and 18 hours respectively for
the Dynamic models, informed by data provided by US Bureau of Labor Statistics
[38], to facilitate comparison with static models. The concentrations for static
and dynamic models were assessed for statistically significant differences using
the Welch Two Sample t-test. As mentioned previously, this test was chosen as it
doesn’t assume that the populations have equal variances.
The models (Equations 5.4 and 5.5) are a fractional split and are not a true
representation of time spent in residential or commercial areas (i.e., the model
isn’t informed by sub-daily movement; PM2.5 concentrations are modelled as daily
103
averages). While the RAMP sensors have sub-daily measurement time resolution,
PM2.5 concentrations were modelled as daily averages due to prediction modeling
constraints (specifically a lack of sub-daily model inputs such as hourly traffic) and
this is an identified limitation of this work.
14
Measured PM2.5 (µg/m3)
13
12
11
10
9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Hour
High Commercial Density High Residential Density
Figure 5.3: Sub-daily variations at low-cost sensor sites in the City of Pitts-
burgh (25 out of 47 total sites in Allegheny County) with high residential
(PR ; blue boxes) and high commercial values (PC ; red boxes) (top 5 sites
for commercial and residential density each).
Nonetheless, we observed sub-daily variations at the 47 sites where low-cost
sensors were deployed (Figure 5.3). Although the nighttime concentrations were
similar across the residential and commercial land use types, the sites with high
commercial density (PC ) were characterized by higher concentrations during day-
time. As such, the static and dynamic models compared in this study are likely to
have larger differences than reported here - this is because people are more likely to
be in a commercial area at 2 PM (when the concentrations are higher in commercial
104
Figure 5.4: Boxplots for annual averages of predicted PM2.5 for residential
(plots with solid colors) and commercial (plots with diagonal lines)
land-use type separately. Weekday and weekend averages are also
shown separately to represent difference in concentration over different
days of the week. Summer and winter concentrations are also displayed
separately to represent the difference in concentration over different sea-
sons.
areas) than 2 AM.
5.6.1 Temporal Variations
The daily average predicted concentrations across all grids ranged from 4 to 75
µg m−3 , with an average of 12.0 µg m−3 [10th -90th percentiles: 6.6-18.6 µg m−3 ].
When averaged annually, concentrations varied between 10 and 30 µg m−3 across
all grids, with an average of 12.2 µg m−3 [10th -90th percentiles: 11.1-13.8 µg m−3 ]
(Figure 5.4). These results illustrate the range of concentrations to which residents
were exposed.
105
The annual average concentration at the US EPA’s Lawrenceville site [13]
(AQS ID: 42-003-0008; Pittsburgh, PA) was reported to be 9.2 µg m−3 . However,
the RAMP collocated at the Lawrenceville site had an annual average measure-
ment of 11.3 µg m−3 , suggesting that RAMPs may be biased high, with an average
difference of approximately 2 µg m−3 .
On average, summer (May-October) had higher concentrations than winter
(November-April), with mean summer and winter concentrations of 13.4 and 11.0
µg m−3 , respectively. Weekend (Saturday-Sunday) concentrations were 10% higher
than weekday concentrations, with average concentrations of 11.9 and 13.1 µg m−3 ,
respectively, across all grids.
Figure 5.4 shows the spatial variations in annual averages of predicted PM2.5
during weekdays (Monday-Friday) and weekends (Saturday-Sunday) and during
summer (May-October) and winter (November-April) seasons.
Overall, our results highlight important temporal variations in concentrations.
Summer concentrations were 20% higher than winter concentrations, and week-
end concentrations were 10% higher than weekday concentrations. These pat-
terns are consistent with PM2.5 data obtained via a US EPA monitor in the City
of Pittsburgh, which showed approximately 1.1 µg m−3 higher concentrations over
weekends compared to weekdays [227]. The higher weekend concentrations are
likely due to increased traffic (especially trucks) in Allegheny county on week-
ends [39]. While previous studies have examined daily variations in concentrations
[136, 171], our findings reinforce the existence of temporal variations and suggest
the potential for improving short-term exposure through behavioral changes, such
as choosing lower traffic roads when out on weekends.
106
5.6.2 Spatial Variations
As introduced in Section 5.6.1, annual average concentrations across all grids var-
ied between 10-30 µg m−3 . This suggests that residents were exposed to a wide
range of concentrations, which may be underestimated by relying on a single or
limited number of stationary monitors.
Upon visual inspection, highways and major roads were found to have higher
predicted PM2.5 concentration (red lines in Figure 5.5), which is a typical pattern
that was expected [32]. The model also identified hotspots and potential areas of
concern (areas with high traffic count or high population, such as downtown, high-
ways, railways). As such, along with details about personal movement between
different areas (e.g., between different grid cells), the maps developed in Figure
5.5 can be a useful tool in estimating the exposure of an individual.
By separating grid cells labeled as residential or commercial from the weighted
stratified sampling, the commercial areas had 0.4 µg m−3 higher median values.
The mean for commercial areas was 1.1 µg m−3 higher (sample standard devia-
tions: σresidential : 0.7 µg m−3 ; σcommercial : 2.6 µg m−3 ) (Figure 5.4) and the differ-
ence between averages were found to be statistically significant (p < 0.05). Ad-
ditionally, the overall range of concentration that the modelled population was ex-
posed to in commercial areas was noticeably higher, with the difference in 90th
percentile concentrations up to 3.7 µg m−3 (30%). Although outside the scope of
this work, the analysis of spatial variability holds potential for assessing various
facets of home vs mobility exposures, such as, exposure differences in suburban
vs inner-city residents, identifying areas of recreation that may influence the total
exposure of residents, or effect of work location on daily average exposure.
107
Annual average concentrations
PM2.5
(µg/m3)
15
10
0 1 2 4 6 8
km
Weekly variations
15.5
Weekday Weekend
0 2 4 8 12 16
km
10
Seasonal variations
16.5
Winter Summer
0 2 4 8 12 16
km
9.5
Figure 5.5: Spatial variations in annual averages of predicted PM2.5 during

weekday (Monday-Friday) and weekends (Saturday-Sunday) and dur-
ing summer (May-October) and winter (November-April) seasons.
108
We acknowledge that both measurement and modeling uncertainty pertain to
this work and therefore we address this in Appendix C.6. Broadly speaking, al-
though absolute difference between static and dynamic models differed when un-
certainties were taken into account, we found that average concentrations at com-
mercial areas were always higher. As such, addressing the uncertainties reinforced
our results that the average PM2.5 concentration that the population was exposed to
was always higher when the population moves into commercially-dense areas vs
when the population stays in residential areas only.
5.6.3 Static and Dynamic Models
We estimated exposure when the spatial mobility was ignored (static models) and
compared it to estimated exposure when the spatial mobility was addressed (dy-
namic models). Static and dynamic models were based on time-weighted PM2.5
concentrations from different locations in the study area. Therefore, exposure es-
timates represent spatially averaged concentrations, resulting in pseudo-mobility-
based exposure.
We used the annual average of the daily predicted concentrations for assess-
ment of the static and dynamic models. The differences between the static and
dynamic models were found to be statistically significant (p < 0.05), observed for
140,000 samples. The difference in concentration between different land-use types
(residential and commercial) resulted in variations in exposure, i.e., this resulted in
higher exposure across population for dynamic models compared to static models.
In all instances, the dynamic model had higher average exposure compared to the
static model, with an average difference up to 1.1 µg m−3 , when population spends
all their time in commercial areas (Figure 5.6). To understand potential individual
109
mobility effects, we also report the 10th and 90th percentile of differences between
the static and dynamic models (10th percentile: 0.1; 90th percentile: 3.73 µg m−3 ),
suggesting that for an individual, pollutant exposure differences may be as high as
approximately 4 µg m−3 .
Difference in Exposure between Static and Dynamic Models
1.2
0
1
2
Hours spent in residential areas over weekdays, α
1.0
4
5
6
7
0.8
8
9
10
11
0.6
12
13
14
15
0.4
16
17
18
19
20
0.2
21
22
23
24
0.0
0 2 4 6 8 10 12 14 16 18 20 22 24
Hours spent in residential areas over weekends, β PM2.5 (µg/m3)
Figure 5.6: Scalar graph of difference in exposure between static and dy-
namic models informed by amount of time spent in residential area over
weekdays and weekends separately, calculated using Equations 5.4 and
5.5.
When assessed for α and β as 12 and 18 hours respectively in Equation 5.5, the
mean exposure using the dynamic model was 0.5 µg m−3 higher compared to the
static model and the difference between averages were found to be statistically sig-
nificant (p < 0.05), (x̄static = 11.8 µg m−3 , s = 1.0 µg m−3 ; x̄dynamic = 12.3 µg m−3 , s =
110
1.3 µg m−3 ) (see Appendix C.7). The 90th percentile concentration for the dynamic
model was 0.9 µg m−3 (7%) more than the static model. The mean difference (MD)
and mean absolute error (MAE) were 0.5 µg m−3 and 0.7 µg m−3 , respectively. The
mean difference was higher over the weekdays (0.6 µg m−3 ) compared to weekends
(0.3 µg m−3 ).
A few studies have previously calculated dynamic exposures and the impact
of movement on exposure estimates Nyhan et al. [171]. used mobile network data
for mobility and estimated a difference between static and dynamic model of 0.02
µg m−3 . Similarly, Lu [136] used agent-based models and estimated a difference
of 0.05 µg m−3 . However, the above-mentioned research lacked fine spatial reso-
lution (≤ 50 m) and is potentially one of the reasons behind smaller differences
between the static and dynamic models than what was observed here (0.5 µg m−3 ,
for the typical case we have considered). This may be due to our models captur-
ing fine spatial scale variations in PM2.5 concentration. Our work also supports
the importance of low-cost sensors to improve exposure estimates. This is in line
the findings of Lu [136]. However, our approach to is likely less intensive com-
putationally when compared to agent-based models and as a result, may be more
readily applied elsewhere. Additionally, none of the previous studies to our knowl-
edge have separately analyzed weekday and weekend concentrations, which is an
important outcome of our work and is recommended in future studies.
5.7 Conclusions
This work addresses some of the shortcomings associated with using static models
(assuming that the population spends their whole time at home) to inform exposure
for epidemiology studies. To do this, we leveraged data collected via a low-cost
111
sensor network in Pittsburgh to estimate exposure across the city. The results of
this work support the hypothesis that exposure estimates would be impacted by
movement of an individual between different areas due to (1) spatial variations in
PM2.5 concentrations, particularly in commercial areas, and (2) temporal variations
such a weekend vs weekday differences.
Given our findings, a centrally-located monitoring station is not recommended
for exposure assessment of the whole population as it could result in negative bi-
ases in health effect estimates, i.e., we may be underestimating exposure by using
a few centrally-located monitors and residential address. Even though absolute
PM2.5 concentration differences in this study were small, the resulting impact on
health may still be substantial. This is supported by a recent report from the Health
Effects Institute describing that even a 4.16 µg m−3 (one interquartile range in the
study population long-term concentrations) increase in average annual PM2.5 con-
centration is associated with a 1.034 hazard ratio for total nonaccidental death [95%
CI: 1.030-1.039] [31]. Furthermore, this same study concluded that there was no
PM2.5 concentration below which no health effects were observed [31]. This sug-
gests that even small-scale reductions in PM2.5 concentration are beneficial and this
warrants further research.
This study used low-cost sensor network data to create spatiotemporal pollu-
tant concentration models and investigated the model’s utility to identify hotspots
and subsequent variations in exposure based on the location and movement pat-
terns. However, this work doesn’t capture the unique movement of an individual
and is one of the identified limitations. This would require movement data via
cellular networks or personal notes, both of which were not in the scope for this
work. Additionally, daily pollutant concentrations are approximated for sub-daily
112
movement. This is due to a lack of time resolution in prediction model inputs,
such as hourly average traffic volume, and is another identified limitation of this
work. Going forward, if the appropriate sub-daily predictors become available,
we recommend the development of hourly pollutant land use regression models,
which could then be paired with agent-based models or with travel survey data
to simulate individual daily exposures. Low-cost sensors are also unable to de-
tect the composition, size, or elemental carbon to organic carbon ratios of PM2.5 ,
which may have associated health impacts of exposure to PM2.5 and is an identified
disadvantage of using LCS technology for exposure assessment [89]. This study,
nonetheless, provides an overarching estimate of total exposure, and highlights the
utility of LCS networks for identifying hotspots, and underestimation or overesti-
mation in reported exposures. This work also assumes that indoor concentrations
of PM2.5 are comparable to outdoor concentrations, which is typically untrue and
is another area of future study. This work provides a more granular look at spatial
variations in outdoor concentrations and provides an opportunity to improve our
overall exposure estimates if combined with indoor-outdoor ratios.
113
Chapter 6
Identification of Neighbourhood
Hotspots via the Cumulative
Hazard Index: Results from a
Community-Partnered Low-cost
Sensor Deployment

from this paper is presented in Appendix D. This work uses the data collected via
11 RAMPs deployed in the Strathcona neighbourhood to assess intra- and inter-
neighbourhood variabilities.
114
S. Jain, R. Gardner-Frolick, N. Martinussen, D. Jackson, A.Giang and N. Zim-
merman (2023). Identification of Neighbourhood Hotspots via the Cumulative
Hazard Index: Results from a Community-Partnered Low-cost Sensor Deploy-
ment. Prepared for submission.

This work was proposed by SJ and RGF, who also independently secured funding
via the Public Scholars Initiative. Ethics approval was obtained by SJ and RGF,
with support from NZ and AG. The field work was designed by everyone and
executed by SJ, RGF and NM. DJ provided community support, his insights on
the neighbourhood during a walking tour, and helped with recruitment of hosts. I
conducted the literature review and analysis for this chapter, developed the figures
and wrote the manuscript. AG and NZ provided critical feedback during all stages
of this project. AG provided insights on the methodology and NZ provided critical
feedback on the manuscript.
6.3 Summary
The Strathcona neighbourhood in Vancouver is particularly vulnerable to environ-
mental injustice due to its close proximity to the Port of Vancouver, and a high
proportion of Indigenous and low-income households. Furthermore, local sources
of air pollutants (e.g., roadways) can contribute to small-scale variations within
communities.
The aim of this study was to assess hyperlocal air quality patterns (intra-neighbourhood
variability) and compare them to average Vancouver concentrations (inter-neighbourhood
variability) to identify possible disparities in air pollution exposure for the Strath-
115
cona community. Between April and August 2022, 11 low-cost sensors (LCS)
were deployed within the neighbourhood to measure PM2.5 , NO2 , and O3 concen-
trations. The collected 15-minute concentrations were down-averaged to daily con-
centrations and compared to greater Vancouver region concentrations to quantify
the exposures faced by the community relative to the rest of the region. Concentra-
tions were also predicted at every 25 m grid within the neighbourhood to quantify
the distribution of air pollution within the community. Using population informa-
tion from census data, cumulative hazard indices (CHIs) were computed for every
dissemination block.
We found that although PM2.5 concentrations in the neighbourhood were lower
than regional Vancouver averages, daily NO2 concentrations and summer O3 con-
centrations were consistently higher. Additionally, although CHIs varied daily, we
found that CHIs were consistently higher in areas with high commercial activ-
ity. As such, estimating CHI for dissemination blocks was useful in identifying
hotspots and potential areas of concern within the neighbourhood. This informa-
tion can collectively assist the community in their advocacy efforts.
6.4 Introduction
It is well established that air pollution has many associated health impacts, which
can range from asthma to premature mortality [30, 36, 66, 185]. Even low-levels
of air pollution can be detrimental to human health; according to a recent Health
Effects Institute report, there is no concentration below which the negative health
impacts of PM2.5 are not observed [31].
The cumulative effect of chronic exposure to multiple pollutants can exacer-
bate these health impacts, even at levels below national benchmarks [259]. Addi-
116
tionally, cumulative effects of different pollutants can highlight different areas of
concern than when assessments are conducted for individual pollutants separately
[80, 217]. Since people breathe a mixture of air pollutants, cumulative assessment
can be more representative of human exposure. In Canada, the AQHI (Air Quality
Health Index) is used for health assessment of air quality, which includes cumula-
tive effects of PM2.5 , NO2 and O3 [87].
Air pollution concentrations have small-scale variations [25, 244] and can vary
greatly in space and time [19, 71, 130, 244]. Socially or economically marginalized
communities, including low-income, people of colour and Indigenous communi-
ties, are often disproportionately exposed to air pollution [54, 161, 183, 249]. As
such, in addition to environmental exposure, marginalized groups often also experi-
ence social and political marginalization, which can be due to inequitable access to
healthcare and policy decisions, further increasing their vulnerability to the health
impacts of air pollution [175]. As such, risks arising from air pollution exposure
are not equitably distributed.
Canada, and British Columbia in particular, generally have good air qual-
ity, but there are areas that are disproportionately impacted by higher concen-
trations and that have higher populations of potentially vulnerable populations
[81, 108, 121, 183]. One such area in British Columbia is the Downtown Eastside
and Strathcona neighbourhood in Vancouver, from both a geographic and demo-
graphic perspective. Geographically, in addition to typical Vancouver sources of
air pollution like residential wood burning and construction, Strathcona and the
Downtown Eastside are located next to the Port of Vancouver. The Port of Van-
couver is the largest port in Canada and in the top five in North America by tonnes
of cargo [56], and as such has significant associated ship, truck, and rail traffic.
117
These port activities are associated with emissions of particulate matter (PM2.5 )
and nitrogen oxides (NOx : NO and NO2 ) [82]. The neighbourhood is also adjacent
to major roadways and Downtown Vancouver, which further exacerbates air pol-
lution via light-duty vehicle emissions of NOx . From a demographic perspective,
Strathcona and the Downtown Eastside have a high proportion of low-income, un-
housed, immigrant, and Indigenous people [52]. The diversity and potential social
vulnerability of many residents in these neighbourhoods, combined with higher
than regional air pollution resulting from various local sources, means that the
people of Strathcona and Downtown Eastside may be disproportionately exposed
to air pollution.
However, assessing the cumulative hazard of air pollution in Strathcona is not
feasible with the current regulatory monitoring network. The City of Vancouver
has only one air quality monitoring station that measures all pollutants, making
it difficult to identify areas of concern within the city or specific neighbourhoods.
The sparse distribution of regulatory monitors is typically due to the high capital
and maintenance costs involved [45, 140, 206]. One potential solution for obtain-
ing more spatially representative data is the use of low-cost air quality sensors
(LCS). LCS cost a fraction of regulatory stations [45] and can operate on battery
or solar power. This provides an opportunity for a denser sensor network, capable
of capturing small-scale variations in pollutant levels [26, 156, 182, 210].
In this study, we partnered with Strathcona Residents Association (SRA) and
deployed 11 multi-pollutant low-cost sensors in the Strathcona and Downtown
Eastside neighbourhoods in Vancouver for a duration of 6 months in 2022, with
the aim of capturing the small-scale spatial variability in pollutant concentrations.
By comparing the measured concentrations with the average concentrations in the
118
broader Vancouver region, we investigated the effectiveness of using LCS to iden-
tify disparities in air quality. Additionally, we calculated cumulative hazard indices
(CHIs) as a method to identify hotspots and areas of concern within the neighbour-
hood.
6.5 Methodology
6.5.1 Study Area
Strathcona is a neighbourhood located within the City of Vancouver and is clas-
sified as one of the 22 planning areas [53], with a population of approximately
12,600, as of the 2016 census [52]. The neighbourhood encompasses mixed land
use types, including residential, commercial, and industrial areas. Residential ar-
eas consist of privately owned homes, as well as collective dwellings that house
20% of the population, including senior residences and single room occupancy
(SRO) hotels [52]. Strathcona is surrounded by industrial facilities on its south and
east sides (produce terminal, recycling facility, small chemical processing plants),
and a shipping yard on its north side (Centerm and Vanterm container terminals of
Port of Vancouver) [216]. The neighbourhood’s western border adjoins Downtown
Vancouver, another planning area of the City of Vancouver. The portion of Down-
town Vancouver adjacent to Strathcona is called the Downtown Eastside (DTES),
which is home to Chinatown and Gastown, the historic center of Vancouver [53].
Figure 6.1 shows the locations of major roadways, rail lines and industrial sources
within the study area (dashed black line).
In Strathcona, approximately 10% of the residents identify as Indigenous, which
is the highest proportion compared to any other neighbourhood in Vancouver (city
119
Figure 6.1: The Strathcona and Downtown Eastside neighbourhoods of Van-
couver that were studied in this work (black dashed line; 3km x 1km).
Green lines are the rail lines within the study area, and orange lines
highlight the major roads (line sources of air pollution). Red markers
identify major point sources of air pollution (port, industries). Blue star
markers are the deployment locations of the RAMPs.
average = 2.4%) [52]. Moreover, 52% of the population in Strathcona has house-
hold incomes below the national poverty line, which is notably higher than the
citywide average of 20% [52]. According to a report by the City of Vancouver,
about 22% of the residents in Strathcona are unhoused or living in SROs, and are
not represented in the census demographics [52]. Strathcona and the DTES area to-
gether account for 52% of the total unhoused individuals in the City of Vancouver
[147].
6.5.2 Community Partner: Strathcona Residents Association
The Strathcona Residents Association (SRA) is a volunteer-based nonprofit orga-
nization that represents residents and workers in the Strathcona neighbourhood of
120
Vancouver. In 2019, the Port of Vancouver initiated construction activities to ex-
pand the Centerm and Venterm container terminals, aiming to boost their cargo
handling capacity by 50% [78]. This expansion is projected to lead to an increase
in the volume of ships, trains, and trucks passing through Strathcona. In 2021,
the SRA conducted a survey among its residents to assess their perspectives on
air quality in the neighbourhood. The survey revealed that out of 181 partici-
pants, 84% viewed the air quality as gradually declining. Furthermore, 79% of
the participants expressed being very concerned regarding exhaust emissions from
heavy-duty diesel trucks transporting shipping containers through the neighbour-
hood [216].
Recognizing these concerns, the study team at the University of British Columbia
(UBC) partnered with the SRA to assess the community’s concerns about their
air quality. We proposed a comprehensive plan to collect air quality data, sought
approval from the UBC ethics board (UBC Ethics ID: H21-02425) and secured
funding through the UBC Public Scholar Initiative. The initiative by University
of British Columbia [238] supports doctoral students whose research supports and
contributes to public good.
6.5.3 Low-cost Sensors
The low-cost pollutant monitoring system used for this work is the Remote Air
Quality Monitoring Platform (RAMP, SENSIT Technologies), that cost around
CAD 4000 (less than 5% of the cost of regulatory grade monitors). The RAMP
package combines a power supply (battery-operated, solar powered or both), a SIM
card slot for online transmission of data via cellular network, a memory card for
data storage, and gas and particle sensors in a weatherproof enclosure. The RAMP
121
includes a commercial nephelometer to measure PM2.5 (Plantower PMS5003), and
electrochemical sensors for NO2 and O3 (Alphasense NO2-B43F and Alphasense
Ox-B431). It also records temperature (T) and relative humidity (RH). The RAMP
records data with a 15-second sampling resolution.
Since LCS systems need routine calibration across the full range of expected
meteorological conditions and pollutant concentrations during deployment to achieve
good performance [58, 146, 150, 160, 178], we collocated the RAMP sensors at
Metro Vancovuer’s Clark Drive Near-Road regulatory monitoring station before
and after the campaign for a total of 62 days. Calibration models were built for
each RAMP using previously published calibration techniques [141, 142] (a mul-
tiple linear regression model for PM2.5 and hybrid random-forest-multiple-linear
regression model for NO2 and O3 ) after down-averaging the data to 15-minute res-
olution to reduce the effect of noise [142]. The performance of the calibration
models was assessed using two metrics: R2 (coefficient of determination, linear
least squares regression of calibrated concentrations versus observed concentra-
tions; higher is better) and MAE (mean absolute error; lower is better). We also
reported relative error in the calibration models by calculating CvMAE (coefficient
of variation of MAE; lower is better) using Equation 6.1. The calibration models
had varied performance across different pollutants, and is reported in Table 6.1.
1
n ∑ni=1 |Calibrated valuei − Observed valuei |
CvMAE = (6.1)
Average observed concentrations
122
Table 6.1: Mean and standard deviations (S.D.) for the performances of cali-
bration models on withheld data from collocation period.
R2 MAE CvMAE
Pollutant
Mean S.D. Mean S.D. Mean S.D.
PM2.5 0.70 0.17 1.56 0.79 0.29 0.06
NO2 0.61 0.06 4.02 0.24 0.22 0.02
O3 0.85 0.04 2.83 0.52 0.25 0.04
6.5.4 Site Selection
The design of this study was rooted in our belief that community members are the
most knowledgeable of their spaces. As such, we conducted a walking tour of the
neighbourhood with representatives from the SRA. During this tour, we identified
potential emission sources (major roadways, rail lines, construction), and recep-
tor locations (cycling routes, parks, schools, senior housing facilities, Indigenous
daycares, and unhoused communities). Informed by the insights gained from the
walking tour, the SRA compiled a list of potential hosts for the study and contacted
them. There were two key logistical requirements for inclusion in the study, which
were communicated to the residents or business owners. Firstly, the chosen resi-
dences or businesses needed to have an outdoor area with access to a power supply
to ensure continuous operation of the RAMPs. Secondly, residents were required
to occupy their homes for at least 75% of the time, or businesses should be opera-
tional for at least 75% of working days, in order to grant us access to the RAMPs
for maintenance purposes.
11 prospective hosts with residences in the neighbourhood expressed interest in
participating. In alignment with COVID-19 protocols, we conducted virtual tours
with each host to obtain their consent and discuss the logistics associated with
deploying the RAMPs at their respective households. After careful consideration,
123
a total of seven hosts were selected for RAMP deployment based on their proximity
to pollution sources or their representation of vulnerable populations. The selected
deployment locations are as follows: (1) near a rail line to monitor rail emissions,
(2) on Hastings Road, a major roadway in the neighbourhood, to monitor truck
and road traffic-related pollutants, (3) across from Strathcona Park to represent
individuals engaging in outdoor physical activities, (4) Union Street, a prominent
biking route, (5) two RAMPs near an elementary school and a community center
and (6) near a cluster of low-income households. All RAMPs were deployed at
ground level.
In addition to the residential deployments, four RAMPs were deployed at busi-
nesses in the study area, to provide additional insights into the neighbourhood air
quality, particularly in areas influenced by commercial activities and community-
focused establishments. These businesses themselves did not generate pollution.
The RAMP placements at commercial businesses were as follows: (1) second floor
of a yoga studio in Chinatown, located on Main Street, which experiences high
commercial foot and vehicle traffic, (2) a community garden on Hastings Street,
(3) the rooftop located on the fourth floor of a community-center hub on Main
Street, and (4) the rooftop located on the fourth floor of a veteran’s housing soci-
ety. Ideally, it would have been preferable to deploy all of the RAMPs at ground
level. However, due to safety concerns related to theft prevention and logistics of
installing RAMPs, these organizations did not have access to suitable spots on the
ground level. As a result, two RAMPs were deployed at an elevated area, approxi-
mately 10-15 meters high. Figure 6.1 shows the approximate location of the study
RAMPs (blue stars).
124
6.5.5 Data Collection and Processing
RAMPs were deployed in the backyards of residents and businesses in Strathcona
and Downtown Eastside between April and November of 2022 to collect PM2.5 ,
NO2 and O3 concentrations. Regulatory data for comparison was obtained for
four Metro Vancouver neighbourhood monitoring stations in Burnaby (stations:
Burnaby South and Burnaby Kensington), North Vancouver (station: Mahon Park)
and Richmond (station: Richmond South). Population data for each dissemination
block (DBs; smallest geographic area for which population counts are disseminated
in Canada) was extracted from census data provided by the Canadian government,
and is visualized in Appendix D.1.
Metro Vancouver’s (MV) ambient air quality objective [153] for each pollutant
was used as the denominator for comparison, with PM2.5 , NO2 and O3 bench-
mark values of 25 µg m−3 , 60 ppb (daily maximum 1-hour concentration) and 62
ppb (daily maximum 8-hour concentration), respectively (Table 6.2). To facilitate
analysis, we down-averaged the calibrated data to match the time resolutions and
criteria of the benchmark concentrations. For example, 15-minute calibrated NO2
concentrations were down-averaged to 1-hour concentrations, and then the daily
maximum 1-hour concentrations were used as the predicted value for each calen-
dar day.
Since there are measurement uncertainties associated with the RAMP sensors
(Table 6.1), the comparison between the average regional MV concentrations was
conducted with a dataset that incorporated these sensor uncertainties. To accom-
plish this, we estimated residuals for each decile concentration bin and subtracted
or added them from the corresponding calibrated concentrations. This approach
125
was employed because the error in RAMP measurements depends on the concen-
tration, with greater uncertainties observed at lower concentrations [141, 271]. The
error-informed dataset was generated through the following steps:
1. The collocation data from the RAMPs were divided into decile bins and
residuals were calculated for each bin (Equation 6.2).
|Calibrated concentration − Re f erence concentration|

% errorbin = ∗ 100%
Re f erence concentration
(6.2)
2. The median % error was subtracted or added from the calibrated data for the
deployment period to generate the error-informed datasets (Equations 6.3 and 6.4).
Boxplots of the % error are shown in Appendix D.2.
Lower bound = Calibrated concentration (1 − error (%)) (6.3)
U pper bound = Calibrated concentration (1 + error (%)) (6.4)
6.5.6 Spatial Modeling
Air pollution studies have often used various interpolation techniques to estimate
concentrations in unsampled areas [84, 166]. Interpolation involves applying math-
ematical processes to the measured concentrations to estimate values across a con-
tinuous spatial field. One commonly used interpolation technique is kriging [40,
170, 260], which takes into account autocorrelation in the data, unlike other tech-
niques such as IDW (inverse distance weighting) or spline.
126
Kriging operates on the principle that nearby points have a higher influence an
estimate than distant points [166]. It also takes clustering into account, whereby
clusters of points are given less weight to reduce bias in predictions. The kriging
process involves two steps: (1) fitting a variogram, which is a visual representation
of the covariance between each pair of points in the sampled data, to determine
the spatial covariance; and (2) using the spatial covariance to derive weights for
interpolating values (Equation 6.5). As such, there are two principles that influ-
ence kriging weights - 1) points nearby to the location of interest are given more
weight than those further away and 2) clustered points are weighted lower (i.e.,
they contain less information than single points).
N
zy = ∑ λi zx (6.5)
i=1
In Equation 6.5, N is the number of measured values, zy is the predicted value z
at point y, λ is the kriging weights and zx is the observed concentration z at sampled
point x.
However, kriging suffers from assumptions of both linearity (uniformity in all
directions) and stationarity (stationary mean and variance across the study space)
[174], and is less accurate than other more complex methods that incorporate ad-
ditional data, such as land use models [6, 152]. Nevertheless, ordinary kriging
has been widely used in environmental justice research due to its ease of imple-
mentation and lack of additional data requirements [40, 77, 108, 170, 260]. For
this project, we opted for ordinary kriging as a simpler spatial method to priori-
tize solutions that communities can independently construct and that do not rely on
complex data inputs.
127
In this work, kriging models were applied to estimate daily concentrations first
at each 25 m grid distance and then were averaged to concentrations at each dis-
semination block.
6.5.7 Estimating Cumulative Air Pollution Impacts
Composite measures of sustainability have previously been calculated by aggre-
gating indicators [117, 165] using approaches such as multiplicative, additive, bi-
nary/threshold and mixed aggregation [80, 269]. These approaches have also been
adopted to measure the unequal distribution of environmental hazards caused by
multiple pollutants [80, 217]. For this work, we aggregated multiple air pollu-
tants to calculate a cumulative hazard index (CHI); the CHI is then used to identify
hotspots within the study area. CHI was calculated using the multiplicative and
additive methods (Equations 6.6 and 6.8).
3
CHIMultiplicative, j = ∏ ri,norm
j (6.6)
i=1
In equation 6.6, ri,norm

j is the normalized CHI of the pollutant i at dissemination
block j. ri,norm
j is calculated by first dividing the air pollutant concentration by the
MV air quality objective (benchmark value) to account for different measurement
units [165], and then scaling the data by the population so that all the pollutants are
on the same scale [217]. The process is shown in Equation 6.7, where ci, j is the
pollutant i concentration at the dissemination block j, si is the benchmark value
for the pollutant, p is the population of the dissemination block j and ri, j is the
normalized pollutant i concentration.
128
ri, j ci, j
ri,norm
j = where ri, j = (6.7)
∑(ri, j ∗ p j )/ ∑ p j si
The multiplicative CHI was built with the assumption that there is an inter-
action between different pollutants [241]. We also calculated the Additive CHI
(Equation 6.8), as it assumes no interaction between pollutants [205] and therefore
can also indicate areas where individual pollutants are high.
3
CHIAdditive, j = ∑ ri,norm
j (6.8)
i=1
norm values were scaled to have a mean of 1, therefore, the mean Mul-
The r(i, j)
tiplicative CHI is expected to be 1 (1x1x1 for the three pollutants) and the mean
Additive CHI is expected to be 3 (1+1+1 for the three pollutants). Higher Mul-
tiplicative CHI values indicate a higher cumulative impact of pollutants in a par-
ticular area, whereas higher Additive CHI values indicate areas where individual
pollutants exhibit high concentrations and contribute to a higher cumulative im-
pact. This approach allows for the assessment of hyper-local air quality patterns
(intra-neighbourhood variability) and can be used to identify hotspots.
6.6.1 Data Summary
During the campaign, six of the 11 total RAMPs experienced some degree of mal-
function, possibly due to power loss, sensor degradation, or a failure in the data
logging/transmission system. Unfortunately, we were unable to address the mal-
functioning sensors effectively, due to various reasons, including scheduling con-
129
flicts. Two RAMPs underwent sensor degradation and reported data with quality
issues (e.g., uncharacteristically high, zero, or no readings). Furthermore, at the
beginning of a renovation period at the community garden, the charging cable for
one of the RAMP sensors was cut. We decided against redeploying the RAMP to
avoid the influence of construction on the overall data collection. Consequently,
only 53% of the originally planned data was collected.
Since missing data wasn’t sporadic (e.g., some sensors malfunctioned and were
never fixed), we applied a criterion for data completeness for the analysis. Specifi-
cally, we considered only those days with at least eight functioning RAMP sensors,
which corresponds to a data completeness of 75% (a benchmark suggested in the
low-cost sensor guidelines provided by the US Environmental Protection Agency
(US EPA [251]). Therefore, for the analysis in this work, we used a dataset con-
sisting of 119 days of RAMP data, collected between April 27, 2022, and August
23, 2022, which met our completeness criteria.
During the period when all 11 RAMPs were operational, we conducted Monte
Carlo simulations to assess the sensitivity of the model to the presence or absence
of each sensor. We predicted daily concentrations at every grid by removing one
sensor at a time and compared these predictions to the predictions when all RAMPs
were used. We reported the p-value of the mean differences in the two datasets
(difference in predicted concentrations when test RAMP is excluded and when test
RAMP is included), and repeated the process 11 times for 11 RAMPs. Through
these simulations, we identified three RAMPs as critical, as their absence resulted
in statistically significant differences (p < 0.05) in the predictions. While two of the
critical sensors remained operational throughout the campaign period, one sensor
stopped working on June 10th . It is likely that having all critical sensors operational
130
would have led to more accurate predictions, which could potentially affect the CHI
estimates. We acknowledge this as a limitation of our work and emphasize the
importance of identifying critical sensors early on to ensure data completeness in
future studies. Furthermore, it is worth noting that two sensors were deployed at an
elevated height, which may have impacted the measured pollutant concentrations.
A study by Wu et al. [258] reported PM2.5 concentration decays of up to 73% at a
height of 19 m. As such, concentrations at the ground level are likely to be higher
than those reported at 10-15 m. The effect of height and associated micro-climate
was not considered in our analysis, and we recognize this as a limitation of our
work.
All sensors across all days recorded data below the Metro Vancovuer Air Qual-
ity Objectives (Table 6.2). The average 24-hour PM2.5 concentration was 4.6
µg m−3 [10th -90th percentile: 2.2-6.4 µg m−3 ], with highest concentrations ob-
served on June 30th when a fire broke out in the neighbourhood [120]. The average
daily 1-hour maximum NO2 across all sensors was 21.7 ppb [10th -90th percentile:
14.9-28.7 ppb], with diurnal peaks observed around 7-8AM only on weekdays
(see Appendix D.3 for diurnal plots), suggesting contribution from morning rush
hour traffic. The average daily 8-hour maximum O3 concentration was 28.1 ppb
[10th -90th percentile: 17.4-39.2 ppb], with diurnal peaks observed in the afternoon.
This is expected, as tropospheric ozone is a secondary air pollutant that is formed
photochemically in the atmosphere from the reactions of NOx and volatile organic
compounds (VOCs) [200]. By comparison, across four neighbourhood MV regula-
tory monitoring stations in the region, average PM2.5 , NO2 and O3 concentrations
were 4.6 µg m−3 , 16.3 ppb and 27.5 ppb respectively (Table 6.2).
131
Table 6.2: PM2.5 (daily average), NO2 (daily 1-hour maximum) and O3 (daily
8-hour maximum) MV air quality objectives and reported concentrations
across four MV stations. The values reported are average during the de-
ployment period, and the numbers in brackets are 10th and 90th percentile
concentrations.
Station PM2.5 (µg m−3 ) NO2 (ppb) O3 (ppb)

Air Quality Objectives 25α 60β 62γ
Burnaby South 5.1 (2.3-8.9) 16.1 (9.7-23.2) 28.4 (20.1-37.5)
Burnaby Kensington 4.6 (1.9-7.7) 16.0 (9.2-24.8) 26.4 (18.9-34.7)
Richmond South 4.4 (1.8-7.6) 16.3 (9.2-23.7) 29.4 (21.1-39.3)
Mahon Park 4.3 (1.9-7.3) 16.7 (7.8-28.3) 25.7 (17.5-35.3)
Average (MV Stations) 4.6 (2.0-6.5) 16.3 (9.0-25.0) 27.5 (19.4-36.7)
Average (RAMPs) 4.6 (2.2-6.4) 21.7 (14.9-28.7) 28.1 (17.4-39.2)
α: Achievement based on rolling average.
β : Achievement based on annual 98th percentile of the daily maximum 1-hour
concentration, averaged over three consecutive years.
γ: Achievement based on annual 4th highest daily maximum 8-hour concentration,
averaged over three consecutive years.
6.6.2 Inter-neighbourhood Variability
We conducted a comparison between the concentrations of each pollutant across
all operational RAMPs within the neighbourhood and the average concentrations
measured at four neighbourhood regulatory monitoring stations in the MV region
(Table 6.2). The comparison was made on the same time-scale as the air quality
objectives. Specifically, we compared PM2.5 concentrations on a daily basis, NO2
concentrations on the maximum 1-hour concentration over the day, and O3 con-
centrations on the maximum 8-hour concentration over the day. To account for
the measurement uncertainties associated with the RAMP sensors, we also com-
pared the error-informed datasets (Section 6.5.5). We then calculated the number
of RAMPs (and the corresponding ratio) that exceeded the average MV regional
concentrations for each pollutant individually, as well as for all pollutants com-
132
bined.
Figure 6.2: Calendar plots for the ratio of sensors exceeding average MV re-
gional concentrations for lower bound, calibrated LCS and upper bound
datasets for PM2.5 (plots A-C), NO2 (plots D-F) and O3 (plots G-I). Ra-
tio=0 (blue) indicates that none of the sensor readings exceeded MV
concentrations, whereas ratio=1 (red) indicates that all the operational
sensors exceeded MV averages. The calendar plot at the bottom for each
category (plots J-L), shows the ratio of sensors exceeding average MV
regional concentrations for the three pollutants together (additive form;
combined results from the three plots above). Ratio=0 (blue) indicates
that no pollutant across all the sensors exceeded MV concentrations,
whereas ratio=3 (red) indicates that all the pollutants across all the sen-
sors exceeded MV concentrations.
Figure 6.2 panels A-I illustrate the fraction of RAMPs exceeding the average
concentrations in Vancouver for PM2.5 , NO2 , and O3 , respectively. A pollutant was
considered as ‘exceeding’ the regional average if more than 50% of the RAMPs
exceeded the average concentrations in the MV region (ratio ≥ 0.5 in panels A-I).
During the 119-day study period, the concentrations of O3 and PM2.5 in the neigh-
bourhood exceeded the average concentrations in the MV region on 58 (lower - up-
133
per bound: 28-92) and 62 (lower - upper bound: 36-87) days, respectively. On the
other hand, NO2 concentrations in the neighbourhood exceeded the average con-
centrations on almost every day (113 days; lower - upper bound: 105-117), with
an average conservative estimate (lower bound dataset) across all RAMPs of 19.6
ppb (13.8-25.0 ppb), 3 ppb higher than average MV concentrations (Table 6.2).
This indicates that residents in the study area experienced higher NO2 concentra-
tions compared to the regional average. For a detailed breakdown of the number of
days each RAMP exceeded the MV averages, please refer to Appendix D.4. Ad-
ditionally, at least two pollutants exceeded the average MV concentrations on 93
out of the total 119 days, reinforcing the hypothesis that neighbourhood residents
disproportionately experience higher levels of air pollution.
The persistently high levels of NO2 in the area, even after accounting for sensor
uncertainties, raise concerns about air quality. Vehicle traffic in the Lower Fraser
Valley is the primary source of NOx , contributing approximately 63% to the over-
all pollution levels [67]. Among vehicular sources, heavy-duty diesel trucks are
considered the most significant contributors to traffic-related air pollution (TRAP).
This is supported by a study conducted by Metro Vancouver at the Clark Drive
Near-Road monitoring station (approximately 3 km from the nearest deployed
RAMP, and where the RAMPs were calibrated), that found that the vehicle type,
particularly heavy-duty diesel trucks, rather than total traffic volume, was the main
contributor to the amount and type of air contaminants associated with major road-
ways in the area [67]. In 2017, at the Clark Drive station, which can be used as a
proxy for our study area, heavy-duty diesel trucks comprised 18% of the total vehi-
cle fleet, six times higher than the regional fleet percentage of 3% [67].The planned
expansions at Vanterm and Centerm, set to increase the port capacity by 50% [78],
134
are expected to result in a further rise in shipping-related traffic, including trucks,
within the study area. It is important to consider the potential consequences of
these increases in TRAP, as the elevated TRAP not only suggest increased NO2
levels but also have implications on elevated O3 levels, especially during the sum-
mer months (reflected in Figure 6.2(c); June-August).
The results of this analysis support the community’s concerns regarding poorer
air quality and highlight the potential of LCS monitoring as a useful tool for identi-
fying disparities in air quality. These findings can also support communities in their
advocacy efforts for improved air quality by providing quantitative evidence of
their concerns. Targeted policies aimed at reducing emissions from traffic sources,
particularly trucks, could help mitigate overall air pollution levels in the neigh-
bourhood. One specific policy approach that members of the SRA have been ad-
vocating for is the phasing out of pre-2007 trucks, as part of the Vancouver Fraser
Port Authority’s Rolling Truck Age Program. This initiative aims to address the
higher emissions from older trucks which lack current emission control technolo-
gies [122].
Previous studies have consistently shown that downtown Vancouver and sur-
rounding neighbourhoods have higher annual average NO2 concentrations com-
pared to the broader MV area [80, 183, 202]. Wang et al. [245] reported an av-
erage concentration of 10.8 ppb in 2010, estimated using land use regression for
each dissemination area, and highest concentrations in Downtown Vancouver. Gi-
ang and Castellani [80] used annual air quality datasets for 2012 and predicted
concentrations for each dissemination area in the city of Vancouver. The study
reported average concentrations in the study area for NO2 and O3 to be approxi-
mately 27 ppb and 30 ppb (40%; city average=21.4 ppb), higher than city averages
135
by 70% and 40% respectively. A study conducted by MV for 2017 reported ap-
proximately 9 ppb and 6 ppb higher concentrations of NO2 at the Clark Drive and
Downtown monitoring stations respectively, when compared to compared to five
other neighbourhood MV stations (average=13 ppb) [67]. Our study, conducted
in 2022, shows a reduction in average NO2 and O3 concentrations in the MV
region compared to Wang et al. [245], Giang and Castellani [80] and Doerksen
et al. [67] (average NO2 =9.2 ppb and O3 =18.8 ppb), however, the Strathcona and
DTES areas still experience higher NO2 concentrations (average=14.2 ppb across
11 RAMPs). Additionally, the reported averages of our work are for summer only.
Since NO2 concentrations are typically higher in winter due to lower atmospheric
mixing height and increased heating [67, 194], the average annual concentrations
are likely to be higher. Furthermore, Doerksen et al. [67] reported an increase in
annual NO2 concentrations from 2015 to 2017 across 7 out of 8 monitoring station
in the region [67].
6.6.3 Intra-neighbourhood Variability
We calculated Multiplicative and Additive CHIs for each dissemination block and
each day of the study period to assess the intra-neighbourhood variability. Spa-
tial maps generated using both additive and multiplicative CHIs exhibited similar
patterns, although the additive CHIs showed generally less spatial variation (coef-
ficient of variations: Additive CHI = 0.05; Multiplicative CHI = 0.16; more de-
scriptive statistics can be found in Appendix D.5), which aligns with the findings
from previous studies [80, 205, 217]. The higher degree of variation observed in
the multiplicative CHIs suggests a relatively non-homogeneous distribution of CHI
values across the neighbourhood.
136
Based on the CHI analysis, two areas were consistently identified as hotspots
during the study period: the western and eastern periphery of the study area. The
western periphery of the study region includes Main Street and Chinatown, has a
high residential (population distribution in each DB is shown in Appendix D.1) and
commercial density, and has a large population of unhoused individuals [147]. The
eastern periphery of the study area includes Commercial Drive, which has high
foot traffic and road traffic due to high commercial density. The prevailing wind
in the region blows from the east (land breeze), followed by winds from the west
(sea breeze; see Figure D.6). This influence of wind is highlighted in Figure 6.3,
which illustrates the Multiplicative CHI over a one-week period. Depending on the
day and time of day, different parts of the neighbourhoods, specifically the eastern
and western peripheries, are located in the downwind direction and experienced
elevated CHI values. Aggregated CHIs over the whole deployment period for both
Multiplicative and Additive CHIs are shown in Appendix D.6.
The results of this analysis provide valuable insights into the occurrence of air
pollution hotspots within the neighborhood. Residents can use these findings to
make informed decisions and minimize their exposure to pollutants. For instance,
they can choose to exercise in parks located in less polluted areas in the middle
of the neighborhood rather than those at the periphery, or choose bike routes that
avoid elevated pollution areas within the neighbourhood. Building on this work,
one potential application is the development of a community dashboard, that in-
corporates simple geospatial models like kriging, for communities that deploy a
network of sensors. This dashboard would allow residents to input their own pol-
lution and location data and visualize the hotspots in their specific areas. Such a
tool would empower individuals to take proactive measures to protect their health
137
¯
Port Metro
27/04 Vancouver-Centerm Port Metro
Vancouver-Vanterm
Port Metro
Vancouver-Pacific
Portside Park C en
Alex ander St
ten
n ia l Stew art S t
W Rd
Pe Powell St Powell St
nd
t
er E Cordova St
tS
Jackson Ave
St Franklin St
Abb o t
E Pender St Strathcona Grandview-Woodlands
Keefer St
V i c tor i a Dr
Ge o rgi a Viad Union St Adanac St
Pr i o r St Venables St
BC Place
Stadium
C l ark Dr
Strathcona Park
Na tio na
C omm e r c i a l D r
ebe c S t
l Av e
Ve rnon Dr
Port Metro
28/04
Qu
Vancouver-Centerm Port Metro Port Metro
Vancouver-Vanterm Vancouver-Pacific
Portside Park C en
Alex ander St
ten
n ia l Stew art S t
W Rd
nd
t
er E Cordova St
tS
Jackson Ave
St Franklin St
Abb o t

Keefer St
V i c tor i a Dr
BC Place
Stadium
C l ark Dr
Strathcona Park
Na tio na
ebe c S t
l Av e
Ve rnon Dr
Port Metro
29/04
Qu

Portside Park C en
Alex ander St
ten
n ia l Stew art S t
W Rd
nd
t
er E Cordova St
tS
Jackson Ave
St Franklin St
Abb o t

Keefer St
V i c tor i a Dr
BC Place
Stadium
C l ark Dr
Strathcona Park
Na tio na
ebe c S t
l Av e
Ve rnon Dr
Port Metro
30/04
Qu

Portside Park C en
Alex ander St
ten
n ia l Stew art S t
W Rd
nd
t
er E Cordova St
tS
Jackson Ave
St Franklin St
Abb o t

Keefer St
V i c tor i a Dr
BC Place
Stadium
C l ark Dr
Strathcona Park
Na tio na
ebe c S t
l Av e
Ve rnon Dr
Port Metro
01/05
Qu

Portside Park C en
Alex ander St
ten
n ia l Stew art S t
W Rd
nd
t
er E Cordova St
tS
Jackson Ave
St Franklin St
Abb o t

Keefer St
V i c tor i a Dr

BC Place
Stadium
C l ark Dr
Strathcona Park
Na tio na
ebe c S t
l Av e
Ve rnon Dr
02/05 1.6
Qu
Port Metro
Portside Park C en
Alex ander St
t en
n ia l Stew art S t
W Rd
nd
t
er E Cordova St
tS
Jackson Ave
St Franklin St
Abb o t

Keefer St
V i c tor i a Dr

BC Place
Stadium
Cl ark D r
Strathcona Park
Na tio na
e be c S t
l Av e
Ve rnon Dr
Port Metro
03/05 Vancouver-Centerm Port Metro Port Metro
Qu
Portside Park C en
Alex ander St
ten
n ia l Stew art S t
W Rd
nd
t
er E Cordova St
tS
Jackson Ave
St Franklin St
Abb o t

Keefer St
V i c tor i a Dr

BC Place
Stadium 0 0.5 1 1.5 2
C l ark Dr
Strathcona Park
Na tio na km 0.5
ebe c S t
l Av e
Ve rnon Dr
Qu
Figure 6.3: Multiplicative CHI for the first week of deployment (April 27-
May 3, 2022). The arrow at the bottom of each subplot is the prevailing
wind direction for the day.
138
and make informed choices regarding their daily activities.
A few studies have investigated the intra-neighbourhood variability of indi-
vidual pollutants. Shakya et al. [203] assessed five separate neighbourhoods in
Philadelphia (USA) for PM2.5 and using mobile sampling conducted for 2-4 hours
each day. In a day, the study found variability within a neighbourhood to be as
high as 17 µg m−3 . Tunno et al. [225] assessed intra-neighbourhood PM2.5 vari-
ability in Braddock (Pittsburgh, USA) using mobile monitoring and found that
average measured concentrations varied between 42-55 µg m−3 within the neigh-
bourhood. In line with these findings, our study collected data from 11 different
locations within the neighbourhood and found daily average PM2.5 concentrations
to vary by as much as 7 µg m−3 . Li et al. [131] conducted mobile sampling in
Pittsburgh (USA) and reported that NO2 exhibited within-neighbourhood spatial
variation, with hotspots elevated by up to 20 ppb above the regional background
concentrations (7 ppb). This supports the findings of our study; we observed intra-
neighbourhood variability as high as 36 ppb in daily maximum 1-hour NO2 con-
centration over the deployment period. However, although intra-neighbourhood
variability has previously been studied for individual pollutants, we could not iden-
tify any studies that addressed hotspots for cumulative effects of different pollu-
tants. Furthermore, previous studies have often relied on mobile monitoring to
assess intra-neighbourhood variability, which may not be easily adopted by com-
munities due to the associated costs and technical expertise required to establish
and maintain such mobile monitors.
139
6.7 Conclusion
This work reported the intra- and inter-neighbourhood variabilities in air pollution
within the Strathcona and DTES neighbourhoods in Vancouver. To achieve this,
we deployed and collected pollution data using 11 LCS placed within the neighbor-
hoods to capture various sources and receptors. The findings of this study support
the hypothesis that LCS can serve as valuable tools for air pollution monitoring
and neighborhood-level assessments for communities, and highlighted that neigh-
borhoods in a city may experience higher pollutant concentrations.
The findings of this study provide evidence supporting the use of LCS by com-
munities to gain a better understanding of their local air quality. We conducted
a comparison of pollutant concentrations within the neighborhood with the av-
erage regional levels, which provided valuable insights into the extent to which
the neighbourhood concentrations deviated. Moreover, these findings support the
community’s concerns regarding air quality and can potentially serve as a basis
for advocating for improved traffic-related policies. The study also highlights the
significance of LCS as a valuable tool that communities can use to identify ar-
eas of concern within their neighbourhoods and make informed decisions towards
improving their overall exposure to pollutants.
There are several limitations to this work. Firstly, as previously mentioned, two
sensors were deployed at an elevated height. As such, the study did not consider the
impact of height and micro-climate in identifying hotspots. This study focuses on
investigating intra-neighbourhood variability by using daily values, aligning with
the time-resolution of the air quality objectives set by MV. However, exploring sub-
daily concentrations could provide insights into different areas of concern. Lastly,
140
we used kriging to create spatial models and identify hotspots, as a solution that
is adoptable by communities. While kriging is a useful tool for estimating con-
centrations in unsampled areas, it is inherently less accurate than more complex
models. As such, for future work where accuracy is important, employing more
sophisticated models, such as land use models, may be preferable.
141
Chapter 7
Concluding Remarks
This chapter summarizes the key findings of this research by discussing the main
contributions of this thesis, revisits the objectives introduced in Chapter 1, dis-
cusses the key design considerations for deployment of LCS, and provides future
research directions.
7.1 Main Contributions

The most significant and actionable contributions identified in this thesis are cen-
tered around the application and interpretation of data collected through LCS, pri-
marily due to its potential impact. These are:
1. Results from Chapters 3 and 4 highlight that leveraging high time resolution
data from LCS, to enable the processing of local vs regional models, can
facilitate transferable calibration and spatial models.
2. A limited number of studies have investigated the use of LCS to enhance
exposure estimates [28, 136], but prior studies have lacked fine spatial res-
142
olution. However, since pollutant concentrations have small scale spatial
variations, using a higher resolution air pollutant surface model, as shown in
Chapter 5, can have a significant impact on the exposure estimates.
3. Intra-neighbourhood variability in air pollution has traditionally been as-
sessed using mobile monitoring techniques [131, 203, 225]. However, the
adoption of mobile monitoring by community members is unlikely due to the
associated costs and technical challenges. In Chapter 6, LCS was explored
as a more accessible option for communities, that can be used to make in-
formed decisions and take appropriate actions to mitigate potential exposure
risks.
4. Results from Chapter 6 also highlight LCS as a more affordable solution to
identifying disparities in air pollution, without employing complex spatial
models. As such, deploying a network of sensors in various parts of the
city and comparing the collected data can be a powerful approach towards
addressing environmental injustices.
7.2 Revisiting Original Objectives
7.2.1 Objective 1: Explore and develop a geographically-transferable

calibration method to improve sensor performance over broad
concentration ranges.
In Chapter 3, a geographically-transferable calibration method was explored for the
estimated regional component of the total PM2.5 concentration. The study involved
PurpleAir LCS located in five different cities across four countries, and used mul-
tiple linear regression for calibration. Intra-city models, which were trained and
143
cross-validated within the same city, as well as inter-city models, trained and cross-
validated across different cities, were developed and evaluated to assess model per-
formance. The main outcomes pertaining to this objective are:
• Decomposing the concentration into estimated regional and local concentra-
tions and calibrating them separately resulted in better performance when
compared to models built without decomposing the concentration (tradi-
tional MLR approach).
• Intra-city and inter-city models reduced the nRMSE by 25% and 30% re-
spectively when compared to sensor reported concentration.
The results of inter-city models suggest the potential in building transferable
models by separately calibrating the estimated regional and local components of
PM2.5 concentration.
7.2.2 Objective 2: Develop and compare different spatiotemporal

pollution models using data collected via LCS networks.
In Chapter 4, land use regression and land use random forest models were explored
as a spatiotemporal pollution model for data collected via LCS for PM2.5 , NO2
and CO. The total concentrations was also decomposed by separating short-lived
events from regional concentrations to combat the issue of model transferability.
The main outcomes pertaining to this objective are:
• For all three pollutants, land use random forests models outperformed tradi-
tional regression models, with increase in cross-validated R2 between 0.10-
0.19 for the three pollutants.
144
• Decomposing the signal reduced the influence of temporal predictors, and
increased the emphasis on spatial predictors. As such, decomposed models
may be better at reflecting the relationship between land use and pollutant
concentrations.
• Spatial maps built using decomposed LURF models successfully identified
hotspots in the county.
These results suggests that LCS, in combination with advanced data analytic
techniques, can be a useful tool in building more accurate and transferable spa-
tiotemporal pollution models.
7.2.3 Objective 3: Compare residents’ exposure due to mobility using

spatiotemporal pollution models built from LCS network data.
In Chapter 5, the results of Chapter 4 were used to predict daily concentrations at
every 50 m grid in the city of Pittsburgh and subsequently to create static (residents
spend 24 hours a day in residential areas) and dynamic (accounts for movement
between commercial and residential area) models. The main outcomes pertaining
to this objective are:
• Weekend concentrations were 10% higher than weekday concentrations, sug-
gesting that short-term exposure can be improved through behavioural changes
by shifting optional activities to weekdays.
• Exposure estimates on average were about 10% higher when the popula-
tion spends more time in commercially dense locations (dynamic models) vs
residentially-dense locations (static models).
145
• The differences between static and dynamic models are likely to be under-
estimated, as residents are more likely to be in commercial areas during the
daytime, when the difference in pollutant concentrations between residential
and commercial areas is higher, compared to nighttime.
These results suggests that LCS data can be leveraged to refine exposure es-
timates and identify behaviours that may reduce exposure. As such, spatial maps
built using LCS data, along with personal movement details, can be a useful tool
in estimating the exposure of an individual.
7.2.4 Objective 4: Assess intra- and inter-neighbourhood

variabilities in a community using data collected via LCS to
identify hotspots and recognize environmental injustice
concerns.
In Chapter 6, LCS data was collected from 11 different locations within a neigh-
bourhood in Vancouver. The variations within the neighbourhood was assessed
and compared to average regional Metro Vancouver (MV) concentrations. The
main outcomes pertaining to this objective are:
• Across all days and all pollutants, the concentrations measured were lower
than the air quality objectives set by MV.
• NO2 concentrations in the study area were consistently higher than other MV
neighbourhoods, suggesting a persistent traffic related air pollution influence
in the community.
• The cumulative impact of the three pollutants highlighted the western and
eastern periphery, areas with higher commercial density within the neigh-
146
bourhood, as hotspots, compared to other parts of the study area. Hotspots
were also influenced by prevailing wind direction.
These results provides evidence that the adoption of LCS by communities can
be useful in improving their understanding of how air pollution varies within their
neighbourhood, and how they can be used to address disparities in air pollution.
7.3 Key Design Considerations for Deployment - Lessons

from this Thesis
This section contributes to the field of low-cost air pollution sensing by providing
insights on the measurement and assessment of ambient air quality gained through
the lessons learned in this thesis. While this section focuses on ambient monitoring
of pollution, it is important to note that LCS can also be used to monitor indoor
concentrations or non-stationary (mobile) exposures. Insights are structured as
questions that a potential LCS user should consider before embarking on an LCS
study.
7.3.1 Purpose: Why are you deploying the sensor?
While seemingly an obvious question, identifying the purpose of deploying LCS
is a crucial step in making informed decisions about its implementation. In this
thesis, various applications of LCS were explored from Chapter 4 to Chapter 6,
which also corresponds to the applications identified by the US EPA.
The primary motivation for deploying sensors has generally been to enhance air
quality monitoring and gain a better understanding of overall air quality. However,
due to measurement uncertainties in LCS, it may not accurately reflect whether
air quality meets the required objectives, and therefore, it is typically not suitable
147
for regulatory monitoring purposes. Nevertheless, LCS can still serve as a valuable
tool for identifying sources of air pollution, or for assessment of spatial or temporal
variation in pollutant concentrations, which can be useful as a reference in directing
policies [207].
Another application of LCS networks is of hotspot identification. In Chapters
4 and 6, LCS data was combined with spatial models to identify areas of concern,
and assess variabilities between and within neighborhoods. This opens up oppor-
tunities for communities to use LCS in supporting their advocacy efforts or for
government entities to assess disparities in air quality and address environmental
justice concerns [126].
Finally, wearable sensors have previously been deployed to estimate personal
exposures via mobile monitoring [79, 163, 257]. Although exposure assessments
with stationary LCS have been limited, Chapter 5 discusses the opportunity with
LCS networks in improving exposure estimates.
7.3.2 Pre-deployment
Pollutants: What are you measuring?
The measurement of specific pollutants should be based on the desired outcome.
PM2.5 is a commonly measured pollutant, and has extensive LCS networks in
many global north locations, due to its health impacts at every concentration level
[30, 184]. Whereas, NO2 may be a better indicator of traffic-related air pollutants,
as vehicular traffic is the primary source of NO2 in most places [5]. O3 , a sec-
ondary pollutant, should be considered in areas with persistent NO2 problems. For
instance, in Chapter 6, we found that the study neighborhood experienced higher
148
air pollution from traffic-related sources. Therefore, in such a case, monitoring
NO2 and O3 can provide more insight on the neighbourhood air quality than PM2.5
alone. Since chronic exposure to these three pollutants have demonstrated health
consequences, there are benefits in monitoring all of them.
However, it is important to note that different pollutants and different sensors
have varying levels of precision and accuracy. Low-cost gas sensors, such as NO2
and O3 sensors, often suffer from cross-sensitivities with other gases [150], result-
ing in poorer accuracy compared to LCS for PM2.5 or CO [271]. The measurement
accuracy of a sensor can also vary depending on its specific characteristics. For
example, the measurement accuracy of an electrochemical sensor may depend on
the type of electrolyte or electrode used [212]. Additionally, for the same type of
sensor, some sensors are more developed and have more accurate readings. For in-
stance, in Chapter 3, I discussed that PurpleAir, a vendor of PM2.5 LCS, uses Plan-
tower sensors to measure the pollutant. While Plantower provides proprietary cal-
ibrated concentrations as PM2.5 measurements, it has been shown that PurpleAir’s
independent corrections have better performance [242].
Commercial sensors: Which LCS to buy?
When selecting a low-cost sensor (LCS), considerations such as budgetary con-
straints and logistical limitations of deployment should guide the decision-making
process. For instance, in areas with limited access to power supply such as rural
India, LCS with solar panels could be a suitable choice. Since there is a grow-
ing number of commercial LCS vendors with varying accuracy and precision, the
selection process can also be simplified by using the performance evaluations of-
fered by the AQ-SPEC (Air Quality Sensor Performance Evaluation Center) pro-
149
gram [211]. Other considerations may include: (1) Sensors that are part of existing
networks, that allows leveraging online data for calibrations, as demonstrated in
Chapter 3, or for access to resources, expertise, and community support. (2) Sen-
sors with an existing online visual dashboard can enhance the understanding and
visualization of collected data, making it easier for stakeholders to interpret the
information. (3) Multi-pollutant sensors to monitor multiple pollutants simultane-
ously and gain insights into their relative mix, or, as demonstrated in Chapter 6,
have the ability to identify air pollution hotspots.
Location: Where to deploy the sensor?
Siting decisions can often be dictated by logistics, such as, access to power supply
(for sensors with charging cables), adequate sun exposure (for outdoor sensors with
solar panels), and mounting requirements. However, the primary consideration
for choosing a location should be based on the intended purpose of the sensor
deployment. Careful consideration of deployment locations can greatly enhance
the effectiveness of air pollution monitoring efforts. Below are common reasons for
air quality monitoring initiatives, and their recommended strategies for identifying
optimal sensor deployment locations.
For area monitoring: As discussed in Chapter 4, pollutants have correlations
with geography (e.g., the pollutant concentrations are typically lower at higher
elevations [100]), spatial patterns (e.g., areas with high commercial density expe-
riencing elevated concentrations compared to areas with high residential density is
reported in Chapter 5) and meteorology (e.g., the influence of prevailing wind on
hotspots within a neighbourhood is reported in Chapter 6). Therefore, locations for
sensor deployment should be carefully selected to cover varying geography, spa-
150
tial patterns and meteorological influences. Including a ‘control’ sensor in an area
with the lowest expected pollution concentration, such as near a park in a residen-
tial area, can also help in establishing a baseline for comparison and facilitates the
identification of areas with elevated pollution levels.
For monitoring traffic related air pollution (TRAP): To capture TRAP, the ideal
placement of sensors are at intersections, bus shelters, or other areas with heavy
traffic. Since TRAP decreases with distance from the road [19], stakeholders with
access to these locations, such as governing bodies, should prioritize this place-
ment. However, for communities or individuals, obtaining government support or
permission might pose significant challenges. In Chapter 6, we describe deploying
sensors to capture TRAP (including major roads and rail line) at nearby residences
of individuals who were willing to host them.
For addressing environmental injustice concerns: Sensors should be deployed
in areas where marginalized communities, elderly care homes, and other vulner-
able populations are located. Governments and academics working with a com-
munity should prioritize engaging with its members to gain insights about their
neighborhood. In Chapter 6, we conducted a walking tour of the neighborhood
with a community member to identify areas with different receptors. Census data
can serve as an alternative method for obtaining demographic information.
Although these are ideal practices, field deployments are often limited by con-
straints such as accessibility to deployment location. As a result, the data collected
may not be perfect. In such situations, sensors may need to be placed where hosts
agree. In these cases, certain sensors become more critical for understanding air
quality patterns. For example, sensors located closer to one another may provide
redundant data, whereas, an isolated sensor away from the rest of the network may
151
have a bigger influence on the spatial patterns of pollutants. To address this, crit-
ical sensors should be identified and prioritized, ensuring their functionality and
providing prompt maintenance. One potential way of identifying critical sensors
using a Monte Carlo simulation is described in Section 6.6.1. In this method, con-
centrations are estimated at each point (or grid) in space via kriging interpolation
using LCS data from all sensors in the network: (1) including test sensor and (2)
excluding test sensor. The outcomes of the two data sets is compared and the test
sensor is flagged as critical if the mean difference is statistically significant (p <
0.05).
Sensor network density: How many sensors?
The number of sampling locations have varied across studies, ranging from 42-
174 at regional-scale [94, 191, 220] , 40-133 at city-scale [25, 59, 110] and 60
at neighbourhood-scale [188]. As such, the density of sensors is often limited by
two factors: a) the availability of volunteers for hosting the sensor, or b) budget
constraints. Therefore, it is important to deploy sensors in meaningful locations
to understand the pollutant variability. For example, in Chapter 6, we explain our
site selection strategy in a neighbourhood to capture diverse sources and receptors
of air pollution with only 11 sensors. Nonetheless, to ensure data reliability, it
is recommended to have at least a few sensors validating each other’s readings.
Optimizing sensor density is an open question and is a major area identified for
future research.
152
Length of deployment: How long should the sensors be deployed for?
In Chapter 5, the data collected from a year-long deployment of sensors enabled
temporal assessment across seasons, highlighting that it may be ideal to deploy
sensors for a minimum of one year. However, if a full year deployment is not
feasible, low-cost sensors, with high temporal resolution (hourly or better), can
still be valuable in identifying temporal patterns as concentrations can vary by
time of day and day of the week. In Chapter 6, we deployed sensors for three
months and observed spatial and temporal patterns within that period, which was
useful in identifying areas and times of concern. Therefore, it is recommended to
deploy sensors for at least a few weeks to capture spatial and temporal variations
under typical conditions (e.g., non-wildfire periods, for non-wildfire studies) and
ideally for a year to account for seasonal variability. If a full-year deployment
of sensors is not feasible, it is recommended to consider the specific pollutant’s
characteristics when determining the deployment time-period. As such, different
pollutants exhibit varying seasonal patterns. For example, to capture the persistent
issues related to O3 , it is advisable to deploy sensors during the summer season.
It is important to note that sensor performance may also exhibit seasonal varia-
tions [42], which could necessitate the maintenance of sensor calibrations through-
out the deployment period. For instance, a study by Sayahi et al. [199] highlighted
a decrease in the correlation between the PM2.5 LCS and regulatory monitors dur-
ing spring and wildfire seasons. To ensure the ongoing calibration of sensors, one
approach include: (1) collocating all sensors with regulatory monitors before and
after the study period and establishing a relationship between different LCS and
(2) leaving a few sensors collocated with the regulatory monitors for the entire
153
duration of the study that can be used to build and maintain calibration models.
These calibration models can then be retroactively applied to deployed sensors by
leveraging the established relationship between the sensors.
7.3.3 During Deployment
Maintenance: How to perform sensor maintenance?
Tracking daily LCS data is a recommended practice for sensor maintenance. It
involves monitoring the status of the sensor’s power supply, noting high and/or
erratic spikes in the readings, and identifying any non-operational sensors that are
displaying zero readings. Power supply issues can usually be resolved by checking
the connection of the charging cable or replacing the battery when there is no active
power supply. Sensors reporting consistently incorrect values or showing unusual
spikes may require recalibration. Non-operational sensors, which fail to provide
any readings, should be replaced.
To track the collected data, Chapter 2 discusses various options for data trans-
mission that are offered by different companies, including Wi-Fi, built-in SIM
cards, or SD cards [45]. Among these options, the use of SD cards can be par-
ticularly beneficial in areas with limited internet access. However, it is important
to note that tracking the quality of data can be more challenging in such cases.
Determining the responsible party for network management is an important
aspect of sensor maintenance. Academic and government bodies should assign
a dedicated person or team to track the sensors and the quality of data collec-
tion, and perform maintenance as needed. In a community, a large sensor network
(20 or more sensors) may require a dedicated person responsible for daily main-
154
tenance and data monitoring. For smaller networks, especially those deployed at
residences, hosts can be trained to carry out monitoring and maintenance activities.
To streamline the management of a large sensor network, it is advisable to
automate the tracking of sensor data. This can be accomplished by developing
databases using SQL and employing dashboard platforms such as Tableau or Grafana
to visualize the measured concentrations. Furthermore, stakeholders are encour-
aged to establish automated alerts within Grafana or use external alert systems like
Google Firebase Cloud Messaging and Microsoft Azure so that they can be notified
promptly of any issues or anomalies in the sensor data, enabling timely intervention
and maintenance.
7.3.4 Post-deployment
Calibration: How to calibrate the sensors?
One of the identified disadvantages of using LCS is that they are characterized by
sensor degradation and need to be calibrated across full range of meteorological
and pollutant concentrations every 1-2 years [142, 150, 212]. Data used in Chap-
ters 4 and 6 were calibrated by collocating sensors at a regulatory monitoring sta-
tion, which is a typical practice recommended for governments or academics. The
calibration models used in previous research have predominantly been constructed
using two types of statistical techniques: multiple linear regression (MLRs) and
machine learning (ML) methods, such as random forests (RFs) and artificial neural
networks (ANNs). MLRs offer advantages in terms of simplicity, interpretability,
speed, and the ability to extrapolate beyond the training range. However, they as-
sume a linear relationship between the independent and dependent variables, and
155
fail to account for the interactions between different predictors. In contrast, ML
models are computationally demanding and are restricted in their ability to extrap-
olate beyond the training range [95]. However, they hold the advantage of not
assuming linearity between independent and dependent variables [11, 63], and are
able to capture the complex relationships between various independent variables,
yielding the potential to model complex systems more accurately. For the purpose
of calibrating LCS, research indicates that MLR methods have demonstrated com-
parable or better performance for PM2.5 sensors [111, 148], while ML models have
improved performance for gas sensors [141, 212].
However, for communities and individuals with limited access to a monitor-
ing station, the feasibility of collocating sensors may be challenging. Additionally,
while calibration models may not require complex data inputs, their computational
requirements can limit accessibility for the general public. In such cases, stake-
holders are encouraged to connect with their air quality monitoring agencies for
tools that can support calibration or performance assessment [153]. Otherwise,
stakeholders can opt for sensor reported data. Although the sensor reported data
may not meet the stringent accuracy and precision standards of governments or
academic institutions, certain sensors have undergone extensive studies and veri-
fication to demonstrate their ability to provide reliable data. Resources like AQ-
SPEC [211] or the US EPA Air Sensor Toolbox [228] can be used to evaluate the
commercial sensors for performance before selecting a vendor.
156
Accuracy and precision: How to test the reliability of reported
concentrations?
Typically, the reliability of measurements (reported or calibrated) is assessed by
testing for accuracy and precision against observed (regulatory monitoring) data.
The US EPA has suggested guidelines for the performance of LCS, segregated by
their application, and are these listed in the Table 7.1.
Table 7.1: Suggested Performance Goals for LCS by the US EPA [251].
Tier Application Precision & Data Com-

Bias Error pleteness
I Education and Information < 50% ≥ 50%
II Hotspot Identification and Characterization < 30% ≥ 75%
III Supplemental Monitoring < 20% ≥ 80%
IV Personal Exposure < 30% ≥ 80%
V Regulatory Monitoring < 7-15%* ≥ 75%
* < 7% for O3 , < 10% for PM2.5 and CO and < 15% for NO2 .
However, as mentioned earlier, collocating sensors is not always feasible, es-
pecially by communities and individuals, making performance testing challenging.
In such situations, alternative methods for assessing sensor performance include:
Data screening: Manually screening the collected data for outliers, or data
gaps, can be useful in identifying measurement errors or equipment malfunctions
when dealing with a few sensors. For a larger network of sensors, a dashboard and
an automated alert can be set up for identifying any high and/or erratic spikes, or
zero readings.
Sensor validation: A comparative analysis between the data collected by LCS
with reference monitors can be used to identify potential drift or deviation in the
sensor readings. To avoid collocation, stakeholders can instead track the ratio of
their LCS vs closest reference data during quiet hours (e.g., 2-4 AM) when pollu-
157
tion sources are limited.
Visual inspection: Time-series plots are an effective tool for monitoring the
performance of sensors over an extended period and identifying any changes or
degradation in the sensor readings. In Chapter 6, we discuss the typical diurnal
patterns of pollutants, which can be particularly helpful in assessing the behavior
of pollutants. For example, higher nighttime O3 concentrations may indicate that
the sensor is not providing reliable readings.
Data quality assurance: Assessing the operating principle of the LCS can be a
useful tool for testing the quality of data. For instance, in Chapter 3, we discuss that
PurpleAir (PM2.5 vendor) reports average concentration readings based on two in-
dividual PM2.5 sensors housed within the same box. As such, comparing the data
reported by these two sensors can serve as a data quality test; discrepancies be-
tween the two sensors may indicate potential issues with the sensor’s performance.
Spatial Models: How to estimate concentrations in unsampled areas?
As monitoring is not feasible everywhere and at all times, spatiotemporal pollution
models can be applied to LCS networks to estimate air pollutant concentrations
in unsampled areas [40, 71, 170]. Since there are many types of models, with
varying accuracy and complexity, the selection depends on the purpose and user.
For LCS, kriging and land use (LU) models have been primarily used for building
spatial models. Kriging is an interpolation technique that accounts for autocorre-
lation in the dataset [166], and is favoured for its ease of implementation and self-
sufficiency, as it operates without the need for external datasets. However, kriging
models suffer from both linearity (uniformity in all directions) and stationarity (sta-
tionary mean and variance across the study space) [174] and are less accurate than
158
other more complex LU models [6, 152]. LU models require extensive set of data
for accurate predictions as it uses different spatial (e.g., elevation) and temporal
(e.g., temperature) variables in a regression model (e.g., MLR, RF, ANNs) to es-
timate concentrations at unmonitored locations. In Chapter 4, LU models were
found to be accurate but required extensive LU data, making them suitable for aca-
demics or government bodies. However, for community-based projects like the one
discussed in Chapter 6, simpler geospatial alternatives such as kriging or IDW (In-
verse Distance Weighting) are highlighted as a more suitable option for individuals
or communities with limited access to required model inputs.
Individuals without personal monitors, but who are concerned about air qual-
ity at their homes or workplaces, can instead choose to leverage existing systems
to approximate concentrations at specific locations of interest. One such exam-
ple is IQAir, that generalizes air quality from various sensor networks, including
regulatory and LCS data, and merging it with wind information [103].
7.4 Future Research Directions

1. The calibration models developed in Chapter 3 specifically target PM2.5 and
do not encompass other pollutants such as NO2 and O3 . Since these pol-
lutants exhibit greater atmospheric complexity, and sensor accuracy is of-
ten worse, ongoing exploration of transferable gas-phase LCS calibration
method is recommended as an area for future research.
2. The calibration method proposed in Chapter 3 offers the greatest advantage
for areas characterized by high pollutant concentrations. This is because
financial resources are typically limited in these areas and therefore, the es-
159
tablishment of traditional monitoring stations is challenging. Therefore, a
recommended area of future study is to test the regional calibration models
for transferability in extreme pollution scenarios (regional concentrations >
100 µg m−3 ).
3. Calibration models can be developed specifically targeting the local com-
ponent of the total signal for different sources of emissions, independent of
geographic influence. The key advantage of such models is their univer-
sal applicability, allowing stakeholders from diverse locations to adopt them
without the need for substantial modifications.
4. Generalized and transferable spatiotemporal pollution models are currently
lacking. Replicating and testing decomposed LURF models in new regions
is a promising avenue for future research to explore, particularly in areas
with high pollutant concentrations.
5. To improve temporal resolution and capture sub-daily variations in exposure
profiles, sub-daily predictors can be incorporated into the dynamic models
created in Chapter 5. Furthermore, integrating agent-based models can en-
able the simulation of individual exposures, leading to more precise estima-
tion.
6. While this thesis primarily concentrates on ambient air pollution, it is im-
portant to acknowledge that individuals are also exposed to air pollution in
indoor settings. The use of indoor-outdoor ratios, in combination with dy-
namic models, can provide valuable insights and enhance exposure estimates
by accounting for both indoor and outdoor sources of pollution and is a rec-
160
ommended area of future study.
7. LCS technology offers the potential to collect abundant data, raising the
question of how many sensors are necessary for optimal coverage. Future
work can focus on sensor optimization to determine the ideal number and
placement of sensors for effective air pollution monitoring and assessment.
8. A user-friendly kriging-based dashboard, using only location and pollution
data as inputs, can make the results of spatial variations more accessible to
the public.
161
Bibliography
[1] Delhi Pollution Control Committee, . URL

https://www.dpcc.delhigovt.nic.in/. → page 40
[2] India Meteorological Department, . URL https://mausam.imd.gov.in/. →

page 40
[3] Karnataka State Pollution Control Board, . URL
https://kspcb.karnataka.gov.in/. → page 40
[4] PurpleAir, . URL https://www2.purpleair.com. → pages 39, 40, 198

[5] WHO guidelines for indoor air quality: selected pollutants. WHO,
Copenhagen, 2010. ISBN 978 92 890 0213 4. OCLC: ocn696099951. →
pages 7, 8, 148
[6] A. Adam-Poupart, A. Brand, M. Fournier, M. Jerrett, and A. Smargiassi.
Spatiotemporal Modeling of Ozone Levels in Quebec (Canada): A
Comparison of Kriging, Land-Use Regression (LUR), and Combined
Bayesian Maximum Entropy–LUR Approaches. Environmental Health
Perspectives, 122(9):970–976, Sept. 2014. doi:10.1289/ehp.1306566. →
pages 127, 159
[7] N. Afshar-Mohajer, C. Zuidema, S. Sousan, L. Hallett, M. Tatum, A. M.
Rule, G. Thomas, T. M. Peters, and K. Koehler. Evaluation of low-cost
electro-chemical sensors for environmental monitoring of ozone, nitrogen
dioxide, and carbon monoxide. Journal of Occupational and
Environmental Hygiene, 15(2):87–98, Feb. 2018.
doi:10.1080/15459624.2017.1388918. → page 17
[8] M. Ahmad, K. Alam, S. Tariq, S. Anwar, J. Nasir, and M. Mansha.

Estimating fine particulate concentration using a combined approach of
linear regression and artificial neural network. Atmospheric Environment,
219:117050, Dec. 2019. doi:10.1016/j.atmosenv.2019.117050. → page 24
162
[9] AirNow. AirNow.gov - Home of the U.S. Air Quality Index: US
Embassies and Consulates. URL
www.airnow.gov/international/us-embassies-and-consulates/. → page 40
[10] M. Aleixandre and M. Gerboles. Review of Small Commercial Sensors for

Indicative Monitoring of Ambient Gas. Chemical Engineering
Transactions, 30, 2012. doi:10.3303/CET1230029. → page 16
[11] A. Alimissis, K. Philippopoulos, C. Tzanis, and D. Deligiorgi. Spatial

estimation of urban air pollution with the use of artificial neural network
models. Atmospheric Environment, 191:205–213, Oct. 2018.
doi:10.1016/j.atmosenv.2018.07.058. → pages 24, 30, 71, 156
[12] Allegheny County GIS Group. Allegheny County Land Cover Areas, 2015.
URL https://services1.arcgis.com/vdNDkVykv9vEWFX4/arcgis/rest/
services/Land Cover/FeatureServer. → pages xxix, 99, 238, 240
[13] Allegheny County Health Department. Air Quality - Annual Data

Summary. Criteria Pollutants and Selected Other Pollutants, 2017. URL
https://www.alleghenycounty.us/uploadedFiles/Allegheny Home/
Health Department/Resources/Data and Reporting/Air Quality Reports/
2017-data-summary.pdf. → pages 89, 106
[14] S. Araki, M. Shima, and K. Yamamoto. Spatiotemporal land use random

forest model for estimating metropolitan NO2 exposure in Japan. Science
of The Total Environment, 634:1269–1277, Sept. 2018.
doi:10.1016/j.scitotenv.2018.03.324. → pages 23, 24, 230
[15] O. O. Arowosegbe, M. Röösli, N. Künzli, A. Saucy, T. C. Adebayo-Ojo,

M. F. Jeebhay, M. A. Dalvie, and K. De Hoogh. Comparing Methods to
Impute Missing Daily Ground-Level PM10 Concentrations between
2010–2017 in South Africa. International Journal of Environmental
Research and Public Health, 18(7):3374, Mar. 2021.
doi:10.3390/ijerph18073374. → page 24
[16] C. L. Avery, K. T. Mills, R. Williams, K. A. McGraw, C. Poole, R. L.

Smith, and E. A. Whitsel. Estimating Error in Using Ambient PM2.5
Concentrations as Proxies for Personal Exposures: A Review.
Epidemiology, 21(2):215–223, 2010.
doi:10.1097/EDE.0b013e3181cb41f7. → pages 26, 95
[17] M. A. Babyak. What You See May Not Be What You Get: A Brief,
Nontechnical Introduction to Overfitting in Regression-Type Models.
163
Psychosomatic Medicine, page 11, 2004.
doi:10.1097/01.psy.0000127692.23278.a9. → page 71
[18] L. Bai, L. Huang, Z. Wang, Q. Ying, J. Zheng, X. Shi, and J. Hu.

Long-term field Evaluation of Low-cost Particulate Matter Sensors in
Nanjing. Aerosol and Air Quality Research, 20(2):242–253, 2020.
doi:10.4209/aaqr.2018.11.0424. → pages 12, 17, 20, 35, 36, 46, 64, 208
[19] N. Baldwin, O. Gilani, S. Raja, S. Batterman, R. Ganguly, P. Hopke,

V. Berrocal, T. Robins, and S. Hoogterp. Factors affecting pollutant
concentrations in the near-road environment. Atmospheric Environment,
115:223–235, Aug. 2015. doi:10.1016/j.atmosenv.2015.05.024. → pages
25, 117, 151
[20] S. Banzhaf, L. Ma, and C. Timmins. Environmental Justice: The

Economics of Race, Place, and Pollution. Journal of Economic
Perspectives, 33(1):185–208, Feb. 2019. doi:10.1257/jep.33.1.185. →
page 28
[21] K. K. Barkjohn, B. Gantt, and A. L. Clements. Development and

application of a United States-wide correction for PM2.5 data collected
with the PurpleAir sensor. Atmospheric Measurement Techniques, 14(6):
4617–4637, June 2021. doi:10.5194/amt-14-4617-2021. → pages
2, 12, 20, 21, 35, 36, 37, 38, 39, 46, 48, 53, 54, 55
[22] M. Bart, D. E. Williams, B. Ainslie, I. McKendry, J. Salmond, S. K.

Grange, M. Alavi-Shoshtari, D. Steyn, and G. S. Henshaw. High Density
Ozone Monitoring Using Gas Sensitive Semi-Conductor Sensors in the
Lower Fraser Valley, British Columbia. Environmental Science &
Technology, 48(7):3970–3977, Apr. 2014. doi:10.1021/es404610t. →
pages 16, 17
[23] C. Beckx, L. Int Panis, T. Arentze, D. Janssens, R. Torfs, S. Broekx, and

G. Wets. A dynamic activity-based population modelling approach to
evaluate exposure to air pollution: Methods and application to a Dutch
urban area. Environmental Impact Assessment Review, 29(3):179–185,
Apr. 2009. doi:10.1016/j.eiar.2008.10.001. → page 95
[24] T. Becnel, K. Tingey, J. Whitaker, T. Sayahi, K. Le, P. Goffin,

A. Butterfield, K. Kelly, and P.-E. Gaillardon. A Distributed Low-Cost
Pollution Monitoring Platform. IEEE Internet of Things Journal, 6(6):
10738–10748, Dec. 2019. doi:10.1109/JIOT.2019.2941374. → page 208
164
[25] R. Beelen, G. Hoek, D. Vienneau, M. Eeftens, K. Dimakopoulou,
X. Pedeli, M.-Y. Tsai, N. Künzli, T. Schikowski, A. Marcon, K. T. Eriksen,
O. Raaschou-Nielsen, E. Stephanou, E. Patelarou, T. Lanki, T. Yli-Tuomi,
C. Declercq, G. Falq, M. Stempfelet, M. Birk, J. Cyrys, S. von Klot,
G. Nádor, M. J. Varró, A. Dėdelė, R. Gražulevičienė, A. Mölter, S. Lindley,
C. Madsen, G. Cesaroni, A. Ranzi, C. Badaloni, B. Hoffmann,
M. Nonnemacher, U. Krämer, T. Kuhlbusch, M. Cirach, A. de Nazelle,
M. Nieuwenhuijsen, T. Bellander, M. Korek, D. Olsson, M. Strömgren,
E. Dons, M. Jerrett, P. Fischer, M. Wang, B. Brunekreef, and K. de Hoogh.
Development of NO2 and NOx land use regression models for estimating
air pollution exposure in 36 study areas in Europe – The ESCAPE project.
Atmospheric Environment, 72(Journal Article):10–23, 2013.
doi:10.1016/j.atmosenv.2013.02.037. → pages 23, 30, 117, 152, 231
[26] J. Bi, J. Stowell, E. Y. Seto, P. B. English, M. Z. Al-Hamdan, P. L. Kinney,

F. R. Freedman, and Y. Liu. Contribution of low-cost sensor measurements
to the prediction of PM2.5 levels: A case study in Imperial County,
California, USA. Environmental Research, 180:108810, Jan. 2020.
doi:10.1016/j.envres.2019.108810. → pages 35, 70, 118
[27] J. Bi, A. Wildani, H. H. Chang, and Y. Liu. Incorporating Low-Cost Sensor

Measurements into High-Resolution PM 2.5 Modeling at a Large Spatial
Scale. Environmental Science & Technology, 54(4):2152–2162, Feb. 2020.
doi:10.1021/acs.est.9b06046. → page 208
[28] J. Bi, N. Carmona, M. N. Blanco, A. J. Gassett, E. Seto, A. A. Szpiro, T. V.

Larson, P. D. Sampson, J. D. Kaufman, and L. Sheppard. Publicly available
low-cost sensor measurements for PM2.5 exposure modeling: Guidance for
monitor deployment and data selection. Environment International, 158:
106897, Jan. 2022. doi:10.1016/j.envint.2021.106897. → page 142
[29] M. Branis and M. Linhartova. Association between unemployment,
income, education level, population size and air pollution in Czech cities:
Evidence for environmental inequality? A pilot national scale analysis.
Health & Place, 18(5):1110–1114, Sept. 2012.
doi:10.1016/j.healthplace.2012.04.011. → page 22
[30] M. Brauer, G. Freedman, J. Frostad, A. van Donkelaar, R. V. Martin,

F. Dentener, R. v. Dingenen, K. Estep, H. Amini, J. S. Apte,
K. Balakrishnan, L. Barregard, D. Broday, V. Feigin, S. Ghosh, P. K.
Hopke, L. D. Knibbs, Y. Kokubo, Y. Liu, S. Ma, L. Morawska, J. L. T.
Sangrador, G. Shaddick, H. R. Anderson, T. Vos, M. H. Forouzanfar, R. T.
165
Burnett, and A. Cohen. Ambient Air Pollution Exposure Estimation for the
Global Burden of Disease 2013. Environmental Science & Technology, 50
(1):79–88, Jan. 2016. doi:10.1021/acs.est.5b03709. → pages
9, 94, 116, 148
[31] M. Brauer, J. R. Brook, T. Christidis, Y. Chu, D. L. Crouse, A. Erickson,

P. Hystad, C. Li, R. V. Martin, J. Meng, A. J. Pappin, L. L. Pinault,
M. Tjepkema, A. v. Donkelaar, C. Weagle, S. Weichenthal, and R. T.
Burnett. Mortality–Air Pollution Associations in Low Exposure
Environments (MAPLE): Phase 2. Research report (Health Effects
Institute), 2022. → pages 112, 116
[32] Breathe Collaborative. Pollution Map - Breathe Project, 2015. URL

https://breatheproject.org/pollution-map/. → pages 89, 90, 107
[33] L. Breiman. Random Forests. Machine Learning, 45:5–32, 2001.

doi:https://doi.org/10.1023/A:1010933404324. → pages 18, 224
[34] C. Brokamp, R. Jandarov, M. Rao, G. LeMasters, and P. Ryan. Exposure

assessment models for elemental components of particulate matter in an
urban environment: A comparison of regression and random forest
approaches. Atmospheric Environment, 151:1–11, Feb. 2017.
doi:10.1016/j.atmosenv.2016.11.066. → pages 24, 30, 71, 77, 84, 225, 232
[35] C. Brokamp, R. Jandarov, M. Hossain, and P. Ryan. Predicting Daily

Urban Fine Particulate Matter Concentrations Using a Random Forest
Model. Environmental Science & Technology, 52(7):4173–4179, Apr.
2018. doi:10.1021/acs.est.7b05381. → pages 25, 84, 232
[36] R. D. Brook, S. Rajagopalan, C. A. Pope, J. R. Brook, A. Bhatnagar, A. V.

Diez-Roux, F. Holguin, Y. Hong, R. V. Luepker, M. A. Mittleman,
A. Peters, D. Siscovick, S. C. Smith, L. Whitsel, and J. D. Kaufman.
Particulate Matter Air Pollution and Cardiovascular Disease: An Update to
the Scientific Statement From the American Heart Association.
Circulation, 121(21):2331–2378, June 2010.
doi:10.1161/CIR.0b013e3181dbece1. → pages 94, 116
[37] U. Brunelli, V. Piazza, L. Pignato, F. Sorbello, and S. Vitabile. Two-days

ahead prediction of daily maximum concentrations of SO2, O3, PM10,
NO2, CO in the urban area of Palermo, Italy. Atmospheric Environment, 41
(14):2967–2995, May 2007. doi:10.1016/j.atmosenv.2006.12.013. →
pages 86, 89, 230
166
[38] Bureau of Labor Statistics. American Time Use Survey — 2019 Results,
June 2020. URL https://www.bls.gov/news.release/pdf/atus.pdf. Issue:
USDL-20-1275. → page 103
[39] Bureau of Planning and Research: Transportation Planning Division. 2017

Pennsylvania Traffic Data, 2017. URL https:
//gis.penndot.gov/BPR PDF FILES/Documents/Traffic/Traffic Information/
Annual Report/2017/2017 Traffic Information Report.pdf. → page 106
[40] M. Buzzelli, M. Jerrett, R. Burnett, and N. Finklestein. Spatiotemporal

Perspectives on Air Pollution and Environmental Justice in Hamilton,
Canada, 1985–1996. Annals of the Association of American Geographers,
93(3):557–573, Sept. 2003. doi:10.1111/1467-8306.9303003. → pages
22, 126, 127, 158
[41] J. C. Cabada, A. Khlystov, A. E. Wittig, C. Pilinis, and S. N. Pandis. Light

scattering by fine particles during the Pittsburgh Air Quality Study:
Measurements and modeling. Journal of Geophysical Research:
Atmospheres, 109(D16), 2004. doi:https://doi.org/10.1029/2003JD004155.
→ page 47
[42] M. J. Campmier, J. Gingrich, S. Singh, N. Baig, S. Gani, A. Upadhya,

P. Agrawal, M. Kushwaha, H. R. Mishra, A. Pillarisetti, S. Vakacherla,
R. K. Pathak, and J. S. Apte. Site and Season Specific Calibrations Improve
Low-cost Sensor Performance: Long-term Field Evaluation of PurpleAir
Sensors in Urban and Rural India. Atmospheric Measurement Techniques
Discussions [Preprint], Mar. 2023. doi:doi.org/10.5194/amt-2023-35. →
pages xxi, 12, 18, 35, 36, 37, 38, 46, 49, 54, 56, 58, 59, 63, 64, 153, 208
[43] D. Camuffo, A. della Valle, and F. Becherini. A critical analysis of one

standard and five methods to monitor surface wetness and time-of-wetness.
Theoretical and Applied Climatology, 132(3-4):1143–1151, May 2018.
doi:10.1007/s00704-017-2167-9. → page 47
[44] J.-J. Cao, B.-Q. Xu, J.-Q. He, X.-Q. Liu, Y.-M. Han, G.-h. Wang, and C.-s.
Zhu. Concentrations, seasonal variations, and transport of carbonaceous
aerosols at a remote Mountainous region in western China. Atmospheric
Environment, 43(29):4444–4452, 2009. ISSN 1352-2310.
doi:https://doi.org/10.1016/j.atmosenv.2009.06.023. URL
https://www.sciencedirect.com/science/article/pii/S1352231009005263.
→ page 66
167
[45] N. Castell, F. R. Dauge, P. Schneider, M. Vogt, U. Lerner, B. Fishbain,
D. Broday, and A. Bartonova. Can commercial low-cost sensor platforms
contribute to air quality monitoring and exposure estimates? Environment
International, 99:293–302, Feb. 2017. doi:10.1016/j.envint.2016.12.007.
→ pages 1, 12, 15, 35, 118, 154
[46] J. Caubel, T. Cados, and T. Kirchstetter. A New Black Carbon Sensor for
Dense Air Quality Monitoring Networks. Sensors, 18(3):738, Mar. 2018.
doi:10.3390/s18030738. → pages 12, 96
[47] Central Pollution Control Board. Central Control Room for Air Quality
Management - All India. URL
https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing. →
page 40
[48] K. M. Cerully, A. Bougiatioti, J. R. Hite, H. Guo, L. Xu, N. L. Ng,

R. Weber, and A. Nenes. On the link between hygroscopicity, volatility,
and oxidation state of ambient and water-soluble aerosols in the
southeastern United States. Atmospheric Chemistry and Physics, 15(15):
8679–8694, Aug. 2015. doi:10.5194/acp-15-8679-2015. → page 213
[49] D. Chakravartty, C. L. S. Wiseman, and D. C. Cole. Differential

environmental exposure among non-Indigenous Canadians as a function of
sex/gender and race/ethnicity variables: A scoping review. Canadian
journal of public health, 105(6):e438–e444, 2014.
doi:10.17269/cjph.105.4265. → page 29
[50] P. Chen, S. Kang, L. Tripathee, A. K. Panday, M. Rupakheti, D. Rupakheti,

Q. Zhang, J. Guo, C. Li, and T. Pu. Severe air pollution and characteristics
of light-absorbing particles in a typical rural area of the Indo-Gangetic
Plain. Environmental Science and Pollution Research, 27(10):
10617–10628, Apr. 2020. doi:10.1007/s11356-020-07618-6. → pages
46, 49
[51] T.-H. Chen, Y.-C. Hsu, Y.-T. Zeng, S.-C. Candice Lung, H.-J. Su, H. J.
Chao, and C.-D. Wu. A hybrid kriging/land-use regression model with
Asian culture-specific sources to assess NO2 spatial-temporal variations.
Environmental Pollution, 259:113875, Apr. 2020.
doi:10.1016/j.envpol.2019.113875. → page 25
[52] City of Vancouver. Strathcona: Neighbourhood Social Indicators Profile

2020. Technical report, Oct. 2020. URL
168
vancouver.ca/files/cov/social-indicators-profile-strathcona.pdf. → pages
118, 119, 120
[53] City of Vancouver. Vancouver: City Social Indicator Profile 2020.
Technical report, Oct. 2020. URL https:
//vancouver.ca/files/cov/social-indicators-profile-city-of-vancouver.pdf. →
page 119
[54] L. P. Clark, D. B. Millet, and J. D. Marshall. National Patterns in
Environmental Injustice and Inequality: Outdoor NO2 Air Pollution in the
United States. PLoS ONE, 9(4):e94431, Apr. 2014.
doi:10.1371/journal.pone.0094431. → pages 28, 117
[55] T. Cole-Hunter, A. De Nazelle, D. Donaire-Gonzalez, N. Kubesch,

G. Carrasco-Turigas, F. Matt, M. Foraster, T. Martı́nez, A. Ambros,
M. Cirach, D. Martinez, J. Belmonte, and M. Nieuwenhuijsen. Estimated
effects of air pollution and space-time-activity on cardiopulmonary
outcomes in healthy adults: A repeated measures study. Environment
International, 111:247–259, Feb. 2018. doi:10.1016/j.envint.2017.11.024.
→ page 26
[56] K. C. Conway. North American Port Analysis: Beyond Post-Panamax
Basics to Logistics, 2012. URL https://cre.org/real-estate-issues/
north-american-port-analysis-beyond-post-panamax-basics-logistics/. →
page 117
[57] T. W. Crawford, S. B. Jilcott Pitts, J. T. McGuirt, T. C. Keyserling, and
A. S. Ammerman. Conceptualizing and comparing neighborhood and
activity space measures for food environment research. Health & Place, 30:
215–225, Nov. 2014. doi:10.1016/j.healthplace.2014.09.007. → page 26
[58] E. S. Cross, L. R. Williams, D. K. Lewis, G. R. Magoon, T. B. Onasch,
M. L. Kaminsky, D. R. Worsnop, and J. T. Jayne. Use of electrochemical
sensors for measurement of air pollution: correcting interference response
and validating measurements. Atmospheric Measurement Techniques, 10
(9):3575–3588, Sept. 2017. doi:10.5194/amt-10-3575-2017. → pages
2, 12, 15, 16, 17, 18, 36, 96, 122
[59] D. L. Crouse, M. S. Goldberg, and N. A. Ross. A prediction-based
approach to modelling temporal and spatial variability of traffic-related air
pollution in Montreal, Canada. Atmospheric Environment, 43(32):
5075–5084, Oct. 2009. doi:10.1016/j.atmosenv.2009.06.040. → pages
23, 30, 152, 231
169
[60] J. Dahlin and T. B. Schön. Getting Started with Particle
Metropolis-Hastings for Inference in Nonlinear Dynamical Models.
Journal of Statistical Software, Code Snippets, 88(2):1 – 41, Mar. 2019.
doi:10.18637/jss.v088.c02. Section: Code Snippets. → page 44
[61] C. Dario. A method to obtain precise determinations of relative humidity

using thin film capacitive sensors under normal or extreme humidity
conditions. Journal of Cultural Heritage, 37:166–169, May 2019.
doi:10.1016/j.culher.2018.11.003. → page 47
[62] S. K. Das, A. Chatterjee, S. K. Ghosh, and S. Raha. Fog-Induced Changes

in Optical and Physical Properties of Transported Aerosols over
Sundarban, India. Aerosol and Air Quality Research, 15(4):1201–1212,
2015. doi:10.4209/aaqr.2014.11.0287. → pages 49, 50
[63] M. Delavar, A. Gholami, G. Shiran, Y. Rashidi, G. Nakhaeizadeh, K. Fedra,

and S. Hatefi Afshar. A Novel Method for Improving Air Pollution
Prediction Based on Machine Learning Approaches: A Case Study Applied
to the Capital City of Tehran. ISPRS International Journal of
Geo-Information, 8(2):99, Feb. 2019. doi:10.3390/ijgi8020099. → pages
24, 30, 71, 156
[64] P. deSouza, K. Barkjohn, A. Clements, J. Lee, R. Kahn, B. Crawford, and

P. Kinney. An analysis of degradation in low-cost particulate matter
sensors. Environmental Science: Atmospheres, 3(3):521–536, 2023.
doi:10.1039/D2EA00142J. → pages 17, 36
[65] B. Dewulf, T. Neutens, W. Lefebvre, G. Seynaeve, C. Vanpoucke,

C. Beckx, and N. Van de Weghe. Dynamic assessment of exposure to air
pollution using mobile phone data. International Journal of Health
Geographics, 15(1):14, Dec. 2016. doi:10.1186/s12942-016-0042-z. →
page 95
[66] Q. Di, Y. Wang, A. Zanobetti, Y. Wang, P. Koutrakis, C. Choirat,

F. Dominici, and J. D. Schwartz. Air Pollution and Mortality in the
Medicare Population. New England Journal of Medicine, 376(26):
2513–2522, June 2017. doi:10.1056/NEJMoa1702747. → pages
26, 31, 70, 94, 95, 116
[67] G. Doerksen, K. Howe, A. Thai, and K. Reid. Metro Vancouver Near-Road

Air Quality Monitoring Study. July 2020. → pages 134, 136
170
[68] Dräger. Dräger’s Guide to Portable Gas Detection. Technical report. URL
https://www.draeger.com/products/content/
9046736-guide-to-portable-gas-detection-en.pdf. → pages xx, 14, 15
[69] J. L. Durant, C. A. Ash, E. C. Wood, S. C. Herndon, J. T. Jayne, W. B.

Knighton, M. R. Canagaratna, J. B. Trull, D. Brugge, W. Zamore, and C. E.
Kolb. Short-term variation in near-highway air pollutant gradients on a
winter morning. Atmospheric Chemistry and Physics, 10(17):8341–8352,
2010. doi:10.5194/acp-10-8341-2010. → pages 23, 70
[70] R. M. Duvall, A. L. Clements, G. Hagler, A. Kamal, V. Kilaru,
L. Goodman, and S. Frederick. Performance Testing Protocols, Metrics,
and Target Values for Fine Particulate Matter Air Sensors. Technical report,
United States Environmental Protection Agency, Feb. 2021. URL
https://cfpub.epa.gov/si/si public record Report.cfm?dirEntryId=350785&
Lab=CEMM. → page 52
[71] M. Eeftens, R. Beelen, K. de Hoogh, T. Bellander, G. Cesaroni, M. Cirach,

C. Declercq, A. Dėdelė, E. Dons, A. de Nazelle, K. Dimakopoulou,
K. Eriksen, G. Falq, P. Fischer, C. Galassi, R. Gražulevičienė, J. Heinrich,
B. Hoffmann, M. Jerrett, D. Keidel, M. Korek, T. Lanki, S. Lindley,
C. Madsen, A. Mölter, G. Nádor, M. Nieuwenhuijsen, M. Nonnemacher,
X. Pedeli, O. Raaschou-Nielsen, E. Patelarou, U. Quass, A. Ranzi,
C. Schindler, M. Stempfelet, E. Stephanou, D. Sugiri, M.-Y. Tsai,
T. Yli-Tuomi, M. J. Varró, D. Vienneau, S. v. Klot, K. Wolf, B. Brunekreef,
and G. Hoek. Development of Land Use Regression Models for PM 2.5 ,
PM 2.5 Absorbance, PM 10 and PM coarse in 20 European Study Areas;
Results of the ESCAPE Project. Environmental Science & Technology, 46
(20):11195–11205, Oct. 2012. doi:10.1021/es301948k. → pages
1, 23, 25, 70, 71, 94, 117, 158, 232
[72] Environment and Climate Change Canada. Ambient Air Monitoring and
quality assurance/quality control guidelines - National Air Pollution
Surveillance Program. Technical report, Canadian Council of Ministers of
the Environment, 2019. URL https:
//ccme.ca/en/res/ambientairmonitoringandqa-qcguidelines ensecure.pdf.
ISBN 978-1-77202-056-4 PDF. → pages 10, 11
[73] B. J. Finlayson-Pitts, J. N. Pitts, and Elsevier All Access Books. Chemistry
of the upper and lower atmosphere: theory, experiments, and applications.
Number Book, Whole. Academic Press, San Diego, 2000. ISBN
9780122570605;012257060X;. → page 7
171
[74] S. Gani, S. Bhandari, S. Seraj, D. S. Wang, K. Patel, P. Soni, Z. Arub,
G. Habib, L. Hildebrandt Ruiz, and J. S. Apte. Submicron aerosol
composition in the world’s most polluted megacity: the Delhi Aerosol
Supersite study. Atmospheric Chemistry and Physics, 19(10):6843–6859,
May 2019. doi:10.5194/acp-19-6843-2019. → page 45
[75] C. A. Garcia, P.-S. Yap, H.-Y. Park, and B. L. Weller. Association of
long-term PM2.5 exposure with mortality using different air pollution
exposure models: impacts in rural and urban California. International
Journal of Environmental Health Research, 26(2):145–157, Mar. 2016.
doi:10.1080/09603123.2015.1061113. → page 94
[76] M. W. Gardner and S. R. Dorling. Artificial neural networks (the

multilayer perceptron)—a review of applications in the atmospheric
sciences. Atmospheric environment (1994), 32(14):2627–2636, 1998.
ISBN: 1352-2310. → page 18
[77] R. Gardner-Frolick, D. Boyd, and A. Giang. Selecting Data Analytic and
Modeling Methods to Support Air Pollution and Environmental Justice
Investigations: A Critical Review and Guidance Framework.
Environmental Science & Technology, 56(5):2843–2860, Mar. 2022.
doi:10.1021/acs.est.1c01739. → pages 22, 127
[78] GCT Global Container Terminals Inc. Analysis of Capacity Options on the
West Coast of Canada. Technical Report BQ-0894, Jan. 2019. URL
https://iaac-aeic.gc.ca/050/documents/p80054/130087E.pdf. → pages
121, 134
[79] J. Gelb and P. Apparicio. Modelling Cyclists’ Multi-Exposure to Air and
Noise Pollution with Low-Cost Sensors—The Case of Paris. Atmosphere,
11(4):422, Apr. 2020. doi:10.3390/atmos11040422. → page 148
[80] A. Giang and K. Castellani. Cumulative air pollution indicators highlight
unique patterns of injustice in urban Canada. Environmental Research
Letters, 15(12):124063, Dec. 2020. doi:10.1088/1748-9326/abcac5. →
pages 27, 28, 117, 128, 135, 136
[81] A. Giang, D. R. Boyd, A. J. Ono, and B. McIlroy-Young. Exposure, access,
and inequities: Central themes, emerging trends, and key gaps in Canadian
environmental justice literature from 2006 to 2017. The Canadian
geographer, 66(3):434–449, 2022. ISSN 0008-3658.
doi:10.1111/cag.12754. URL https://go.exlibris.link/Q1Cd7m0G. Place:
Toronto Publisher: Blackwell Publishing Ltd. → page 117
172
[82] G. P. Gobbi, L. Di Liberto, and F. Barnaba. Impact of port emissions on
EU-regulated and non-regulated air quality indicators: The case of
Civitavecchia (Italy). Science of The Total Environment, 719:134984, June
2020. doi:10.1016/j.scitotenv.2019.134984. → page 118
[83] Government of British Columbia. British Columbia Ambient Air Quality

Objectives. Information Sheet, Nov. 2021. URL
https://www2.gov.bc.ca/assets/gov/environment/air-land-water/air/
reports-pub/prov air qual objectives fact sheet.pdf. → page 10
[84] A. Gressent, L. Malherbe, A. Colette, H. Rollin, and R. Scimia. Data

fusion for air quality mapping using low-cost sensor observations:
Feasibility and added-value. Environment International, 143:105965, Oct.
2020. doi:10.1016/j.envint.2020.105965. → page 126
[85] P. Gu, H. Z. Li, Q. Ye, E. S. Robinson, J. S. Apte, A. L. Robinson, and

A. A. Presto. Intra-city variability of PM exposure is driven by
carbonaceous sources and correlated with land use variables.
Environmental Science & Technology, Sept. 2018.
doi:10.1021/acs.est.8b03833. → page 213
[86] L. Gupta, R. Dev, K. Zaidi, R. Sunder Raman, G. Habib, and B. Ghosh.

Assessment of PM10 and PM2.5 over Ghaziabad, an industrial city in the
Indo-Gangetic Plain: spatio-temporal variability and associated health
effects. Environmental Monitoring and Assessment, 193(11):735, Nov.
2021. doi:10.1007/s10661-021-09411-5. → page 49
[87] S. Gutenberg. Demystifying the Air Quality Health Index. Canadian

Pharmacists Journal / Revue des Pharmaciens du Canada, 147(6):
332–334, Nov. 2014. doi:10.1177/1715163514552560. → page 117
[88] M. Habermann, M. Billger, and M. Haeger-Eugensson. Land use

Regression as Method to Model Air Pollution. Previous Results for
Gothenburg/Sweden. Procedia Engineering, 115:21–28, 2015.
doi:10.1016/j.proeng.2015.07.350. → pages 23, 70
[89] D. H. Hagan and J. H. Kroll. Assessing the accuracy of low-cost optical

particle sensors using a physics-based approach. Atmospheric
Measurement Techniques, 13(11):6343–6355, Nov. 2020.
doi:10.5194/amt-13-6343-2020. → page 113
[90] R. Haluza-Delay, M. Mascarenhas, J. L. Robinson, D. B. Tindall, E. Seldat,

G. Pechlaner, L. L. Hanson, J. Page, S. F. Trainor, I. Chapin, F. Stuart, H. P.
173
Huntington, D. C. Natcher, G. Kofinas, C. Teelucksingh, and M. C. J.
Stoddart. Environmental Justice in Canada. Local Environment, 12(6):
557–564, 2007. doi:10.1080/13549830701657323. → page 29
[91] S. A. H. Hassanpour Matikolaei, H. Jamshidi, and A. Samimi.

Characterizing the effect of traffic density on ambient CO, NO2, and
PM2.5 in Tehran, Iran: an hourly land-use regression model.
Transportation Letters, 11(8):436–446, Sept. 2019.
doi:10.1080/19427867.2017.1385201. → pages 23, 230, 231, 232
[92] Health Canada. Human Health Risk Assessment for Ambient Nitrogen
Dioxide. Technical report, May 2016. URL https:
//www.canada.ca/en/health-canada/services/publications/healthy-living/
human-health-risk-assessment-ambient-nitrogen-dioxide.html. → page 8
[93] Health Effects Institute. State of Global Air 2019. Special Report.
Technical report, Boston, MA, 2019. URL
https://www.stateofglobalair.org/sites/default/files/soga 2019 report.pdf.
→ page 69
[94] S. B. Henderson, B. Beckerman, M. Jerrett, and M. Brauer. Application of

Land Use Regression to Estimate Long-Term Concentrations of
Traffic-Related Nitrogen Oxides and Fine Particulate Matter.
Environmental Science & Technology, 41(7):2422–2428, Apr. 2007.
doi:10.1021/es0606780. → pages 22, 23, 25, 87, 152, 231, 232
[95] T. Hengl, M. Nussbaum, M. N. Wright, G. B. M. Heuvelink, and B. Gräler.

Random forest as a generic framework for predictive modeling of spatial
and spatio-temporal variables. PeerJ, 6(Journal Article):e5518, 2018.
doi:10.7717/peerj.5518. Place: United States Publisher: PeerJ. → pages
24, 72, 156
[96] M. Hernandez, T. W. Collins, and S. E. Grineski. Immigration, mobility,

and environmental injustice: A comparative study of Hispanic people’s
residential decision-making and exposure to hazardous air pollutants in
Greater Houston, Texas. Geoforum, 60:83–94, Mar. 2015.
doi:10.1016/j.geoforum.2015.01.013. → page 26
[97] N. Hilker, J. M. Wang, C.-H. Jeong, R. M. Healy, U. Sofowote, J. Debosz,

Y. Su, M. Noble, A. Munoz, G. Doerksen, L. White, C. Audette, D. Herod,
J. R. Brook, and G. J. Evans. Traffic-related air pollution near roadways:
discerning local impacts from background. Atmospheric Measurement
174
Techniques, 12(10):5247–5261, Oct. 2019.
doi:10.5194/amt-12-5247-2019. → page 9
[98] J. Hindson. Development and assessment of an integrated low-cost air

quality and traffic sensor network : quantifying traffic-related air pollution
in Vancouver, Canada. Text, 2022. URL
https://open.library.ubc.ca/collections/24/items/1.0422687. → page 199
[99] R. W. Hornung and L. D. Reed. Estimation of Average Concentration in

the Presence of Nondetectable Values. Applied Occupational and
Environmental Hygiene, 5(1):46–51, Jan. 1990.
doi:10.1080/1047322X.1990.10389587. → pages 97, 236
[100] X. Hu, L. A. Waller, A. Lyapustin, Y. Wang, M. Z. Al-Hamdan, W. L.

Crosson, M. G. Estes, S. M. Estes, D. A. Quattrochi, S. J. Puttaswamy, and
Y. Liu. Estimating ground-level PM2.5 concentrations in the Southeastern
United States using MAIAC AOD retrievals and a two-stage model.
Remote Sensing of Environment, 140:220–232, Jan. 2014.
doi:10.1016/j.rse.2013.08.032. → page 150
[101] K. Huang, J. Bi, X. Meng, G. Geng, A. Lyapustin, K. J. Lane, D. Gu, P. L.

Kinney, and Y. Liu. Estimating daily PM2.5 concentrations in New York
City at the neighborhood-scale: Implications for integrating non-regulatory
measurements. Science of The Total Environment, 697:134094, Dec. 2019.
doi:10.1016/j.scitotenv.2019.134094. → page 27
[102] L. Huang, C. Zhang, and J. Bi. Development of land use regression models
for PM2.5, SO2, NO2 and O3 in Nanjing, China. Environmental Research,
158:542–552, Oct. 2017. doi:10.1016/j.envres.2017.07.010. → pages 2, 70
[103] IQAir. Air quality in the world. URL

https://www.iqair.com/world-air-quality. → page 159
[104] D. A. Jaffe, C. Miller, K. Thompson, B. Finley, M. Nelson, J. Ouimette,

and E. Andrews. An evaluation of the U.S. EPA’s correction equation for
PurpleAir sensor data in smoke, dust, and wintertime urban pollution
events. Atmospheric Measurement Techniques, 16(5):1311–1322, 2023.
doi:10.5194/amt-16-1311-2023. URL
https://amt.copernicus.org/articles/16/1311/2023/. → page 66
[105] S. Jain, A. A. Presto, and N. Zimmerman. Spatial Modeling of Daily

PM2.5, NO2, and CO Concentrations Measured by a Low-Cost Sensor
Network: Comparison of Linear, Machine Learning, and Hybrid Land Use
175
Models. Environmental Science & Technology, June 2021.
doi:10.1021/acs.est.1c02653. → pages 96, 97, 98, 236, 238
[106] R. Jayaratne, X. Liu, P. Thai, M. Dunbabin, and L. Morawska. The

influence of humidity on the performance of a low-cost air particle mass
sensor and the effect of atmospheric fog. Atmospheric measurement
techniques, 11(8):4883–4890, 2018. doi:10.5194/amt-11-4883-2018. →
pages 36, 47
[107] R. Jayaratne, X. Liu, K.-H. Ahn, A. Asumadu-Sakyi, G. Fisher, J. Gao,

A. Mabon, M. Mazaheri, B. Mullins, M. Nyaku, Z. Ristovski, Y. Scorgie,
P. Thai, M. Dunbabin, and L. Morawska. Low-cost PM2.5 Sensors: An
Assessment of Their Suitability for Various Applications. Aerosol and Air
Quality Research, 2020. doi:10.4209/aaqr.2018.10.0390. → page 47
[108] M. Jerrett, R. T. Burnett, P. Kanaroglou, J. Eyles, N. Finkelstein, C. Giovis,

and J. R. Brook. A GIS–Environmental Justice Analysis of Particulate Air
Pollution in Hamilton, Canada. Environment and Planning A: Economy
and Space, 33(6):955–973, June 2001. doi:10.1068/a33137. → pages
22, 117, 127
[109] M. Jerrett, R. T. Burnett, R. Ma, C. A. Pope, D. Krewski, K. B. Newbold,

G. Thurston, Y. Shi, N. Finkelstein, E. E. Calle, and M. J. Thun. Spatial
Analysis of Air Pollution and Mortality in Los Angeles. Epidemiology
(Cambridge, Mass.), 16(6):727–736, 2005.
doi:10.1097/01.ede.0000181630.15826.7d. → pages 26, 70, 95
[110] M. Jerrett, M. A. Arain, P. Kanaroglou, B. Beckerman, D. Crouse, N. L.

Gilbert, J. R. Brook, N. Finkelstein, and M. M. Finkelstein. Modeling the
Intraurban Variability of Ambient Traffic Pollution in Toronto, Canada.
Journal of Toxicology and Environmental Health, Part A, 70(3-4):200–212,
Feb. 2007. doi:10.1080/15287390600883018. → pages
23, 30, 87, 152, 231
[111] S. K. Jha, M. Kumar, V. Arora, S. N. Tripathi, V. M. Motghare, A. A.

Shingare, K. A. Rajput, and S. Kamble. Domain Adaptation-Based Deep
Calibration of Low-Cost PM. Sensors. IEEE Sensors Journal, 21(22):
25941–25949, Nov. 2021. doi:10.1109/JSEN.2021.3118454. → pages
20, 36, 46, 64, 156, 208
[112] W. Jiao, G. Hagler, R. Williams, R. Sharpe, R. Brown, D. Garver, R. Judge,

M. Caudill, J. Rickard, M. Davis, L. Weinstock, S. Zimmer-Dauphinee, and
176
K. Buckley. Community Air Sensor Network (CAIRSENSE) project:
evaluation of low-costsensor performance in a suburban environment in the
southeastern UnitedStates. Atmospheric Measurement Techniques, 9(11):
5281–5292, Nov. 2016. doi:10.5194/amt-9-5281-2016. → pages 17, 36
[113] Y. Jo, M. Jang, S. Han, A. Madhu, B. Koo, Y. Jia, Z. Yu, S. Kim, and
J. Park. CAMx-UNIPAR Simulation of SOA Mass Formed from
Multiphase Reactions of Hydrocarbons under the Central Valley Urban
Atmospheres of California. EGUsphere [Preprint], Feb. 2023.
doi:https://doi.org/10.5194/egusphere-2023-93. → page 45
[114] K. K. Johnson, M. H. Bergin, A. G. Russell, and G. S. Hagler. Field Test of

Several Low-Cost Particulate Matter Sensors in High and Low
Concentration Urban Environments. Aerosol and Air Quality Research, 18
(3):565–578, 2018. doi:10.4209/aaqr.2017.10.0418. → page 36
[115] J. Johnston and L. Cushing. Chemical Exposures, Health, and

Environmental Justice in Communities Living on the Fenceline of Industry.
Current Environmental Health Reports, 7(1):48–57, Mar. 2020.
doi:10.1007/s40572-020-00263-8. → page 28
[116] A. C. Just, R. O. Wright, J. Schwartz, B. A. Coull, A. A. Baccarelli, M. M.

Tellez-Rojo, E. Moody, Y. Wang, A. Lyapustin, and I. Kloog. Using
High-Resolution Satellite Aerosol Optical Depth To Estimate Daily PM 2.5
Geographical Distribution in Mexico City. Environmental Science &
Technology, 49(14):8576–8584, July 2015. doi:10.1021/acs.est.5b00859.
→ pages 72, 97
[117] S. M. Kang, M. S. Kim, and M. Lee. The trends of composite

environmental indices in Korea. Journal of Environmental Management, 64
(2):199–206, 2002. doi:10.1006/jema.2001.0529. Place: Oxford Publisher:
Elsevier Ltd. → pages 28, 128
[118] A. A. Karner, D. S. Eisinger, and D. A. Niemeier. Near-Roadway Air

Quality: Synthesizing the Findings from Real-World Data. Environmental
Science & Technology, 44(14):5334–5344, July 2010.
doi:10.1021/es100008x. → pages 23, 70
[119] K. Kelly, J. Whitaker, A. Petty, C. Widmer, A. Dybwad, D. Sleeth,

R. Martin, and A. Butterfield. Ambient and laboratory evaluation of a
low-cost particulate matter sensor. Environmental Pollution, 221:491–500,
Feb. 2017. doi:10.1016/j.envpol.2016.12.039. → pages 15, 38
177
[120] T. Kerr. Fire destroys Value Village in East Vancouver. Global News, June
2022. URL
https://globalnews.ca/news/8958070/fire-value-village-east-vancouver/.
→ page 131
[121] S. Kershaw, S. Gower, C. Rinner, and M. Campbell. Identifying inequitable

exposure to toxic air pollution in racialized and low-income
neighbourhoods to support pollution prevention. Geospatial Health, 7(2):
265, May 2013. doi:10.4081/gh.2013.85. → page 117
[122] I. A. Khalek, M. G. Blanks, P. M. Merritt, and B. Zielinska. Regulated and

unregulated emissions from modern 2010 emissions-compliant heavy-duty
on-highway diesel engines. Journal of the Air & Waste Management
Association, 65(8):987–1001, Aug. 2015.
doi:10.1080/10962247.2015.1051606. → page 135
[123] A. Kibble and R. Harrison. Point sources of air pollution. Occupational

Medicine, 55(6):425–431, Sept. 2005. doi:10.1093/occmed/kqi138. →
page 6
[124] S. Kimbrough, R. Chris Owen, M. Snyder, and J. Richmond-Bryant. NO to

NO2 conversion rate analysis and implications for dispersion model
chemistry methods using Las Vegas, Nevada near-road field measurements.
Atmospheric Environment, 165:23–34, Sept. 2017.
doi:10.1016/j.atmosenv.2017.06.027. → page 8
[125] I. Kloog, B. Ridgway, P. Koutrakis, B. A. Coull, and J. D. Schwartz. Long-

and Short-Term Exposure to PM2.5 and Mortality: Using Novel Exposure
Models. Epidemiology, 24(4):555–561, July 2013.
doi:10.1097/EDE.0b013e318294beaa. → page 95
[126] A. L. Kramer, J. Liu, L. Li, R. Connolly, M. Barbato, and Y. Zhu.

Environmental justice analysis of wildfire-related PM2.5 exposure using
low-cost sensors in California. Science of The Total Environment, 856:
159218, Jan. 2023. doi:10.1016/j.scitotenv.2022.159218. → page 148
[127] S. M. Kreidenweis, M. D. Petters, and P. J. DeMott. Single-parameter

estimates of aerosol water content. Environmental Research Letters, 3(3):
035002, July 2008. doi:10.1088/1748-9326/3/3/035002. → pages 212, 213
[128] J. Lepeule, F. Laden, D. Dockery, and J. Schwartz. Chronic Exposure to

Fine Particles and Mortality: An Extended Follow-up of the Harvard Six
178
Cities Study from 1974 to 2009. Environmental Health Perspectives, 120
(7):965–970, July 2012. doi:10.1289/ehp.1104660. → page 94
[129] A. C. Lewis, J. D. Lee, P. M. Edwards, M. D. Shaw, M. J. Evans, S. J.

Moller, K. R. Smith, J. W. Buckley, M. Ellis, S. R. Gillot, and A. White.
Evaluating the performance of low cost chemical sensors for air pollution
research. Faraday Discussions, 189:85–103, 2016.
doi:10.1039/C5FD00201J. → page 17
[130] H. Z. Li, T. R. Dallmann, P. Gu, and A. A. Presto. Application of mobile

sampling to investigate spatial variation in fine particle composition.
Atmospheric Environment, 142:71–82, Oct. 2016.
doi:10.1016/j.atmosenv.2016.07.042. → pages 23, 25, 70, 76, 77, 117, 232
[131] H. Z. Li, P. Gu, Q. Ye, N. Zimmerman, E. S. Robinson, R. Subramanian,

J. S. Apte, A. L. Robinson, and A. A. Presto. Spatially dense air pollutant
sampling: Implications of spatial variability on the representativeness of
stationary air pollutant monitors. Atmospheric Environment: X, 2:100012,
Apr. 2019. doi:10.1016/j.aeaoa.2019.100012. → pages 12, 139, 143
[132] T. Li, Y. Guo, Y. Liu, J. Wang, Q. Wang, Z. Sun, M. Z. He, and X. Shi.
Estimating mortality burden attributable to short-term PM2.5 exposure: A
national observational study in China. Environment International, 125:
245–251, Apr. 2019. doi:10.1016/j.envint.2019.01.073. → page 95
[133] L. Liang, Z. Han, J. Li, and M. Liang. Investigation of the influence of

mineral dust on airborne particulate matter during the COVID-19 epidemic
in spring 2020 over China. Atmospheric Pollution Research, 13(6):101424,
2022. ISSN 1309-1042. doi:https://doi.org/10.1016/j.apr.2022.101424.
URL
https://www.sciencedirect.com/science/article/pii/S1309104222001106.
→ page 66
[134] K. H. Liland, T. Almøy, and B.-H. Mevik. Optimal Choice of Baseline

Correction for Multivariate Calibration of Spectra. Applied Spectroscopy,
64(9):1007–1016, 2010. doi:10.1366/000370210792434350. → page 44
[135] W. Liu, X. Li, Z. Chen, G. Zeng, T. León, J. Liang, G. Huang, Z. Gao,

S. Jiao, X. He, and M. Lai. Land use regression models coupled with
meteorology to model spatial and temporal variability of NO2 and PM10 in
Changsha, China. Atmospheric Environment, 116(Journal Article):
272–280, 2015. doi:10.1016/j.atmosenv.2015.06.056. → page 85
179
[136] Y. Lu. Beyond air pollution at home: Assessment of personal exposure to
PM2.5 using activity-based travel demand model and low-cost air sensor
network data. Environmental Research, 201:111549, Oct. 2021.
doi:10.1016/j.envres.2021.111549. → pages 26, 95, 106, 111, 142
[137] Y. Lu and T. Fang. Examining Personal Air Pollution Exposure, Intake, and
Health Danger Zone Using Time Geography and 3D Geovisualization.
ISPRS International Journal of Geo-Information, 4(1):32–46, Dec. 2014.
doi:10.3390/ijgi4010032. → page 26
[138] H. Lyu, T. Dai, Y. Zheng, G. Shi, and T. Nakajima. Estimation of PM2.5

Concentrations over Beijing with MODIS AODs Using an Artificial Neural
Network. SOLA, 14(0):14–18, 2018. doi:10.2151/sola.2018-003. → page
24
[139] X. Ma, I. Longley, J. Gao, A. Kachhara, and J. Salmond. A site-optimised
multi-scale GIS based land use regression model for simulating local scale
patterns in air pollution. Science of The Total Environment, 685:134–149,
Oct. 2019. doi:10.1016/j.scitotenv.2019.05.408. → pages 23, 231
[140] B. Maag, Z. Zhou, and L. Thiele. A Survey on Sensor Calibration in Air
Pollution Monitoring Deployments. IEEE Internet of Things Journal, 5(6):
4857–4870, Dec. 2018. doi:10.1109/JIOT.2018.2853660. → pages
1, 12, 35, 118
[141] C. Malings, R. Tanzer, A. Hauryliuk, S. P. N. Kumar, N. Zimmerman, L. B.
Kara, A. A. Presto, and R. Subramanian. Development of a general
calibration model and long-term performance evaluation of low-cost
sensors for air pollutant gas monitoring. Atmospheric Measurement
Techniques, 12(2):903–920, Feb. 2019. doi:10.5194/amt-12-903-2019. →
pages
xxvi, 2, 12, 13, 15, 16, 18, 20, 21, 22, 35, 36, 73, 122, 126, 156, 209, 211
[142] C. Malings, R. Tanzer, A. Hauryliuk, P. K. Saha, A. L. Robinson, A. A.
Presto, and R. Subramanian. Fine particle mass monitoring with low-cost
sensors: Corrections and long-term performance evaluation. Aerosol
Science and Technology, 54(2):160–174, 2020.
doi:10.1080/02786826.2019.1623863. Publisher: Taylor & Francis
eprint: https://doi.org/10.1080/02786826.2019.1623863. → pages
xviii, 16, 17, 36, 38, 46, 47, 65, 73, 97, 122, 155, 208, 209, 212, 214
[143] V. Malyan, V. Kumar, and M. Sahu. Significance of sources and size
distribution on calibration of low-cost particle sensors: Evidence from a
180
field sampling campaign. Journal of Aerosol Science, 168:106114, Feb.
2023. doi:10.1016/j.jaerosci.2022.106114. → pages 20, 37, 64, 208
[144] P. J. Mason and D. J. Thomson. Boundary Layer (Atmospheric) and Air

Pollution | Overview. In G. R. North, J. Pyle, and F. Zhang, editors,
Encyclopedia of Atmospheric Sciences (Second Edition), pages 220–226.
Academic Press, Oxford, second edition edition, 2015. ISBN
978-0-12-382225-3. doi:10.1016/B978-0-12-382225-3.00081-5. → page
85
[145] N. Masson, R. Piedrahita, and M. Hannigan. Approach for quantification

of metal oxide type semiconductor gas sensors used for ambient air quality
monitoring. Sensors and Actuators B: Chemical, 208:339–345, Mar. 2015.
doi:10.1016/j.snb.2014.11.032. → pages 2, 17, 18
[146] N. Masson, R. Piedrahita, and M. Hannigan. Quantification Method for

Electrolytic Sensors in Long-Term Monitoring of Ambient Air Quality.
Sensors, 15(10):27283–27302, Oct. 2015. doi:10.3390/s151027283. →
pages 16, 17, 18, 36, 122
[147] C. Mauboules. Homelessness & Supportive Housing Strategy, Oct. 2020.

URL
https://council.vancouver.ca/20201007/documents/pspc1presentation.pdf.
→ pages 120, 137
[148] C. McFarlane, P. K. Isevulambire, R. S. Lumbuenamo, A. M. E. Ndinga,

R. Dhammapala, X. Jin, V. F. McNeill, C. Malings, R. Subramanian, and
D. M. Westervelt. First Measurements of Ambient PM2.5 in Kinshasa,
Democratic Republic of Congo and Brazzaville, Republic of Congo Using
Field-calibrated Low-cost Sensors. Aerosol and Air Quality Research, 21
(7):200619, 2021. doi:10.4209/aaqr.200619. → pages 20, 36, 46, 156, 208
[149] T. McLaughlin, L. Kearney, and L. Sanicola. Special Report: U.S. air

monitors routinely miss pollution - even refinery explosions. Reuters, Dec.
2020. URL
https://www.reuters.com/article/usa-pollution-airmonitors-specialreport/
u-s-air-monitors-routinely-miss-pollution-even-refinery-explosions-idUSKBN28B4RT.
→ pages 1, 29
[150] M. Mead, O. Popoola, G. Stewart, P. Landshoff, M. Calleja, M. Hayes,

J. Baldovi, M. McLeod, T. Hodgson, J. Dicks, A. Lewis, J. Cohen,
R. Baron, J. Saffell, and R. Jones. The use of electrochemical sensors for
181
monitoring urban air quality in low-cost, high-density networks.
Atmospheric Environment, 70:186–203, May 2013.
doi:10.1016/j.atmosenv.2012.11.060. → pages
2, 14, 16, 17, 18, 36, 122, 149, 155
[151] M. Mehra, F. Zirzow, K. Ram, and S. Norra. Geochemistry of PM2.5

aerosols at an urban site, Varanasi, in the Eastern Indo-Gangetic Plain
during pre-monsoon season. Atmospheric Research, 234:104734, Apr.
2020. doi:10.1016/j.atmosres.2019.104734. → page 49
[152] L. D. Mercer, A. A. Szpiro, L. Sheppard, J. Lindström, S. D. Adar, R. W.

Allen, E. L. Avol, A. P. Oron, T. Larson, L.-J. S. Liu, and J. D. Kaufman.
Comparing universal kriging and land-use regression for predicting
concentrations of gaseous oxides of nitrogen (NOx) for the Multi-Ethnic
Study of Atherosclerosis and Air Pollution (MESA Air). Atmospheric
Environment, 45(26):4412–4420, Aug. 2011.
doi:10.1016/j.atmosenv.2011.05.043. → pages 127, 159
[153] MetroVancouver. Air Aware and Small Sensors. URL

http://www.metrovancouver.org/services/air-quality/action/air-aware/
Pages/default.aspx. → pages 125, 156
[154] MetroVancouver. Metro Vancouver Ambient Air Quality Objectives.

Technical Report 34097451, Jan. 2020. URL
https://metrovancouver.org/services/air-quality-climate-change/
Documents/ambient-air-quality-objectives.pdf. → page 10
[155] MetroVancouver. Metro Vancouver Overview, 2021. URL

http://www.metrovancouver.org/about/. → pages 12, 41
[156] L. Minet, R. Gehr, and M. Hatzopoulou. Capturing the sensitivity of

land-use regression models to short-term mobile monitoring campaigns
using air pollution micro-sensors. Environmental Pollution, 230:280–290,
Nov. 2017. doi:10.1016/j.envpol.2017.06.071. → pages 35, 70, 118
[157] M. L. Miranda, S. E. Edwards, M. H. Keating, and C. J. Paul. Making the

Environmental Justice Grade: The Relative Burden of Air Pollution
Exposure in the United States. International Journal of Environmental
Research and Public Health, 8(6):1755–1771, May 2011.
doi:10.3390/ijerph8061755. → pages 12, 28
[158] M. Mirzaei, J. Amanollahi, and C. G. Tzanis. Evaluation of linear,

nonlinear, and hybrid models for predicting PM2.5 based on a GTWR
182
model and MODIS AOD data. Air Quality, Atmosphere & Health, 12(10):
1215–1224, Oct. 2019. doi:10.1007/s11869-019-00739-z. → page 24
[159] P. Mohai, D. Pellow, and J. T. Roberts. Environmental Justice. Annual

Review of Environment and Resources, 34(1):405–430, Nov. 2009.
doi:10.1146/annurev-environ-082508-094348. → page 28
[160] S. Moltchanov, I. Levy, Y. Etzion, U. Lerner, D. M. Broday, and

B. Fishbain. On the feasibility of measuring urban air pollution by wireless
distributed sensor networks. Science of The Total Environment, 502:
537–547, Jan. 2015. doi:10.1016/j.scitotenv.2014.09.059. → pages 18, 122
[161] R. Morello-Frosch and B. M. Jesdale. Separate and Unequal: Residential

Segregation and Estimated Cancer Risks Associated with Ambient Air
Toxics in U.S. Metropolitan Areas. Environmental Health Perspectives,
114(3):386–393, Mar. 2006. doi:10.1289/ehp.8500. → page 117
[162] A. Moreno-Jiménez, R. Cañada-Torrecilla, M. J. Vidal-Domı́nguez,

A. Palacios-Garcı́a, and P. Martı́nez-Suárez. Assessing environmental
justice through potential exposure to air pollution: A socio-spatial analysis
in Madrid and Barcelona, Spain. Geoforum, 69:117–131, Feb. 2016.
doi:10.1016/j.geoforum.2015.12.008. → page 22
[163] N. H. Motlagh, M. A. Zaidan, P. L. Fung, E. Lagerspetz, K. Aula,

S. Varjonen, M. Siekkinen, A. Rebeiro-Hargrave, T. Petäjä, Y. Matsumi,
M. Kulmala, T. Hussein, P. Nurmi, and S. Tarkoma. Transit pollution
exposure monitoring using low-cost wearable sensors. Transportation
Research Part D: Transport and Environment, 98:102981, Sept. 2021.
doi:10.1016/j.trd.2021.102981. → page 148
[164] M. Mueller, J. Meyer, and C. Hueglin. Design of an ozone and nitrogen

dioxide sensor unit and its long-term operation within a sensor network in
the city of Zurich. Atmospheric Measurement Techniques, 10(10):
3783–3799, Oct. 2017. doi:10.5194/amt-10-3783-2017. → page 17
[165] G. Munda. “Measuring Sustainability”: A Multi-Criterion Framework.

Environment, Development and Sustainability, 7(1):117–134, Jan. 2005.
doi:10.1007/s10668-003-4713-0. → pages 28, 128
[166] S. Munir, M. Mayfield, and D. Coca. Understanding Spatial Variability of

NO2 in Urban Areas Using Spatial Modelling and Data Fusion
Approaches. Atmosphere, 12(2):179, Jan. 2021.
doi:10.3390/atmos12020179. → pages 2, 22, 126, 127, 158
183
[167] National Round Table on the Environment and the Economy. Developing
ambient air quality objectives for Canada: Advice to the Minister of
Environment. Technical report, Ottawa, Ont., 2008. URL
http://nrt-trn.ca/wp-content/uploads/2011/06/ambient-air.pdf. OCLC:
246931974. → page 10
[168] Natural Resources Canad. Learn the facts: Cold weather effects on fuel
efficiency. 2014. URL
https://www.nrcan.gc.ca/sites/www.nrcan.gc.ca/files/oee/pdf/
transportation/fuel-efficient-technologies/autosmart factsheet 3 e.pdf. →
page 85
[169] N. H. Nguyen, H. X. Nguyen, T. T. B. Le, and C. D. Vu. Evaluating

Low-Cost Commercially Available Sensors for Air Quality Monitoring and
Application of Sensor Calibration Methods for Improving Accuracy. Open
Journal of Air Pollution, 10(01):1–17, 2021.
doi:10.4236/ojap.2021.101001. → pages xx, 13, 14
[170] N. P. Nguyen and J. D. Marshall. Impact, efficiency, inequality, and

injustice of urban air pollution: variability by emission location.
Environmental Research Letters, 13(2):024002, Feb. 2018.
doi:10.1088/1748-9326/aa9cb5. → pages 22, 126, 127, 158
[171] M. M. Nyhan, I. Kloog, R. Britter, C. Ratti, and P. Koutrakis. Quantifying

population exposure to air pollution using individual mobility patterns
inferred from mobile phone data. Journal of Exposure Science &
Environmental Epidemiology, 29(2):238–247, Mar. 2019.
doi:10.1038/s41370-018-0038-9. → pages 26, 95, 106, 111
[172] Næss, P. Nafstad, G. Aamodt, B. Claussen, and P. Rosland. Relation

between Concentration of Air Pollution and Cause-Specific Mortality:
Four-Year Exposures to Nitrogen Dioxide and Particulate Matter Pollutants
in 470 Neighborhoods in Oslo, Norway. American Journal of
Epidemiology, 165(4):435–443, Feb. 2007. doi:10.1093/aje/kwk016. →
page 95
[173] Office of Legacy Management. Environmental Justice History. URL

https://www.energy.gov/lm/environmental-justice-history. → page 28
[174] R. A. Olea. Ordinary Kriging. In Geostatistics for Engineers and Earth

Scientists, pages 39–65. Springer US, Boston, MA, 1999. ISBN
978-1-4615-5001-3. doi:10.1007/978-1-4615-5001-3 4. → pages 127, 158
184
[175] M. S. O’Neill, M. Jerrett, I. Kawachi, J. I. Levy, A. J. Cohen, N. Gouveia,
P. Wilkinson, T. Fletcher, L. Cifuentes, J. Schwartz, and Workshop on Air
Pollution and Socioeconomic Conditions. Health, wealth, and air pollution:
advancing theory and methods. Environmental Health Perspectives, 111
(16):1861–1870, Dec. 2003. doi:10.1289/ehp.6334. → pages 28, 117
[176] S. J. Pai, T. S. Carter, C. L. Heald, and J. H. Kroll. Updated World Health
Organization Air Quality Guidelines Highlight the Importance of
Non-anthropogenic PM 2.5 . Environmental Science & Technology Letters, 9
(6):501–506, June 2022. doi:10.1021/acs.estlett.2c00203. → page 9
[177] K. Pan and I. C. Faloona. The impacts of wildfires on ozone production
and boundary layer dynamics in California’s Central Valley. Atmospheric
Chemistry and Physics, 22(14):9681–9702, 2022.
doi:10.5194/acp-22-9681-2022. → pages xx, 8
[178] X. Pang, M. D. Shaw, A. C. Lewis, L. J. Carpenter, and T. Batchellier.

Electrochemical ozone sensors: A miniaturised alternative for ozone
measurements in laboratory experiments and air-quality monitoring.
Sensors and Actuators B: Chemical, 240:829–837, Mar. 2017.
doi:10.1016/j.snb.2016.09.020. → pages 16, 18, 122
[179] S. Park, S. Lee, M. Yeo, and D. Rim. Field and laboratory evaluation of
PurpleAir low-cost aerosol sensors in monitoring indoor airborne particles.
Building and Environment, 234:110127, 2023.
doi:https://doi.org/10.1016/j.buildenv.2023.110127. → page 66
[180] Y. Park, B. Kwon, J. Heo, X. Hu, Y. Liu, and T. Moon. Estimating PM2.5
concentration of the conterminous United States via interpretable
convolutional neural networks. Environmental Pollution, 256:113395, Jan.
2020. doi:10.1016/j.envpol.2019.113395. → page 24
[181] A. P. Patton, W. Zamore, E. N. Naumova, J. I. Levy, D. Brugge, and J. L.
Durant. Transferability and generalizability of regression models of
ultrafine particles in urban neighborhoods in the boston area.
Environmental Science and Technology, 49(10):6051–6060, May 2015.
doi:10.1021/es5061676. → pages 24, 71
[182] R. Piedrahita, Y. Xiang, N. Masson, J. Ortega, A. Collier, Y. Jiang, K. Li,

R. P. Dick, Q. Lv, M. Hannigan, and L. Shang. The next generation of
low-cost personal air quality sensors for quantitative exposure monitoring.
Atmospheric Measurement Techniques, 7(10):3325–3336, Oct. 2014.
doi:10.5194/amt-7-3325-2014. → pages 12, 35, 70, 96, 118
185
[183] L. Pinault, D. Crouse, M. Jerrett, M. Brauer, and M. Tjepkema. Spatial
associations between socioeconomic groups and NO2 air pollution
exposure within three large Canadian cities. Environmental Research, 147:
373–382, May 2016. doi:10.1016/j.envres.2016.02.033. → pages
29, 117, 135
[184] C. A. Pope, M. Ezzati, and D. W. Dockery. Fine-Particulate Air Pollution

and Life Expectancy in the United States. New England Journal of
Medicine, 360(4):376–386, Jan. 2009. doi:10.1056/NEJMsa0805646. →
pages 9, 148
[185] C. A. Pope III. Lung Cancer, Cardiopulmonary Mortality, and Long-term

Exposure to Fine Particulate Air Pollution. Journal of the American
Medical Association, 287(9):1132, Mar. 2002.
doi:10.1001/jama.287.9.1132. → pages 94, 116
[186] K. Poplawski, T. Gould, E. Setton, R. Allen, J. Su, T. Larson,

S. Henderson, M. Brauer, P. Hystad, C. Lightowlers, P. Keller, M. Cohen,
C. Silva, and M. Buzzelli. Intercity transferability of land use regression
models for estimating ambient concentrations of nitrogen dioxide. Journal
of Exposure Science and Environmental Epidemiology, 19(1):107–117,
2009. doi:10.1038/jes.2008.15. Place: United States Publisher: Nature
Publishing Group. → pages 24, 71
[187] O. A. Popoola, G. B. Stewart, M. I. Mead, and R. L. Jones. Development of

a baseline-temperature correction methodology for electrochemical sensors
and its implications for long-term stability. Atmospheric Environment, 147:
330–343, Dec. 2016. doi:10.1016/j.atmosenv.2016.10.024. → pages 2, 16
[188] P. S. Prathibha. Hyperlocal Air Quality Exposure Assessment to Support

Health Studies. PhD thesis, ProQuest Dissertations Publishing, 2021. URL
https://www.proquest.com/dissertations-theses/
hyperlocal-air-quality-exposure-assessment/docview/2623379179/se-2.
→ page 152
[189] PurpleAir. Retrieved from https://api.openaq.org, 2023. URL

https://explore.openaq.org. → pages 30, 36
[190] N. Puttaswamy, V. Sreekanth, A. Pillarisetti, A. R. Upadhya, S. Saidam,

B. Veerappan, K. Mukhopadhyay, S. Sambandam, R. Sutaria, and
K. Balakrishnan. Indoor and Ambient Air Pollution in Chennai, India
during COVID-19 Lockdown: An Affordable Sensors Study. Aerosol and
186
Air Quality Research, 22(1):210170, 2022. doi:10.4209/aaqr.210170. →
pages 18, 36, 208
[191] M. Rao, L. A. George, V. Shandas, and T. N. Rosenstiel. Assessing the

potential of land use modification to mitigate ambient NO2 and its
consequences for respiratory health. International Journal of
Environmental Research and Public Health, 14(7):750, July 2017.
doi:10.3390/ijerph14070750. → pages 23, 24, 87, 152, 231
[192] V. Rao and W. Vizuete. Detection and evaluation of airborne particulates.

In Particulates Matter, pages 95–110. Elsevier, 2021. ISBN
978-0-12-816904-9. doi:10.1016/B978-0-12-816904-9.00010-6. → pages
13, 38
[193] K. Ravindra, T. Singh, V. Pandey, and S. Mor. Air pollution trend in

Chandigarh city situated in Indo-Gangetic Plains: Understanding
seasonality and impact of mitigation strategies. Science of The Total
Environment, 729:138717, Aug. 2020.
doi:10.1016/j.scitotenv.2020.138717. → pages 46, 49
[194] D. Roberts–Semple, F. Song, and Y. Gao. Seasonal characteristics of

ambient nitrogen oxides and ground–level ozone in metropolitan
northeastern New Jersey. Atmospheric Pollution Research, 3(2):247–257,
Apr. 2012. doi:10.5094/APR.2012.027. → page 136
[195] P. H. Ryan and G. K. LeMasters. A Review of Land-use Regression

Models for Characterizing Intraurban Air Pollution Exposure. Inhalation
Toxicology, 19(sup1):127–133, Jan. 2007.
doi:10.1080/08958370701495998. → pages 23, 70
[196] K. Sabaliauskas, C.-H. Jeong, X. Yao, and G. J. Evans. The application of

wavelet decomposition to quantify the local and regional sources of
ultrafine particles in cities. Atmospheric Environment, 95:249–257, Oct.
2014. doi:10.1016/j.atmosenv.2014.05.035. → pages 23, 218
[197] J. L. Sadd, M. Pastor, R. Morello-Frosch, J. Scoggins, and B. Jesdale.

Playing It Safe: Assessing Cumulative Impact and Social Vulnerability
through an Environmental Justice Screening Method in the South Coast Air
Basin, California. International Journal of Environmental Research and
Public Health, 8(5):1441–1459, May 2011. ISSN 1660-4601.
doi:10.3390/ijerph8051441. → page 22
187
[198] T. Sahsuvaroglu, J. G. Su, J. Brook, R. Burnett, M. Loeb, and M. Jerrett.
Predicting Personal Nitrogen Dioxide Exposure in an Elderly Population:
Integrating Residential Indoor and Outdoor Measurements, Fixed-Site
Ambient Pollution Concentrations, Modeled Pollutant Levels, and
Time–Activity Patterns. Journal of Toxicology and Environmental Health,
Part A, 72(23):1520–1533, Oct. 2009. doi:10.1080/15287390903129408.
→ page 22
[199] T. Sayahi, A. Butterfield, and K. Kelly. Long-term field evaluation of the

Plantower PMS low-cost particulate matter sensors. Environmental
Pollution, 245:932–940, Feb. 2019. doi:10.1016/j.envpol.2018.11.065. →
pages 15, 38, 55, 153
[200] J. H. Seinfeld and S. N. Pandis. Atmospheric chemistry and physics: from

air pollution to climate change. Wiley-Interscience, Hoboken, N.J, 3rd
edition, Apr. 2016. ISBN 978-1-118-94740-1. → pages 45, 131
[201] E. Setton, J. D. Marshall, M. Brauer, K. R. Lundquist, P. Hystad, P. Keller,

and D. Cloutier-Fisher. The impact of daily mobility on exposure to
traffic-related air pollution and health effect estimates. Journal of Exposure
Science & Environmental Epidemiology, 21(1):42–48, Jan. 2011.
doi:10.1038/jes.2010.14. → pages 26, 95
[202] E. M. Setton, C. P. Keller, D. Cloutier-Fisher, and P. W. Hystad. Spatial

variations in estimated chronic exposure to traffic-related air pollution in
working populations: A simulation. International Journal of Health
Geographics, 7(1):39, 2008. doi:10.1186/1476-072X-7-39. → page 135
[203] K. M. Shakya, P. Kremer, K. Henderson, M. McMahon, R. E. Peltier,

S. Bromberg, and J. Stewart. Mobile monitoring of air and noise pollution
in Philadelphia neighborhoods during summer 2017. Environmental
Pollution, 255:113195, Dec. 2019. doi:10.1016/j.envpol.2019.113195. →
pages 139, 143
[204] Y. Shi, K. K.-L. Lau, and E. Ng. Developing Street-Level PM 2.5 and PM 10
Land Use Regression Models in High-Density Hong Kong with Urban
Morphological Factors. Environmental Science & Technology, 50(15):
8178–8187, Aug. 2016. doi:10.1021/acs.est.6b01807. → page 27
[205] R. Shrestha, J. Flacke, J. Martinez, and M. Van Maarseveen.

Environmental Health Related Socio-Spatial Inequalities: Identifying
“Hotspots” of Environmental Burdens and Social Vulnerability.
188
International Journal of Environmental Research and Public Health, 13(7):
691, July 2016. doi:10.3390/ijerph13070691. → pages 27, 129, 136
[206] M. Si, Y. Xiong, S. Du, and K. Du. Evaluation and Calibration of a

Low-cost Particle Sensor in Ambient Conditions Using Machine Learning
Technologies. Atmospheric Measurement Techniques, 13(4):1693–1707,
2020. doi:10.5194/amt-2019-393. → pages 12, 20, 35, 118
[207] D. Sinaga, W. Setyawati, F. Y. Cheng, and S.-C. C. Lung. Investigation on

daily exposure to PM2.5 in Bandung city, Indonesia using low-cost sensor.
Journal of Exposure Science & Environmental Epidemiology, 30(6):
1001–1012, 2020. doi:10.1038/s41370-020-0256-9. → page 148
[208] T. Singh, K. Ravindra, G. Beig, and S. Mor. Influence of agricultural

activities on atmospheric pollution during post-monsoon harvesting
seasons at a rural location of Indo-Gangetic Plain. Science of The Total
Environment, 796:148903, Nov. 2021.
doi:10.1016/j.scitotenv.2021.148903. → page 49
[209] K. R. Smith, P. M. Edwards, M. J. Evans, J. D. Lee, M. D. Shaw,

F. Squires, S. Wilde, and A. C. Lewis. Clustering approaches to improve
the performance of low cost air pollution sensors. Faraday Discussions,
200:621–637, 2017. doi:10.1039/C7FD00020K. → page 17
[210] E. G. Snyder, T. H. Watkins, P. A. Solomon, E. D. Thoma, R. W. Williams,

G. S. W. Hagler, D. Shelow, D. A. Hindin, V. J. Kilaru, and P. W. Preuss.
The Changing Paradigm of Air Pollution Monitoring. Environmental
Science & Technology, 47(20):11369–11377, Oct. 2013.
doi:10.1021/es4022602. → pages 12, 35, 70, 96, 118
[211] South Coast AQMD. Air Quality Sensor Performance Evaluation Center
(AQ-SPEC). URL http://www.aqmd.gov/aq-spec. → pages 150, 156
[212] L. Spinelle, M. Gerboles, M. G. Villani, M. Aleixandre, and

F. Bonavitacola. Field calibration of a cluster of low-cost available sensors
for air quality monitoring. Part A: Ozone and nitrogen dioxide. Sensors
and Actuators B: Chemical, 215:249–257, Aug. 2015.
doi:10.1016/j.snb.2015.03.031. → pages 17, 18, 21, 149, 155, 156
[213] M. Stafoggia, T. Bellander, S. Bucci, M. Davoli, K. d. Hoogh, F. d. Donato,

C. Gariazzo, A. Lyapustin, P. Michelozzi, M. Renzi, M. Scortichini,
A. Shtein, G. Viegi, I. Kloog, and J. Schwartz. Estimation of daily PM10
and PM2.5 concentrations in Italy, 2013–2015, using a spatiotemporal
189
land-use random-forest model. Environment International, 124:170 – 179,
2019. doi:https://doi.org/10.1016/j.envint.2019.01.016. → pages
23, 84, 232
[214] Statistics Canada. Population Density (Canada Census 2021), 2021. URL
https://censusmapper.ca/#13/49.2755/-123.0614. → pages xxix, 245
[215] J. R. Stetter and J. Li. Amperometric Gas SensorsA Review. Chemical

Reviews, 108(2):352–366, Feb. 2008. doi:10.1021/cr0681039. → page 14
[216] Strathcona Residents Association. Air Quality Monitoring Project

Community Survey Report. Technical report, May 2021. URL
https://strathcona-residents.org/wp-content/uploads/2021/06/
Community-Survey-Report.pdf. → pages 119, 121
[217] J. G. Su, R. Morello-Frosch, B. M. Jesdale, A. D. Kyle, B. Shamasunder,

and M. Jerrett. An Index for Assessing Demographic Inequalities in
Cumulative Environmental Hazards with Application to Los Angeles,
California. Environmental Science & Technology, 43(20):7626–7634, Oct.
2009. doi:10.1021/es901041p. → pages 27, 28, 117, 128, 136
[218] V. Svetnik, A. Liaw, C. Tong, and T. Wang. Application of Breiman’s

Random Forest to Modeling Structure-Activity Relationships of
Pharmaceutical Molecules. In Multiple Classifier Systems, pages 334–343,
Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
doi:https://doi.org/10.1007/978-3-540-25966-4 33. → page 78
[219] Y. Tan, T. R. Dallmann, A. L. Robinson, and A. A. Presto. Application of

plume analysis to build land use regression models from mobile sampling
to improve model transferability. Atmospheric Environment, 134:51–60,
2016. doi:https://doi.org/10.1016/j.atmosenv.2016.03.032. → page 70
[220] R. Tanzer, C. Malings, A. Hauryliuk, R. Subramanian, and A. A. Presto.

Demonstration of a Low-Cost Multi-Pollutant Network to Quantify
Intra-Urban Spatial Variations in Air Pollutant Source Impacts and to
Evaluate Environmental Justice. International Journal of Environmental
Research and Public Health, 16(14):2523, July 2019.
doi:10.3390/ijerph16142523. → pages 73, 152
[221] O. Tchepel and C. Borrego. Frequency analysis of air quality time series
for traffic related pollutants. Journal of Environmental Monitoring, 12(2):
544–550, 2010. ISSN 1464-0325, 1464-0333. doi:10.1039/B913797A. →
page 44
190
[222] M. A. Tekindal, B. D. Erdoğan, and Y. Yavuz. Evaluating Left-Censored
Data Through Substitution, Parametric, Semi-parametric, and
Nonparametric Methods: A Simulation Study. Interdisciplinary Sciences:
Computational Life Sciences, 9(2):153–172, 2017.
doi:10.1007/s12539-015-0132-9. → pages 97, 236
[223] C. L. Townsend. Effects on health of prolonged exposure to low

concentrations of carbon monoxide. Occupational and Environmental
Medicine, 59(10):708–711, Oct. 2002. doi:10.1136/oem.59.10.708. →
page 9
[224] J. Tryner, C. L’Orange, J. Mehaffy, D. Miller-Lionberg, J. C. Hofstetter,

A. Wilson, and J. Volckens. Laboratory evaluation of low-cost PurpleAir
PM monitors and in-field correction using co-located portable filter
samplers. Atmospheric Environment, 220:117067, Jan. 2020.
doi:10.1016/j.atmosenv.2019.117067. → pages 36, 42
[225] B. J. Tunno, K. N. Shields, P. Lioy, N. Chu, J. B. Kadane, B. Parmanto,

G. Pramana, J. Zora, C. Davidson, F. Holguin, and J. E. Clougherty.
Understanding intra-neighborhood patterns in PM2.5 and PM10 using
mobile monitoring in Braddock, PA. Environmental Health, 11(1):76, Dec.
2012. doi:10.1186/1476-069X-11-76. → pages 139, 143
[226] United States Census Bureau. QuickFacts: Allegheny County,

Pennsylvania. URL
https://www.census.gov/quickfacts/alleghenycountypennsylvania. → page
12
[227] United States Environmental Protection Agency. Air Data: Air Quality
Data Collected at Outdoor Monitors Across the US, . URL
https://www.epa.gov/outdoor-air-quality-data. → pages 40, 106
[228] United States Environmental Protection Agency. Air Sensor Toolbox, .

URL https://www.epa.gov/air-sensor-toolbox. → page 156
[229] United States Environmental Protection Agency. Review of National

Ambient Air Quality Standards for Carbon Monoxide. Technical Report
Federal Register 76: Vol 169, Aug. 2011. → pages 9, 10
[230] United States Environmental Protection Agency. America’s Children and

the Environment, Third Edition. Technical Report EPA 240-R-13-001, Jan.
2013. → pages 9, 10, 69, 84
191
[231] United States Environmental Protection Agency. EJ 2020 Action Agenda:
The U.S. EPA’s Enviornmental Justice Strategic Plan for 2016-2020.
Technical Report EPA-300-B-1-6004, Oct. 2016. URL
https://www.epa.gov/sites/default/files/2016-05/documents/
052216 ej 2020 strategic plan final 0.pdf. → page 28
[232] United States Environmental Protection Agency. Quality Assurance

Guidance Document 2.12: Monitoring PM2.5 in Ambient Air Using
Designated Reference or Class I Equivalent Methods. Technical Report
EPA-454/B-16-001, Jan. 2016. → page 47
[233] United States Environmental Protection Agency. Review of the Primary

National Ambient Air Quality Standards for Oxides of Nitrogen. Technical
Report Federal Register 83: Vol 75, Apr. 2018. → page 10
[234] United States Environmental Protection Agency. Integrated Science

Assessment (ISA) for Particulate Matter (Final Report, Dec 2019).
Technical Report EPA/600/R-19/188, Washington, D.C., 2019. → page 22
[235] United States Environmental Protection Agency. Policy Assessment for the
Review of the National Ambient Air Quality Standards for Particulate
Matter. Technical Report EPA–452/R–20–002, Research Triangle Park,
NC. U.S. Environmental Protection Agency, Office of Air Quality Planning
and Standards, Heath and Environmental Impacts Division., Jan. 2020. →
page 9
[236] United States Environmental Protection Agency. Review of the National

Ambient Air Quality Standards for Particulate Matter. Technical Report
Federal Register 85: Vol 244, Dec. 2020. → pages 9, 10
[237] United States Environmental Protection Agency. Review of the Ozone

National Ambient Air Quality Standards. Technical Report Federal
Register 85: Vol 251, Dec. 2020. → page 10
[238] University of British Columbia. Public Scholars Initiative. URL

https://www.grad.ubc.ca/psi. → page 121
[239] S. V, A. B. R, P. Kulkarni, N. Puttaswamy, V. Prabhu, P. Agrawal, A. R.

Upadhya, S. Rao, R. Sutaria, S. Mor, S. Dey, R. Khaiwal, K. Balakrishnan,
S. N. Tripathi, and P. Singh. Inter- versus Intracity Variations in the
Performance and Calibration of Low-Cost PM 2.5 Sensors: A Multicity
Assessment in India. ACS Earth and Space Chemistry, 6(12):3007–3016,
Dec. 2022. doi:10.1021/acsearthspacechem.2c00257. → pages 59, 208
192
[240] M. Vakili, S.-R. Sabbagh-Yazdi, K. Kalhor, and S. Khosrojerdi. Using
Artificial Neural Networks for Prediction of Global Solar Radiation in
Tehran Considering Particulate Matter Air Pollution. Energy Procedia, 74:
1205–1212, Aug. 2015. doi:10.1016/j.egypro.2015.07.764. → page 24
[241] F. Villa and H. McLEOD. Environmental Vulnerability Indicators for

Environmental Planning and Decision-Making: Guidelines and
Applications. Environmental Management, 29(3):335–348, Mar. 2002.
doi:10.1007/s00267-001-0030-2. → page 129
[242] L. Wallace, J. Bi, W. R. Ott, J. Sarnat, and Y. Liu. Calibration of low-cost

PurpleAir outdoor monitors using an improved method of calculating PM.
Atmospheric Environment, 256:118432, July 2021.
doi:10.1016/j.atmosenv.2021.118432. → pages 39, 55, 149
[243] K. Wang, F.-e. Chen, W. Au, Z. Zhao, and Z.-l. Xia. Evaluating the
feasibility of a personal particle exposure monitor in outdoor and indoor
microenvironments in Shanghai, China. International Journal of
Environmental Health Research, 29(2):209–220, Mar. 2019.
doi:10.1080/09603123.2018.1533531. → page 208
[244] M. Wang, R. Beelen, X. Basagana, T. Becker, G. Cesaroni, K. de Hoogh,

A. Dedele, C. Declercq, K. Dimakopoulou, M. Eeftens, F. Forastiere,
C. Galassi, R. Gražulevičienė, B. Hoffmann, J. Heinrich, M. Iakovides,
N. Künzli, M. Korek, S. Lindley, A. Mölter, G. Mosler, C. Madsen,
M. Nieuwenhuijsen, H. Phuleria, X. Pedeli, O. Raaschou-Nielsen,
A. Ranzi, E. Stephanou, D. Sugiri, M. Stempfelet, M.-Y. Tsai, T. Lanki,
O. Udvardy, M. J. Varró, K. Wolf, G. Weinmayr, T. Yli-Tuomi, G. Hoek,
and B. Brunekreef. Evaluation of Land Use Regression Models for NO2
and Particulate Matter in 20 European Study Areas: The ESCAPE Project.
Environmental Science & Technology, 47(9):4357–4364, May 2013.
doi:10.1021/es305129t. → pages 1, 25, 70, 94, 117
[245] R. Wang, S. B. Henderson, H. Sbihi, R. W. Allen, and M. Brauer. Temporal

stability of land use regression models for traffic-related air pollution.
Atmospheric Environment, 64:312–319, Jan. 2013.
doi:10.1016/j.atmosenv.2012.09.056. → pages 135, 136
[246] G. L. Watson, D. Telesca, C. E. Reid, G. G. Pfister, and M. Jerrett. Machine

learning models accurately predict ozone exposure during wildfire events.
Environmental Pollution, 254:112792, Nov. 2019.
doi:10.1016/j.envpol.2019.06.088. → pages 72, 97
193
[247] P. Wei, Z. Ning, S. Ye, L. Sun, F. Yang, K. Wong, D. Westerdahl, and
P. Louie. Impact Analysis of Temperature and Humidity Conditions on
Electrochemical Sensor Response in Ambient Air Quality Monitoring.
Sensors, 18(2):59, Jan. 2018. doi:10.3390/s18020059. → page 17
[248] S. Weichenthal, K. Van Ryswyk, A. Goldstein, M. Shekarrizfard, and

M. Hatzopoulou. Characterizing the spatial distribution of ambient
ultrafine particles in Toronto, Canada: A land use regression model.
Environmental Pollution, 208(Pt A):241–248, 2016.
doi:10.1016/j.envpol.2015.04.011. → pages 23, 70
[249] S. M. Wiebe. Everyday Exposure: Indigenous Mobilization and

Environmental Justice in Canada’s Chemical Valley. Vancouver, BC: UBC
Press, 2016. ISBN 978-0-7748-3263-2 (hardback). → page 117
[250] D. E. Williams, G. S. Henshaw, M. Bart, G. Laing, J. Wagner, S. Naisbitt,

and J. A. Salmond. Validation of low-cost ozone measurement instruments
suitable for use in an air-quality monitoring network. Measurement Science
and Technology, 24(6):065803, June 2013.
doi:10.1088/0957-0233/24/6/065803. → page 16
[251] R. Williams, V. Kilaru, E. Snyder, A. Kaufman, T. Dye, A. Rutter,

A. Russell, and H. Hafner. Air Sensor Guidebook. Technical Report
EPA/600/R-14/159, June 2014. → pages xviii, 18, 130, 157
[252] W. L. Winston. Simulation Modeling Using @RISK. Duxbury Press, 2000.

ISBN 0-534-38059-X. → page 100
[253] World Health Organization. Ambient Air Pollution: A global assessment of

exposure and burden of disease. Technical report, 2016. URL
https://apps.who.int/iris/handle/10665/250141. → pages 35, 36, 69
[254] World Health Organization. WHO Global Air Quality Guidelines.

Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur
dioxide and carbon monoxide. Technical report, Geneva, 2021. Licence:
CC BY-NC-SA 3.0 IGO. → pages 1, 6, 10
[255] World Health Organization. Ambient (outdoor) air pollution. Dec. 2022.
URL https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)
-air-quality-and-health. → page 9
[256] C.-D. Wu, Y.-T. Zeng, and S.-C. C. Lung. A hybrid kriging/land-use
regression model to assess PM2.5 spatial-temporal variability. Science of
194
The Total Environment, 645:1456–1464, Dec. 2018.
doi:10.1016/j.scitotenv.2018.07.073. → page 25
[257] T.-G. Wu, Y.-D. Chen, B.-H. Chen, K. H. Harada, K. Lee, F. Deng, M. J.
Rood, C.-C. Chen, C.-T. Tran, K.-L. Chien, T.-H. Wen, and C.-F. Wu.
Identifying low-PM2.5 exposure commuting routes for cyclists through
modeling with the random forest algorithm based on low-cost sensor
measurements in three Asian cities. Environmental Pollution, 294:118597,
Feb. 2022. doi:10.1016/j.envpol.2021.118597. → page 148
[258] Y. Wu, J. Hao, L. Fu, Z. Wang, and U. Tang. Vertical and horizontal
profiles of airborne particulate matter near major roads in Macao, China.
Atmospheric Environment, 36(31):4907–4918, Oct. 2002.
doi:10.1016/S1352-2310(02)00467-3. → page 131
[259] Y. Xia and H. Tong. Cumulative effects of air pollution on public health.
Statistics in Medicine, 25(20):3548–3559, Oct. 2006.
doi:10.1002/sim.2446. → page 116
[260] Y. Xu, S. Jiang, R. Li, J. Zhang, J. Zhao, S. Abbar, and M. C. González.

Unraveling environmental justice in ambient PM2.5 exposure in Beijing: A
big data approach. Computers, Environment and Urban Systems, 75:12–21,
May 2019. doi:10.1016/j.compenvurbsys.2018.12.006. → pages
22, 126, 127
[261] X. Yang, Y. Zheng, G. Geng, H. Liu, H. Man, Z. Lv, K. He, and

K. de Hoogh. Development of PM2.5 and NO2 models in a LUR
framework incorporating satellite remote sensing and air quality model data
in Pearl River Delta region, China. Environmental Pollution, 226:143–153,
July 2017. doi:10.1016/j.envpol.2017.03.079. → pages 23, 231, 232
[262] L. Yao, N. Lu, and S. Jiang. Artificial Neural Network (ANN) for
Multi-source PM2.5 Estimation Using Surface, MODIS, and
Meteorological Data. In 2012 International Conference on Biomedical
Engineering and Biotechnology, pages 1228–1231, Macau, Macao, May
2012. IEEE. doi:10.1109/iCBEB.2012.81. → pages 24, 84, 232
[263] E. Yoo, C. Rudra, M. Glasgow, and L. Mu. Geospatial Estimation of

Individual Exposure to Air Pollutants: Moving from Static Monitoring to
Activity-Based Dynamic Exposure Assessment. Annals of the Association
of American Geographers, 105(5):915–926, Sept. 2015.
doi:10.1080/00045608.2015.1054253. → page 26
195
[264] X. Yu, C. Ivey, Z. Huang, S. Gurram, V. Sivaraman, H. Shen, N. Eluru,
S. Hasan, L. Henneman, G. Shi, H. Zhang, H. Yu, and J. Zheng.
Quantifying the impact of daily mobility on errors in air pollution exposure
estimation using mobile phone location data. Environment International,
141:105772, Aug. 2020. doi:10.1016/j.envint.2020.105772. → pages
26, 95
[265] A. Zanobetti and J. Schwartz. The Effect of Fine and Coarse Particulate
Air Pollution on Mortality: A National Analysis. Environmental Health
Perspectives, 117(6):898–903, June 2009. doi:10.1289/ehp.0800108. →
page 94
[266] T. Zheng, M. H. Bergin, K. K. Johnson, S. N. Tripathi, S. Shirodkar, M. S.
Landis, R. Sutaria, and D. E. Carlson. Field evaluation of low-cost
particulate matter sensors in high- and low-concentration environments.
Atmospheric Measurement Techniques, 11(8):4823–4846, Aug. 2018.
doi:10.5194/amt-11-4823-2018. → page 36
[267] T. Zheng, M. H. Bergin, R. Sutaria, S. N. Tripathi, R. Caldow, and D. E.

Carlson. Gaussian Process regression model for dynamically calibrating a
wireless low-cost particulate matter sensor network in Delhi. Atmospheric
Measurement Techniques, Sept. 2019. doi:amt-12-5161-2019. → page 208
[268] T. Zheng, M. H. Bergin, S. Hu, J. Miller, and D. E. Carlson. Estimating
ground-level PM2.5 using micro-satellite images by a convolutional neural
network and random forest approach. Atmospheric Environment, 230:
117451, June 2020. doi:10.1016/j.atmosenv.2020.117451. → page 24
[269] P. Zhou, B. Ang, and K. Poh. Comparing aggregating methods for
constructing the composite environmental index: An objective measure.
Ecological Economics, 59(3):305–311, Sept. 2006.
doi:10.1016/j.ecolecon.2005.10.018. → pages 28, 128
[270] N. Zimmerman. Tutorial: Guidelines for implementing low-cost sensor

networks for aerosol monitoring. Journal of Aerosol Science, 159:105872,
Jan. 2022. doi:10.1016/j.jaerosci.2021.105872. → pages 19, 46
[271] N. Zimmerman, A. A. Presto, S. P. N. Kumar, J. Gu, A. Hauryliuk, E. S.
Robinson, A. L. Robinson, and R. Subramanian. A machine learning
calibration model using random forests to improve sensor performance for
lower-cost air quality monitoring. Atmospheric Measurement Techniques,
11(1):291–313, Jan. 2018. doi:10.5194/amt-11-291-2018. → pages
12, 13, 14, 15, 16, 17, 21, 35, 36, 70, 73, 87, 96, 126, 149, 209
196
[272] N. Zimmerman, H. Z. Li, A. Ellis, A. Hauryliuk, E. S. Robinson, P. Gu,
R. U. Shah, Q. Ye, L. Snell, R. Subramanian, A. L. Robinson, J. S. Apte,
and A. A. Presto. Improving Correlations between Land Use and Air
Pollutant Concentrations Using Wavelet Analysis: Insights from a
Low-cost Sensor Network. Aerosol and Air Quality Research, 20(2):
314–328, 2020. doi:10.4209/aaqr.2019.03.0124. → pages
71, 73, 75, 97, 209
[273] M. Zusman, C. S. Schumacher, A. J. Gassett, E. W. Spalt, E. Austin, T. V.

Larson, G. Carvlin, E. Seto, J. D. Kaufman, and L. Sheppard. Calibration
of low-cost particulate matter sensors: Model development for a multi-city
epidemiological study. Environment International, 134:105329, Jan. 2020.
doi:10.1016/j.envint.2019.105329. → pages
xxi, 12, 20, 36, 37, 58, 63, 64, 65, 208
197
Appendix A
Supplementary Information:
Chapter 3
A.1 Calculating ALT Concentrations

This section is recreated using information provided by PurpleAir, subjected to
PurpleAir fair use policies [4].
Plantower output provides the total number of particles in six overlapping size
categories: > 0.3 µm, > 0.5 µm, > 1 µm, > 2.5 µm, > 5 µm and > 10 µm.
Steps to calculate ALT concentrations:
1. Calculate the total particle number N (per deciliter) in the following size
categories: 0.3-0.5 µm, 0.5-1 µm, 1-2.5 µm.
2. Select the geometric mean in each size category to represent the bin.
3. Calculate total volume V using the equation A.1.
198
N3
V= (A.1)
6
4. Calculate the mass by multiplying the volume with density of water (Equa-
tion A.2).
M = ρV (A.2)
5. Add the three mass concentrations to estimate total PM2.5 concentrations.
6. Multiply by a calibration factor (CF) of 3.
A.2 RollingBall Algorithm

The rolling ball method involves tracing the topmost point of a ball as it is rolled
under the total concentration in a line graph, Figure A.2 contains a visual represen-
tation of the method. The method requires two parameters to be defined: width of
local window, wm, and width of local window for smoothing, ws. To establish the
baseline concentration, the dataset is divided into separate datasets, each of which
is equal in size to wm. The algorithm then identifies the minimum points of each
individual dataset, and performs linear interpolation between these points. This
process is repeated two more times, with the dataset shifted by 1/3 of the window
width [98].
Smoothing is applied by repeating the process ws times. With each iteration,
the wm increases by h ∗ wm, with h ranging from 1 to ws. The outcomes from all
the iterations, until h=ws, are averaged to give final results. Points with baseline
values greater than the total concentration is then replaced with total concentration
199
60 (a) Standard
50
PM2.5 (μg/m )
3
40
30
20
10
0
60 (b) Baseline Separation
50
PM2.5 (μg/m )
3
40
30
20
10
0
60 (c) Estimated Regional
50
PM2.5 (μg/m )
3
40
30
20
10
0
60 (d) Local
50
PM2.5 (μg/m )
3
40
30
20
10
Figure A.1: Visual representation of the rolling ball technique.
200
value.
A.3 Rolling Ball Model Input
Table A.1: Average values for the ratio of estimated baselines to total con-
centrations for all cities. The selected model input (wm) is in bold font.
wm (Hours) Season 4 5 6 7 8 9 10 11 12 13
Bengaluru 0.76 0.74 0.73 0.70 0.69 0.66 0.65 0.63 0.62 0.62
Chico 0.69 0.67 0.65 0.64 0.62 0.6 0.59 0.58 0.57 0.56
Kathmandu 0.80 0.78 0.77 0.76 0.74 0.71 0.68 0.66 0.65 0.65
Summer 0.79 0.77 0.74 0.71 0.68 0.66 0.63 0.62 0.61 0.60
Delhi
Winter 0.81 0.78 0.75 0.72 0.67 0.63 0.60 0.59 0.58 0.57
Summer 0.75 0.73 0.72 0.69 0.66 0.65 0.63 0.63 0.63 0.62
Lahore
Winter 0.75 0.73 0.72 0.69 0.65 0.63 0.61 0.60 0.60 0.59
A.4 Regression Model Selection
Table A.2: Average model RMSE (µg m−3 ) across all testing PAs. Text in
bold is the lowest RMSE model for the city.
Model Bengaluru Chico Delhi Kathmandu Lahore Mean Median

PM 5.97 5.67 18.98 6.81 16.07 10.70 6.81
PM+T 6.21 5.64 19.57 8.63 15.23 11.06 8.63
PM+RH 6.61 5.76 19.92 6.30 14.84 10.68 6.61
PM+DP 6.97 5.80 20.08 8.77 17.08 11.74 8.77
PM+T+RH 6.88 6.54 19.89 8.89 16.04 11.65 8.89
PM+T+DP 7.01 5.92 21.30 8.47 16.28 11.80 8.47
PM+RH+DP 6.97 6.20 20.87 8.50 15.98 11.70 8.50
PM+T+RH+DP 6.97 6.64 20.93 8.48 16.00 11.80 8.48
201
A.5 Intra-city Model Performances
Table A.3: Model Performances for Chico.
Uncalibrated ATM Signal Calibrated ATM Signal

Testing Period Sensor ID
CPA1 0.94 19.70 57% 0.97 9.38 27%
2020 CPA2 0.94 20.36 59% 0.97 9.10 26%
CPA3 0.86 21.87 63% 0.90 10.67 31%
CPA1 0.95 16.39 65% 0.96 7.86 31%
2021 CPA2 0.95 17.29 69% 0.96 8.12 32%
CPA3 0.88 16.48 66% 0.93 6.91 27%
Table A.4: Model Performances for Kathmandu.
Testing Period Sensor ID Uncalibrated ATM Signal Calibrated ATM Signal

June KPA2 0.34 12.20 44% 0.39 3.15 11%
July KPA2 0.48 10.36 93% 0.43 3.01 27%
August KPA2 0.26 9.22 103% 0.21 3.18 36%
September KPA2 0.33 12.34 100% 0.38 2.99 24%
October KPA2 0.81 12.71 80% 0.92 3.13 20%
November KPA2 0.32 22.71 69% 0.42 9.59 29%
202
Table A.5: Model Performances for Bengaluru.

BPA1 0.47 17.62 73% 0.55 7.16 30%
January
BPA2 0.25 13.18 54% 0.26 7.17 30%
BPA1 0.74 15.38 51% 0.72 4.80 16%
Februrary
BPA2 0.75 8.37 28% 0.71 6.28 21%
BPA1 0.79 15.65 47% 0.85 3.58 11%
March
BPA2 0.84 7.43 22% 0.86 6.15 18%
BPA1 0.71 13.19 45% 0.78 5.27 18%
April
BPA2 0.76 7.68 26% 0.85 5.78 20%
BPA1 0.45 13.76 75% 0.57 5.17 28%
May
BPA2 0.34 6.95 38% 0.50 4.52 24%
BPA1 0.41 7.97 88% 0.41 4.42 49%
June
BPA2 0.25 4.32 48% 0.25 3.68 41%
BPA1 0.43 6.76 61% 0.50 3.93 35%
July
BPA2 0.56 3.50 31% 0.66 4.47 40%
BPA1 0.42 7.39 73% 0.60 3.24 32%
August
BPA2 0.38 4.18 41% 0.56 4.05 40%
BPA1 0.59 8.56 69% 0.73 3.18 26%
September
BPA2 0.34 5.25 43% 0.50 3.48 28%
BPA1 0.71 22.61 104% 0.90 6.14 30%
October
BPA2 0.84 7.81 36% 0.87 3.72 18%
BPA1 0.91 24.04 75% 0.94 5.64 18%
November
BPA2 0.90 8.08 25% 0.92 10.60 33%
BPA1 0.87 21.39 101% 0.91 7.59 36%
December
BPA2 0.87 8.98 43% 0.91 4.74 23%
203
Table A.6: Model Performances for Delhi.

< 100 µg m−3
DPA1 0.36 28.03 39% 0.58 13.86 19%
January DPA2 0.61 27.77 39% 0.70 57.98 81%
DPA3 0.25 24.51 34% 0.41 14.44 20%
DPA1 0.68 25.72 46% 0.57 11.89 21%
February DPA2 0.58 15.26 27% 0.62 36.93 67%
DPA3 0.68 19.82 36% 0.68 12.39 22%
DPA1 0.76 30.57 56% 0.74 11.02 20%
March DPA2 0.76 9.67 18% 0.83 24.44 45%
DPA3 0.61 24.94 46% 0.72 9.58 18%
DPA1 0.29 44.08 68% 0.41 17.14 26%
April DPA2 0.37 27.74 43% 0.50 13.42 21%
DPA3 0.18 40.49 62% 0.25 15.09 23%
DPA1 0.48 39.68 66% 0.59 19.60 33%
May DPA2 0.45 25.77 43% 0.54 12.74 21%
DPA3 0.37 35.49 59% 0.45 15.13 25%
DPA1 0.18 30.45 75% 0.59 10.84 27%
June DPA2 0.14 22.13 55% 0.43 11.96 30%
DPA3 0.12 27.23 67% 0.43 9.19 23%
DPA1 0.84 13.74 51% 0.85 7.70 29%
July DPA2 0.74 6.86 26% 0.77 17.33 65%
DPA3 0.56 11.29 42% 0.60 9.69 36%
DPA1 0.79 14.31 59% 0.82 11.29 47%
August DPA2 0.77 6.31 26% 0.81 9.72 40%
DPA3 0.66 11.58 47% 0.70 7.38 30%
DPA1 0.85 18.75 61% 0.89 11.83 38%
September DPA2 0.83 9.22 30% 0.86 11.04 36%
DPA3 0.78 14.48 47% 0.83 7.72 25%
DPA1 0.83 35.76 67% 0.91 22.15 42%
October DPA2 0.83 24.85 47% 0.90 14.01 26%
DPA3 0.82 22.91 43% 0.92 9.86 19%
DPA2 0.08 33.83 38% 0.12 10.87 12%
November
DPA3 0.08 41.69 47% 0.11 16.41 19%
> 100 µg m −3
DPA2 0.83 83.81 49% 0.82 30.71 18%

November
DPA3 0.69 88.78 52% 0.70 31.63 19%
December DPA2 0.49 62.36 37% 0.53 38.83 23%
204
Table A.7: Model Performances for Lahore.

LPA1 0.47 18.60 27% 0.49 16.56 24%
February
LPA2 0.41 18.92 28% 0.53 14.22 21%
LPA1 0.35 17.52 26% 0.36 16.13 23%
March
LPA2 0.49 20.27 30% 0.47 13.00 19%
LPA1 0.08 28.23 39% 0.04 13.90 19%
LPA2 0.27 33.44 46% 0.30 12.24 17%
April
LPA3 0.18 31.18 43% 0.24 11.48 16%
LPA4 0.16 28.06 39% 0.08 13.95 19%
LPA1 0.27 21.54 32% 0.54 14.21 21%
LPA2 0.37 30.16 45% 0.59 11.37 17%
May
LPA3 0.32 24.78 37% 0.50 10.86 16%
LPA4 0.42 23.94 36% 0.65 13.85 21%
LPA1 0.17 19.44 30% 0.37 14.33 22%
June LPA2 0.28 33.29 51% 0.57 12.61 19%
LPA4 0.24 28.40 43% 0.37 14.34 22%
LPA1 0.53 30.96 84% 0.62 14.09 38%
July LPA2 0.69 8.13 22% 0.77 7.80 21%
LPA4 0.73 11.87 32% 0.81 5.83 16%
LPA1 0.62 16.84 44% 0.64 7.55 20%
August LPA2 0.55 9.09 24% 0.60 7.77 20%
LPA4 0.58 10.41 27% 0.59 9.08 24%
LPA1 0.68 16.63 29% 0.69 16.67 29%
September LPA2 0.62 13.40 24% 0.65 12.12 21%
LPA4 0.69 13.11 23% 0.70 13.04 23%
LPA1 0.36 24.41 36% 0.35 13.81 20%
October LPA2 0.32 16.28 24% 0.43 12.64 18%
LPA4 0.43 21.99 32% 0.36 13.69 20%
LPA1 0.34 15.71 21% 0.58 15.72 21%
December
LPA4 0.39 36.37 49% 0.44 13.07 18%
205
Table A.8: Model Performances for Lahore.

> 100 µg m−3
LPA1 0.47 50.60 28% 0.53 47.12 26%
January LPA2 0.53 46.76 26% 0.63 39.88 22%
LPA3 0.48 49.87 27% 0.62 41.12 23%
LPA1 0.62 26.34 19% 0.62 26.34 19%
February LPA2 0.73 39.15 28% 0.76 23.20 16%
LPA3 0.67 34.15 24% 0.53 20.58 15%
LPA1 0.18 17.42 16% 0.18 26.83 25%
March LPA2 0.52 13.69 13% 0.47 20.79 19%
LPA3 0.24 16.46 15% 0.23 23.94 22%
LPA1 0.21 42.29 35% 0.31 18.20 15%
LPA2 0.06 60.43 50% 0.09 14.83 12%
April
LPA3 0.20 52.40 44% 0.33 11.80 10%
LPA4 0.23 49.49 41% 0.27 15.51 13%
LPA1 0.00 44.71 38% 0.43 8.39 7%
May
LPA2 0.01 56.87 49% 0.18 11.09 10%
LPA1 0.47 39.73 25% 0.57 29.55 18%
October
LPA2 0.62 39.19 24% 0.70 33.87 21%
LPA1 0.41 54.44 32% 0.49 38.10 22%
November
LPA4 0.53 45.00 26% 0.64 34.81 21%
LPA1 0.34 112.18 49% 0.31 103.93 45%
December
LPA4 0.60 53.48 23% 0.62 49.04 21%
206
A.6 Inter-City Model Performances
Table A.9: Performances of Inter-city Models. RMSE values are in µg m−3

and nRMSE values are in %.
ATM Raw Signal ATM Calibrated Signal ALT Raw Signal ALT Calibrated Signal
Testing Model
R2 RMSE nRMSE R2 RMSE nRMSE R2 RMSE nRMSE R2 RMSE nRMSE
< 100 µg m−3
California 0.81 15.71 75 0.87 5.03 23 0.83 9.69 46 0.88 5.19 24
BPA1
Kathmandu 0.81 15.71 75 0.84 5.70 26 0.83 9.69 46 0.90 5.54 26
California 0.83 7.63 36 0.85 7.85 36 0.85 6.30 30 0.85 6.43 30
BPA2
Kathmandu 0.83 7.63 36 0.83 8.18 38 0.85 6.30 30 0.87 5.55 26
Bengaluru 0.94 18.46 60 0.96 6.13 20 0.93 9.61 32 0.96 6.13 20
CPA1
Kathmandu 0.94 18.46 60 0.95 5.59 18 0.93 9.61 32 0.93 7.56 25
Bengaluru 0.94 19.20 62 0.95 6.25 20 0.93 10.31 34 0.95 6.25 20
CPA2
Kathmandu 0.94 19.20 62 0.95 5.46 18 0.93 10.31 34 0.94 7.32 24
Bengaluru 0.87 19.92 65 0.87 8.20 27 0.84 12.45 41 0.87 8.20 27
CPA3
Kathmandu 0.87 19.92 65 0.87 8.31 27 0.84 12.45 41 0.84 9.51 31
Bengaluru 0.84 16.66 54 0.89 5.86 19 0.81 6.76 22 0.86 5.25 17
KPA1
California 0.84 16.66 54 0.89 4.42 14 0.81 6.76 22 0.86 5.69 18
DPA1 Lahore 0.60 27.95 58 0.58 14.21 32 0.55 33.61 71 0.52 14.44 32
DPA2 Lahore 0.51 18.08 36 0.61 16.94 36 0.52 23.40 49 0.62 16.88 36
DPA3 Lahore 0.56 23.82 49 0.68 13.37 28 0.59 33.34 70 0.61 14.37 31
LPA1 Lahore 0.15 29.34 48 0.18 30.20 53 0.17 26.67 43 0.10 37.96 69
LPA2 Delhi 0.32 24.07 38 0.49 13.72 24 0.33 31.14 50 0.39 17.94 33
LPA3 Delhi 0.10 22.08 33 0.18 11.39 17 0.30 28.02 44 0.23 24.57 45
LPA4 Delhi 0.20 38.49 60 0.37 20.59 36 0.05 42.09 64 0.20 25.08 41
Average 0.68 19.95 54 0.73 10.39 27 0.68 17.29 42 0.70 12.10 30
> 100 µg m−3
DPA1 Lahore 0.72 61.83 41 0.40 40.83 27 0.67 66.57 44 0.64 29.63 20
DPA2 Lahore 0.19 62.71 41 0.16 45.53 30 0.16 74.22 49 0.12 47.87 32
DPA3 Lahore 0.59 69.43 46 0.56 44.27 29 0.70 83.96 55 0.67 45.06 30
LPA1 Delhi 0.54 42.25 27 0.59 82.99 54 0.03 30.01 45 0.55 50.09 33
LPA2 Delhi 0.67 45.37 30 0.73 52.88 34 0.22 36.68 55 0.66 56.80 37
LPA3 Delhi 0.64 42.40 28 0.66 70.99 46 0.15 33.92 51 0.61 46.87 30
LPA4 Delhi 0.65 43.42 28 0.73 88.52 58 0.01 47.23 71 0.66 45.05 29
Average 0.57 52.49 34 0.55 60.86 40 0.28 53.23 53 0.56 45.91 30
207
A.7 Comparison with Other Studies
Table A.10: Comparison with other studies. MAE and RMSE values are in
µg m−3 .
Study Location n Sensor T.R. Time Span Calibration R2 Error

3 Shinyei PPD42NS Hourly 18 months Regression 0.75 MNE: 41%
Bai et al. [18] Nanjing, China
3 Shinyei PPD42NS Hourly 18 months ANN 0.84 MNE: 29.7%
Becnel et al. [24] Salt Lake County, Utah 50 PMS7003 Hourly 6 months Regression 0.85-0.90 RMSE: 3.49-5.59
Bi et al. [27] California, USA 2090 PMS5003 Hourly 1 year Regression 0.86 NA
>6 PMS5003 Hourly 1 year Regression 0.82-0.95 nRMSE: 20-32%α
Campmier et al. [42] Multiple cities, India
>6 PMS5003 Hourly 1 year Regression 0.52-0.93 nRMSE: 24-71β
12 PMS7003 Hourly 3 months Regression 0.81 MAPE: 13%
Jha et al. [111] Mumbai, India
12 PMS7003 Hourly 3 months DNN 0.82 MAPE: 11.84%
9 PMS5003 Hourly 11 months Regression 0.52 MAE: 2.5
Malings et al. [142] Pittsburgh, USA
25 Met-One NPM Hourly 11 months Regression 0.49 MAE: 3.5
5 PMS5003 Hourly 3 days Regression 0.49 RMSE: 25.31
Malyan et al. [143] Mumbai, India
5 PMS5003 Hourly 3 days RF 0.75 RMSE: 20.73
5 PMS5003 Hourly 6 months Regression 0.9 MAE: 7.3
McFarlane et al. [148] Kampala, Uganda
5 PMS5003 Hourly 6 months RF 0.91 MAE: 7.2
Puttaswamy et al. [190] Tamil Nadu, India 5 PMS7003 Hourly 4 months Regression 0.87 nRMSE: 15%
Multiple cities, India 9 PMS7003 Hourly 45 days Regression 0.32-0.89 nRMSE: 14-23%
V et al. [239]
Mumbai, India 9 PMS7003 Hourly 45 days Regression NA NAα
Wang et al. [243] Shanghai 17 PMS7003 Hourly 2 days Regression 0.72-0.78 NA
Zheng et al. [267] Delhi, India 10 PMS7003 Daily 59 days GPR 0.58-0.81 nRMSE: 30%
26 PMS A003; Shinyei Daily 2.5 years Regression 0.74-0.95 RMSE: 1.67-2.46
Multiple cities, USA
PPD42NS
Zusman et al. [273]
Seattle, USA 26 PMS A003; Shinyei Daily 2.5 years Regression 0.67-0.80 RMSE: 0.84-3.41β
PPD42NS
Bengaluru, Chico, Delhi, 15 PMS5003 Hourly 1 year Regression 0.64 nRMSE: 26%α
Our Study
Kathmandu, Lahore 15 PMS5003 Hourly 1 year Regression 0.78 nRMSE: 25%β
TR: Time Resolution; ANN: Artificial Neural Networks; DNN: Deep Neural Networks; RF: Random Forests; GPR: Gaussian Process Regression
α: Intra-city model; β : Inter-city model
208
Appendix B
Chapter 4
B.1 QA/QC for the monitoring data (Zimmerman et al.,

2020 and Malings et al., 2019, 2020)
A detailed report on calibration models for RAMP monitors can be found in Zim-
merman et al. (2020) [272], Malings et al. (2019, 2020) [141, 142]; Zimmerman et
al. (2018) [271]. A subset of methodology and results from Malings et al. (2019,
2020) [141, 142] is paraphrased below. All methodologies, results, figures and
tables in this section are the intellectual property of Carl Malings and have been
paraphrased or reproduced here with his permission.
B.1.1 Gas Sensor Calibrations – paraphrased from Malings et al.

(2019)
RAMP monitors were deployed in 2017 for 1 month in summer and fall at Carnegie
Mellon University (CMU). Located less than 10m from the RAMP monitors were
209
high-quality regulatory-grade instruments (all instruments were owned and oper-
ated by the CAPS group at Carnegie Mellon University) for measuring ambient
concentrations of CO (Teledyne T300U instrument), CO2 (LI-COR 820), O3 (Tele-
dyne T400 photometric ozone analyzer), and NO and NO2 (2B Technologies model
405nm). These instruments provided the “true” concentration values for these var-
ious gases to which the RAMP monitors were exposed. Additionally, one RAMP
monitor was collocated with the Allegheny County Health Department (ACHD)
regulatory monitors in Lawrenceville (urban background site measuring all EPA
criteria pollutants) and one was collocated at the ACHD Parkway East site along
I-376 highway (higher levels of NOx ). These sensors were first calibrated at CMU
and subsequently tested against data independently at the ACHD sites. Essentially,
these collocations were used to analyze the performance of our sensor against regu-
latory grade monitor at different locations or locations with higher concentrations.
A general calibration model was developed for all sensors deployed at same
place and same time for each gas sensor (CO and NO2 ). Data from 75% of the
RAMP monitors were used as the training dataset and remaining 25% of the RAMP
monitors were used as testing data. The steps followed by Malings to create a
general calibration model have been paraphrased below:
1. Data from all the RAMPs deployed at the same time and place was compiled.
2. Median concentrations were identified from the RAMPs at EACH timestamp
(RAMPs from point 1 above).
3. A new time-series was created using the median values. This signal is treat-
ed/labelled as a ‘typical RAMP’.
210
4. The data from this ‘typical RAMP’ was used to build one calibration model
against the reference monitors. The resulting models are denoted as gRAMP
(general RAMP).
5. gRAMP model is applied to all RAMPs.
Various calibration techniques were tested for RAMP monitors by collocating
with regulatory grade monitors – including linear and quadratic regression, cluster-
ing, artificial neural networks and random forests. To circumvent the disadvantages
associated with random forests (outputs of a random forest model for new data can
only be within the range of the values included as part of the training data), it was
expanded into hybrid random forest–linear regression model.
Figure B.1: (Adapted from Malings et al., 2019 [141], duplicated with per-
mission): Performance evaluation of various calibration techniques for
gRAMP models. Out of the 5 techniques listed, best 3 performing cali-
bration techniques are displayed for each gas in the figure.
211
B.1.2 Correction of PM2.5 Data – paraphrased from Malings et al.
(2020)
PM2.5 measurements were obtained via commercial low-cost nephelometer - ei-
ther a Met-One Neighborhood Monitor or a PurpleAir PA-II. The sensors were
calibrated against Beta Attenuation Monitors (BAM) at Allegheny County Health
Department (ACHD) or Pennsylvania Department of Environmental Protection
(DEP). The two nephelometers were calibrated differently, using the methods de-
veloped by Malings et al. (2020) [142], and evaluation of long-term performance
of the two sensors has been discussed thoroughly in Malings et al. (2020) [142].
Two different methods were tested for accuracy and precision of the calibration.
The first method, a physics-based approach, combined hygroscopic growth factor
with linear correction to calibrate the sensors. These two steps are described below
Hygroscopic growth factor
RAMP’s PM2.5 sensors were corrected for humidity using hygroscopic growth fac-
tor (ratio of PM at a given humidity and temperature to that at 22°C and 35%
relative humidity (RH)). It was calculated using method described in Petters and
Kreidenweis (2007) [127], and the formula is reported in the Eq. B.1:
αw (T, RH
f RH(T, RH) = 1 + κbulk (B.1)
1 − αw (T, RH)
where:
4σw Mw −1

αw (T, RH) = RHexp (B.2)
ρw RT D p
Here,
212
κbulk : hygroscopicity of bulk aerosol; κbulk = Σi xi κi where xi and κi : are the volume
fraction hygroscopocity parameters of major non-refractory aerosol components
sulfate, nitrate, ammonium, and organic matter. xi values can be found in Gu et al.
2018 [85]. κi : values can be obtained from Cerully et al. 2015 [48] and Petters and
Kreidenweis, 2007 [127].
αw : water activity parameter
σw , Mw , and ρw : the surface tension, molecular weight and density of water (0.072
N/m, 0.018 kg/mol, 1000 kg/m3 respectively)
T: absolute temperature
R: ideal gas constant (8.314 J/mol K)
RH: ambient relative humidity
D p : particle diameter; adopted as volume median diameter from long-term size
distribution measurements using SMPS in Pittsburgh.
Linear Correction
The following equation B.3 was adopted to calibrate the sensors to the refer-
ence grade instrument.

PM2.5 as reported
Corrected PM2.5 = θ1 + θ0 (B.3)
f RH(T, RH)
Here, corrected PM2.5 refers to the concentrations obtained by reference mon-
itor and PM2.5 as reported refers to RAMP observations. Values of coefficients θ0
and θ1 are reported below in Table B.1.
213
Table B.1: Calibrated coefficients (θ0 and θ1 ) calculated using typical linear
regression techniques for Met-One NPM and PurpleAir PPA over 3 dif-
ferent periods – summer, winter and other. S.D. denotes the standard
deviation. (Table taken from Malings. et al. 2020 [142])
θ0 θ1
Met-One NPM PurpleAir PPA Met-One NPM PurpleAir PPA
Coefficient S.D Coefficient S.D Coefficient S.D Coefficient S.D
Summer 5.28 0.09 5.4 0.4 1.5 0.01 0.62 0.03
Winter 2.03 0.08 -0.3 0.2 1.5 0.01 1.25 0.01
Other 1.68 0.13 3.7 0.1 1.76 0.02 0.83 0.01
B.1.3 Emperical Correction Method
A second method, empirical correction method was tested as an alternative since
specific aerosol chemical composition might be unavailable at locations. Separate
equations for NPM and PurpleAir sensors were used and are Eq.B.4 and Eq.B.5.
Coefficients for the equations can be found in table S4 of Malings (2020).
Corrected PM2.5 = α0 + α1 [NPM PM2.5 ] + α2 T + α3 RH + α4 [NPM PM2.5 ]2 +
α5 [NPM PM2.5 ]T + α6 [NPM PM2.5 ]RH + α7 T 2 + α8 T RH + α9 RH 2
(B.4)





β0 + β1 [PPA PM2.5 ] + β2 T + β3 RH+



β4 DP(T, RH) if [PPA PM2.5 ] > 20 µg m−3


Corrected PM2.5 = (B.5)




γ0 + γ1 [PPA PM2.5 ] + γ2 T + γ3 RH+



γ DP(T, RH) if [PPA PM ] ≤ 20 µg m−3


4 2.5
214
B.2 Distribution of daily pollutant concentrations
Figure B.2: Distribution of daily average PM2.5 concentrations for 47 sites

(only 47 out of deployed 50 sensors collected PM2.5 data) for the period
August 2016 – December 2017. Mean values are marked with an ‘X’
and median values are denoted with a solid line. PM2.5 concentrations
vary notably across 47 sites, with 4 datapoints exceeding 80 µg m−3
(not shown here). Site 17 (blue plot, with high concentrations) is on
a roof in downtown Pittsburgh, 20m away from a restaurant exhaust
vent. The restaurant specialized in wood-fired pizzas, and was therefore
characterized by extremely high concentrations.
215
Figure B.3: Distribution of daily average NO2 concentrations for all sites for
the period August 2016 – December 2017. Mean values are marked
with an ‘X’ and median values are denoted with a solid line. Site 49
(blue, second last boxplot) is near a railway track, and hence is char-
acterized with high concentrations. NO2 concentrations vary notably
across 50 sites, with 4 datapoints exceeding 35 ppb (not shown here).
Figure B.4: Distribution of daily average CO concentrations for all sites for
the period August 2016 – December 2017. Mean values are marked
with an ‘X’ and median values are denoted with a solid line. Site 49
(blue, second last boxplot) is near a railway track, and hence is char-
acterized with high concentrations. CO concentrations vary notably
across 50 sites, with 4 datapoints exceeding 2000 ppb (not shown here).
216
B.3 Limit of Detection values
Limit of Detection (LoD) was determined for each pollutant to establish the mini-
mum concentration that would be reliably measured by the sensors. Our approach
to determining LoD involved calculating error fractions of pollutants. Essentially,
we trained random forest models on 45 random sites (and testing on the remaining
5 sites). This was iterated until each site was tested against or 20 times, whichever
is larger.
120 200 250

PM2.5 NO2 CO
100
200
150
80
150
60 100
Error Fraction %
Error Fraction %
Error Fraction % 100

40
50
50
20
0 0 0
-20 -50
-50
<10th [0, 139)
>90th [493, 1320]

10-20th [139, 173)
20-30th [173, 201)
30-40th [201, 228)
40-50th [228, 252)
50-60th [252, 281)
60-70th [281, 323)
70-80th [323, 387)
80-90th [387, 493)
<10th [0, 5.6)
40-50th [8.7, 10)
<10th [0, 4.3)

10-20th [5.6, 6.9)
20-30th [6.9, 7.8)
30-40th [7.8, 8.7)
10-20th [4.3, 5.6)

20-30th [5.6, 6.6)
30-40th [6.6, 7.4)
40-50th [7.4, 8.3)
50-60th [8.3, 9.1)
50-60th [10, 11.2)
60-70th [11.2, 12.6)
70-80th [12.6, 14.6)
80-90th [14.6, 17.4)
>90th [17.4, 42.2]
70-80th [10.1, 11.4)

80-90th [11.4, 13.3)
>90th [13.3, 21.7]
60-70th [9.1, 10.1)
Percentile bin
Figure B.5: Boxplots of error fractions for PM2.5 , NO2 and CO divided into
deciles (of observed daily average concentrations) used to determine
LoD. Black solid line represents the ideal error fraction (closer to zero
is better) and was used to determine the decile where it stabilizes. The
final LoD was determined as the lower decile of the bin where error
fraction stabilizes.
The predicted concentration from each run is then plotted as error fraction per-
cent against observed concentration (Figure S5), where error fraction is defined as
217
a ratio of error (predicted concentration – observed concentration) and observed
concentrations.
LoD was then determined as the observation where error fraction stabilizes
( 10% median error fraction). The LoD for daily average PM2.5 , NO2 and CO con-
centrations were henceforth selected as 7 µg m−3 , 6.5 ppb and 200 ppb respectively
(rounded off).
B.4 Description of wavelet decomposition approach

(Zimmerman et al., 2020)
(Paraphrased with permissions from Naomi Zimmerman)
Wavelet decomposition is a method used to separate local and regional back-
ground signals, reflecting the impact of sporadic vs persistent local enhancements.
It is a signal processing method that separates a time-series data into its high fre-
quency (level of detail, d1 ) and low frequency (level of approximation, a1 ). The
low frequency signal is further broken down into second level of detail (d2 ) and
approximation (a2 ), and the process is repeated few more times, until desired level
of detail is obtained. The original data are reconstructed by adding the final level of
approximation (an ) and the sum of each level of detail (d1 +d2 +. . . +dn ). The level
of decomposition corresponds to changes on time scales of 2n (Sabaliauskas et al.,
2014) [196].
Details of the wavelet decomposition approach used in the manuscript are de-
scribed previously (Zimmerman et al., 2019) but summarized here. A 3-level de-
composition was used to separate the short-lived events (3 level = changes above
baseline occur on the order of 2 h; 15 min x 23 = 2 h). The difference between the
baseline of the 3-level decomposition and a 5-level decomposition (changes above
218
Figure B.6: (From Zimmerman et al., 2020, duplicated with permission):
Wavelet decomposition of measured pollutant concentration (CO), from
short-lived events to regional background signals.
baseline occur on the order of 8 h; 15 min x 25 = 8 h) were defined as the contribu-
tion from longer-lived events. The difference between the 5-level decomposition
baseline and the regional background was defined as the persistent enhancement. A
second decomposition of 3-level was conducted to identify and separate the short-
lived events, which is equivalent to separating a baseline (a5 ) that changes on the
order of 2 hours (15 min x 23 = 2 h). Persistent enhancements were obtained by
subtracting regional background concentrations from fifth level of approximation,
a5.
B.5 Description of model covariates
219
Table B.2: Land use covariates used in LUR and LURF models
Category Description Source Unit Buffer Ra-

dius (m)
Static predictors
Population density Number of inhabi- Allegheny 25, 50, 100,
tants County 200
Road length Length of all roads PASDA m 25, 50, 100
Housing density Number of house- Allegheny 25, 50, 100,
holds Country 200
Vehicle density Vehicle density on PennDOT Veh m/day 25, 50, 100
all roads
Bus fuel consump- Bus fuel consump- PennDOT Kg fuel/day 25, 50, 100
tion tion
Rail length Rail length PennDOT m 50, 100
Elevation Elevation (height USGS m
above mean sea
level)
Inverse distance to Inverse distance to PASDA 1/m
the road the road
Time-varying pre-
dictors
Temperature Daily average tem- NWS F
perature
Wind Daily average wind NWS km/h
speed
Precipitation Daily total precipi- NWS mm
tation
EPA CO EPA’s daily CO EPA ppm
measurement
EPA PM EPA’s daily PM2.5 EPA µg m−3
measurement
Allegheny County: Allegheny County GIS Group
PASDA: Pennsylvania Spatial Data Access
PennDOT: Pennsylvania Department of Transportation
USGS: USGS National Elevation Dataset
NWS: US National Weather Service
EPA: US Environmental Protection Agency
220
B.6 Model covariates of sites and LUR variable coefficients
Table B.3: Predictor values used in model building
Site Lat Lon PD PD PD PD RL RL RL HD HD HD HD VD VD VD BFC BFC BFC RaL RaL Inv Ele
(25) (50) (100) (200) (25) (50) (100) (25) (50) (100) (200) (25) (50) (100) (25) (50) (100) (50) (100)
1 40.511 -79.869 0.8 3.3 12.2 53.3 2.3 9.3 41.4 0.3 1.1 4.1 18.0 0.5 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 329.8
2 40.494 -79.907 1.0 3.8 18.5 81.6 5.7 21.8 89.8 0.5 1.9 9.2 40.4 151.7 148.1 151.9 0.0 0.0 0.1 0.0 0.0 0.0 260.8
3 40.478 -79.970 1.9 7.5 25.8 122.7 11.0 41.0 153.5 1.3 5.0 17.1 78.1 194.7 192.9 178.8 0.1 0.4 1.6 8.7 33.3 0.0 226.7
4 40.478 -79.957 5.8 22.2 87.8 322.2 9.8 38.2 148.4 3.5 13.5 53.1 194.4 29.8 39.0 41.0 0.1 0.7 3.0 1.6 5.2 0.1 229.8
5 40.465 -79.959 1.5 12.5 62.7 463.6 5.3 23.7 76.1 0.4 6.8 35.9 284.7 2.0 1.2 42.6 0.0 0.7 5.1 0.0 0.0 0.1 255.9
6 40.468 -79.926 11.5 44.9 179.6 706.7 9.7 38.1 148.4 7.3 28.5 114.1 444.4 48.1 49.2 45.8 0.2 1.0 3.7 0.0 0.0 0.1 279.8
7 40.463 -79.916 4.1 16.1 67.1 275.2 7.4 30.1 117.2 2.5 9.9 41.1 166.2 13.8 14.1 13.7 0.1 0.3 1.1 0.0 0.0 0.1 304.6
8 40.459 -79.928 4.6 18.1 73.6 334.6 11.7 44.5 175.3 2.8 11.1 44.6 212.4 72.9 67.3 64.7 1.1 4.6 16.7 3.6 14.3 0.1 274.6
221
9 40.456 -79.890 4.7 18.4 73.7 307.3 10.5 40.8 159.2 3.4 13.3 53.0 214.1 47.5 46.7 45.6 0.2 0.9 3.5 0.0 0.1 0.1 278.9
10 40.457 -79.925 15.8 62.3 246.9 944.7 8.0 33.6 137.0 10.3 40.6 161.4 620.6 24.2 24.6 26.1 0.1 0.7 2.9 0.4 2.2 0.1 277.7
11 40.453 -79.908 1.3 4.9 19.0 81.0 4.6 4.7 4.5 0.8 2.9 11.0 46.2 70.8 63.3 60.2 0.0 0.0 0.0 0.0 0.0 0.0 335.2
12 40.451 -79.942 9.7 38.4 145.0 630.2 6.0 25.3 99.2 5.2 20.7 78.2 345.6 55.0 54.3 51.2 0.1 0.4 1.3 0.0 0.0 0.0 274.9
13 40.450 -79.909 7.4 29.1 114.5 457.0 8.0 33.3 133.7 3.6 14.1 55.7 227.4 64.9 66.9 63.5 0.2 0.7 2.8 0.0 0.0 0.1 297.6
14 40.449 -79.901 6.1 24.0 94.8 352.2 7.2 29.7 119.0 3.3 12.9 50.9 187.1 34.9 37.0 35.5 0.3 1.1 4.3 1.7 6.4 0.0 290.5
15 40.441 -79.997 0.0 0.0 2.6 25.2 19.3 51.5 184.7 0.0 0.0 6.6 64.0 464.2 268.3 216.7 8.5 19.6 42.0 0.0 0.0 0.1 229.6
16 40.442 -79.942 3.9 15.4 61.3 240.6 0.0 0.0 0.0 0.0 0.0 0.2 9.3 1.3 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.1 285.3
17 40.441 -79.995 0.0 0.0 0.0 1.2 4.3 44.4 175.0 0.0 0.0 0.0 2.0 1.7 178.7 186.5 0.0 5.0 29.5 0.0 0.0 0.1 230.9
18 40.444 -79.996 0.0 19.5 45.1 132.8 10.4 46.0 156.7 4.0 15.4 38.1 125.0 130.3 216.6 135.4 2.4 5.4 11.1 0.0 0.0 0.1 224.9
19 40.445 -79.976 0.7 9.8 43.7 257.5 9.3 61.2 236.3 1.0 4.6 26.6 187.2 58.4 110.0 67.0 0.0 1.9 4.8 0.0 0.0 0.1 305.0
20 40.447 -79.927 5.4 21.2 81.5 345.8 5.2 21.1 90.9 2.0 7.7 29.1 129.9 22.2 27.3 29.8 0.0 0.1 0.4 0.0 0.0 0.0 323.6
21 40.444 -79.895 4.4 17.3 72.1 286.4 7.4 30.9 128.5 2.6 10.3 42.7 167.3 11.8 11.3 11.1 0.1 0.3 1.0 0.0 0.0 0.0 323.5
22 40.438 -79.926 8.2 32.3 127.9 503.3 6.4 22.6 88.5 3.8 14.9 59.9 236.6 53.6 36.9 36.3 0.6 1.9 6.6 0.0 0.0 0.0 334.1
23 40.438 -79.963 6.7 26.6 115.1 482.6 8.7 35.9 135.3 2.9 11.6 51.7 200.2 125.6 130.0 119.1 0.7 3.1 11.4 0.0 0.0 0.0 278.0
(25) (50) (100) (200) (25) (50) (100) (25) (50) (100) (200) (25) (50) (100) (25) (50) (100) (50) (100)
24 40.433 -80.010 6.4 25.0 99.5 380.9 8.6 33.7 133.2 4.4 17.3 68.7 258.5 58.1 56.4 56.7 0.0 0.3 1.3 2.7 10.7 0.1 313.4
25 40.429 -79.986 0.0 10.6 91.1 288.2 20.1 63.3 194.3 1.8 8.8 68.8 222.1 3.7 93.1 55.5 0.0 0.0 0.0 0.0 0.0 0.1 229.6
26 40.435 -79.877 4.9 19.2 73.2 298.9 5.2 22.2 91.6 2.4 9.5 35.8 146.5 2.6 2.7 2.9 0.0 0.0 0.0 0.0 0.0 0.0 319.9
27 40.431 -79.893 6.1 23.7 88.8 329.7 7.9 31.0 124.9 2.9 11.1 41.4 155.1 1.6 2.9 13.4 0.0 0.0 0.2 0.0 0.0 0.1 272.9
28 40.435 -79.896 6.7 26.5 108.5 429.5 8.1 30.5 123.7 4.0 15.8 64.7 251.4 32.7 30.3 33.4 0.2 0.6 2.8 0.0 0.0 0.0 277.6
29 40.437 -79.863 2.9 11.4 44.0 178.1 7.1 26.6 103.9 1.8 6.9 26.1 105.4 207.4 192.6 177.1 0.1 0.2 0.7 0.0 0.0 0.0 367.1
30 40.427 -79.917 10.3 39.9 148.2 633.6 7.4 29.8 125.7 5.4 20.9 76.3 327.0 89.2 89.5 94.6 0.1 0.2 0.8 0.0 0.0 0.0 328.2
31 40.422 -79.933 10.7 41.9 167.4 661.0 8.5 34.3 139.2 5.4 21.0 84.0 332.5 10.3 11.8 16.4 0.2 0.8 3.1 0.0 0.0 0.1 332.2
32 40.425 -79.894 4.3 17.0 75.9 307.5 11.9 48.3 181.7 2.4 9.5 42.2 168.4 126.7 120.0 94.6 0.0 0.1 0.3 0.0 0.0 0.1 265.6
33 40.416 -79.880 4.3 16.9 65.7 278.9 6.0 24.5 99.9 2.4 9.4 36.0 149.9 30.5 30.9 29.3 0.4 1.3 6.3 3.9 15.2 0.0 295.1
34 40.418 -79.903 4.4 16.9 67.6 264.9 8.7 33.9 126.9 2.1 8.0 32.2 125.8 10.9 10.1 9.4 0.0 0.0 0.1 2.3 6.3 0.0 275.8
222
35 40.407 -79.901 6.3 24.3 98.6 380.1 8.9 36.0 147.4 3.3 12.8 52.0 206.3 3.1 2.8 9.7 0.0 0.0 0.6 0.0 0.0 0.1 264.2
36 40.398 -79.863 1.1 4.4 17.1 79.1 6.4 25.5 94.9 0.4 1.5 6.1 28.1 10.6 9.7 8.5 0.0 0.0 0.0 2.5 9.7 0.0 228.4
37 40.402 -79.860 4.5 17.5 69.0 274.2 8.3 34.2 136.7 2.3 9.1 36.5 145.4 8.6 9.2 9.2 0.1 0.2 0.8 0.6 2.6 0.0 288.4
38 40.402 -79.844 5.8 22.9 86.3 322.8 11.6 45.6 179.5 3.0 11.6 44.1 167.0 26.5 30.4 32.4 0.0 0.2 0.7 0.0 0.0 0.1 315.5
39 40.388 -79.919 3.2 12.6 50.6 189.4 3.5 14.0 55.2 1.5 5.9 24.0 89.4 0.7 0.7 0.7 0.0 0.0 0.0 0.0 0.0 0.0 338.9
40 40.381 -80.049 9.8 38.2 150.9 571.8 8.1 30.4 124.8 5.8 22.4 89.8 333.0 12.1 14.0 16.1 0.0 0.0 0.1 0.0 0.0 0.0 355.5
41 40.380 -79.871 0.0 0.0 0.2 1.5 2.4 10.4 44.0 0.0 0.0 0.1 0.7 16.1 20.4 21.5 0.0 0.0 0.2 0.0 0.0 0.0 332.1
42 40.359 -80.110 2.8 11.1 42.3 165.5 6.8 26.6 102.0 1.5 5.8 22.2 87.4 58.0 53.9 50.5 0.1 0.4 1.7 5.5 21.2 0.0 259.4
43 40.355 -79.977 1.9 7.3 29.3 124.3 3.3 14.3 62.8 0.8 3.1 12.4 52.8 14.7 14.8 16.8 0.0 0.1 0.3 0.0 0.0 0.0 365.7
44 40.344 -79.824 0.9 3.6 13.0 67.3 3.0 12.8 57.6 0.4 1.4 5.3 27.3 0.6 0.7 0.7 0.0 0.0 0.0 0.0 0.0 0.0 316.0
45 40.318 -79.901 0.4 1.4 5.6 23.6 3.6 14.0 49.3 0.2 0.7 2.7 11.7 10.7 11.2 8.4 0.0 0.0 0.0 0.0 0.0 0.0 268.0
46 40.311 -79.863 0.4 1.5 5.7 22.0 1.7 6.1 26.2 0.2 0.6 2.3 8.8 1.0 1.0 1.3 0.0 0.0 0.0 0.0 0.0 0.0 308.7
47 40.314 -79.875 0.1 0.3 1.1 4.9 1.7 6.7 24.9 0.0 0.2 0.6 2.8 6.2 7.3 7.1 0.0 0.0 0.0 2.2 8.3 0.0 274.5
48 40.308 -79.869 0.1 0.2 0.7 2.9 2.7 9.9 41.5 0.0 0.1 0.4 1.7 5.6 5.4 5.7 0.0 0.0 0.0 1.3 4.6 0.0 314.7
49 40.297 -79.886 0.5 1.9 10.8 46.1 3.1 11.0 43.0 0.3 1.2 6.1 24.5 0.6 0.6 0.6 0.0 0.0 0.0 4.6 16.8 0.0 244.3
(25) (50) (100) (200) (25) (50) (100) (25) (50) (100) (200) (25) (50) (100) (25) (50) (100) (50) (100)
50 40.291 -79.888 3.6 14.2 58.0 239.0 7.6 28.8 115.4 1.7 6.8 27.9 121.7 1.6 1.5 2.1 0.0 0.0 0.0 0.0 0.0 0.1 300.5
Lat: Latitude; Lon: Longitude; PD: Population Density; RL: Road Length; HD: Housing Density; VD: Vehicle Density; BFC: Bus Fuel Consumption; RaL: Rail Length; Inv: Inverse Distance to the Road; Ele: Elevation
223
B.7 Description of random forest models
A random forest model is a machine learning algorithm for solving regression or
classification problems [33]. The model is constructed as an ensemble of decision
trees. A decision tree is a series of distinct choices that assigns a probability to
each outcome scenario. It starts with a single main node and branches into sub-
sequent secondary nodes. The secondary nodes represent the possible outcomes
of the event. These secondary nodes then act as a primary node and branch into
further possibilities. The process is repeated until a terminal node is reached. The
user specifies the number of trees that make up the forest, and each tree uses a
bootstrapped random sample from the training data set. The origin node of the
decision tree is split into sub-nodes by considering a random subset of the possible
variables and splitting based on which of these variables are strong predictors of
the response. The number of variables to randomly select at each node is chosen
by the user. This is repeated until a terminal node is reached; the user can specify
the maximum number of sub-nodes or the minimum number of data points in the
node as the indication to terminate the tree. In this project, we used 10-fold cross
validation method and terminated at 1000 trees with minimum terminal node size
of 1. The 10-fold cross-validation method, referred to as the K-fold technique, in-
volves randomly splitting the data into K (=10 here) folds. The model is trained on
K minus 1 folds (training set) and validated against the remaining Kth fold (testing
set). The process is iterated K times, until each K-fold gets a chance to be validated
as the testing dataset – essentially, until every datapoint from the original dataset
gets an opportunity to appear in the testing set. Random forests are often imple-
mented in prediction analysis due to their increased accuracy and ability to capture
224
complex interaction problems as compared to linear regression [34].
B.8 Observed vs Predicted concentrations
Observed Concentrations
LUR LURF
Standard Signal Standard Signal
PM2.5 Deconvolved Signal Deconvolved Signal
40 40
Predicted concentrations (μg/m )

3
35
PM2.5 (μg/m )
30 30
3
25
20 20
15
10
10
5
NO2 10 20 30 40
3
Observed concentrations (μg/m )
25 Predicted concentrations (ppb) 30
25
NO2 (ppb)
20
20
15
15
10
10
5
CO
5 10 15 20 25 30
1200 Observed concentrations (ppb)
Predicted concentrations (ppb)
1200
1000 1000
CO (ppb)
800 800
600 600
400 400
200 200
0 0
11-01-17 01-01-17 03-01-17 05-01-17 0 400 800 1200
Observed concentrations (ppb)
Figure B.7: (Figures above depicts predicted LUR (orange and green) and
LURF (blue and pink) concentrations (for standard and deconvolved
signals respectively) for the duration between October 2016 and June
2017 for site at Allegheny County Health Department (ACHD). Figure
(b) is a scatterplot between observed and predicted concentrations, with
dotted black line representing 1:1 line.
225
B.9 LUR Coefficients
Table B.4: LUR Coefficients for PM2.5 . U and S represent coefficients of Unstandardized and Standardized data,
respectively.
Standard Persistent Long-lived Short-lived

Coefficients Coefficients Coefficients Coefficients
U S p-Value U S p-Value U S p-Value U S p-Value
Intercept 3.93 13.63 0.51 3.60 0.19 0.76 3.30 1.00
EPA PM 0.93 3.80 <2e-16 0.26 1.05 <2e-16 0.04 0.18 <2e-16
EPA CO 0.36 0.10 2.4E-09
Inverse Distance to
the Road
226
Housing Density -0.01 -1.31 <2e-16 -0.004 -0.45 7.3E-09 -0.002 -0.26 <2e-16 -0.01 -1.12 <2e-16
(200m)
Bus Fuel Consump-
tion (25m)
Population Density 0.19 0.62 2.8E-07 0.09 0.29 2.5E-04 0.03 0.11 1.1E-07 0.15 0.52 <2e-16
(25m)
Elevation 0.01 0.35 <2e-16
Bus Fuel Consump- 0.27 2.34 <2e-16 0.14 1.20 < 2e-16 0.04 0.34 <2e-16
tion (100m)
Road Density 0.01 0.62 <2e-16
(100m)
R2 0.72 0.54 0.13 0.22
MAE 1.83 0.77 0.42 0.20
Average 13.63 3.60 0.76 1.00
CvMAE 0.13 0.21 0.56 0.20
Table B.5: LUR Coefficients for NO2 . U and S represent coefficients of Unstandardized and Standardized data, respec-
tively.

Intercept 13.00 10.53 4.37 4.67 0.39 0.63 0.85 0.64
EPA PM 0.16 0.70 <2e-16 0.08 0.36 <2e-16 0.00 0.02 6.6E-03
EPA CO 2.24 0.64 <2e-16 -0.45 -0.13 1.1E-03 0.39 0.11 <2e-16
Bus Fuel Consump- 0.03 0.25 1.3E-08 0.03 0.30 1.4E-15
tion (100m)
Population Density 0.00 0.08 <2e-16
(200m)
Temperature 0.06 1.14 <2e-16 0.003 0.05 <2e-16
227
Precipitation
Wind -0.09 -0.29 5.5E-10 -0.08 -0.26 2.7E-10 -0.02 -0.07 <2e-16 -0.02 -0.08 8.6E-09
Housing Density -0.001 -0.08 1.3E-04
(200m)
Road Density 0.00 0.01 8.2E-02
(100m)
Population Density -0.09 -0.15 1.4E-04
(50m)
Elevation 0.02 0.62 <2e-16 0.01 0.47 <2e-16
R2 0.64 0.20 0.16 0.08
MAE 2.02 1.14 0.28 0.25
Average 10.53 4.67 0.63 0.64
CvMAE 0.19 0.24 0.44 0.39
Table B.6: LUR Coefficients for CO. U and S represent coefficients of Unstandardized and Standardized data, respec-
tively.

Intercept 229.6 367.6 129.4 189.9 13.18 26.87 19.24 31.25
EPA PM 10.35 44.66 <2e-16 6.16 26.57 <2e-16 1.26 5.42 <2e-16
EPA CO 111.8 32.31 <2e-16 17.14 4.95 <2e-16 16.54 4.78 7E-08
Population Density 0.61 0.24 3E-06 3.42 12.58 2E-14
(25m)
Housing Density -0.17 -21.63 <2e-16 -0.12 -14.70 1E-15
(200m)
Temperature 2.06 31.42 <2e-16
228
Population Density -0.07 -15.52 4E-15

(200m)
Bus Fuel Consump- 2.21 14.63 2.8E-09 1.10 7.30 4E-04
tion (100m)
Road Density 0.08 5.04 1E-05
(100m)
Elevation 0.37 14.46 1.0E-11 0.03 1.25 9.6E-03
R2 0.67 0.16 0.30 0.10
MAE 96.15 48.71 8.60 6.43
Average 367.59 189.90 26.87 31.25
CvMAE 0.26 0.26 0.32 0.21
B.10 Average performance metrics of the pollutants
across all models
Table B.7: Average performance metrics of the pollutants across all models
for 20 iterations
R-squared (Average)
LUR LURF Hybrid
Standard Decomposed Standard Decomposed Standard Decomposed
PM2.5 0.59 0.73 0.78 0.76 0.74 0.75
NO2 0.40 0.43 0.59 0.56 0.56 0.54
CO 0.30 0.31 0.40 0.41 0.35 0.34
MAE (Average)
LUR LURF Hybrid
PM2.5 2.63 2.43 1.83 2.08 1.81 2.19
NO2 1.94 1.81 1.30 1.55 1.30 1.53
CO 108.76 109.11 100.46 98.97 95.37 96.74
CvMAE (Average)
LUR LURF Hybrid
PM2.5 0.23 0.21 0.16 0.19 0.17 0.19
NO2 0.22 0.21 0.17 0.17 0.17 0.17
CO 0.35 0.35 0.34 0.34 0.32 0.32
229
B.11 Comparison to other published studies
Table B.8: Comparison with other published studies.
Project Location Sensor Prediction Time Average No of No of R2 [Mean

Method Resolu- Concen- Sites Days (Min-
tion tration Max)]
CO (ppb)
Brunelli et al. [37] Italy Regulatory Monitor- ANN Daily N.R. 8 730 0.9 (0.86-
ing Network Max. 0.92)
Hassanpour Matiko- Tehran, Iran N.R. Regression Hourly N.R. 30 90 0.38
laei et al. [91]
Pittsburgh, RAMP Regression Daily 367.6 50 518 0.30 (0.05-
USA 0.53)
230
Pittsburgh, RAMP LURF Daily 367.6 50 518 0.40 (0.14-

This project
USA 0.75)
Pittsburgh, RAMP Hybrid Daily 367.6 50 518 0.35 (0.05-
USA 0.65)
NO2 (ppb)
Japan Regulatory Monitor- LURF Monthly 15-20 81 1460 0.79
ing Network
Araki et al. [14]
Japan Regulatory Monitor- Regression Monthly 15-20 81 1460 0.73
ing Network
Brunelli et al. [37] Italy Regulatory Monitor- ANN Daily N.R. 8 730 0.87 (0.81-
ing Network Max. 0.96)

tion tration Max)]
Pearl River Regulatory Monitor- Regression Annual 18.22 69 365 0.56
Yang et al. [261] Delta, China ing Network
Beelen et al. [25] Europe Ogawa passive sam- Regression Annual 14.3 1434 42 0.7 (0.31-
plers 0.87)
Ma et al. [139] Auckland, New Passive Diffusion Regression Annual 15.5 107 120 - 0.6
Zealand Tubes 132
Crouse et al. [59] Montreal, Ogawa passive sam- Regression Annual 11.9 129 14 0.78
Canada plers
Jerrett et al. [110] Toronto, Ogawa passive sam- Regression N.R. 32.2 95 14 0.69
Canada plers
231
Portland, USA Ogawa passive sam- Regression Annual 11-13 174 14 0.75-0.80
plers
Rao et al. [191]
Portland, USA Ogawa passive sam- LURF Annual 11-13 174 14 0.80-0.83
plers
Henderson et al. Vancouver, Ogawa passive sam- Regression Annual 16.2 116 14 0.56-0.60
[94] Canada plers
laei et al. [91]
USA 0.57)
This project
USA 0.82)
USA 0.80)

tion tration Max)]
PM2.5 (µg m−3 )
Cincinnati, Harvard-type Im- LURF N.R. 17.6 24 1460 0.51
USA pactors, 37-mm
Teflon Membrane
Brokamp et al. [34]
Cincinnati, Harvard-type Im- Regression N.R. 17.6 24 1460 0.64
USA pactors, 37-mm
Teflon Membrane
Eeftens et al. [71] Europe Harvard Impactors Regression Annual 15.7 436 42 0.58 (0.21-
0.79)
Li et al. [130] Pittsburgh, Filter sampling, 47- Regression N.R. 14.6 36 92 0.34
232
USA mm Teflon Filter

Yang et al. [261] Pearl River Regulatory Monitor- Regression Annual 41.7 69 365 0.87
Delta, China ing Network
North China Satellite Monitoring ANN Daily N.R. N.R. 90 0.66
Yao et al. [262]
North China Satellite Monitoring Regression Daily N.R. N.R. 90 0.34
Brokamp et al. [35] Cincinnati, Satellite Monitoring Random Daily 12.61 52 +5000 0.88
USA Forest (median)
Henderson et al. Vancouver, Harvard Impactors Regression Annual 4.08 116 14 0.52
[94] Canada
laei et al. [91]
Stafoggia et al. Italy N.R. LURF Daily 17.1 198 - 1095 0.80
[213] 229

tion tration Max)]
USA 0.73)
USA 0.92)
This project
USA 0.90)
N.R.: Not Reported; ANN: Artificial Neural Networks
Note 1: ESCAPE (Europe) Project’s R2 and average concentration value have been averaged over the 36/20 sites for NO2 /PM2.5 respectively.
Maximum and minimum concentrations were noted down for individual site.
Note 2: R2 values for ”this project” consists of average values for standard signals. The minimum-maximum values have been noted after
233
removing outliers.
Note 3: Daily average concentration for Araki et al. (2018) is an estimate obtained via figure 1 in the paper.
B.12 Spatial Mapping
Using LURF standard and decomposed prediction models, spatial maps at 50 m
resolution were plotted for Allegheny county (total grids = 772,805). Steps fol-
lowed for the same are as follows:
1. Grids with ≥ 50% spatial variables exceeding training model limits (both upper
and lower limits from 50 training sites) were excluded from the map. For instance,
if standard signal model used 6 variables for model training, grids with < 4 vari-
ables within training model limits were filtered out.
2. Daily PM2.5 was predicted at each of the remaining grids (total remaining grids
= 387,938) for 2017. These predictions were then averaged to determine annual
concentrations.
Standard LURF Signal Decomposed LURF Signal
Average PM2.5 (μg/m3)

Value
14
10
0 4 8 16 24 32 Pittsburgh City
km
Allegheny County
Figure B.8: (Figures above depicts annual average PM2.5 using LURF stan-
dard and decomposed signal prediction models. Pittsburgh city and
Allegheny county boundaries are marked by blue and black lines re-
spectively.
Although both standard and decomposed signals had similar performance when
tested against external dataset, decomposed signal were found to have more reliable
results. This is depicted in Figure B.8 – standard signal model tended to predict
234
lower concentrations on roadways compared to surrounding area. Therefore, when
considering machine learning approaches, it is recommended to opt for wavelet
signal decomposition for more accurate and reliable results.
B.13 Comparison of model performance with EPA

monitoring stations
Table B.9: MAE calculations for data predicted at EPA monitoring stations
EPA’s monitoring data* LURF Decomposed Model Predictions

South Fayette 8.7 11.19
Lawrenceville 9.2 12.16
Clariton 9.8 11.67
Harrison 10 11.57
North Braddock 11 12.58
Liberty 13.4 11.52
MAE 2.06
*Data collected via Allegheny County Health Department report.
235
Appendix C
Chapter 5
C.1 Limit of Detection (LOD)

For this work, LOD is the smallest concentration that can be reliably measured by
the low-cost sensors and was identified as 5 µg m−3 . We opted against removing the
datapoint below LOD as the data would then be skewed high. Similarly, replacing
the datapoints below LOD with 0 would result in data skewed low. Therefore, we
√
opted for replacing the data measured below LOD with 3.53 µg m−3 (LOD/ 2),
as recommended by [99] and [222]. Figure C.1 shows a boxplot for the percent of
data at each of the 47 sites replaced due to being below LOD.
C.2 Selection of prediction models and variables

In our previous study Jain et al. [105], we compared regression, random forests, and
hybrid regression-random forest models for both standard and time-decomposed
236
Figure C.1: Boxplot for percent of data at each site (n = 47 sites) that were
√
below Limit of Detection (LOD; 5 µg m−3 ) and replaced with LOD/ 2
(=3.53 µg m−3 ).
signals. In time-decomposed analysis, we created separate land use regression /
random forest models for persistent enhancements (>8 h duration), longer-lived
events (2-8 h duration) and short-lived events (< 2 h duration) and layered these
on top of the regional background to predict overall PM2.5 concentrations. For
modeling of PM2.5 , random forest models outperformed regression models (R2 in-
creases of 0.17-0.19; normalized mean absolute error decreased by 4-7%). Hybrid
regression-random forest models were created to address the incapability of ran-
dom forests to extrapolate. However, we found that hybrid models didn’t improve
the overall model performance or robustness of random forest models. Therefore,
237
random forest models were chosen for predicting concentrations at every grid cell.
For random forest models, the standard signal model had comparable performance
to decomposed signals. However, decomposing the signal improved the relative
importance of static (spatial) variables in the model for short-lived events. This
implies that local spikes in concentrations can be predicted using land use charac-
teristics in the nearby locations. Since we are primarily looking into spatial effects
for this work, we opted for the decomposed signal model. For a detailed discus-
sion of the relative model performance of land use regression vs. land use random
forest, as well as the impact of time decomposition, see Jain et al. [105].
Table C.1: Top 5 most important variables for modeling of random forest
decomposed signal. Value in the brackets signify buffer distance.
Signal Spatial variables Temporal variables

Persistent Population density (100m), road EPA’s daily PM2.5
enhance- length (100m), housing density measurements
ment (100m), rail length (100m)
Long-lived Elevation, vehicle density (50m), EPA’s daily CO mea-
events bus fuel consumption (50m), in- surements
verse distance to the road
Short-lived Elevation, road length (50m), vehi- Wind
events cle density (100m), bus fuel con-
sumption (100m)
C.3 Land Use Types by Allegheny County GIS Group

The list below are the available land use types from the Allegheny County GIS
Group [12].
1. Water
2. Transportation
238
3. Forest
4. Grasslands
5. Agriculture
6. Low-density residential
7. Medium-density residential
8. High-density residential
9. Identified malls
10. Commercial
11. Light industrial
12. Heavy industrial
13. Strip mine
14. Non-vegetative
C.4 Spatial distribution at 100m buffers for residential

and commercial areas
239
Figure C.2: Spatial distribution of (a) Residential and (b) Commercial areas
at 100m buffers in Pittsburgh city, obtained via Allegheny County GIS
Group [12]. Grid cells with no value (colorless) imply residential or
commercial density is zero for that grid.
240
C.5 Average day-wise concentrations in 2017
Figure C.3: Boxplot for mean day-wise concentrations in 2017 for data col-
lected at EPA’s Lawrenceville site in Pittsburgh city.
This work segregates the daily concentrations into weekday (Monday through
Friday) and weekend (Saturday and Sunday) concentrations. Figure C.3 shows a
boxplot for EPA’s daily PM2.5 concentrations across different days of the week for
2017. Amongst weekdays, Mondays through Thursdays have similar medians ( 8
µg m−3 ). Median Friday concentrations are higher ( 9 µg m−3 ) and can be attributed
to higher rate of various evening activities by individuals or citywide events (e.g.,
dining out, game nights, festivals). Nonetheless, we opted to group weekday con-
centrations because we expect similar behavioural movement (e.g., amount of time
spent) between residential and commercial areas during the weekdays. Analo-
gously, even though Saturday and Sunday concentrations are also dissimilar, we
241
have grouped them together into weekends.
C.6 Uncertainties in measurement and models

We identified two uncertainties associated with this work – uncertainties in mea-
surements taken by RAMPs and uncertainties in prediction modeling.
• Uncertainties in measurements: In Pittsburgh city, we collocated a RAMP
at EPA’s Lawrenceville site. We compared calibrated daily PM2.5 measure-
ments from RAMP with EPA’s data and found mean error to be 8%.
Σni=1 (EPA measurementsi − RAMP measurementsi )

Mean Error(ME) =
n
(C.1)
• Uncertainties in modeling: Random Forest models are created as an en-
semble of decision trees. However, these decision trees use the mean value
estimate predicted values. To ascertain that we account for uncertainties as-
sociated at this stage of modeling, in addition to mean values, we noted 2.5th
and 97.5th percentile predicted values. Using these, we found mean error to
be about 28% and 72% respectively for 2.5th and 97.5th percentile predicted
values.
We assume that uncertainties in modeling had a higher overall effect due to
higher mean error. Therefore, we opted to evaluate model uncertainties, and used
the predicted values across 5th and 95th percentile random forest models and com-
pared the concentrations in residential and commercial areas (Figure C.4).
242
Figure C.4: Boxplots for daily predicted PM2.5 for residential (plots with
solid colors) and commercial (plots with diagonal lines) land-use type
separately. Blue and orange boxplots refer to annual average predicted
PM2.5 concentrations when random forest models noted 5th and 95th
percentile concentrations, instead of mean concentrations (pink box-
plots).
Average and median concentrations at commercial area was higher than con-
centrations at residential areas across 5th percentile, mean and 95th percentile (Fig-
ure C.4). As such, for pink boxes (mean; Figure C.4), both median and average
concentration at commercial areas were 6% higher than at residential areas. Both
average and median concentration at commercial areas were 1% higher than res-
idential areas for the 5th percentile model runs. Similarly, average and median
concentration at commercial grids were 22% and 18% higher for 95th percentile
243
model runs.
Although the absolute value between models (5th , mean and 95th percentile;
Figure C.4) are different, the average concentration at commercial areas were al-
ways higher when compared to residential areas. As such, addressing the uncer-
tainties strengthen our argument that average PM2.5 concentrations that the popu-
lation was exposed to was underreported when movement of the population isn’t
considered.
C.7 Static and Dynamic Models
Figure C.5: Boxplots for static and dynamic models when α = 12 and β = 18
hours in Equations 4 and 5 of the main manuscript.
244
Appendix D
Chapter 6
D.1 Population
Figure D.1: Population across each Dissemination Block [214].
245
Error Fraction % D.2
50
-50
100
150
200
250
300
350
400
450
500
0
<10th [0, 1.46)
10-20th [1.46, 2.04)
20-30th [2.04, 2.68)
30-40th [2.68, 3.32)
40-50th [3.32, 4.07)
50-60th [4.07, 4.92)
60-70th [4.92, 5.97)
% Error
70-80th [5.97, 7.53)

80-90th [7.53, 9.75)
PM2.5
>90th [9.75, 31.58)
Error Fraction %
50
-50
100
150
200
250
300
0
<10th [0, 7.07)
10-20th [7.07, 10.30)
20-30th [10.30, 12.29)
30-40th [12.29, 14.13)
246
40-50th [14.13, 16.25)
50-60th [16.25, 18.84)
60-70th [18.84, 22.00)
70-80th [22.00, 25.99)
Percentile bin
80-90th [25.99, 30.33)
NO2
>90th [30.33, 74.03)
Error Fraction %
500
0
1000
1500
2000
2500
<10th [0, 0.34)

10-20th [0.34, 0.64)
20-30th [0.64, 2.30)
30-40th [2.30, 6.09)
40-50th [6.09, 11.26)
50-60th [11.26, 14.87)
60-70th [14.97, 18.25)
70-80th [18.25, 21.87)
80-90th [21.87, 27.47)
O3
Figure D.2: Boxplots for % error of each pollutant across each decile bin.
>90th [27.47, 41.47)

D.3 Typical Diurnal Pattern
Figure D.3: Diurnal pattern observed over the study period (for RAMP
1011).
247
D.4 RAMPs vs MV comparison
Table D.1: Individual RAMPs with corresponding number of days days when
the error-informed dataset exceeded average regional MV concentra-
tions. Total number of days when the data was assessed is listed in the
last column.
RAMP PM2.5 NO2 O3 Total

1001 51 95 0 100
1002 53 111 28 119
1004 30 99 28 119
1005 18 82 55 96
1008 41 88 19 119
1009 35 110 17 119
1011 22 42 42 108
1012 5 92 29 96
1039 17 39 34 59
1040 13 96 33 119 (PM=28)
1041 25 85 62 93
D.5 Descriptive Statistics
Table D.2: Descriptive Statistics for each pollutant (across all RAMPs) and
calculated CHI (across all DBs).
Mean Median Std. Dev. COV Min.a Max.b

PM2.5 (µg/m3 ) 4.57 4.01 3.39 0.07 1.37 16.18
NO2 (ppb) 21.72 21.51 5.68 0.26 10.32 36.32
O3 (ppb) 28.14 27.86 8.89 0.32 9.82 51.86
Additive CHI 2.99 2.99 0.16 0.05 2.60 3.43
Multiplicative CHI 0.99 0.99 0.16 0.16 0.61 1.44
Std. Dev.: Standard Deviation; COV: Coefficient of Variation (Mean/Std. Dev.)
a: 1st percentile; b: 99th percentile
248
D.6 Aggregate Multiplicative and Additive CHI
Figure D.4: Aggregated Multiplicative CHI across the study period.
Figure D.5: Aggregated Additive CHI across the study period.
249
D.7 Wind Speed and Direction
Figure D.6: Windrose for the direction and speed of wind during the study
period.
250

ubc_2023_november_jain_sakshi.pdf

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ubc_2023_november_jain_sakshi.pdf

Uploaded by

Copyright:

Available Formats

Low-cost Air Quality Sensors: From Nuts & Bolts to Real

B. Tech., Center of Environmental Planning and Technology, 2016

A THESIS SUBMITTED IN PARTIAL FULFILLMENT

THE FACULTY OF GRADUATE AND POSTDOCTORAL

The University of British Columbia

© Sakshi Jain, 2023

Additional Supervisory Committee Members:

Recent advancements in low-cost sensor (LCS) technology have presented a new

and affordable opportunity to understand and subsequently improve air quality.

nology, including calibrating the sensors, using sensors to build spatiotemporal

In Chapter 3, a general calibration method for commercially available low-

models can be transferable to large geographical areas, especially in areas with

by 30% in measurements. Chapter 4 used data from a network of 50 LCS de-

ployed in Pittsburgh (Pennsylvania, USA) to build daily average land-use regres-

an increase in average externally cross-validated R2 of 0.10-0.19. Models built

(population spends 24 hours/day in a fixed residential area) and dynamic models

mate variations in residents’ exposures to PM2.5 due to movement. The exposure

time in commercially-dense locations (dynamic model) vs residentially-dense lo-

a network of 11 LCS deployed in an environmental injustice neighborhood in Van-

sis and provides insight into key design deployment considerations.

quality monitoring instruments has limited their widespread deployment, resulting

in a limited number of monitoring stations in most cities and a lack of coverage in

published manuscripts in scholarly journals and conference proceedings.

A version of Chapter 3 is in preparation for submission. The results of this

ference, Pasadena, California. I conceptualized the study, conducted the literature

Chapter 4 contains a paper published in the peer-reviewed journal Environment

Modeling of Daily PM2.5 , NO2 , and CO Concentrations Measured by a Low-Cost

Models. Environmental Science & Technology, 2021 55 (13), 8631-8641.” The

American Association for Aerosol Research (AAAR) Annual Conference, Port-

this work and provided critical feedback for the manuscript.

A version of Chapter 5 is in preparation for submission. The results of this

Annual Conference of the International Society for Environmental Epidemiology,

and provided critical feedback for the manuscript.

A version of Chapter 6 is in preparation for submission. The field work de-

of this project. AG provided insights on the methodology and NZ provided critical

feedback on the manuscript.

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . 1

1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Sources of Ambient Air Pollution . . . . . . . . . . . . . . . . . 6

2.2 Air Quality Monitoring . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Air Pollution Regulation . . . . . . . . . . . . . . . . . . 9

2.2.2 Existing Infrastructure and its Limitations . . . . . . . . . 10

2.2.3 Overview: Low-Cost Sensors . . . . . . . . . . . . . . . 12

2.2.4 Operating Principles of Low-Cost Sensors . . . . . . . . . 13

2.2.5 Challenges with Low-cost Sensors and Calibration Tech-

2.3 Extending Sensor Data: Spatiotemporal Models . . . . . . . . . . 22

2.4 Extending Spatiotemporal Models: Exposure Assessments . . . . 25

2.5 Extending Spatiotemporal Models: Hotspot Identification . . . . . 27

2.6 Environmental Justice and Air Pollution . . . . . . . . . . . . . . 28

3 Exploration of intra-city and inter-city PM2.5 regional calibration

models to improve low-cost sensor performance . . . . . . . . . . . 33

3.1 Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5.1 Low-cost Sensor: PurpleAir . . . . . . . . . . . . . . . . 38

3.5.2 Data Collection and Processing . . . . . . . . . . . . . . 40

3.5.4 Model Building . . . . . . . . . . . . . . . . . . . . . . . 46

3.5.5 Intra-city Models . . . . . . . . . . . . . . . . . . . . . . 48

3.5.6 Inter-city Models . . . . . . . . . . . . . . . . . . . . . . 49

3.5.7 Validation Testing of the Method . . . . . . . . . . . . . . 51

3.5.8 Performance Metrics . . . . . . . . . . . . . . . . . . . . 52

3.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 53