Professional Documents
Culture Documents
LICENSE
CC BY 4.0
20-01-2023 / 24-01-2023
CITATION
Arevalo-Ramirez, Tito; Alfaro, Anali; Saavedra, José M.; Recabarren, Matías; Ponce-Donoso, Mauricio;
Delpiano, José (2023): Exploring the Potential of Vegetation Indices for Urban Tree Segmentation in Street
View Images. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21933291.v1
DOI
10.36227/techrxiv.21933291.v1
JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 1
18 multispectral vegetation indices, one can boost the average inter- previous works have proposed different artificial intelligence- 48
19 section over the union between 5.07 % to 13.7 %. Specifically, we based strategies; see [6], [7], [8], [9], [10], [11], [12] and 49
20 suggest the SegFormer segmentation network pre-trained with the the references therein. It is essential to highlight that before 50
21 CityScapes dataset and Red Edge Modified Simple Ration index for mentioned works use street view images in the electromagnetic 51
22 improving urban tree segmentation. However, if no multispectral data
23 is available, the DeepLabV3 network pre-trained with the ADE20k spectrum’s Red, Green, and Blue (RGB) bands. 52
24 dataset is suggested because it could achieve the best RGB outcomes Most previous works focus on identifying tree species by 53
25 average IoU value of 0.671. first detecting trees using object detection algorithms (e.g., 54
26 Index Terms—Urban trees, Semantic Segmentation, Image to You Only Look Once, YOLO, deep learning approach). In- 55
27 Image translation, Multispectral Features, Neural Networks formation about tree species and the number of individuals 56
30
31
U Rban forests have become essential in developing sus-
tainable cities in the last decades. Air and water qual-
ity control, microclimate regulation, or carbon sequestration,
identification of trees, which is commonly obscured by tree
occlusion or crown combinations, see Fig. 1. In particular,
60
61
Tito Arevalo-Ramirez, Anali Alfaro, José Saavedra, Matı́as Recabarren, and obstacles or other trees. 77
José Delpiano are with the Faculty of Engineering and Applied Sciences, The research presented by [10] detects trees using the 78
Universidad de los Andes, Santiago, Chile
Mauricio Ponce-Donoso is with the Sociedad Chilena de Arboricultura, YOLO network with MobileNet as the backbone and computes 79
Santiago, Chile their height by pixel coordinates of the tree bounding box. 80
JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 2
81 Note that this work handles tree occlusion and multiple tree
82 bounding boxes in the dataset generation stage. Images that
83 compose the dataset are taken at a distance that frame up the
84 tree of interest. Further, a person holding a reference object
85 (object with known dimensions) stands up close to that tree. In
86 particular, the reference object has two purposes. The former
87 serves as a scale for determining tree height, and the second
88 alleviates the identification of the tree of interest. The tree
89 under analysis is selected as the tree closest to the reference
90 object. Even though the proposed data acquisition protocol
91 could alleviate the recognition of tree-of-interest, it might
92 share some of the field survey disadvantages. For instance, data
93 collection could be prone to long periods and require trained Fig. 2: Field sampling scheme. The tree of interest is repre-
94 staff because two people are required, one to take photographs sented in cyan. The binary mask is generated using an image
95 of the trees and the other to carry up a reference object and manipulation program [21].
96 stand next to the tree of interest.
97 Based on the research works mentioned above, a reliable II. DATA ACQUISITION 136
98 segmentation of the tree to be assessed must be available
The RGB images for this study were obtained by vol- 137
99 for an appropriate tree characterization. However, large urban
unteers using smartphones under the project Arbocensus in 138
100 datasets which include tree instances have the disadvantage
Santiago metropolitan region, Chile. In particular, urban trees 139
101 that they do not discriminate individual occurrences of trees
were mapped from Las Condes and La Reina communes. 140
102 [14], [15]. Moreover, as mentioned by [8], it is challenging to
Santiago region’s climate can be described as Continental 141
103 generate a customized semantic segmentation dataset compa-
Mediterranean, with average coldest and warmest temperatures 142
104 rable to object detection datasets [16], in terms of training
of about 9.4 and 21.9 Celcius, respectively. Further, per 143
105 samples, due to semantic segmentation requires pixel-wise
year, precipitation is around 275 millimeters [20]. Regarding 144
106 labeled instances, which is an arduous task. Therefore, in the
urban tree photographs, volunteers (citizen scientists) could 145
107 present work, we explore a strategy to improve the segmenta-
capture about three thousand images of about one hundred 146
108 tion of urban trees by two state-of-the-art deep neural networks
species. However, we randomly chose one hundred images to 147
109 (i.e., DeepLabV3 and SegFormer [17], [18]). Both networks
evaluate whether vegetation indices could improve urban tree 148
110 are fine-tuned using a limited set of training, validation, and
segmentation when using small number og tree samples. A 149
111 testing samples.
detailed process of urban tree sampling is shown in Fig. 2. 150
After the random selection of the one hundred urban tree 151
112 Since the custom dataset we have created has only one hun-
images, the binary masks were created using an image ma- 152
113 dred RGB samples, we attempt to incorporate information into
nipulation program [21]. The primary motivation to employ 153
114 segmentation models by vegetation indices (VIs) computed
the software mentioned above was to create fine binary masks 154
115 from the electromagnetic spectrum’s visible, red-edge, and
as the one shown in Fig. 2. The detailed segmentation masks 155
116 near-infrared bands. We hypothesize that vegetation indices
were obtained manually by pixel-wise labeling of the tree of 156
117 could add valuable knowledge about urban trees, which might
interest using free selection and color selection tools. 157
118 not be decoded in the training stage of the segmentation mod-
119 els due to lacking training samples. The belief that vegetation
120 indices could boost urban tree segmentation is supported by a III. M ETHODOLOGY 158
121 previous work, which shows that multispectral indices improve Once we obtain the RGB images and their corresponding 159
122 the classification of ground points in forested regions [19]. binary masks, the VIs are computed. Visible-based VIs are 160
123 In this context, we evaluate the behavior and segmentation and blue channels. Regarding multispectral indices, they are 162
124 performance of DeepLabV3 and SegFormer models when they generated using synthesized red-edge and near-infrared chan- 163
125 are directly fed with a four-channel image (i.e., RGB channels nels. A supervised image-to-image translation model derived 164
126 plus a vegetation index channel). A total of 19 vegetation by training and validating a conditional adversarial network 165
127 indices were computed, ten based on visible bands and nine generates the synthesized red-edge and near-infrared channels. 166
128 on red-edge and near-infrared bands. The later indices are Next, we compute 19 sets of four-channel images (i.e., red, 167
129 calculated using multispectral data synthesized from the RGB green, blue, and VI) to evaluate if any proposed VI improves 168
130 channels using a conditional adversarial network. By integrat- the performance of the segmentation models. The segmen- 169
131 ing vegetation indices during the training process, we have tation outcomes using RGB images are used as reference. 170
132 found that one could improve the segmentation performance Furthermore, it is essential to highlight that deep neural 171
133 by about 13.7 % when using an appropriate deep neural networks are pre-trained with two urban datasets (ADE20k 172
134 network and vegetation index, even if it is computed using and CityScapes), which include urban tree instances. Figure 3 173
135 multispectral synthesized data. shows the general scheme of the proposed methodology. 174
JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 3
Fig. 3: General scheme of the proposed methodology for evaluating visible and synthesized red-edge and near-infrared vegetation
indices. Description of visible and multispectral indices is shown in Table I. Light gray block describe the computation of
vegetation indices based on visible and predicted multispectral bands. Gray block pictures the fine-tuning of pre-trained
segmentation networks. Note that I2I refers to image-to-image translation model.
175 A. Visible, Red-Edge, and Near-Infrared Processing In this sense, nineteen sets of one hundred images, which 211
183 predict the near-infrared channel from aerial crop images [23]. The pixel-wise identification is performed by two state-of- 218
184 Nevertheless, in our case, the source and target images are the-art semantic segmentation deep networks implemented in 219
185 street view RGB and red-edge/near-infrared images, respec- a PyTorch open-source toolbox MMSegmentation [27]. 220
186 tively. Note that the RGB to multispectral channels mapping 1) DeepLabV3: is a deep convolutional neural network that 221
187 is learned by two different models, one for red-edge and the exploits the potential of atrous convolution for improving its 222
188 other for near-infrared. Each model is trained, validated, and performance in semantic image segmentation tasks [17]. This 223
189 tested from scratch using a hyperspectral city dataset [24]. segmentation model achieves a mean Intersection over Union 224
190 Specifically, we use 1054 and 100 images to train and validate (mIoU) of 42.42% and 79.09% on ADE20k and Cityscapes 225
191 each model. Both stages are performed using one thousand datasets, respectively. Table II shows the computation of 226
192 epochs with default parameters, UNet as the generator and intersection over union. 227
193 PatchGAN as the discriminator. Then, red-edge and near- 2) SegFormer: is a semantic segmentation deep network 228
194 infrared mapping models are evaluated using the structural that unifies transformers with lightweight multilayer percep- 229
195 similarity index measure (SSIM) with 176 images not seen in tron decoders, which avoids complex decoders [18].When 230
196 the training or validation stages. using ADE20k and Cityscapes datasets, SegFormer yields a 231
197 After the red-edge and near-infrared mapping models are mIoU of about 37.85% and 76.54%, correspondingly. 232
198 determined, we use them to compute multispectral channels Note that segmentation networks are pre-trained with three 233
199 using our dataset’s RGB images. Next, the VIs are calcu- channels images (i.e., red, green, and blue). A detailed de- 234
200 lated as operations and transformations between visible or scription of pre-training stages and models can be found in 235
201 multispectral channels. We have chosen ten indices based on [27]. 236
202 RGB channels, and nine on multispectral channels, which have 3) Training and Validation: Using our dataset, we took 237
203 been reported in previous works for evaluating the status of advantage of the transfer learning procedure for training, 238
204 vegetation [25]. validating, and fine-tuning segmentation models. Specifically, 239
205 2) Visible Vegetation Indices: These indices are computed we use 74, 16, and 10 images for training, validating, and 240
206 using only the red, green, and blue channels of the electro- testing the models. Note that default parameters are used for 241
207 magnetic spectrum. Table I shows the description of each VI. the stages mentioned above. Nevertheless, it is important to 242
208 3) Multispectral Vegetation Indices: Multispectral indices note that when using four channels images, the networks’ input 243
209 use bands in the visible, red-edge, and near-infrared bands to is set to four. Further, 15 thousand iterations with batch size 244
210 estimate the status of vegetation. Table I explains each index. of one are employed for fine-tuning the segmentation models. 245
JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 4
TABLE I: Visible and multispectral VIs. Where ρα is the reflectance in the α band. The red, green, blue, red-edge, and
near-infrared channels are represented by R, G, B, RE, and N IR, respectively. References of each vegetation index equation
can be found in [19], [25], [26]
Vegetation Index Equation
Visible
Color Index of Vegetation Extraction (cive) 18.78745 + 0.44ρR − 0.88ρG + 0.385ρB
Excess Green Index (exg) 2ρG − ρR − ρB
Excess Red Index (exr) 1.4ρR − ρG
Excess Green Minus Red Index (exgr) exg - exr
2ρG −ρR −ρB
Green Leaf Index (gli) 2ρG +ρR +ρB
ρ2 2
G −ρR
Modified Green Red Vegetation Index (mgrvi)
ρ2 +ρ2
G R
ρG −ρR
Modified Photochemical Reflectance Index (mpri)
ρG +ρR
ρ −ρ
Normalized Difference Index (ndi) 128 ρG +ρR + 1
G R
ρ2
g −ρR ρB
Red Green Blue Vegetation Index (rgbvi) ρ2
g +ρR ρB
Triangular Greenness Index (tgi) 0.5((ρR − ρB) − (ρR − ρG )) − ((ρR − ρG ) − (ρR − ρB ))
Multispectral
Chlorophyll Absorption Reflectance Index (cari) (ρRE − ρR ) − 0.2(ρRE − ρG )
ρN IR −ρR
Enhanced Vegetation Index (evi) 2.5 ρ +6ρ −7.5ρ +1
N IR R B
ρN IR −ρG
Green Normalized Difference Vegetation Index (gndvi) ρN IR +ρG
ρRE ((ρRE −ρR )−0.2(ρRE −ρG ))
Modified CARI (mcari) ρR
√
2ρN IR +1− 2ρN IR +1−8(ρN IR −ρR )
Modified Soil Adjusted Vegetation Index (msavi) 2
ρN IR −ρR
Normalized Difference Vegetation Index (ndvi) ρN IR +ρR
1.16(ρN IR −ρR )
Optimization of Soil Adjusted Vegetation Index (osavi) ρN IR +ρR +0.16
ρN IR /ρRE −1
Red Edge Modified Simple Ratio (remsr) √
ρN IR /ρRE +1
ρN IR −ρRE
Red Edge Normalized Difference Vegetation Index (rendvi) ρN IR +ρRE
250 IV. R ESULTS Fig. 4: Worst cases for qualitative results of conditional
adversarial network for red-edge and near-infrared channels.
251 For training, validating, and testing the image-to-image Both of them are associated with conditions of extremely low
252 translation model, we extracted the reflectance at 718 nm illumination.
253 (red-edge) and 840 nm (near-infrared) from the hyperspectral
254 city dataset. These bands are close to the ones used by the
255 MicaSense redEdge multispectral camera. The performance of
256 the conditional adversarial network is described in Table III.
multispectral channel using our dataset. It should be high- 262
257 The worst prediction outcomes are shown in Fig. 4.
lighted that poorly illuminated environments yield the worst 263
258 Since the average SSIM is over 0.8 for the synthesized
outcomes, specifically in dark images with no light sources. 264
259 red-edge and near-infrared channels and no previous works
Nevertheless, this reconstruction behavior might not occur in 265
260 evaluate this information in urban tree segmentation, we used
our dataset because all images were captured on sunny days. 266
261 the corresponding image-to-image models for predicting the
Once the multispectral channels are determined, vegetation 267
Red-Edge Near-Infrared Figure 5 shows qualitative outcomes of the best and worst
Metric 272
avg std avg std segmentation instances when using the fine-tuning set that 273
SSIM 0.86 0.14 0.81 0.17 helps to enhance segmentation performance. 274
JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 5
TABLE IV: Segmentation performance of the deep neural networks using IoU metric. Description of the visible and multispectral
VIs is in table I. Bold values are the ones that achieve the best average IoU metric.
Testing samples
Model Pretraining Set avg
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
RGB 0.859 0.922 0.718 0.632 0.486 0.643 0.453 0.736 0.526 0.739 0.671
ADE20k
RGB&exg 0.830 0.894 0.776 0.693 0.490 0.576 0.470 0.741 0.579 0.723 0.677
RGB&gndvi 0.900 0.904 0.697 0.668 0.486 0.611 0.524 0.719 0.561 0.751 0.682
RGB&ndvi 0.864 0.926 0.729 0.675 0.481 0.630 0.519 0.706 0.540 0.724 0.679
RGB 0.843 0.921 0.646 0.584 0.475 0.623 0.483 0.687 0.575 0.765 0.660
DeepLabV3
RGB&exg 0.863 0.919 0.716 0.685 0.479 0.561 0.533 0.659 0.612 0.656 0.668
RGB&mpri 0.913 0.921 0.766 0.662 0.504 0.614 0.573 0.725 0.587 0.692 0.696
CityScapes
RGB&tgi 0.898 0.913 0.678 0.595 0.466 0.552 0.568 0.678 0.556 0.787 0.669
RGB&evi 0.862 0.924 0.767 0.720 0.530 0.606 0.537 0.722 0.541 0.746 0.696
RGB&mcari 0.883 0.900 0.756 0.625 0.532 0.625 0.521 0.699 0.520 0.777 0.684
RGB&msavi 0.874 0.861 0.735 0.651 0.502 0.673 0.531 0.652 0.588 0.760 0.683
RGB&osavi 0.871 0.918 0.730 0.736 0.508 0.626 0.533 0.662 0.530 0.787 0.690
RGB&rendvi 0.859 0.909 0.717 0.728 0.510 0.617 0.536 0.754 0.511 0.721 0.686
RGB 0.759 0.925 0.645 0.613 0.534 0.706 0.431 0.642 0.642 0.767 0.667
ADE20k
RGB&rgbvi 0.823 0.911 0.710 0.656 0.496 0.686 0.428 0.707 0.684 0.712 0.681
RGB&cari 0.848 0.922 0.720 0.609 0.454 0.669 0.594 0.781 0.674 0.771 0.704
RGB&mcari 0.862 0.913 0.709 0.621 0.515 0.644 0.478 0.741 0.640 0.676 0.680
RGB 0.627 0.925 0.506 0.695 0.502 0.722 0.440 0.687 0.370 0.728 0.620
RGB&tgi 0.874 0.913 0.596 0.764 0.574 0.679 0.518 0.632 0.172 0.675 0.640
SegFormer
RGB&cari 0.852 0.902 0.563 0.684 0.513 0.649 0.552 0.781 0.349 0.492 0.634
RGB&evi 0.854 0.919 0.498 0.674 0.485 0.700 0.438 0.742 0.580 0.720 0.661
CityScapes
RGB&mcari 0.858 0.916 0.559 0.560 0.498 0.627 0.559 0.624 0.639 0.552 0.639
RGB&msavi 0.858 0.905 0.722 0.606 0.454 0.700 0.428 0.739 0.658 0.765 0.683
RGB&ndvi 0.893 0.872 0.717 0.519 0.492 0.699 0.615 0.688 0.510 0.767 0.677
RGB&osavi 0.863 0.916 0.663 0.548 0.423 0.608 0.521 0.682 0.532 0.790 0.655
RGB&remsr 0.881 0.913 0.787 0.687 0.436 0.642 0.485 0.737 0.675 0.808 0.705
RGB&rendvi 0.867 0.910 0.597 0.700 0.460 0.694 0.428 0.532 0.564 0.776 0.653
275 V. D ISCUSSIONS could not be as informative as the measured one. In this con- 305
276 The image-to-image translation models can retrieve reliable computed using genuine multispectral channels because they 307
277 red-edge and near-infrared channels, an SSIM greater than could add knowledge not retrieved by synthesized data. Note 308
278 0.8. Nevertheless, one should be aware that in environments that this work is the first research that attempts to improve 309
279 with inadequate illumination, this model fails to provide a fair urban tree segmentation performance using multispectral in- 310
280 representation of multispectral channels; see Fig. 4. Despite a formation. Therefore, the strategy presented could be used as 311
281 previous work [23] achieving SSIM values above 0.9, there a baseline for future works that seek to improve pixel-wise 312
282 are significant differences in the environments mapped. First, identification of urban trees. 313
283 [23] captured aerial photographs from crop fields, which have As expected, the knowledge provided by vegetation in- 314
284 two classes vegetation and terrain. This might alleviate the dices helps to improve the segmentation performance of 315
285 prediction task of near-infrared channels. However, in our DeepLabV3 and SegFormer networks. However, not all 19 sets 316
286 case, street view images of urban trees and objects such (RGB&VI) boost the segmentation behavior. Moreover, the 317
287 as buildings could change the illumination conditions by enhancement depends on the pretraining dataset. For instance, 318
288 introducing unexpected shadows or sparkles that might affect when DeepLabV3 and SegFormer are pre-trained with the 319
289 the spectral reflectance recorded by the hyperspectral sensor. CityScapes dataset, more sets (RGB&VI) yield higher IoU 320
290 On the other side, it is essential to highlight that [23] values than solely RGB images. 321
291 performs a radiometric calibration using a reflectance panel The segmentation outcomes presented in Table IV show that 322
292 to obtain absolute spectral information. In our case, the vegetation indices could improve the segmentation of urban 323
293 hyperspectral city dataset does not acknowledge whether or trees. Specifically, when using DeepLabV3, one could expect 324
294 not the spectral reflectance values are absolute or relative. a boosting in the IoU between 1.64 % and 5.45 % for ADE20k 325
295 Nevertheless, we considered this dataset and outcomes suitable and CityScapes datasets, respectively. For SegFormer, the 326
296 for urban tree segmentation purposes. We could not further enhancement is about 5.55 % and 13.7 % for ADE20k and 327
297 assess synthesized red-edge and near-infrared because our CityScapes, correspondingly. Note that SegFormer achieves 328
298 dataset lacks multispectral data. Moreover, the assessment of the most considerable IoU difference compared to RGB im- 329
299 multispectral reconstruction is out of the focus of the present ages when it uses RGB&remsr information. It has been trained 330
300 work. with the CityScapes dataset; see Table IV. 331
301 The image-to-image translation model’s reconstruction per- Regarding visible and multispectral VIs, the latter could be 332
302 formance might influence the segmentation networks’ per- the ones that add more information for boosting segmentation 333
303 formance. Although the red-edge and near-infrared channels networks. Specifically, EVI and REMSR vegetation indices 334
304 achieve an SSIM over 0.8, the synthesized multispectral data are the ones that allow us to achieve the best segmentation 335
JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 6
DeepLabV3. However, since the difference for their best av- 363
Specifically, from the testing samples evaluated, the ones that 369
output the lowest IoU values are occluded trees and trees 370
(a) DeepLabV3: pre-trained on CityScapes and fine-tuned
using RGB&evi. The p and r for the best outcome are with combined crowns. Instances of these cases are shown 371
0.949 and 0.973; and for the worst outcome are 0.539 and in Fig. 1. In this sense, future works should focus on fusing 372
0.969. street view, aerial images, and tree georeferences to tackle the 373
bands. We infer that because from 19 VIs, just three indices 381
yield an average IoU greater than the one obtained with RGB 382
(b) SegFormer: pre-trained on CityScapes and fine-tuned images when using the ADE20k dataset. This behavior might 383
using RGB&remsr. The p and r for the best outcome are be due to all VIs being determined from a single source of 384
0.931 and 0.979; and for the worst outcome are 0.437 and
0.995. information, RGB images. The network’s ability to decode 385
343 ADE20k). These results might suggest that deep neural net- dices shows that the knowledge derived from these indices 398
344 works can decode knowledge of vegetation indices in the train- can improve pixel-wise identification of urban trees. Note that 399
345 ing and validation stages. In particular, the ADE20k dataset multispectral indices are computed using red-edge and near- 400
346 might have a fair number of vegetation and tree instances that infrared channels estimated by an image-to-image translation 401
347 alleviates the transfer learning procedure for segmenting the model. The structural similarity index for red-edge and near- 402
348 tree of interest with small quantity of urban tree samples. infrared channels was 0.86 and 0.81, respectively. Regarding 403
349 Based on those mentioned above, the selection of pretrain- segmentation outcomes, one could improve the IoU score from 404
350 ing samples plays a crucial role in alleviating further steps in 0.620 to 0.705 (13.7 %) by using the SegFormer segmentation 405
351 the pixel-wise identification of urban. In particular, for RGB network pre-trained with the CityScapes dataset and RGB im- 406
352 images, we suggest using the DeepLabV3 network pre-trained ages combined with the Red Edge Modified Simple Ratio in- 407
353 with the ADE20k dataset for future works related to urban dex. However, more experiments with measured multispectral 408
354 tree segmentation because it retrieves the best IoU values information are suggested due to the segmentation improve- 409
355 for RGB images. If multispectral information is available, ments achieved with synthesized red-edge and near-infrared 410
356 the SegFormer model pre-trained with the CityScapes dataset channels. Moreover, if just RGB images are available, We 411
357 should be used due to its performance. advise employing DeepLabV3 pre-trained with the ADE20k 412
358 On the other hand, the differences between DeepLabV3 dataset as the base network for further fine-tuning with the 413
359 and SegFormer are shown in Fig. 5. In particular, for the custom urban trees dataset. Specifically, this configuration 414
360 worst case, the SegFormer network shows a greater region achieves the best IoU values (i.e., 0.671) when it was fine- 415
361 of false positives than DeepLabV3. Further, for the best and tuned with our RGB custom tree dataset. 416
JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 7
417 VII. ACKNOWLEDGEMENTS [18] Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., and 488
Luo, P., 2021. “Segformer: Simple and efficient design for semantic 489
418 This work is supported by the Agencia Nacional de Investi- segmentation with transformers”. arXiv preprint arXiv:2105.15203. 490
419 gación y Desarrollo (ANID) under grant Fondecyt 11220510, [19] Arevalo-Ramirez, T., Guevara, J., Rivera, R. G., Villacrés, J., Menéndez, 491
O., Fuentes, A., and Cheein, F. A., 2021. “Assessment of multispectral 492
420 FONDEF ID21I10360 and Tree Fund 21-JD-01. JD thankfully vegetation features for digital terrain modeling in forested regions”. 493
421 acknowledges funding from the Advanced Center of Electrical IEEE Transactions on Geoscience and Remote Sensing, 60, pp. 1–9. 494
422 and Electronic Engineering, AC3E (ANID/FB0008). [20] to travel, C., 2022. Climate-Santiago Chile. https://www. 495
climatestotravel.com/climate/chile/santiago. [Online; accessed 20- 496
December-2022]. 497
[21] The GIMP Development Team. Gimp. 498
423 R EFERENCES [22] Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A., 2017. “Image-to-image 499
translation with conditional adversarial networks”. CVPR. 500
424 [1] Gillner, S., Vogt, J., Tharang, A., Dettmann, S., and Roloff, A., 2015.
[23] Aslahishahri, M., Stanley, K. G., Duddu, H., Shirtliffe, S., Vail, S., Bett, 501
425 “Role of street trees in mitigating effects of heat and drought at highly
K., Pozniak, C., and Stavness, I., 2021. “From rgb to nir: Predicting of 502
426 sealed urban sites”. Landscape and Urban Planning, 143, pp. 33–42.
near infrared reflectance from visible spectrum aerial images of crops”. 503
427 [2] Ponce-Donoso, M., Vallejos-Barra, O., Ingram, B., and Daniluk- In Proceedings of the IEEE/CVF International Conference on Computer 504
428 Mosquera, G., 2020. “Urban trees and environmental variables rela- Vision, pp. 1312–1322. 505
429 tionships in a city of central chile”. Arboriculture & Urban Forestry, [24] Huang, Y., Ren, T., Shen, Q., Fu, Y., and You, S., 2021. HSICityV2: 506
430 46(2), pp. 84–95. Urban Scene Understanding via Hyperspectral Images, July. 507
431 [3] McPherson, G., Simpson, J. R., Peper, P. J., Maco, S. E., and Xiao, Q., [25] Jaya, N., Harmayani, K., Widhiawati, I., Atmaja, G., and Jaya, I., 2022. 508
432 2005. “Municipal forest benefits and costs in five us cities”. Journal of “Spatial analysis of vegetation density classification in determining 509
433 forestry, 103(8), pp. 411–416. environmental impacts using uav imagery”. ISPRS Annals of the 510
434 [4] Schwarz, K., Fragkias, M., Boone, C. G., Zhou, W., McHale, M., Grove, Photogrammetry, Remote Sensing and Spatial Information Sciences, 3, 511
435 J. M., O’Neil-Dunne, J., McFadden, J. P., Buckley, G. L., Childers, pp. 417–422. 512
436 D., et al., 2015. “Trees grow on money: urban tree canopy cover and [26] Fu, H., Wang, C., Cui, G., She, W., and Zhao, L., 2021. “Ramie yield 513
437 environmental justice”. PloS one, 10(4), p. e0122051. estimation based on uav rgb images”. Sensors, 21(2), p. 669. 514
438 [5] Mullaney, J., Lucke, T., and Trueman, S. J., 2015. “A review of benefits [27] Contributors, M., 2020. MMSegmentation: Openmmlab semantic seg- 515
439 and challenges in growing street trees in paved urban environments”. mentation toolbox and benchmark. https://github.com/open-mmlab/ 516
440 Landscape and urban planning, 134, pp. 157–166. mmsegmentation. 517
441 [6] Beery, S., Wu, G., Edwards, T., Pavetic, F., Majewski, B., Mukherjee, S., [28] Wallace, L., Lucieer, A., and Watson, C. S., 2014. “Evaluating tree 518
442 Chan, S., Morgan, J., Rathod, V., and Huang, J., 2022. “The auto arborist detection and segmentation routines on very high resolution uav lidar 519
443 dataset: A large-scale benchmark for multiview urban forest monitoring data”. IEEE Transactions on Geoscience and Remote Sensing, 52(12), 520
444 under domain shift”. In Proceedings of the IEEE/CVF Conference on pp. 7619–7628. 521
445 Computer Vision and Pattern Recognition, pp. 21294–21307. [29] Harikumar, A., Bovolo, F., and Bruzzone, L., 2018. “A local projection- 522
446 [7] Branson, S., Wegner, J. D., Hall, D., Lang, N., Schindler, K., and Perona, based approach to individual tree detection and 3-d crown delineation in 523
447 P., 2018. “From google maps to a fine-grained catalog of street trees”. multistoried coniferous forests using high-density airborne lidar data”. 524
448 ISPRS Journal of Photogrammetry and Remote Sensing, 135, pp. 13–30. IEEE Transactions on Geoscience and Remote Sensing, 57(2), pp. 1168– 525
449 [8] Choi, K., Lim, W., Chang, B., Jeong, J., Kim, I., Park, C.-R., and Ko, 1182. 526
450 D. W., 2022. “An automatic approach for tree species detection and
451 profile estimation of urban street trees using deep learning and google
452 street view images”. ISPRS Journal of Photogrammetry and Remote
453 Sensing, 190, pp. 165–180.
454 [9] Jodas, D. S., Brazolin, S., Yojo, T., De Lima, R. A., Velasco, G. D. N.,
455 Machado, A. R., and Papa, J. P., 2021. “A deep learning-based approach
456 for tree trunk segmentation”. In 2021 34th SIBGRAPI Conference on
457 Graphics, Patterns and Images (SIBGRAPI), IEEE, pp. 370–377.
458 [10] Jodas, D. S., Yojo, T., Brazolin, S., Velasco, G. D. N., and Papa, J. P.,
459 2022. “Detection of trees on street-view images using a convolutional
460 neural network”. International Journal of Neural Systems, 32(01),
461 p. 2150042.
462 [11] Lumnitz, S., Devisscher, T., Mayaud, J. R., Radic, V., Coops, N. C., and
463 Griess, V. C., 2021. “Mapping trees along urban street networks with
464 deep learning and street-level imagery”. ISPRS Journal of Photogram-
465 metry and Remote Sensing, 175, pp. 144–157.
466 [12] Wang, Y., Yan, X., Bao, H., Chen, Y., Gong, L., Wei, M., and Li, J.
467 “Detecting occluded and dense trees in urban terrestrial views with a
468 high-quality tree detection dataset”. IEEE Transactions on Geoscience
469 and Remote Sensing, 60, pp. 1–12.
470 [13] Berland, A., and Lange, D. A., 2017. “Google street view shows promise
471 for virtual street tree surveys”. Urban Forestry & Urban Greening, 21,
472 pp. 11–15.
473 [14] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benen-
474 son, R., Franke, U., Roth, S., and Schiele, B., 2016. “The cityscapes
475 dataset for semantic urban scene understanding”. In Proc. of the IEEE
476 Conference on Computer Vision and Pattern Recognition (CVPR).
477 [15] Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., and Torralba, A.,
478 2017. “Scene parsing through ade20k dataset”. In Proceedings of the
479 IEEE conference on computer vision and pattern recognition, pp. 633–
480 641.
481 [16] Everingham, M., Van Gool, L., Williams, C. K. I., Winn,
482 J., and Zisserman, A. The PASCAL Visual Object Classes
483 Challenge 2012 (VOC2012) Results. http://www.pascal-
484 network.org/challenges/VOC/voc2012/workshop/index.html.
485 [17] Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H., 2017. “Re-
486 thinking atrous convolution for semantic image segmentation”. arXiv
487 preprint arXiv:1706.05587.