Manuscript

Exploring the Potential of Vegetation Indices for Urban Tree
Segmentation in Street View Images

This paper was downloaded from TechRxiv (https://www.techrxiv.org).
LICENSE
CC BY 4.0
SUBMISSION DATE / POSTED DATE
20-01-2023 / 24-01-2023
CITATION
Arevalo-Ramirez, Tito; Alfaro, Anali; Saavedra, José M.; Recabarren, Matías; Ponce-Donoso, Mauricio;
Delpiano, José (2023): Exploring the Potential of Vegetation Indices for Urban Tree Segmentation in Street
View Images. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21933291.v1
DOI
10.36227/techrxiv.21933291.v1
JOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 1
Exploring the Potential of Vegetation Indices for

Urban Tree Segmentation in Street View Images
Tito Arevalo-Ramirez, Anali Alfaro, Jose M. Saavedra, Matı́as Recabarren, Mauricio Ponce-Donoso, José
Delpiano
1 Abstract—Urban forests play a crucial role in the development of

2 cities because of the urban ecosystem services they provide. Previous
3 works have alleviated urban forest monitoring by discriminating tree
4 species and performing tree inventories using street view images and
5 convolutional neural networks. However, the characterization of trees
6 from street-view images remains a challenging task. Determining
7 tree structural parameters has been limited because of inaccurate
8 tree segmentation caused by combined, occluded, or leaf-off trees.
9 Therefore, the current work evaluates the potential of vegetation in-
10 dices derived from red, green, blue, and synthesized near-infrared and Fig. 1: Tree segmentation challenges. Tree of interest in en-
11 red-edge spectral bands for urban tree segmentation. In particular, closed by cyan lines. Unwanted objects are shown by magenta
12 we attempt to show whether or not vegetation indices add relevant regions
13 information to deep neural segmentation networks when there are
14 low fine-tuning training samples. A conditional adversarial network
15 generates red-edge and near-infrared images in urban environments, (e.g., 18 months) and a high amount of money [6]. In order 46
16 which retrieve an average structural similarity index of 0.86 and
17 0.81, respectively. Furthermore, we note that by using appropriate to alleviate the characterization and evaluation of urban trees, 47
18 multispectral vegetation indices, one can boost the average inter- previous works have proposed different artificial intelligence- 48
19 section over the union between 5.07 % to 13.7 %. Specifically, we based strategies; see [6], [7], [8], [9], [10], [11], [12] and 49
20 suggest the SegFormer segmentation network pre-trained with the the references therein. It is essential to highlight that before 50
21 CityScapes dataset and Red Edge Modified Simple Ration index for mentioned works use street view images in the electromagnetic 51
22 improving urban tree segmentation. However, if no multispectral data
23 is available, the DeepLabV3 network pre-trained with the ADE20k spectrum’s Red, Green, and Blue (RGB) bands. 52
24 dataset is suggested because it could achieve the best RGB outcomes Most previous works focus on identifying tree species by 53
25 average IoU value of 0.671. first detecting trees using object detection algorithms (e.g., 54
26 Index Terms—Urban trees, Semantic Segmentation, Image to You Only Look Once, YOLO, deep learning approach). In- 55
27 Image translation, Multispectral Features, Neural Networks formation about tree species and the number of individuals 56
significantly alleviates tasks related to urban tree inventories 57
28 I. I NTRODUCTION [6], [13]. Nevertheless, tree characterization remains an as 58
challenging task because it usually depends on pixel-wise 59

29
30
31
U Rban forests have become essential in developing sus-
tainable cities in the last decades. Air and water qual-
ity control, microclimate regulation, or carbon sequestration,
identification of trees, which is commonly obscured by tree
occlusion or crown combinations, see Fig. 1. In particular,
60
61
based on the literature review, we found that only a few works 62

32 among other ecosystem services, are usually determined by
are pursuing to tackle the segmentation of urban trees and the 63
33 the characterization of urban trees [1], [2]. For instance, crown
computation of their dendrometric parameters [8], [10]. 64
34 projection area (area under tree dripline) and leaf area are used
The reference [8] aims to automatically determine tree 65
35 to calculate rainfall interception for estimating stormwater-
profile information such as the tree height, diameter at breast 66
36 runoff reduction benefits [3]. Moreover, urban trees can be
height (DBH), and tree species. Specifically, the methodology 67
37 associated with social inequality and facilitate its quantifica-
proposed starts detecting trees within a bounding box using 68
38 tion and enhancement [4]. Further, urban trees can be used
YOLOv3. Next, pixel-wise segmentation is performed for the 69
39 to retrieve economic compensation metrics for communities
detailed identification of trees. Panoptic-Deeplab framework 70
40 and local governments [5]. In this sense, proper management
performs the semantic segmentation using the Cityscapes 71
41 and characterization of urban trees enable environmental and
dataset for training and validation stages [14]. Despite the 72
42 socioeconomic earnings.
fact that the authors achieve acceptable outcomes, the main 73
43 In situ measurements of tree dendrometric parameters con-
drawback is that the model is evaluated using ideal tree 74
44 stitute the traditional and most accurate approach for determin-
images. In particular, ideal tree images are the ones in which 75
45 ing tree characteristics. Nevertheless, they require long periods
trees are detectable without being occluded or overlapped by 76
Tito Arevalo-Ramirez, Anali Alfaro, José Saavedra, Matı́as Recabarren, and obstacles or other trees. 77
José Delpiano are with the Faculty of Engineering and Applied Sciences, The research presented by [10] detects trees using the 78
Universidad de los Andes, Santiago, Chile
Mauricio Ponce-Donoso is with the Sociedad Chilena de Arboricultura, YOLO network with MobileNet as the backbone and computes 79
Santiago, Chile their height by pixel coordinates of the tree bounding box. 80
81 Note that this work handles tree occlusion and multiple tree
82 bounding boxes in the dataset generation stage. Images that
83 compose the dataset are taken at a distance that frame up the
84 tree of interest. Further, a person holding a reference object
85 (object with known dimensions) stands up close to that tree. In
86 particular, the reference object has two purposes. The former
87 serves as a scale for determining tree height, and the second
88 alleviates the identification of the tree of interest. The tree
89 under analysis is selected as the tree closest to the reference
90 object. Even though the proposed data acquisition protocol
91 could alleviate the recognition of tree-of-interest, it might
92 share some of the field survey disadvantages. For instance, data
93 collection could be prone to long periods and require trained Fig. 2: Field sampling scheme. The tree of interest is repre-
94 staff because two people are required, one to take photographs sented in cyan. The binary mask is generated using an image
95 of the trees and the other to carry up a reference object and manipulation program [21].
96 stand next to the tree of interest.
97 Based on the research works mentioned above, a reliable II. DATA ACQUISITION 136
98 segmentation of the tree to be assessed must be available
The RGB images for this study were obtained by vol- 137
99 for an appropriate tree characterization. However, large urban
unteers using smartphones under the project Arbocensus in 138
100 datasets which include tree instances have the disadvantage
Santiago metropolitan region, Chile. In particular, urban trees 139
101 that they do not discriminate individual occurrences of trees
were mapped from Las Condes and La Reina communes. 140
102 [14], [15]. Moreover, as mentioned by [8], it is challenging to
Santiago region’s climate can be described as Continental 141
103 generate a customized semantic segmentation dataset compa-
Mediterranean, with average coldest and warmest temperatures 142
104 rable to object detection datasets [16], in terms of training
of about 9.4 and 21.9 Celcius, respectively. Further, per 143
105 samples, due to semantic segmentation requires pixel-wise
year, precipitation is around 275 millimeters [20]. Regarding 144
106 labeled instances, which is an arduous task. Therefore, in the
urban tree photographs, volunteers (citizen scientists) could 145
107 present work, we explore a strategy to improve the segmenta-
capture about three thousand images of about one hundred 146
108 tion of urban trees by two state-of-the-art deep neural networks
species. However, we randomly chose one hundred images to 147
109 (i.e., DeepLabV3 and SegFormer [17], [18]). Both networks
evaluate whether vegetation indices could improve urban tree 148
110 are fine-tuned using a limited set of training, validation, and
segmentation when using small number og tree samples. A 149
111 testing samples.
detailed process of urban tree sampling is shown in Fig. 2. 150
After the random selection of the one hundred urban tree 151
112 Since the custom dataset we have created has only one hun-
images, the binary masks were created using an image ma- 152
113 dred RGB samples, we attempt to incorporate information into
nipulation program [21]. The primary motivation to employ 153
114 segmentation models by vegetation indices (VIs) computed
the software mentioned above was to create fine binary masks 154
115 from the electromagnetic spectrum’s visible, red-edge, and
as the one shown in Fig. 2. The detailed segmentation masks 155
116 near-infrared bands. We hypothesize that vegetation indices
were obtained manually by pixel-wise labeling of the tree of 156
117 could add valuable knowledge about urban trees, which might
interest using free selection and color selection tools. 157
118 not be decoded in the training stage of the segmentation mod-
119 els due to lacking training samples. The belief that vegetation
120 indices could boost urban tree segmentation is supported by a III. M ETHODOLOGY 158
121 previous work, which shows that multispectral indices improve Once we obtain the RGB images and their corresponding 159
122 the classification of ground points in forested regions [19]. binary masks, the VIs are computed. Visible-based VIs are 160
determined straightforwardly using each image’s red, green, 161
123 In this context, we evaluate the behavior and segmentation and blue channels. Regarding multispectral indices, they are 162
124 performance of DeepLabV3 and SegFormer models when they generated using synthesized red-edge and near-infrared chan- 163
125 are directly fed with a four-channel image (i.e., RGB channels nels. A supervised image-to-image translation model derived 164
126 plus a vegetation index channel). A total of 19 vegetation by training and validating a conditional adversarial network 165
127 indices were computed, ten based on visible bands and nine generates the synthesized red-edge and near-infrared channels. 166
128 on red-edge and near-infrared bands. The later indices are Next, we compute 19 sets of four-channel images (i.e., red, 167
129 calculated using multispectral data synthesized from the RGB green, blue, and VI) to evaluate if any proposed VI improves 168
130 channels using a conditional adversarial network. By integrat- the performance of the segmentation models. The segmen- 169
131 ing vegetation indices during the training process, we have tation outcomes using RGB images are used as reference. 170
132 found that one could improve the segmentation performance Furthermore, it is essential to highlight that deep neural 171
133 by about 13.7 % when using an appropriate deep neural networks are pre-trained with two urban datasets (ADE20k 172
134 network and vegetation index, even if it is computed using and CityScapes), which include urban tree instances. Figure 3 173
135 multispectral synthesized data. shows the general scheme of the proposed methodology. 174
Fig. 3: General scheme of the proposed methodology for evaluating visible and synthesized red-edge and near-infrared vegetation
indices. Description of visible and multispectral indices is shown in Table I. Light gray block describe the computation of
vegetation indices based on visible and predicted multispectral bands. Gray block pictures the fine-tuning of pre-trained
segmentation networks. Note that I2I refers to image-to-image translation model.
175 A. Visible, Red-Edge, and Near-Infrared Processing In this sense, nineteen sets of one hundred images, which 211
retrieve knowledge about red, green, blue, channels, and a 212

176 Before explaining how to compute the VIs, we start by VI, are generated. In particular, we use four-channel images 213
177 describing the procedure for determining the artificial red-edge where the first three channels correspond to red, green, and 214
178 and near-infrared channels. blue information, and the fourth channel provides knowledge 215
179 1) Conditional Adversarial Network: is a network that about a specific vegetation index. 216
180 determines a model for mapping a pixel from a source
181 image to a target image; see [22] for more details. It has
182 been demonstrated that image-to-image translation models can B. Segmentation models 217
183 predict the near-infrared channel from aerial crop images [23]. The pixel-wise identification is performed by two state-of- 218
184 Nevertheless, in our case, the source and target images are the-art semantic segmentation deep networks implemented in 219
185 street view RGB and red-edge/near-infrared images, respec- a PyTorch open-source toolbox MMSegmentation [27]. 220
186 tively. Note that the RGB to multispectral channels mapping 1) DeepLabV3: is a deep convolutional neural network that 221
187 is learned by two different models, one for red-edge and the exploits the potential of atrous convolution for improving its 222
188 other for near-infrared. Each model is trained, validated, and performance in semantic image segmentation tasks [17]. This 223
189 tested from scratch using a hyperspectral city dataset [24]. segmentation model achieves a mean Intersection over Union 224
190 Specifically, we use 1054 and 100 images to train and validate (mIoU) of 42.42% and 79.09% on ADE20k and Cityscapes 225
191 each model. Both stages are performed using one thousand datasets, respectively. Table II shows the computation of 226
192 epochs with default parameters, UNet as the generator and intersection over union. 227
193 PatchGAN as the discriminator. Then, red-edge and near- 2) SegFormer: is a semantic segmentation deep network 228
194 infrared mapping models are evaluated using the structural that unifies transformers with lightweight multilayer percep- 229
195 similarity index measure (SSIM) with 176 images not seen in tron decoders, which avoids complex decoders [18].When 230
196 the training or validation stages. using ADE20k and Cityscapes datasets, SegFormer yields a 231
197 After the red-edge and near-infrared mapping models are mIoU of about 37.85% and 76.54%, correspondingly. 232
198 determined, we use them to compute multispectral channels Note that segmentation networks are pre-trained with three 233
199 using our dataset’s RGB images. Next, the VIs are calcu- channels images (i.e., red, green, and blue). A detailed de- 234
200 lated as operations and transformations between visible or scription of pre-training stages and models can be found in 235
201 multispectral channels. We have chosen ten indices based on [27]. 236
202 RGB channels, and nine on multispectral channels, which have 3) Training and Validation: Using our dataset, we took 237
203 been reported in previous works for evaluating the status of advantage of the transfer learning procedure for training, 238
204 vegetation [25]. validating, and fine-tuning segmentation models. Specifically, 239
205 2) Visible Vegetation Indices: These indices are computed we use 74, 16, and 10 images for training, validating, and 240
206 using only the red, green, and blue channels of the electro- testing the models. Note that default parameters are used for 241
207 magnetic spectrum. Table I shows the description of each VI. the stages mentioned above. Nevertheless, it is important to 242
208 3) Multispectral Vegetation Indices: Multispectral indices note that when using four channels images, the networks’ input 243
209 use bands in the visible, red-edge, and near-infrared bands to is set to four. Further, 15 thousand iterations with batch size 244
210 estimate the status of vegetation. Table I explains each index. of one are employed for fine-tuning the segmentation models. 245
TABLE I: Visible and multispectral VIs. Where ρα is the reflectance in the α band. The red, green, blue, red-edge, and
near-infrared channels are represented by R, G, B, RE, and N IR, respectively. References of each vegetation index equation
can be found in [19], [25], [26]
Vegetation Index Equation
Visible
Color Index of Vegetation Extraction (cive) 18.78745 + 0.44ρR − 0.88ρG + 0.385ρB
Excess Green Index (exg) 2ρG − ρR − ρB
Excess Red Index (exr) 1.4ρR − ρG
Excess Green Minus Red Index (exgr) exg - exr
2ρG −ρR −ρB
Green Leaf Index (gli) 2ρG +ρR +ρB
ρ2 2
G −ρR
Modified Green Red Vegetation Index (mgrvi)
ρ2 +ρ2
G R
ρG −ρR
Modified Photochemical Reflectance Index (mpri)
ρG +ρR
ρ −ρ
Normalized Difference Index (ndi) 128 ρG +ρR + 1
G R
ρ2
g −ρR ρB
Red Green Blue Vegetation Index (rgbvi) ρ2
g +ρR ρB
Triangular Greenness Index (tgi) 0.5((ρR − ρB) − (ρR − ρG )) − ((ρR − ρG ) − (ρR − ρB ))
Multispectral
Chlorophyll Absorption Reflectance Index (cari) (ρRE − ρR ) − 0.2(ρRE − ρG )
ρN IR −ρR
Enhanced Vegetation Index (evi) 2.5 ρ +6ρ −7.5ρ +1
N IR R B
ρN IR −ρG
Green Normalized Difference Vegetation Index (gndvi) ρN IR +ρG
ρRE ((ρRE −ρR )−0.2(ρRE −ρG ))
Modified CARI (mcari) ρR
√
2ρN IR +1− 2ρN IR +1−8(ρN IR −ρR )
Modified Soil Adjusted Vegetation Index (msavi) 2
ρN IR −ρR
Normalized Difference Vegetation Index (ndvi) ρN IR +ρR
1.16(ρN IR −ρR )
Optimization of Soil Adjusted Vegetation Index (osavi) ρN IR +ρR +0.16
ρN IR /ρRE −1
Red Edge Modified Simple Ratio (remsr) √
ρN IR /ρRE +1
ρN IR −ρRE
Red Edge Normalized Difference Vegetation Index (rendvi) ρN IR +ρRE
TABLE II: Quantitative metrics for evaluating the performance

of CRF. Where ToI is the tree of interest, IoU refers to the
intersection over the union, p is the precision, and r the recall.
Predicted Metric
ToI non–ToI IoU = a/(a + b + c)
ToI a b p = a/(a + c)
Ground-truth (a) Red-Edge: Worst, SSIM = 0.14
non–ToI c d r = a/(a + b)
246 After training and validating segmentation models using

247 our dataset, we evaluate them using IoU quantitative metric
248 detailed in Table II and the testing set with samples that have
249 not been seen in previous stages.
(b) Near-Infrared: Worst, SSIM = 0.09
250 IV. R ESULTS Fig. 4: Worst cases for qualitative results of conditional
adversarial network for red-edge and near-infrared channels.
251 For training, validating, and testing the image-to-image Both of them are associated with conditions of extremely low
252 translation model, we extracted the reflectance at 718 nm illumination.
253 (red-edge) and 840 nm (near-infrared) from the hyperspectral
254 city dataset. These bands are close to the ones used by the
255 MicaSense redEdge multispectral camera. The performance of
256 the conditional adversarial network is described in Table III.
multispectral channel using our dataset. It should be high- 262
257 The worst prediction outcomes are shown in Fig. 4.
lighted that poorly illuminated environments yield the worst 263
258 Since the average SSIM is over 0.8 for the synthesized
outcomes, specifically in dark images with no light sources. 264
259 red-edge and near-infrared channels and no previous works
Nevertheless, this reconstruction behavior might not occur in 265
260 evaluate this information in urban tree segmentation, we used
our dataset because all images were captured on sunny days. 266
261 the corresponding image-to-image models for predicting the
Once the multispectral channels are determined, vegetation 267
indices’ computation is straightforward by equations in Table 268

TABLE III: Structural similarity index measure outcomes for I. Then, the RGB and VIs information is fed to segmentation 269
the Image-to-image translation using 176 testing samples from models. The behavior of these models on the testing set is 270
hyperspectral city dataset shown in Table IV. 271
Red-Edge Near-Infrared Figure 5 shows qualitative outcomes of the best and worst
Metric 272
avg std avg std segmentation instances when using the fine-tuning set that 273
SSIM 0.86 0.14 0.81 0.17 helps to enhance segmentation performance. 274
TABLE IV: Segmentation performance of the deep neural networks using IoU metric. Description of the visible and multispectral
VIs is in table I. Bold values are the ones that achieve the best average IoU metric.
Testing samples
Model Pretraining Set avg
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
RGB 0.859 0.922 0.718 0.632 0.486 0.643 0.453 0.736 0.526 0.739 0.671
ADE20k
RGB&exg 0.830 0.894 0.776 0.693 0.490 0.576 0.470 0.741 0.579 0.723 0.677
RGB&gndvi 0.900 0.904 0.697 0.668 0.486 0.611 0.524 0.719 0.561 0.751 0.682
RGB&ndvi 0.864 0.926 0.729 0.675 0.481 0.630 0.519 0.706 0.540 0.724 0.679
RGB 0.843 0.921 0.646 0.584 0.475 0.623 0.483 0.687 0.575 0.765 0.660
DeepLabV3
RGB&exg 0.863 0.919 0.716 0.685 0.479 0.561 0.533 0.659 0.612 0.656 0.668
RGB&mpri 0.913 0.921 0.766 0.662 0.504 0.614 0.573 0.725 0.587 0.692 0.696
CityScapes
RGB&tgi 0.898 0.913 0.678 0.595 0.466 0.552 0.568 0.678 0.556 0.787 0.669
RGB&evi 0.862 0.924 0.767 0.720 0.530 0.606 0.537 0.722 0.541 0.746 0.696
RGB&mcari 0.883 0.900 0.756 0.625 0.532 0.625 0.521 0.699 0.520 0.777 0.684
RGB&msavi 0.874 0.861 0.735 0.651 0.502 0.673 0.531 0.652 0.588 0.760 0.683
RGB&osavi 0.871 0.918 0.730 0.736 0.508 0.626 0.533 0.662 0.530 0.787 0.690
RGB&rendvi 0.859 0.909 0.717 0.728 0.510 0.617 0.536 0.754 0.511 0.721 0.686
RGB 0.759 0.925 0.645 0.613 0.534 0.706 0.431 0.642 0.642 0.767 0.667
ADE20k
RGB&rgbvi 0.823 0.911 0.710 0.656 0.496 0.686 0.428 0.707 0.684 0.712 0.681
RGB&cari 0.848 0.922 0.720 0.609 0.454 0.669 0.594 0.781 0.674 0.771 0.704
RGB&mcari 0.862 0.913 0.709 0.621 0.515 0.644 0.478 0.741 0.640 0.676 0.680
RGB 0.627 0.925 0.506 0.695 0.502 0.722 0.440 0.687 0.370 0.728 0.620
RGB&tgi 0.874 0.913 0.596 0.764 0.574 0.679 0.518 0.632 0.172 0.675 0.640
SegFormer
RGB&cari 0.852 0.902 0.563 0.684 0.513 0.649 0.552 0.781 0.349 0.492 0.634
RGB&evi 0.854 0.919 0.498 0.674 0.485 0.700 0.438 0.742 0.580 0.720 0.661
CityScapes
RGB&mcari 0.858 0.916 0.559 0.560 0.498 0.627 0.559 0.624 0.639 0.552 0.639
RGB&msavi 0.858 0.905 0.722 0.606 0.454 0.700 0.428 0.739 0.658 0.765 0.683
RGB&ndvi 0.893 0.872 0.717 0.519 0.492 0.699 0.615 0.688 0.510 0.767 0.677
RGB&osavi 0.863 0.916 0.663 0.548 0.423 0.608 0.521 0.682 0.532 0.790 0.655
RGB&remsr 0.881 0.913 0.787 0.687 0.436 0.642 0.485 0.737 0.675 0.808 0.705
RGB&rendvi 0.867 0.910 0.597 0.700 0.460 0.694 0.428 0.532 0.564 0.776 0.653
275 V. D ISCUSSIONS could not be as informative as the measured one. In this con- 305
text, we encourage future works to evaluate vegetation indices 306
276 The image-to-image translation models can retrieve reliable computed using genuine multispectral channels because they 307
277 red-edge and near-infrared channels, an SSIM greater than could add knowledge not retrieved by synthesized data. Note 308
278 0.8. Nevertheless, one should be aware that in environments that this work is the first research that attempts to improve 309
279 with inadequate illumination, this model fails to provide a fair urban tree segmentation performance using multispectral in- 310
280 representation of multispectral channels; see Fig. 4. Despite a formation. Therefore, the strategy presented could be used as 311
281 previous work [23] achieving SSIM values above 0.9, there a baseline for future works that seek to improve pixel-wise 312
282 are significant differences in the environments mapped. First, identification of urban trees. 313
283 [23] captured aerial photographs from crop fields, which have As expected, the knowledge provided by vegetation in- 314
284 two classes vegetation and terrain. This might alleviate the dices helps to improve the segmentation performance of 315
285 prediction task of near-infrared channels. However, in our DeepLabV3 and SegFormer networks. However, not all 19 sets 316
286 case, street view images of urban trees and objects such (RGB&VI) boost the segmentation behavior. Moreover, the 317
287 as buildings could change the illumination conditions by enhancement depends on the pretraining dataset. For instance, 318
288 introducing unexpected shadows or sparkles that might affect when DeepLabV3 and SegFormer are pre-trained with the 319
289 the spectral reflectance recorded by the hyperspectral sensor. CityScapes dataset, more sets (RGB&VI) yield higher IoU 320
290 On the other side, it is essential to highlight that [23] values than solely RGB images. 321
291 performs a radiometric calibration using a reflectance panel The segmentation outcomes presented in Table IV show that 322
292 to obtain absolute spectral information. In our case, the vegetation indices could improve the segmentation of urban 323
293 hyperspectral city dataset does not acknowledge whether or trees. Specifically, when using DeepLabV3, one could expect 324
294 not the spectral reflectance values are absolute or relative. a boosting in the IoU between 1.64 % and 5.45 % for ADE20k 325
295 Nevertheless, we considered this dataset and outcomes suitable and CityScapes datasets, respectively. For SegFormer, the 326
296 for urban tree segmentation purposes. We could not further enhancement is about 5.55 % and 13.7 % for ADE20k and 327
297 assess synthesized red-edge and near-infrared because our CityScapes, correspondingly. Note that SegFormer achieves 328
298 dataset lacks multispectral data. Moreover, the assessment of the most considerable IoU difference compared to RGB im- 329
299 multispectral reconstruction is out of the focus of the present ages when it uses RGB&remsr information. It has been trained 330
300 work. with the CityScapes dataset; see Table IV. 331
301 The image-to-image translation model’s reconstruction per- Regarding visible and multispectral VIs, the latter could be 332
302 formance might influence the segmentation networks’ per- the ones that add more information for boosting segmentation 333
303 formance. Although the red-edge and near-infrared channels networks. Specifically, EVI and REMSR vegetation indices 334
304 achieve an SSIM over 0.8, the synthesized multispectral data are the ones that allow us to achieve the best segmentation 335
worst outcomes SegFormer achieves lower precision than 362
DeepLabV3. However, since the difference for their best av- 363
erage IoU outcomes is about 0.009, more experiments should 364
be conducted to assess each segmentation network for urban 365
tree segmentation. 366
Although the proposed work alleviates urban tree seg- 367
mentation, this task should still be considered challenging. 368
Specifically, from the testing samples evaluated, the ones that 369
output the lowest IoU values are occluded trees and trees 370
(a) DeepLabV3: pre-trained on CityScapes and fine-tuned
using RGB&evi. The p and r for the best outcome are with combined crowns. Instances of these cases are shown 371
0.949 and 0.973; and for the worst outcome are 0.539 and in Fig. 1. In this sense, future works should focus on fusing 372
0.969. street view, aerial images, and tree georeferences to tackle the 373
still non-solved issues regarding tree segmentation. Moreover, 374
we also suggest performing aerial surveys and applying the 375
methodologies proposed by previous researchers [28], [29] for 376
improving the segmentation of urban trees. 377
Our experiments also revealed that deep neural segmenta- 378
tion networks could decode the information retrieved by veg- 379
etation indices based on visible or synthesized multispectral 380
bands. We infer that because from 19 VIs, just three indices 381
yield an average IoU greater than the one obtained with RGB 382
(b) SegFormer: pre-trained on CityScapes and fine-tuned images when using the ADE20k dataset. This behavior might 383
using RGB&remsr. The p and r for the best outcome are be due to all VIs being determined from a single source of 384
0.931 and 0.979; and for the worst outcome are 0.437 and
0.995. information, RGB images. The network’s ability to decode 385
vegetation indices might depend on the vegetation and tree 386

Fig. 5: Qualitative results of segmentation networks, best and instances available in the training sets. For instance, by pre- 387
worst outcomes are presented. Dark cyan repreents the true training segmentation models with the ADE20k dataset, one 388
positive pixels, dark magenta are false positive pixels, and can achieve the best IoU scores using solely RGB images for 389
light green region shows false negative pixels. Samples S2 fine-tuning these models. Conversely, the CityScapes dataset 390
and S5 achieves the best and worst outcomes. might need to include more vegetation or tree examples for 391
inferring the information retrieved by vegetation indices. 392
Finally, the insight retrieved by the current work alleviates 393

336 performance using DeepLabV3 and SegFormer networks, re-
and guides future works regarding the visible and multispectral 394
337 spectively. Note that the MPRI index, an RGB-based index, assessment of urban trees and related tasks. 395
338 shows performance similar to a multispectral index (EVI); see
339 Table IV. Although segmentation improvements, one should
340 be aware that the best outcome when using RGB&VI data VI. C ONCLUSIONS 396
341 (SegFormer pre-trained with CityScapes) is 5.07 % greater
342 than the best RGB outcome (DeepLabV3 pre-trained with The assessment of visible and multispectral vegetation in- 397
343 ADE20k). These results might suggest that deep neural net- dices shows that the knowledge derived from these indices 398
344 works can decode knowledge of vegetation indices in the train- can improve pixel-wise identification of urban trees. Note that 399
345 ing and validation stages. In particular, the ADE20k dataset multispectral indices are computed using red-edge and near- 400
346 might have a fair number of vegetation and tree instances that infrared channels estimated by an image-to-image translation 401
347 alleviates the transfer learning procedure for segmenting the model. The structural similarity index for red-edge and near- 402
348 tree of interest with small quantity of urban tree samples. infrared channels was 0.86 and 0.81, respectively. Regarding 403
349 Based on those mentioned above, the selection of pretrain- segmentation outcomes, one could improve the IoU score from 404
350 ing samples plays a crucial role in alleviating further steps in 0.620 to 0.705 (13.7 %) by using the SegFormer segmentation 405
351 the pixel-wise identification of urban. In particular, for RGB network pre-trained with the CityScapes dataset and RGB im- 406
352 images, we suggest using the DeepLabV3 network pre-trained ages combined with the Red Edge Modified Simple Ratio in- 407
353 with the ADE20k dataset for future works related to urban dex. However, more experiments with measured multispectral 408
354 tree segmentation because it retrieves the best IoU values information are suggested due to the segmentation improve- 409
355 for RGB images. If multispectral information is available, ments achieved with synthesized red-edge and near-infrared 410
356 the SegFormer model pre-trained with the CityScapes dataset channels. Moreover, if just RGB images are available, We 411
357 should be used due to its performance. advise employing DeepLabV3 pre-trained with the ADE20k 412
358 On the other hand, the differences between DeepLabV3 dataset as the base network for further fine-tuning with the 413
359 and SegFormer are shown in Fig. 5. In particular, for the custom urban trees dataset. Specifically, this configuration 414
360 worst case, the SegFormer network shows a greater region achieves the best IoU values (i.e., 0.671) when it was fine- 415
361 of false positives than DeepLabV3. Further, for the best and tuned with our RGB custom tree dataset. 416
417 VII. ACKNOWLEDGEMENTS [18] Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., and 488
Luo, P., 2021. “Segformer: Simple and efficient design for semantic 489
418 This work is supported by the Agencia Nacional de Investi- segmentation with transformers”. arXiv preprint arXiv:2105.15203. 490
419 gación y Desarrollo (ANID) under grant Fondecyt 11220510, [19] Arevalo-Ramirez, T., Guevara, J., Rivera, R. G., Villacrés, J., Menéndez, 491
O., Fuentes, A., and Cheein, F. A., 2021. “Assessment of multispectral 492
420 FONDEF ID21I10360 and Tree Fund 21-JD-01. JD thankfully vegetation features for digital terrain modeling in forested regions”. 493
421 acknowledges funding from the Advanced Center of Electrical IEEE Transactions on Geoscience and Remote Sensing, 60, pp. 1–9. 494
422 and Electronic Engineering, AC3E (ANID/FB0008). [20] to travel, C., 2022. Climate-Santiago Chile. https://www. 495
climatestotravel.com/climate/chile/santiago. [Online; accessed 20- 496
December-2022]. 497
[21] The GIMP Development Team. Gimp. 498
423 R EFERENCES [22] Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A., 2017. “Image-to-image 499
translation with conditional adversarial networks”. CVPR. 500
424 [1] Gillner, S., Vogt, J., Tharang, A., Dettmann, S., and Roloff, A., 2015.
[23] Aslahishahri, M., Stanley, K. G., Duddu, H., Shirtliffe, S., Vail, S., Bett, 501
425 “Role of street trees in mitigating effects of heat and drought at highly
K., Pozniak, C., and Stavness, I., 2021. “From rgb to nir: Predicting of 502
426 sealed urban sites”. Landscape and Urban Planning, 143, pp. 33–42.
near infrared reflectance from visible spectrum aerial images of crops”. 503
427 [2] Ponce-Donoso, M., Vallejos-Barra, O., Ingram, B., and Daniluk- In Proceedings of the IEEE/CVF International Conference on Computer 504
428 Mosquera, G., 2020. “Urban trees and environmental variables rela- Vision, pp. 1312–1322. 505
429 tionships in a city of central chile”. Arboriculture & Urban Forestry, [24] Huang, Y., Ren, T., Shen, Q., Fu, Y., and You, S., 2021. HSICityV2: 506
430 46(2), pp. 84–95. Urban Scene Understanding via Hyperspectral Images, July. 507
431 [3] McPherson, G., Simpson, J. R., Peper, P. J., Maco, S. E., and Xiao, Q., [25] Jaya, N., Harmayani, K., Widhiawati, I., Atmaja, G., and Jaya, I., 2022. 508
432 2005. “Municipal forest benefits and costs in five us cities”. Journal of “Spatial analysis of vegetation density classification in determining 509
433 forestry, 103(8), pp. 411–416. environmental impacts using uav imagery”. ISPRS Annals of the 510
434 [4] Schwarz, K., Fragkias, M., Boone, C. G., Zhou, W., McHale, M., Grove, Photogrammetry, Remote Sensing and Spatial Information Sciences, 3, 511
435 J. M., O’Neil-Dunne, J., McFadden, J. P., Buckley, G. L., Childers, pp. 417–422. 512
436 D., et al., 2015. “Trees grow on money: urban tree canopy cover and [26] Fu, H., Wang, C., Cui, G., She, W., and Zhao, L., 2021. “Ramie yield 513
437 environmental justice”. PloS one, 10(4), p. e0122051. estimation based on uav rgb images”. Sensors, 21(2), p. 669. 514
438 [5] Mullaney, J., Lucke, T., and Trueman, S. J., 2015. “A review of benefits [27] Contributors, M., 2020. MMSegmentation: Openmmlab semantic seg- 515
439 and challenges in growing street trees in paved urban environments”. mentation toolbox and benchmark. https://github.com/open-mmlab/ 516
440 Landscape and urban planning, 134, pp. 157–166. mmsegmentation. 517
441 [6] Beery, S., Wu, G., Edwards, T., Pavetic, F., Majewski, B., Mukherjee, S., [28] Wallace, L., Lucieer, A., and Watson, C. S., 2014. “Evaluating tree 518
442 Chan, S., Morgan, J., Rathod, V., and Huang, J., 2022. “The auto arborist detection and segmentation routines on very high resolution uav lidar 519
443 dataset: A large-scale benchmark for multiview urban forest monitoring data”. IEEE Transactions on Geoscience and Remote Sensing, 52(12), 520
444 under domain shift”. In Proceedings of the IEEE/CVF Conference on pp. 7619–7628. 521
445 Computer Vision and Pattern Recognition, pp. 21294–21307. [29] Harikumar, A., Bovolo, F., and Bruzzone, L., 2018. “A local projection- 522
446 [7] Branson, S., Wegner, J. D., Hall, D., Lang, N., Schindler, K., and Perona, based approach to individual tree detection and 3-d crown delineation in 523
447 P., 2018. “From google maps to a fine-grained catalog of street trees”. multistoried coniferous forests using high-density airborne lidar data”. 524
448 ISPRS Journal of Photogrammetry and Remote Sensing, 135, pp. 13–30. IEEE Transactions on Geoscience and Remote Sensing, 57(2), pp. 1168– 525
449 [8] Choi, K., Lim, W., Chang, B., Jeong, J., Kim, I., Park, C.-R., and Ko, 1182. 526
450 D. W., 2022. “An automatic approach for tree species detection and
451 profile estimation of urban street trees using deep learning and google
452 street view images”. ISPRS Journal of Photogrammetry and Remote
453 Sensing, 190, pp. 165–180.
454 [9] Jodas, D. S., Brazolin, S., Yojo, T., De Lima, R. A., Velasco, G. D. N.,
455 Machado, A. R., and Papa, J. P., 2021. “A deep learning-based approach
456 for tree trunk segmentation”. In 2021 34th SIBGRAPI Conference on
457 Graphics, Patterns and Images (SIBGRAPI), IEEE, pp. 370–377.
458 [10] Jodas, D. S., Yojo, T., Brazolin, S., Velasco, G. D. N., and Papa, J. P.,
459 2022. “Detection of trees on street-view images using a convolutional
460 neural network”. International Journal of Neural Systems, 32(01),
461 p. 2150042.
462 [11] Lumnitz, S., Devisscher, T., Mayaud, J. R., Radic, V., Coops, N. C., and
463 Griess, V. C., 2021. “Mapping trees along urban street networks with
464 deep learning and street-level imagery”. ISPRS Journal of Photogram-
465 metry and Remote Sensing, 175, pp. 144–157.
466 [12] Wang, Y., Yan, X., Bao, H., Chen, Y., Gong, L., Wei, M., and Li, J.
467 “Detecting occluded and dense trees in urban terrestrial views with a
468 high-quality tree detection dataset”. IEEE Transactions on Geoscience
469 and Remote Sensing, 60, pp. 1–12.
470 [13] Berland, A., and Lange, D. A., 2017. “Google street view shows promise
471 for virtual street tree surveys”. Urban Forestry & Urban Greening, 21,
472 pp. 11–15.
473 [14] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benen-
474 son, R., Franke, U., Roth, S., and Schiele, B., 2016. “The cityscapes
475 dataset for semantic urban scene understanding”. In Proc. of the IEEE
476 Conference on Computer Vision and Pattern Recognition (CVPR).
477 [15] Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., and Torralba, A.,
478 2017. “Scene parsing through ade20k dataset”. In Proceedings of the
479 IEEE conference on computer vision and pattern recognition, pp. 633–
480 641.
481 [16] Everingham, M., Van Gool, L., Williams, C. K. I., Winn,
482 J., and Zisserman, A. The PASCAL Visual Object Classes
483 Challenge 2012 (VOC2012) Results. http://www.pascal-
484 network.org/challenges/VOC/voc2012/workshop/index.html.
485 [17] Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H., 2017. “Re-
486 thinking atrous convolution for semantic image segmentation”. arXiv
487 preprint arXiv:1706.05587.

Manuscript

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Manuscript

Uploaded by

Copyright:

Available Formats

Exploring the Potential of Vegetation Indices for Urban Tree

Segmentation in Street View Images

SUBMISSION DATE / POSTED DATE

Exploring the Potential of Vegetation Indices for

1 Abstract—Urban forests play a crucial role in the development of

significantly alleviates tasks related to urban tree inventories 57

28 I. I NTRODUCTION [6], [13]. Nevertheless, tree characterization remains an as 58

challenging task because it usually depends on pixel-wise 59

based on the literature review, we found that only a few works 62

determined straightforwardly using each image’s red, green, 161

retrieve knowledge about red, green, blue, channels, and a 212

TABLE II: Quantitative metrics for evaluating the performance

246 After training and validating segmentation models using

indices’ computation is straightforward by equations in Table 268

text, we encourage future works to evaluate vegetation indices 306

worst outcomes SegFormer achieves lower precision than 362

erage IoU outcomes is about 0.009, more experiments should 364

be conducted to assess each segmentation network for urban 365

tree segmentation. 366

Although the proposed work alleviates urban tree seg- 367

mentation, this task should still be considered challenging. 368

still non-solved issues regarding tree segmentation. Moreover, 374

we also suggest performing aerial surveys and applying the 375

methodologies proposed by previous researchers [28], [29] for 376

improving the segmentation of urban trees. 377

Our experiments also revealed that deep neural segmenta- 378

tion networks could decode the information retrieved by veg- 379

etation indices based on visible or synthesized multispectral 380

vegetation indices might depend on the vegetation and tree 386

inferring the information retrieved by vegetation indices. 392

Finally, the insight retrieved by the current work alleviates 393

You might also like