You are on page 1of 5

APPLICATION OF MACHINE LEARNING IN CALCULATING SEDIMENT

YIELD OF PULANGUI IV HYDROELECTRIC POWER PLANT LOCATED IN


MARAMAG, BUKIDNON

Einstine M. Opiso¹, Ruel C. Wenceslao II¹, Glenn G. Balighot2, Hannah Llay B. Galve2,
Liezel Mae C. Divinagracia2, Clyde Exel T. Encornal2, Niño Joshua G. Ortega2,
Leo Jay S. Maestrado2

¹ Geo-Environmental Research Group, Department of Civil Engineering, Central Mindanao University, Maramag, Bukidnon
2
Research Collaborator, Department of Civil Engineering, Central Mindanao University, Maramag, Bukidnon

Abstract: The sedimentation yield of the catchment areas is becoming a serious threat to the efficiency and lifespan of the
Pulangui IV Hydroelectric Plant. The manual approach on sediment monitoring is complex in manner, time, and labor extensive,
and less economical. To address these problems, this study investigates the application of machine learning in image processing
for siltation calculation of the dam’s lower reservoir, power channel, and surge pool. The developed machine learning model
with the use of aerial imageries were classified into two categories; maintained and silted. The model has achieved an accuracy
of ~0.80 with a validation accuracy of 0.80. The promising results of the trained model would serve as a basis in creating
intervention strategies in enhancing the sediment management system of the hydroelectric power plant.

Key words: machine learning, sediment monitoring, image classification, hydroelectric plant

1 INTRODUCTION value of 112 tons/ha/yr. They added that the observed changes
Machine learning approaches have been used in recent years in data correspond to an increase in runoff (44.58–76.80%)
to help knowledge-based systems in civil engineering and sediment yield (1.33–26.28%) within the sub-basins.
overcome the knowledge acquisition bottleneck. Because of The existing problem gives way to the importance of machine
the heuristic nature of issues, civil engineers often rely on learning and application in developing programs that could
time and resource-saving computational technologies (Deka, predict the outcome with the utilization of past data.
2020). One of the challenges that machine learning could Therefore, the main objective of this study is to examine the
potentially solve is the sediment accumulation. use of machine learning in image processing for sediment
Many catchments were developed across the world for yield calculation using aerial imageries implemented through
various objectives including hydropower generation, Convolution Neural Network (CNN) along Keras API in
irrigation, and flood control. These facilities often suffer a Pulangui IV Hydroelectric Power Plant catchment areas
significant challenge of fast storage capacity decline due to situated in Maramag, Bukidnon.
erosion and sedimentation processes (Njouya et al., 2020).
Sedimentation accumulation can compromise dam safety as 2 METHODOLOGY
well as energy output, storage, discharge capacity, and flood
attenuation. Morris & Fan (1988) claimed that the 2.1 Data Acquisition
accumulation increases the stress on the dam and gates, As required by the model’s (Keras) nature on rendering
damages mechanical equipment, and has a range of additional results, the mixed data was determined specifically the
severe environmental impacts. The life span of the storage numeric, categorical, and image data. For the numeric data,
capacity of the dam is frequently designed to be no less than which includes site-specific data like the depth of the
100 years. As a result, a thorough understanding of storage catchment, it was determined through an existing and present
mitigation is critical for appropriate planning and bathymetry survey accessible in the dam facility office. Aside
management (Sumi & Hirose, 2009). from these figures, the site’s technical characteristics and
A study on Climate Change Impact on the Hydrologic name were also included. The abovementioned data was then
Regimes and Sediment Yield of Pulangui River Basin (PRB) used as a tool or reference for the model to process the image.
for Watershed Sustainability by Panondi and Izumi (2021) The image data were comprised of shots taken at different
found out that the said river with a reach of 320 km that angles and with no less than 16 separate time intervals using
stretches between the Bukidnon Province of Region 10 and a drone. This number of occurrences is the optimal choice
the Cotabato Province of Region 12 has exceeded the because of the time constraints. The drones utilized were the
tolerable amount of sedimentation with the accumulated DJI Mini 2 and Mavic 2 Pro. DJI Mini 2 has a maximum
1
picture resolution of 4000x3000 pixels (4:3) or 4000x2250 2.2 Pre-processing of Data
pixels (16:9), can fly as high as 500 meters vertically and can On loading the numeric and categorical data, the attributes
cover a maximum distance of 2000 meters horizontally. such as the site’s technical properties were defined. The site’s
Mavic 2 Pro has a maximum picture resolution of 5427x3648 location, as represented by its coordinates, as well as the site’s
pixels in still resolution which can go as high as 6000 meters name and dimensions were also considered under the
vertically and a maximum travel distance of 1800 meters categorical data type. After defining the categorical dataset, it
horizontally. More specifically, the shots were taken before was converted into numerical variables by using one-hot
and after major runoff encoding, a function that converts the categorical values into
binary vectors. With this, the numeric and categorical data
were put into a single dataset. The images were preprocessed
through data augmentation like flip, rescaling, zoom, and
shear. Augmentation is necessary to supplement existing
image data with minor alterations to avoid the model from
overfitting to the training data.
After preprocessing the data, two branches were used for
handling the numeric/categorical and image data. The first
branch is the Multi-layer Perceptron (MLP), a type of feed-
forward neural network that handles numeric and categorical
data. The Convolutional Neural Network (CNN) extracted the
Fig. 1 Sample image dataset. features from the input of the images and flattened them into
a vector. After these branches were defined, the model was
In acquiring aerial images, the locations where the drone was trained with the new inputs from the MLP and CNN, the
launched was pre-determined. For the reservoir, the first point combined dataset of numeric and categorical data, and the
was at the Pulangui dam in Panadtalan, Maramag, with flattened vector of image data, respectively. This new data
coordinates of latitude 7.786455° and longitude 125.023847°. was fed into a fully connected network and was used to train
The next point was at Purok 17 Butong, Quezon, with and later on produce a final multi-input Keras model.
coordinates of latitude 7.798682° and longitude 125.047427°.
Tubigon, Maramag was the third location, with coordinates of 2.3 Research Model
7.826099° and 125.026706°. The fourth and the last point in Keras is a neural network Application Programming
the reservoir was Purok 6A Panadtalan, Maramag, with Interphase (API) designed for Phyton programming language
coordinates of latitude 7.806186° and longitude 125.024572°. and used to develop machine learning models.
For the power channel and surge pool, three points were The functional API, as one of the two main models offered by
established. The first point along the power channel was at Keras, was used on this study for it is capable of handling
Purok 8A, North Poblacion, Maramag, with coordinates of multiple inputs and outputs. Keras’ nature as a deep learning
latitude 7.7703° and longitude 125.021028°. The second framework, could accept multiple independent data like
point was located at Purok 2A, North Poblacion, Maramag, numeric, categorical, and image, and utilize them in defining
with latitude 7.76004° and longitude 125.011751°. Lastly, the a multi-input model (Functional API) that is composed of
third point along the power channel was at Purok 1b, Camp 1, Convolutional Neural Network (CNN) and Multi-layer
Maramag, with coordinates of latitude 7.736179° and Perceptron (MLP). Since this study gathered and utilized
longitude 125.012427°. These points were plotted through multiple independent data such as sites’ dimensions and
QGIS application. coordinates as well as the images taken at different angles and
time interval, Keras is best suited as the model for this
research.

Fig. 2 Location points established for image data acquisition


Fig. 3 Keras architecture capable of multiple inputs, including
numerical, categorical, and image data

2
2.4 Model Training o TN (True Negative) – outcomes that were originally
The combined input to the final layers in the network was negative and were predicted negative.
based on the output of both the MLP and CNN branches. The o FP (False Positive) – outcomes that were originally
images were trained on the GPU system NVIDIA GeForce negative but were predicted positive. This error is also
which runs on Windows. Python libraries and dependencies called a type 1 error.
such as Tensorflow, Keras, Matlab and Numpy were o FN (False Negative) – outcomes that were originally
imported to commence the formation of the model. The positive but were predicted negative. This error is also
model class had to be first imported from the model’s module called a type 2 error.
of Keras. The class must then be instantiated, passing the o TP (True Positives) – outcomes that were originally
input and output parameters as variables. The input positive and were predicted positive.
parameters were the input layer already defined, while the The number of correct and incorrect predictions were
output data was the last hidden layer of the model. To build a summarized with count values and broken down by each class
model that shares its input layer as the highlight of the in the confusion matrix.
functional API, the input layer function has been called to The geographical information system (GIS) has been utilized
different hidden layers. Then the user could concatenate it for validating the results acquired on sediment yield
with the concatenate sub-module in Keras. The outputs of calculation using the acquired field-observed data. This
both branches had been combined, and a single output had design method was capable of reducing errors and increasing
been defined. the relative accuracy in the analysis of the model. A more
precise system of GIS with great functions could enable users
to store, retrieve, and update basic information in the form of
layers or information tables. With the use of this system,
human error factors were minimized, and predicting accurate
data on sediment production could be made easier and more
possible.

3 RESULTS AND DISCUSSION

A total of 1, 757 initial dataset images were iterated by CNN


Fig. 4 Convolutional Neural Network architecture gathered by drone shots from the identified location points.
Dataset preparation was done by resizing of image,
The convolution layer, pooling layer, full connected layer, and assignment of training, test and validation dataset,
loss layer are among the architectural layers of a specification of epochs (the number of repetitions in training
convolutional neural network (CNN). Each of these levels the dataset), and batch sizes (amount of dataset trained per
plays a part in the two-dimensional data management process. epoch). The model was initialized by creating layers for input,
Convolution operations are performed at the output of the processing, and output. The 3-channel RGB images were
preceding layer by the convolution layer. The main convoluted layer-by-layer: four layers of convolution,
mechanism that underpins a CNN is this layer. pooling and 2 dense layers, and using ReLU as the hidden
Training variables, creating datasets for training and layer activation function, sigmoid as the final output activate
validation, have been set up to train the model. The model had function. The training epoch was set to 100 and the batch size
been compiled with its loss and an optimizer with learned rate is 20
decay. Training would then begin, which is known as fitting
the model (and was also where all the weights were tuned by
the process known as backpropagation). The model had been
trained by slicing the data into “batches” and repeatedly
iterating over the entire dataset for a given number of epochs.
The returned history object would hold a record of the loss
values and metric values during training. The code for model
prediction had been called on the testing data, which would
allow to grab predictions for evaluating the model.
The model training would proceed until the system had been
able to identify which images had a tolerable amount of silt
and those with an alarming volume. The training data, which
was the important predictive analytics, had been visualized in
the confusion matrix.

2.5. Model Training Validation


The confusion matrix, a binary classification metric, was used
to characterize the classification model’s performance as well Fig. 5 Sample pre-processed images
as a way to summarize classifier performance. It would use
the following abbreviations:
3
Keras conv2d was a constitutional 2D layer for this model, 0.80 was the achieved accuracy of the model developed on
and this layer generated a convolution kernel that was looped training and test datasets. Training dataset has an accuracy of
with layers input, resulting in a tensor of outcomes. 0.9130 while test dataset achieved an accuracy of 0.8048. On
Normalizing the photos was the final step in the preparation. the model loss, test data has higher percentage than the trained
For a better and more accurate model performance, data data. The accuracy of the model will continue to progress
augmentation was used to artificially enhance the quantity of through continuous feeding and training of new images.
data by producing additional data points.
Data augmentation techniques was used to extend dataset and
provide additional pictures in different lighting settings and
orientation, improve model’s ability to become more generic,
and improve test and validation accuracy, especially on
distorted images. The specific iterative process is shown
below. Note that as the epoch progresses, the time consumed
per epoch decreases. It was also observed that training
accuracy as well as the validated accuracy are directly
proportional to the number of epochs on training the model.
The validated loss is expected to decrease at the last repetition.

Fig. 8 Sample model prediction results based on actual results


from model testing

20 unlabeled images from the identified catchment areas of


Pulangi IV Hydro Power Plant was fed into the trained model.
The outcome showed a correctly-labeled 0 non-silted areas.
However, 2 images were incorrectly labeled as silted instead
of being non-silted.
Fig. 6 Sample model training process

The accuracy and loss rate of the model training process on


the silted and non-silted data set changes after training,
Python visualization tool was used to get the accuracy and
loss rate change curves, as shown in Figure 7.

Fig. 9 Confusion matrix from the trained model

A classification report generated from the library


sklearn.metrics showed that the model attained 0 precision
score for non-silted of 0 false negatives and 0.90 on
predicting silted images with 2 false positives. The model
scored 1 and 0.75 on recall respectively predicting 2 non-
silted areas incorrectly and all silted areas correctly. The
model’s f1 scores are 0 for non-silted and 0.95 for silted.

Fig. 7 Changes in training accuracy and loss rate of the image


dataset.
Fig. 10 Model training classification report
4
ACKNOWLEDGMENT
Precision is a classifier's ability to avoid labeling a negative The authors would like to express their huge appreciation to
occurrence as positive, a classifier's recall is its capacity to Central Mindanao University and National Power
discover all positive occurrences, F1 scores are lower than Corporation for the success of this study as a whole. Above
accuracy measurements because they factor in precision and all, to the Almighty Father who gave them all the favors for
recall. The following are the formulas respectively: the conduct of this study.

REFERENCES
TP
Precision = Abdolahnejad, M., Moocarme, M., & Bhagwat, R. (2019).
TP + FP Applied deep learning with keras: Solve complex real-life
problems with the simplicity of keras. Packt Publishing.
TP
Recall = Arnold, T. B. (2017). kerasR: R interface to the keras deep
TP + FN learning library. Journal of Open-Source Software, 2(14),
296. https://doi.org/10.21105/joss.00296.
2 (Recall × Precision)
F1 Score = Deka, P. C. (2020). A primer on machine learning applications
Recall + Precision in civil engineering. Taylor & Francis Group, LLC.
Fagorite, V. I., Ehujuo, N. N., Okeke, C. A., Onunkwo, A. A.
Although it had a simple model and dataset, the model's (2019). Reservoir sedimentation: causes, effects and
accuracy result is fair, with around _ accuracy on a 100- mitigation. International Journal of Advanced Academic
training epoch model. Owing of the advantages of this Research, 5(10). ISSN: 2488-9849.
model's rapid and light processing time, it may be used on Manjunath, S. (2021). Convolutional neural network and
devices with limited computational capabilities, including keras mixed data input.
mobile devices, web servers, and internet-of-things devices. https://sharathmanjunath.medium.com/convolutional-
The operation of Pulangui IV Hydroelectric Power Plant neural-network-and-keras-mixed-data-input-part-2-by-
Dredging/Survey group on bathymetric and topographic sharath-manjunath-bb9529d76f89
survey could take almost 3 months to be completed annually. Morris, G.L. and Fan, J. (1998). Reservoir sedimentation
Due to the fact that these survey approaches are the only handbook, design and management of dams, reservoirs,
methods on monitoring the silts accumulation, the time spent and watersheds for sustainable use. McGraw-Hill Book Co.,
and allotted on such operation is limited only on that specific Njouya, O., Ako, A. & Zhao S. (2020). Erosion and
period. sedimentation impact on storage volume of Lom Pangar
The model could benefit the mentioned hydroelectric plant by hydroelectric dam, Cameroon. International Journal of
providing them support in accomplishing silt monitoring Scientific Research & Engineering Trends 6(5). 3017.
operations on a frequent basis through aerial imagery. In Retrieved from https://ijsret.com/wp-
addition, the team could monitor and determine which areas content/uploads/2020/09/IJSRET_V6_issue5_672.pdf.
of the catchment has greater amount of sediment yield in Panondi, W. & Izumi, N. (2021). Climate change impact on
terms of percentage. Thus, the model could help the Plant’s the hydrologic regimes and sediment yield of Pulangi river
Dredging/Survey group on fulfilling their work on a time basin (PRB) for watershed sustainability. Sustainability 13,
efficient and effective approach. no. 16: 9041. https://doi.org/10.3390/su13169041.

3.1 Conclusion
In this paper, the application of machine learning for the
sediment yield calculation in Pulangi IV Hydroelectric Plant
was investigated. It showed a deep learning model that
classifies image-dataset from the lower reservoir, power
channel, and surge pool of the said plant. The following
conclusions are drawn based on the results. The model was
able to classify images from the mentioned areas according to
their classifications. This could significantly solve the
problems posed stated at the beginning of this study, more
emphasized on the challenge of manual sediment monitoring
system practiced by the Plant management especially in the
new normal. The model showed promising results during the
testing and training of the datasets with, 1757 initial datasets
to train on the specified class (silted and non-silted), the
model has achieved a training accuracy of 0.9130. Upon
feeding new data, it classified two classes with an accuracy
reaching up to 0.80.

You might also like