You are on page 1of 31

DETECTION AND PREDICTION OF

SHORELINE USING MACHINE


LEARNING TECHNIQUES

REPORT OF MINI PROJECT-I


Submitted in partial fulfilment of the requirements for the degree of
BACHELOR OF TECHNOLOGY
IN
CIVIL ENGINEERING
By
Kiran Shripad Hegde (211CV126)
N Ranjith Shetty (211CV130)
Raghav Bajpai (211CV139)
Kartikey Tiwari (211CV225)

Under the guidance of


Dr. Subrahmanya Kundapura

DEPARTMENT OF WATER RESOURCES AND OCEAN


ENGINEERING
NATIONAL INSTITUE OF TECHNOLOGY KARNATAKA
SURATHKAL, MANGALURU-575 025
NOVEMBER 2023

1
2
ACKNOWLEDGEMENT

It's not about just doing things it is about doing things the right way
We have tried to get a glimpse of the workings of “DETECTION
AND PREDICTION OF SHORELINE USING MACHINE
LEARNING TECHNIQUE” and go higher on the learning curve.
This report is the result of work whereby we have been accompanied
and supported by many people. It is a pleasant aspect that I now have
the opportunity to express our gratitude for all of them. With
immense pleasure, we express our sincere gratitude to our guides and
mentor Dr. Subrahmanya Kundapura for their flawless guidance,
invaluable suggestions and continuous encouragement at all stages of
our work mini-project

3
4
DECLARATION

We hereby declare that the Mini Project report I (WO380) entitled


"DETECTION AND PREDICTION OF SHORELINE USING
MACHINE LEARNING TECHNIQUE" which is being submitted
to the National Institute of Technology Karnataka, Surathkal in
partial fulfillment of the requirement of the award of the degree of
Bachelor of Technology in the Department of Civil Engineering is a
bonafide report of the work carried out by us.

The material contained in this report has not been submitted to any
university or institute to award any degree.

Member 1: KIRAN SHRIPAD HEGDE (211CV126),

Member 2: N RANJITH SHETTY (211CV130),

Member 3: RAGHAV BAJPAI (211CV139),

Member 4: KARTIKEY TIWARI (211CV225)

Place: Surathkal

Date: 23-11-2023

5
6
CERTIFICATE

This is to certify that the content of this project entitled,


“SHORELINE DETECTION AND PREDICTION USING
MACHINE LEARNING” by KIRAN SHRIPAD HEGDE
(211CV126), N RANJITH SHETTY (211CV130), RAGHAV
BAJPAI (211CV139), KARTIKEY TIWARI (211CV225) is a
bonafide work of them under my supervision in the academic year
2023. Based on the declaration made by them, I recommend this
project report for evaluation.

Dr. Subrahmanya Kundapura Dr. Varija K.


Faculty of Water Resources Engineering Professor and Head
and Mini project supervisor Department of Water Resources and
Department of Water Resources and Ocean Engineering
Ocean Engineering National Institute of Technology Karnataka
National Institute of Technology Surathkal
Karnataka Surathkal

7
TABLE OF CONTENT

ACKNOWLEDGEMENT 3

CERTIFICATE 7

1. INTRODUCTION 10

1.1 History 10

1.2 Objective 11

1.3 Methodology 12

2. DATA COLLECTION 13

2.1 Google Earth Engine 13

2.2 USGS and QGIS 14

3. DATA PREPROCESSING 15

3.1 Canny Edge Detection 15

3.2 Edge Point Extraction 21

4. REGRESSION MODEL 24

4.1 Support Vector Regression 24

4.2 Radial Basis Function 26

5. RESULTS AND DISCUSSION 30

6. REFERENCES 31

8
LIST OF TABLES

Table 1 Report from Ministry of Earth Science, 2018 10


Table 2 Data Frame after extracting points 23
Table 3 Data frame after missing data handling 24

LIST OF FIGURES

Fig 1.1 Report from Ministry of Earth Science, 2018 11


Fig. 1.2 Methodology flow chart 13
Fig. 2.1 Google Earth Engine Code Editor 14
Fig. 2.2 Google Earth Engine Tim lapse 14
a) Original Grayscale image
Fig. 3.1.1 b) Image after noise reduction 16
a) Image after noise reduction
Fig 3.1.2 b) Image after non maximum suppression 18
a) Image after Non-Maximum Suppression
Fig. 3.1.3
b) Image after Double threshold
19

Fig.3.1.4 Edge Tracking by Hysteresis Process 20


a) Image after Double threshold
Fig. 3.1.5
b) Image after Edge Tracking by Hysteresis process
20
a) Original Grayscale Image
Fig.3.1.6 b) Image after Canny function 21

Fig. 3.2.1 Plot of the extracted points 22


Fig. 3.2.2 Plot after removal of duplication 22
a) plot after missing data handling
Fig. 3.2.3
b) plot of comparative analysis of shoreline
24
Plot of predicted shoreline for 2050 along with the
Fig 4.1
detected shoreline of 2022
28

a) Fit of the regression model for 450th y-layer


Fig 4.2 b) plot of all model mean absolute error vs y- 29
layer

9
1. INTRODUCTION
1.1 HISTORY

Shoreline analysis in Karnataka, a state in southwest India, is a critical field of


study due to the state's extensive coastline, which spans approximately 320
kilometers along the Arabian Sea. This analysis involves a comprehensive
examination of various factors that affect the coastline, including geological
formations, oceanographic conditions, climatic influences, and human activities.

As per National Assessment of Shoreline changes along Indian Coast, the status
of Karnakata was reported as below:

Table 1. Report from Ministry of Earth Science, 2018

The report highlighting severe erosion along the Karnataka coastline indicates a
critical environmental and socio-economic issue facing this region. Karnataka's
coastline, with its diverse landscapes and ecological systems, is experiencing
significant shoreline changes, primarily attributed to erosion. This erosion is not
uniformly distributed but varies significantly across different regions, with some
areas experiencing more severe impacts.

Shashihithlu Beach near Mulki is one such example where the verdant beachfront
was completely destroyed after the construction of a bridge and a fishing jetty on
Nandini river. It is said the coffer-bund erected for bridge construction altered the
river flow into the Sea, thereby causing erosion.

10
Fig 1.1 Report from Ministry of Earth Science, 2018

1.2 OBJECTIVE

a) To collect the high-resolution timelapse (from 1984 to 2022) images


from USGS through Google Earth Engine or QGIS for the detection
and prediction of the shoreline movement near Shashihitlu beach.

b) To detect the shoreline of the collected images using the Canny Edge
Detection Algorithm.

c) To create a data frame of the pixels of shoreline for each year and to
handle the missing data using Pandas and Numpy.

d) To predict the shoreline profile for the given year by training Support
Vector Regression with Radial Basis Function kernel with optimal
value for the C and γ by trial and error method.

e) To display the predicted shoreline using matplotlib along the referred


year’s actual shoreline.

11
1.3 METHODOLOGY

Shoreline analysis in geographic and environmental contexts typically involves


assessing the position of the shoreline and its changes over time due to various
natural and anthropogenic factors. This type of analysis is crucial for
understanding coastal dynamics, which can be influenced by erosion, sediment
deposition, sea-level rise, and human activities such as construction and land
reclamation.

In a program designed for shoreline analysis, you would generally follow several
key steps:

1. Data Collection: Gather satellite images, aerial photographs, historical


maps, and other relevant data sources that capture the shoreline at different
time points.

2. Image Pre-processing: Standardize the images by correcting for distortions,


aligning them to a common coordinate system, and enhancing image quality
for better feature extraction.

3. Shoreline Extraction: Identify and delineate the shoreline edge using


image processing techniques, such as edge detection algorithms (e.g., Canny
edge detector), or classification methods (e.g., supervised classification).

4. Shoreline Digitization: Convert the extracted shoreline edge into a vector


format, such as lines or polygons, which can be analyzed in GIS software.

5. Change Detection: Compare shorelines from different periods to identify


areas of change, which could be erosion or accretion. This can be done by
overlaying digitized shorelines or calculating the distance between shorelines
at different times.

6. Statistical Analysis: Use statistical methods to quantify the rate of change,


identify patterns, and assess the significance of the observed changes.

7. Predictive Modeling: Employ models, such as SVR with RBF kernels, to


predict future shoreline positions based on past trends and influencing factors.

12
8. Interpretation and Application: Analyze the results to draw conclusions
about coastal processes and to inform coastal management decisions, such as
where to fortify against erosion, where to allow natural dynamics, or where to
focus conservation efforts.

9. Visualization and Reporting: Create maps, graphs, and other


visualizations to communicate the findings of the shoreline analysis to
stakeholders, policymakers, and the scientific community

Fig 1.2 Methodology flow chart

2. DATA COLLECTION

2.1 Google Earth engine:

Satellite image collection from Google Earth Engine can be a cornerstone for
projects like Shoreline Detection and Prediction. Google Earth Engine (GEE) is a
cloud-based platform for planetary-scale environmental data analysis that
combines a multi-petabyte catalog of satellite imagery and geospatial datasets
with planetary-scale analysis capabilities.

For a project focused on shoreline analysis, GEE offers an extensive archive of


satellite data from various sources like Landsat, Sentinel, and MODIS, which are
regularly updated and available for historical time series analysis. These datasets
can be invaluable for identifying and quantifying changes in shoreline position
over time due to natural and anthropogenic factors.

13
Fig. 2.1 Google Earth Engine Code Editor

Fig . 2.2 Google Earth Engine Timelapse

2.2 USGS AND QGIS:

The collection of satellite imagery by the USGS is a sophisticated process that


involves various programs and tools. One of the key programs is the Landsat
program, which has been observing Earth's changes for over 50 years. Through
this program, a vast collection of images has been amassed, revealing significant
changes across the planet. The USGS EarthExplorer web application is a primary
tool for accessing this trove of digital data, which includes not just satellite
imagery but also aerial photography and cartographic products. This tool allows

14
users to download data, such as Landsat imagery, by setting search criteria such
as location, date range, and cloud cover, and then selecting from various data sets.

When it comes to analyzing these satellite images, QGIS, an open-source


geographic information system, is a powerful resource. It enables users to
perform remote sensing, which is the science and technology of identifying,
measuring, and analyzing the characteristics of objects without direct contact. In
QGIS, image classification is a crucial task that helps analyze land use and cover
classes. However, raw satellite images require processing to be fully analyzed.
The initial steps in QGIS analysis involve examining the data to understand how
features are represented, which informs the type of analysis that may be suitable
for the specific data.

3. DATA PREPROCESSING

3.1 CANNY EDGE DETECTION :

The Canny edge detector is an edge detection operator that uses a multi-stage
algorithm to detect a wide range of edges in images. It was developed by John F.
Canny in 1986. Canny also produced a computational theory of edge detection
explaining why the technique works.

The Canny edge detection algorithm is composed of 5 steps:

3.1.1 Noise reduction :

One way to get rid of the noise on the image is by applying Gaussian blur to
smooth it. To do so, the image convolution technique is applied with a Gaussian
Kernel (3x3, 5x5, 7x7 etc…). The kernel size depends on the expected blurring
effect. The smallest the kernel, the less visible the blur. In our example, we will
use a 5 by 5 Gaussian kernel.

The equation for a Gaussian filter kernel of size (2k+1)×(2k+1) is given by:

15
Fig. 3.1.1 a) Original Grayscale image b) Image after noise
reduction

3.1.2 Gradient Calculation

The Gradient calculation step detects the edge intensity and direction by
calculating the gradient of the image using edge detection operators.

Edges correspond to a change in pixels’ intensity. To detect it, the easiest way is
to apply filters that highlight this intensity change in both directions: horizontal
(x) and vertical (y)

When the image is smoothed, the derivatives Ix and Iy w.r.t. x and y are
calculated. It can be implemented by convolving I with Sobel kernels Kx and Ky,
respectively:

16
Then, the magnitude G and the slope θ of the gradient are calculated as follows:

the gradient intensity level is between 0 and 255 which is not uniform. The edges
on the final result should have the same intensity (i-e. white pixel = 255).

3.1.3 Non-Maximum Suppression:

Ideally, the final image should have thin edges. Thus, we must perform non-
maximum suppression to thin out the edges.

The principle is simple: the algorithm goes through all the points on the gradient
intensity matrix and finds the pixels with the maximum value in the edge
directions.

17
Fig.3.1.2 a) Image after noise reduction b) Image after Non Maximum
Suppression

3.1.4 Double threshold:

The double threshold step aims at identifying 3 kinds of pixels: strong, weak, and
non-relevant:

Strong pixels are pixels that have an intensity so high that we are sure they
contribute to the final edge.

Weak pixels are pixels that have an intensity value that is not enough to be
considered strong ones, but yet not small enough to be considered non-relevant
for edge detection.

Other pixels are considered non-relevant for the edge. Now you can see what the
double thresholds hold for:

The high threshold is used to identify the strong pixels (intensity higher than the
high threshold)

A low threshold is used to identify the non-relevant pixels (intensity lower than
the low threshold)

All pixels having intensity between both thresholds are flagged as weak and the
Hysteresis mechanism (next step) will help us identify the ones that could be
considered as strong and the ones that are considered non-relevant.

18
Fig. 3.1.3 a) Image after Non-Maximum Suppression b) Image after Double
threshold

3.1.5 Edge Tracking by Hysteresis:

Based on the threshold results, the hysteresis consists of transforming weak


pixels into strong ones, if and only if at least one of the pixels around the one
being processed is a strong one, as described below:

19
Fig.3.1.4 Edge Tracking by Hysteresis Process

Fig. 3.1.5 a) Image after Double threshold b) Image after Edge Tracking by
Hysteresis process

All of this can be achieved using the in-built function: Canny(image, threshhold1,
threshhold2)

From this we can regulate the features in the detected edge image, for further
discussion we will be using the same function as given below:

20
Fig.3.1.6 a) Original Grayscale Image b) Image after Canny function

3.2 EDGE POINT EXTRACTION:

3.2.1 White pixels’ point extraction

The result of the edge detection is a grayscale image where the edges are
highlighted. In this image, the pixel value 255 (white color) signifies an edge
point, while other values indicate non-edge areas.

numpy.argwhere Function: The numpy.argwhere function is a NumPy library


function that finds the indices of array elements that are non-zero, in this case, the
elements that are equal to 255. When applied to the grayscale edge image,
np.argwhere(edge == 255) returns a two-dimensional array of coordinates where
each row represents the position [row, column] of an edge point.

Extracting Edge Points: The edge_points array now contains the coordinates of
all edge pixels detected in the grayscale image. For example, if the edge array
represents an image where the detected shorelines are marked with the value 255,
edge_points will contain the coordinates of all shoreline points.

21
Fig. 3.2.1 plot of the extracted points

3.2.2 Duplication removal in the DataFrame:

The code drops duplicates based on the 'Axis' column and then fills NaN values.
It’s unclear why there would be NaN values that need filling after dropping
duplicates, as np.argwhere should not produce NaNs. This step might be
redundant unless there’s a specific reason for it that’s not apparent from the
snippet.

Fig. 3.2.2 Plot after removal of duplication

22
3.2.3 DataFrame Concatenation:

The line df = pd.concat([df, ndf[year]], axis=1) suggests that there is an existing


DataFrame df that is being concatenated with the new data. If df is not initialized
before the loop, this could cause an error. Also, if the loop is meant to process
images from multiple years, each iteration will add a new column for that year,
which seems correct. However, there should be a mechanism to handle the
alignment of rows between df and ndf[year] if the number of edge points varies
between images.

Table 2 Data Frame after extracting points

3.2.4 Handling Missing Data:

Forward Fill: Forward Fill is a method to fill missing values (NaNs) in time
series data or any ordered sequence. In pandas, this is done using the fillna
method with the method='ffill' argument, which propagates the last valid
observation forward to the next valid.

In the context of your shoreline detection program, if you are dealing with
sequential data where each row represents a point along the shoreline for a given
year, and you wish to carry forward the last observed data point to the next one if
there's a gap, you can use forward fill.

Backward Fill: .fillna() starts off simple, but unlocks a ton of value once you
start backfilling and forward filling. A simple parameter entry at the parameter
[method = “Backfill”]

23
Backfill will scan from top to bottom and, for each NaN value, it will look ahead
to find the next non-null value and use it to fill the NaN. If there is no valid
observation after the missing value (i.e., at the end of the dataset), the missing
value will remain as NaN.

Table 2 Data frame after missing data handling

Fig. 3.2.3 a) plot after missing data handling b) plot of comparative analysis of
shoreline

4. REGRESSION MODEL

4.1 SVR (SUPPORT VECTOR REGRESSION):

Support Vector Regression (SVR) is a type of Support Vector Machine (SVM)


that is used for regression problems. It operates by finding a function that
approximates the relationship between the independent variables (features) and
the dependent variable (target) as accurately as possible.

4.1.1 General Concept of SVR:

SVR seeks to find a function f(x) such that the deviation from the actual targets yi

24
for the training data is minimized within a certain threshold ϵ. For any given data
point xi , the predicted value f(xi) should not deviate from the actual target yi by
more than ϵ. This is often visualized as fitting a tube or "epsilon-insensitive tube"
around the regression function and trying to contain as many data points within
this tube.

4.1.2 Mathematical Formulation:

The basic form of the SVR function is:

f (x)=⟨w,x⟩+b

Where ⟨w,x⟩ represents the dot product between the weights vector w and the
feature vector x, and b is the bias.

To train an SVR model, we solve the following optimization problem:

a) Minimize 1/2∣∣w∣∣2 to find the flattest possible tube.

b) Subject to the constraints that for each data point (xi,yi):

yi − ⟨ w, xi ⟩ − b ≤ ϵ

⟨ w , xi ⟩ + b − y i ≤ ϵ

These constraints ensure that predictions lie within the ϵ-insensitive tube.

For non-linear relationships, SVR uses kernel functions to map the input features
into higher-dimensional space where it is possible to find a linear separating
hyperplane. Common kernels include the linear, polynomial, and radial basis
function (RBF) or Gaussian kernel.

4.1.3 Shoreline Prediction Using SVR:

In the context of shoreline prediction, SVR can be used to understand and predict
the changing position of a shoreline over time. The independent variables might
include factors such as time, tide levels, wave energy, sediment supply, and other
relevant features. The dependent variable would be the shoreline position.

By applying SVR, you can model the relationship between these factors and the

25
shoreline movement, potentially allowing for predictions about future shoreline
positions based on trends learned from past data. This could be particularly useful
for predicting the effects of coastal erosion or the impact of climate change on
sea-level rise.

The actual implementation of SVR for shoreline prediction would require:

a) Feature Extraction: Determining which variables significantly impact


shoreline changes.

b) Model Training: Using historical data to train the SVR model.

c) Prediction: Applying the trained model to new data to predict future shoreline
positions.

The SVR method provides a robust way to handle non-linear relationships and
can perform well with small to medium-sized datasets, which makes it a suitable
choice for this kind of geospatial analysis.

4.2 Radial Basis Function (RBF)

The Radial Basis Function (RBF) kernel is a popular kernel function used in
various machine learning algorithms, including Support Vector Regression
(SVR). It is particularly used when the relationship between the dependent and
independent variables is non-linear.

4.2.1 RBF Kernel in SVR:

In the context of SVR, the RBF kernel transforms the input feature space into a
higher-dimensional space where it is easier to find a linear separation (or in the
case of regression, a linear fit). This is done without the need to compute the
coordinates of the data in that higher-dimensional space, thanks to the kernel
trick.

The RBF kernel function is defined as:

26
where:

 K(xi,xj) is the kernel function, computing the similarity between two points
xi and xj in the input space.

 γ is a parameter that defines how much influence a single training example


has. The larger γ is, the closer other examples must be to affect the model.

 ||xi−xj||2 is the squared Euclidean distance between the two feature vectors.

General Explanation of RBF:

The RBF kernel is a measure of similarity or closeness between two points. The
idea is that points that are close to each other in the input space will have a high
kernel value, and points that are far apart will have a low kernel value.

One of the main advantages of the RBF kernel is its ability to handle cases where
the relationship between class labels and attributes is nonlinear. The RBF kernel
can map an input space in infinite-dimensional space, which provides a lot of
flexibility for the model to fit the data.

A key aspect of using the RBF kernel in any SVM model is the setting of the γ
parameter. If γ is too large, the radius of the area of influence of the support
vectors only includes the support vector itself and can lead to over-fitting (i.e.,
the decision boundary becomes too specific to the training data). If γ is too small,
the model might become too constrained and cannot capture the complexity or
"shape" of the data, leading to under-fitting.

In the specific application to shoreline detection and prediction, the RBF kernel
could help the SVR model capture complex, non-linear relationships between the
predictors (like time, tidal patterns, human activity, sediment characteristics) and
the shoreline movement, facilitating more accurate predictions that could inform
coastal management and planning strategies.

The 672 SVR models with RBF Kernel with cost = 100 and γ = 0.5 are trained
for the each y-layer pixel. The training was based upon year as independent

27
variable and dependent variable as the x-pixel of the particular y-layer pixel.

The Prediction function takes the input as year and by calculates the possible
variation in the x-layer pixels for the given year.

Fig 4.1 Plot of predicted shoreline for 2050 along with the detected shoreline of
2022

4.2.2 Evaluation Metric:

Mean Absolute Error (MAE)

It calculates the absolute difference between the predicted values and averages
them

28
It gives us an idea of how wrong our predictions were, with higher the MAE the
worse fit our model is , so it is ideal for a lower MAE.

Given a set of n observations (x1,y1),(x2,y2),(x3,y3)…(xn,yn) where xi represents


the feature vector of the i-th observation and yi is the actual value of the i-th
observation, and a predictive model F, the predicted value for the i-th observation
would be xi.

Fig 4.2 a) Fit of the regression model for 450th y-layer b) plot of all model mean
absolute error vs y-layer

29
5. RESULTS AND CONCLUSIONS

a) The satellite images for period of 1984 to 2022 were collected from Google
Earth Engine , Google Earth Engine Time lapse and USGS/QGIS. The satellite
images collected from USGS had more noises compared to Earth Engine and
hence the satellite images taken from Google Earth Engine were used for further
procedures.

b) The Canny Edge Detection Algorithm, taken from the OpenCV library, was
successfully able to detect the edges in the given images. The thresholds were
adjusted manually to get the detection with lesser noise.

c) In-built functions of Pandas and Numpy libraries were used to extract the
pixels’ points, duplication removal and missing data handling. The data frame
involving shoreline profile points for each particular year was successfully
created.

d) The created data frame was used for the training of SVR with RBF kernel.
The SVR was imported from SciKit Learn library. The C and Gamma were
adjust by trail and error method to fit the curve for the data points.

e) Due to the less data points and the presence of noises, which were still present
due to the noisy satellite images from 2010 to 2013, lead to the models to have
varying MAE values as shown in the figure.The average of MAEs calculated was
5.93.

f) The predicted shoreline was plotted along the given input satellite. This plot
provided a comparative overview of the shoreline variation.

30
6. REFERENCES

i. National Assessment of Shoreline changes along Indian Coast, R. S. Kankara,


M. V. Ramana Murthy & M. Rajeevan, Ministry of Earth Sciences ,National
Centre for Coastal Research, Chennai-600100, July- 2018

ii. Coastal Erosion Monitoring and Hazard Degree Assessment at Penglai


Sandy Coast Based on Remote Sensing,IOP Conference Series: Earth and
Environmental Science,Shiyong Wen et al 2019

iii. Use of Machine Learning and Remote Sensing Techniques for Shoreline
Monitoring: A Review of Recent Literature , Chrysovalantis-Antonios D.
Tsiakos and Christos Chalkias, Department of Geography, Harokopio
University of Athens, 17676 Kallithea, Greece

iv. Mapping shoreline change using machine learning: a case study from the
eastern Indian coast, Lalit Kumar, Mohammad Saud Afzal1 · Mohammad
Mashhood Afzal

v. Akoum, A., & Al Mawla, N. (2015). Hand gesture recognition approach for
asl language using hand extraction algorithm. Journal of Software
Engineering and Applications, 8(08), 419.

vi. Documentaions from OpenCV and SKlearn

31

You might also like