Research Report

A Project report on
1
2
3
LISTS OF FIGURES
PAGE NO
FIGURE-1 FLOW CHART OF MACHINE LEARNING MODEL 14
FIGURE-2 LINEAR REGRESSION 14
FIGURE-3 MODEL-1 SIMPLE LINEAR REGRESSION 15
FIGURE-4 MODEL-2 MODIFIED LINEAR REGRESSION 16
FIGURE-5 MODEL-3 MODIFIED LINEAR REGRESSION 16
FIGURE-6 SUPPORT VECTOR MACHINE 23
FIGURE-7 ARTIFICIAL NEURAL NETWORK 25
FIGURE-8 VARIATION OF DEPTH 30
FIGURE-9 VARIATION OF BLOWCOUNTS 30
FIGURE-10 VARIATION OF BLOWCOUNTS VS DEPTH 31
FIGURE-11 VARIATION OF BLOWCOUNTS VS qc 31
FIGURE-12 VARIATION OF BLOWCOUNTS VS fs 32
FIGURE-13 VARIATION OF BLOWCOUNTS VS Uz 32
FIGURE-14 VARIATION OF BLOWCOUNTS VS 33

NORMALIZED ENTRHU[-]
4
NORMALIZED HAMMER ENERGY[-]

NUMBER OF BLOWS
FIGURE-17 VARIATION OF BLOWCOUNTS VS DIAMETER 34

BOTTOM WALL THICKNESS

PILE PENETRATION
FIGURE-20 DEPLOYMENT 36
FIGURE-21 BARPLOTS OF MODELS 37
5
LISTS OF TABLES
PAGE NO
TABLE-1 ACTUAL AND PREDICTED VALUES BLOW 28

COUNTS ON RANDOM VALUES
TABLE-2 RANGES OF VALUES IN GIVEN DATASET 29
TABLE-3 SAMPLE DATA 30
TABLE-4 CORRELATION OF BLOWCOUNTS WITH ALL 32

OTHER PARAMETERS
TABLE-5 RESULTS-ALGORITHRMS AND THEIR 38

RESPECTIVE ACCURACIES IN TERMS OF
R2 SCORE
6
CHAPTER 1
INTRODUCTION
Pile driveability prediction is the process of estimating the resistance that a pile will face during
installation. It involves the use of analytical approaches, numerical models, and empirical data to
determine the capacity of the pile to withstand various loads during installation.
Pile driveability prediction provides useful information for the design of pile foundations, enabling
engineers to select appropriate pile types, sizes, and installation methods. Additionally, it can help
avoid potential problems during pile installation, such as pile damage or excessive driving stresses,
which could lead to structural issues or costly repairs. Pile driveability prediction is an essential
element in the design and construction of effective and efficient pile foundations.
Pile blow counts are a common measure of soil resistance during pile driving. They are determined by
counting the number of blows required to achieve a certain depth of pile penetration. Accurate
predictions of pile blow counts can provide valuable information for pile design and optimization.
However, accurately predicting pile blow counts is challenging due to the complex interactions
between soil properties, pile geometry, and hammer characteristics.
Pile driveability prediction is a critical aspect of geotechnical engineering that determines the
feasibility and efficiency of pile construction projects. This review paper highlights some of the recent
studies on pile driveability prediction techniques and tools. One of the earliest and most widely used
methods for pile driveability prediction is the wave equation analysis (WEA). Various studies have
focused on the development of WEA-based tools that can accurately predict pile driveability
parameters such as blow count, penetration rate, and driving stresses. For example, Ghazavi et al.
(2014) developed a user-friendly WEA-based algorithm that uses field measurements to estimate pile
capacity and soil properties. Similarly, Wang et al. (2016) proposed an iterative WEA method that
incorporates soil resistance, pile length, and hammer performance to predict pile driveability
parameters.
Pile driveability prediction models are used in a wide range of engineering applications, including -
Foundation design, Construction planning, Quality control, Environmental impact assessment, Risk
assessment, Cost estimation.
7
Chapter 2
Literature Review
1. Sang et al. (2020) “Prediction of pile blow counts using artificial neural networks”
developed an Artificial Neural Network (ANN) model to predict pile blow-counts using soil
properties, pile geometry, and hammer characteristics as input variables. The authors aimed
to improve the accuracy of pile blow count predictions and reduce the cost and time
associated with field tests.
Background - Pile blow counts are a common measure of soil resistance during pile driving.
They are determined by counting the number of blows required to achieve a certain depth of
pile penetration. Accurate predictions of pile blow counts can provide valuable information
for pile design and optimization. However, accurately predicting pile blow counts is
challenging due to the complex interactions between soil properties, pile geometry, and
hammer characteristics.
Methodology -Sang et al. (2020) used an ANN model to predict pile blow counts based on a
dataset of field test data collected from 15 different pile driving projects. The dataset included
soil properties, pile geometry, and hammer characteristics, as well as pile blow counts
measured during field tests.
The ANN model was constructed using three hidden layers with 10, 8, and 4 neurons,
respectively. The model was trained using back-propagation and Levenberg-Marquardt
algorithms. The authors used a root mean square error (RMSE) and correlation coefficient
(R) to evaluate the performance of the ANN model.
Results - The results of the study showed that the ANN model could accurately predict pile
blow counts using soil properties, pile geometry, and hammer characteristics as input
variables. The RMSE and R values were 1.89 and 0.912, respectively, which indicated good
agreement between the predicted and observed pile blow counts. The authors also conducted
sensitivity analysis to identify the most significant input variables in the ANN model. The
results showed that soil strength and pile diameter were the most significant input variables,
while hammer energy was found to have a relatively minor effect on pile blow count
predictions.
Conclusion - The ANN model developed by Sang et al. (2020) provided an accurate and
cost-effective method for predicting pile blow counts using soil properties, pile geometry, and
hammer characteristics as input variables. The model could potentially reduce the number of
field tests required for pile design and optimization. The authors suggested that future
research should focus on improving the accuracy of the ANN model by incorporating
additional input variables and expanding the dataset used for the model training.
8
2. Wang et al. (2016) “An iterative WEA method for predicting pile drivability
parameters considering soil resistance, pile length and hammer performance” proposed
an iterative WEA method that incorporates soil resistance, pile length, and hammer
performance to predict pile drivability parameters.
Background - In traditional pile driving analysis, soil resistance is often assumed to be
constant along the length of the pile, which can result in inaccurate predictions of pile
drivability parameters such as pile stress, acceleration, and velocity. Moreover, pile length
and hammer performance are also important factors that can significantly affect pile
drivability. However, these factors are often neglected or simplified in traditional analyses,
which can lead to poor design and construction decisions. To address these issues, Wang et
al. proposed an iterative WEA method that incorporates soil resistance, pile length, and
hammer performance in a more comprehensive manner. The method involves an iterative
process that adjusts the pile length, hammer performance, and soil resistance until the
predicted pile drivability parameters match the measured values.
Methodology - The iterative process starts with an initial estimate of pile length, hammer
performance, and soil resistance. The WEA model is then used to predict the pile drivability
parameters based on the initial input parameters. The predicted parameters are then compared
with the measured values, and the differences between them are used to update the input
parameters. This process is repeated until the predicted and measured parameters converge to
within a specified tolerance. The proposed method was validated through a case study
involving driven piles at a construction site in China. The results showed that the iterative
WEA method was able to accurately predict pile drivability parameters and provide a better
understanding of the driving process. Moreover, the proposed method can be applied to
different soil types and pile lengths, and it can be used to optimize the pile driving process by
selecting the most suitable hammer and pile length.
Conclusion - Provides a more accurate and comprehensive approach to predicting pile
drivability parameters by considering the effects of soil resistance, pile length, and hammer
performance. The method can be a useful tool for designers and contractors in pile driving
projects, helping them to make more informed decisions and avoid potential construction
problems.
9
CHAPTER 3
Objectives
1. Aim is to predict the blow counts for piles by building a prediction model using machine
learning techniques.
2. Statistical analysis to determine the variation of depth and blow counts against three
parameters i.e., cone tip resistance(qc), blow-counts (Blows/m) and Normalized ENTHRU (-).
3. To improve the accuracy of pile blow counts predictions and reduce the cost and time
associated with field tests.
4. Deployment of machine learning model.
10
Chapter 4
Methodology
It starts with the determination of the type of the problem, where we select the machine learning
techniques such as Classification, Regression, etc, in our case, there is a regression use case.
Model training is done to improve performance also to identify various patterns, rules, and features.
Model testing, this step determines the accuracy percentage
Figure-1 Flow chart of machine learning process
In this we have used simple Linear regression machine learning model
Linear Regression is a machine

learning algorithm based on
supervised learning. It performs a
regression task. Regression models
a target prediction value based on
independent variables. It is mostly
used for finding out the relationship
between variables and forecasting.
Different regression models differ
based on – the kind of relationship
between dependent and independent
variables they are
Figure-2 Linear Regression
11
considering, and the number of independent variables getting used. There are many names for a
regression’s dependent variable. It may be called an outcome variable, criterion variable, endogenous
variable, or regressand. The independent variables can be called exogenous variables, predictor
variables, or regressors.
Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x)). Hence, the name is Linear Regression. In the figure above, X (input) is the
work experience and Y (output) is the salary of a person. The regression line is the best fit line for our
model.
Hypothesis function for Linear Regression:
Model 1: Linear model based on normalized ENTHRU only

The simplest linear model depends on only one feature. We can select the normalized energy
transmitted to the pile (ENTRHU) as the only feature for illustration purposes.
blow counts=a0 +a1 ( ENTHRU )normalized + €
Figure-3 Model-1 simple linear regression
Model 2: ENTHRU feature is modified to hyperbolic tanh function
ENTHR U modified =tanh ( 5∗ENTHR U normalized −0.5 )
blow counts=a0 +a1 ( ENTHRU )modified +€
12
Figure-4 Model-2 modified linear regression
Model 3: If we make the simplifying assumption that there is a proportionality between the
cone resistance and the unit shaft friction (𝑓𝑠=𝛼⋅𝑞𝑐)
We know from engineering considerations on the pile driving problem that the soil resistance to
driving (SRD) can be expressed as the sum of shaft friction and end bearing resistance. The shaft
friction can be expressed as the integral of the unit shaft friction over the pile circumference and
length. If we make the simplifying assumption that there is a proportionality between the cone
resistance and the unit shaft friction (𝑓𝑠=𝛼⋅𝑞𝑐) we can write the shaft resistance as follows:
Rs
blow counts=85∗tanh( −0.5)
1000
Figure-5 Model-3 modified linear regression
Model 4: Using K-Nearest Neighbors (KNN) regression, using all the parameters of the dataset
to train, test and predict the model
Introduction:
13
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that is used for both
classification and regression problems. In KNN regression, the goal is to predict the continuous output
variable based on the K-nearest neighbors from the training dataset.
Theory:
KNN regression is based on the principle that similar inputs will have similar outputs. The KNN
algorithm first calculates the distance between the test input and all the training inputs. It then selects
the K-nearest neighbors based on the calculated distances. Finally, it computes the weighted average
of the output variables of the K-nearest neighbors to predict the output variable for the test input.
Mathematical Background:
The KNN regression algorithm can be mathematically represented as:
For a new test input X, let D be the distance between X and all the training inputs
(x1, x2, ...,xn). Then, the K-nearest neighbors of X can be calculated as:
KNN(X) = {x1, x2, ..., xk}, where D(X, x1) ≤ D(X, x2) ≤ ... ≤ D(X, xk).
Next, the predicted output variable y for the test input X can be calculated as:
y = (Σwi*yi)/Σwi, where wi = 1/D(X, xi), and yi is the output variable for the training input xi.
Code:
14
Model 5: Using Decision Tree Regression, using all the parameters of the dataset to train, test
and predict the model
Introduction:
Decision Tree Regression is a non-parametric supervised learning algorithm used for regression tasks.
It uses a decision tree as a predictive model to map the input features to the target variable. The
decision tree is constructed by recursively splitting the data based on the values of the input features
until the terminal nodes are reached, which contain the predicted target values.
Theory:
The decision tree regression algorithm works by recursively partitioning the input feature space into
smaller regions based on the values of the input features. At each step, the algorithm selects the input
feature that best splits the data into two regions such that the variance of the target variable in each
region is minimized. The splitting criterion can be based on different metrics such as mean squared
error or mean absolute error.
The decision tree regression algorithm is based on the concept of entropy, which is a measure of the
impurity or randomness of a set of data. The entropy of a set S is defined as:
where pi is the proportion of data points in S that

belong to class i. The entropy is maximum when the classes are equally distributed in S and minimum
when all data points in S belong to the same class.
The algorithm constructs the decision tree by recursively partitioning the input feature space into
smaller regions based on the entropy of the target variable. At each step, the algorithm selects the
input feature that best splits the data into two regions such that the entropy of the target variable in
each region is minimized. The splitting criterion can be based on different metrics such as mean
squared error or mean absolute error.
15
Code:
Model 6: Using Random Forest Regression, using all the parameters of the dataset to train, test
Introduction:
Random Forest Regression is a popular ensemble learning algorithm used for solving regression
problems. It combines multiple decision trees to make predictions, each tree contributing to the final
prediction. Random Forest Regression is a powerful algorithm that can handle both linear and non-
linear data, and it is highly accurate and robust to overfitting.
Theory:
Random Forest Regression works by creating a forest of decision trees. Each decision tree is created
using a random subset of the features and a random subset of the training data. This helps in reducing
overfitting and increasing the accuracy of the model. The final prediction is made by aggregating the
predictions of all the trees in the forest.
The Random Forest Regression algorithm can be mathematically represented as follows:
1. Create a forest of decision trees with n trees.
2. For each tree i in the forest, randomly select a subset of m features from the total set of p
features.
3. Randomly select a subset of n samples from the training dataset to build the decision tree.
4. For each node in the tree, select the best feature among the m features to split the data.
5. Grow the tree until a stopping criterion is reached.
6. Repeat steps 2-5 for all the trees in the forest.
7. To make a prediction for a new input x, pass x through all the trees in the forest and aggregate
the predictions using a weighted average.
16
Code:
17
Model 7: Using Gradient Boosting Regression, using all the parameters of the dataset to train,
test and predict the model
Introduction:
Gradient Boosting Regression is another popular ensemble learning algorithm used for solving
regression problems. It works by combining multiple weak regression models to create a strong
model. Each weak model is added to the ensemble sequentially, with each new model focusing on the
errors of the previous model. Gradient Boosting Regression is known for its high accuracy and ability
to handle complex datasets with many features.
Theory:
Gradient Boosting Regression works by adding new models to the ensemble sequentially, each one
focusing on the errors of the previous model. The final prediction is the sum of the predictions of all
the models in the ensemble. In order to improve the accuracy of the model, each new model is added
in a way that minimizes the errors of the previous models.
The Gradient Boosting Regression algorithm can be mathematically represented as follows:
1. Initialize the prediction to the average of the target variable.
2. For each iteration, fit a regression model to the residuals of the previous model.
3. Compute the negative gradient of the loss function with respect to the predicted values.
4. Fit a new regression model to the negative gradients.
5. Update the prediction by adding the prediction of the new model multiplied by a learning rate.
6. Repeat steps 2-5 until a stopping criterion is reached.
7. The final prediction is the sum of the predictions of all the models in the ensemble.
Code:
18
Model 8: Using AdaBoost Regression, using all the parameters of the dataset to train, test and
predict the model
Introduction:
AdaBoost Regression is a popular ensemble learning algorithm used for solving regression problems.
It works by combining multiple weak regression models to create a strong model. Each weak model is
added to the ensemble sequentially, with each new model focusing on the errors of the previous
model. AdaBoost Regression is known for its high accuracy and ability to handle complex datasets
with many features.
Theory:
AdaBoost Regression works by adding new models to the ensemble sequentially, each one focusing
on the errors of the previous model. The final prediction is the sum of the predictions of all the models
in the ensemble. In order to improve the accuracy of the model, each new model is added in a way
that gives more weight to the samples that were not correctly predicted by the previous models.
The AdaBoost Regression algorithm can be mathematically represented as follows:
1. Initialize the weights of all the samples to 1/N, where N is the total number of samples.
2. For each iteration, fit a regression model to the training data using the current weights.
3. Compute the weighted mean squared error of the predictions on the training data.
4. Compute the alpha value, which is a measure of the contribution of the current model to the
final prediction.
5. Update the weights of the samples, giving more weight to the samples that were not correctly
predicted by the current model.
6. Repeat steps 2-5 until a stopping criterion is reached.
7. The final prediction is the sum of the predictions of all the models in the ensemble, weighted
by their alpha values.
Code:
19
Model 9: Using Support Vector Regression (SVR), using all the parameters of the dataset to
train, test and predict the model
Introduction:
Support Vector Regression (SVR) is a machine learning algorithm used for regression tasks. It is a
variant of Support Vector Machines (SVM) that can handle continuous variables and non-linear
relationships between features and the target variable. SVR is known for its ability to handle high-
dimensional data and its flexibility in defining the shape of the decision boundary.
Theory:
SVR is based on the same principles as SVM, which involves finding a hyperplane that maximally
separates the data into two classes. In SVR, the goal is to find a hyperplane that maximizes the margin
between the data points and a decision boundary. The decision boundary is defined as a function of
the input features, and the goal of the algorithm is to minimize the error between the predicted and
actual target values.
The key idea behind SVR is to transform the input data into a higher-dimensional space using a kernel
function. This allows for non-linear relationships between the input features and the target variable to
be captured. The algorithm then finds the hyperplane that maximizes the margin in this higher-
dimensional space.
Some important definitions:
1. Kernel: Kernel is a function that is used to map a lower-dimensional data points into higher
dimensional data points. As SVR performs linear regression in a higher dimension, this function is
crucial. There are many types of kernel such as Polynomial Kernel, Gaussian Kernel, Sigmoid Kernel,
etc.
2.Hyper Plane: In Support Vector Machine, a hyperplane is a line used to separate two data classes in
a higher dimension than the actual dimension. In SVR, a hyperplane is a line that is used to predict
continuous value.
3.Boundary Line: Two parallel lines drawn to the two sides of Support Vector with the error threshold
value, 𝞮(epsilon) are known as the boundary line. Boundary line situated at positive region is known
as Positive Hyperplane and the boundary line situated at negative region is known as Negative
Hyperplane .These lines create a margin between the data points.
4.Support Vector: The line from which the distance is minimum or least from two boundary data
points.
The mathematical formulation of SVR involves finding a hyperplane in a higher-dimensional space
that separates the data points into two classes, where the classes are defined by the sign of the
difference between the target variable and the predicted value. The optimization problem is as
follows:
20
Figure-6 support vector machine
Code:
Model 10: Using ExtraTrees Regression, using all the parameters of the dataset to train, test
Introduction:
ExtraTrees regression is a machine learning algorithm used for regression tasks. It is an ensemble
learning method, based on decision trees, where each tree is built from a random sample of the
training data and a random subset of features. The algorithm constructs a forest of decision trees and
combines their predictions to make the final prediction. ExtraTrees regression is known for its
accuracy and ability to handle high-dimensional data.
Theory:
21
ExtraTrees regression is an ensemble learning method that uses multiple decision trees to make a
prediction. The algorithm constructs a forest of decision trees, where each tree is built from a random
sample of the training data and a random subset of features. The trees are trained independently, and
their predictions are combined to make the final prediction.
The training process involves constructing a large number of decision trees, where each tree is built
using a different subset of the training data and features. The algorithm then aggregates the
predictions of all the trees to make the final prediction. The randomness in the tree construction and
feature selection process helps to reduce overfitting and improve the accuracy of the model.
The ExtraTrees algorithm can be mathematically represented as follows:
 Randomly select n subsets of the training data and m subsets of the features.
 For each subset, build a decision tree using the CART algorithm.
 Aggregate the predictions of all the trees to make the final prediction.
The CART algorithm is a recursive binary splitting algorithm that recursively partitions the data into
subsets based on the values of the features. At each node of the tree, the algorithm selects the feature
and split point that results in the largest reduction in the variance of the target variable. The algorithm
continues to split the data until a stopping criterion is reached, such as reaching a maximum tree depth
or a minimum number of samples in each leaf node.
Code:
Model 11: Using Deep Learning technique (ANN), using all the parameters of the dataset to
train, test and predict the model
Introduction:
Artificial Neural Networks (ANN) are a class of machine learning algorithms inspired by the structure
and function of biological neurons. ANNs are used for a variety of tasks, including classification,
regression, and clustering. ANNs are particularly useful for complex tasks that involve non-linear
relationships between the input and output variables. In this context, ANNs are commonly used for
multi-feature dataset regression tasks.
Theory:
22
An ANN consists of layers of interconnected neurons that process the input data to produce an output.
Each neuron is a mathematical function that takes as input the weighted sum of the inputs from the
previous layer and applies an activation function to produce an output. The weights and biases of the
neurons are learned during the training process to minimize the error between the predicted output and
the actual output.
The basic unit of an ANN is a neuron, which takes as input a set of values, multiplies each value by a
corresponding weight, adds a bias term, and applies an activation function to produce an output. The
output of a neuron is given by:
Figure-7 Artificial neural network
Perceptrons - Invented by Frank Rosenblatt in 1958, are the simplest neural network that consists of n
number of inputs, only one neuron, and one output, where n is the number of features of our dataset.
The process of passing the data through the neural network is known as forward propagation and the
forward propagation carried out in a perceptron is explained in the following three steps.
Step 1: For each input, multiply the input value xᵢ with weights wᵢ and sum all the multiplied values.
Weights — represent the strength of the connection between neurons and decides how much influence
the given input will have on the neuron’s output. If the weight w ₁ has a higher value than the weight
w₂, then the input x₁ will have a higher influence on the output than w₂.
The row vectors of the inputs and weights are x = [x ₁, x ₂, … , x ₙ] and w =[w ₁, w ₂, … , w ₙ]
respectively and their dot product is given by
The row vectors of the inputs and weights are x = [x₁, x₂, … , xₙ] and w =[w₁, w₂, … , wₙ]
respectively and their dot product is given by
23
Step 2: Add bias b to the summation of multiplied values and let’s call this z. Bias — also known as
the offset is necessary in most of the cases, to move the entire activation function to the left or right to
generate the required output values.
Step 3: Pass the value of z to a non-linear activation function. Activation functions — are used to
introduce non-linearity into the output of the neurons, without which the neural network will just be a
linear function. Moreover, they have a significant impact on the learning speed of the neural network.
Perceptron’s have binary step function as their activation function. However, we shall use sigmoid —
also known as logistic function as our activation function.
where σ denotes the sigmoid activation function and the output we get after the forward prorogation is
known as the predicted value ŷ.
Code:
24
Table-1 Actual and predicted values of blow counts on random values
25
Chapter 5
Data Collection and Analysis
 The pile driving process requires input on the soil, the hammer settings, and the pile
characteristics.
 The full data set consists of piezocone penetration test data (PCPT), recorded hammer
energies, and pile characteristics at 114 pile foundation locations.
 The 114 foundation locations are separated into 94 locations which can be used to train
models and 20 locations which will be used to score submissions of the participants.
 The pile driving process starts after the pile has penetrated some distance into the ground so
blank values are possible at shallow depth.
 Data files with blank values removed are provided for the training and test data set.
Table-2 Ranges of values in given dataset
 References of the dataset - https://www.kaggle.com/c/isfog2020-pile-driving-predictions/
26
Table-3 Sample Data
 z [m] - depth below mudline

 qc [MPa] - Cone tip resistance
 fs [MPa] - Sleeve friction
 u2 [MPa] - Pore pressure behind the cone
 ID - a unique ID combining the location name and the depth at which data is provided
 Location ID - Anonymized location ID
 Normalized ENTHRU [-] - Energy transmitted to the pile. Normalized to be between 0 and 1
 Normalized hammer energy [-] - Energy provided by the hammer. Normalized to be
between 0 and 1
 Diameter [m] - Diameter of the pile at the selected depth
 Bottom wall thickness [mm] - Wall thickness at the bottom of the pile
 Pile penetration [m] - Final penetration of the pile below mudline
 Blow count [Blows/m] - Number of blows required for an additional meter of pile
penetration. This describes the rate of pile penetration.
 Number of blows - Total number of blows to reach the selected depth
27
Analysis-1 Variation of Cone tip resistance(qc), Blow-Count (Blows/m), Normalized ENTHRU (-)
with depth(m)
Figure-8 Variation of depth
Analysis-2 Variation of Cone tip resistance(qc), Normalized ENTHRU (-), depth(m) with Blow-
Count (blows/m)
28
Figure-9 Variation of blowcounts
Table-3 Correlation table of Blow-Counts with all others parameters

Parameters Correlation Value
z [m] 0.712875
qc [MPa] 0.509373
fs [MPa] 0.437957
u2 [MPa] 0.401883
Blowcount [Blows/m] 1.000000
Normalised ENTRHU [-] 0.673955
Normalised hammer energy [-] 0.681180
Number of blows 0.702035
Diameter [m] NaN
Bottom wall thickness [mm] -0.099416
Pile penetration [m] 0.115292
Analysis-4 Variation of
Blow-Count with depth z
[m]
Figure-10 variation of blowcounts vs depth
Blow-Count with qc
[MPa]
29
Figure-11 variation of blowcounts vs qc
Blow- Count with
fs [MPa]
Figure-12 variation of blowcounts vs fs
Analysis-7 Variation
of Blow- Count
with u2 [MPa]
30
Figure-13 variation of blowcounts vs uz
Blow-Count with
Normalised ENTRHU
[-]
Figure-14 variation of blowcounts vs Normalised ENTRHU [-]
Analysis-9 Variation of Blow-Count with Normalised hammer energy [-]
31
Figure-15 variation
of
blowcounts vs Normalised hammer energy [-]
Analysis- 10
Variation of
Blow- Count
with
Number of
blows
32
Figure-16 variation of blowcounts vs number of blows
Analysis-11 Variation of Blow-Count with Diameter [m]
Figure-17 variation of blowcounts vs diameter
Analysis-
12 Variation
of Blow-
Count with
Bottom wall
thickness [mm]
33
Figure- 18
variation of
blowcounts vs bottom wall thickness
Analysis-13 Variation of Blow-Count with Pile penetration [m]
Figure-19 variation of blowcounts vs pile penetration
34
Chapter 6
Deployment
(Using Streamlit)
 Streamlit can install by running the following command in your terminal or command
prompt:
pip install streamlit
2. Create a Python script: Create a new Python script and import the necessary libraries,
including Streamlit and any other libraries you need for your application.
3. Define the layout: Use the Streamlit functions to define the layout of your application, such as
adding a title, header, sidebar, or footer.
4. Load data and create visualizations: Use Streamlit to load data and create visualizations such
as charts, tables, and maps.
5. Add interactive widgets: Streamlit provides several interactive widgets such as sliders,
dropdowns, checkboxes, and buttons, which you can use to allow users to interact with your
application.
6. Run the app: Run Streamlit application locally using the following command in terminal or
command prompt:
streamlit run your_app.py
7. Deploy the app: Once application created, deploy can be done to a web server or cloud
service such as Heroku or AWS. Streamlit provides several deployment options, including
Streamlit Cloud, which allows to deploy application with a single command.
35
Figure-20 Deployment
Chapter 7
Conclusion & Results
Table-4 Results - Algorithms and their respective accuracies in terms of R2 score
36
Figure-21 Barplots of models
 As from the results from model-1 the R2 score is less than 50%, model-1 is not recommended
for prediction of blow-counts.
 Model-2, Model-3 having R2 score value less than 70% if we use this the chances of error
from this model will be more than 30%, so these models also should not be recommended.
 Deep learning (ANN model) giving R2 score values of 74.01% which is less than some other
machine learning models.
 Model-7 (Gradient Boosting Regression), Model-6 (Random Forest Regression) having R2

score more than 80% which is quite better.
 Model-10 (Extra Trees Regression) having R2 score of 85.09% which is quite good and
highest among all the model, so this can be used for prediction.
 Pile driveability prediction is a vital aspect of geotechnical engineering that requires accurate
and reliable prediction tools. Recent studies have explored various methods such as WEA,
ANN models, and numerical simulations for pile driveability prediction, but there is still a
need for further research to improve the accuracy and applicability of these tools in different
soil conditions.
37
References
 Sang, S., Zhang, L., Shi, W., & Jiang, Y. (2020). Prediction of pile blow counts using
artificial neural networks. Automation in Construction, 119, 103348.
https://doi.org/10.1016/j.autcon.2020.103348
 Wang, C., Zhang, Y., Li, X., & Hu, H. (2016). An iterative WEA method for predicting pile
drivability parameters considering soil resistance, pile length and hammer performance.
Computers and Geotechnics, 70, 93-103. https://doi.org/10.1016/j.compgeo.2015.08.020
 Alm, T., Hamre, L., 2001. Soil model for pile driveability predictions based on CPT
interpretations. Presented at the International Conference On Soil Mechanics and Foundation
Engineering.https://www.issmge.org/publications/publication/soil-model-for-pile-
driveability-predictions-based-on-cpt-interpretations
 Scikit-learn: Machine Learning in Python , Pedregosa et al., JMLR 12, pp. 2825-2830,
2011.
38
39

Research Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Report

Uploaded by

Copyright:

Available Formats

A Project report on

FIGURE-1 FLOW CHART OF MACHINE LEARNING MODEL 14

FIGURE-2 LINEAR REGRESSION 14

FIGURE-3 MODEL-1 SIMPLE LINEAR REGRESSION 15

FIGURE-4 MODEL-2 MODIFIED LINEAR REGRESSION 16

FIGURE-5 MODEL-3 MODIFIED LINEAR REGRESSION 16

FIGURE-6 SUPPORT VECTOR MACHINE 23

FIGURE-7 ARTIFICIAL NEURAL NETWORK 25

FIGURE-8 VARIATION OF DEPTH 30

FIGURE-9 VARIATION OF BLOWCOUNTS 30

FIGURE-10 VARIATION OF BLOWCOUNTS VS DEPTH 31

FIGURE-11 VARIATION OF BLOWCOUNTS VS qc 31

FIGURE-12 VARIATION OF BLOWCOUNTS VS fs 32

FIGURE-13 VARIATION OF BLOWCOUNTS VS Uz 32

FIGURE-14 VARIATION OF BLOWCOUNTS VS 33

FIGURE-15 VARIATION OF BLOWCOUNTS VS 33

FIGURE-16 VARIATION OF BLOWCOUNTS VS 34

FIGURE-17 VARIATION OF BLOWCOUNTS VS DIAMETER 34

FIGURE-18 VARIATION OF BLOWCOUNTS VS 35

FIGURE-19 VARIATION OF BLOWCOUNTS VS 35

FIGURE-21 BARPLOTS OF MODELS 37

TABLE-1 ACTUAL AND PREDICTED VALUES BLOW 28

TABLE-2 RANGES OF VALUES IN GIVEN DATASET 29

TABLE-3 SAMPLE DATA 30

TABLE-4 CORRELATION OF BLOWCOUNTS WITH ALL 32

TABLE-5 RESULTS-ALGORITHRMS AND THEIR 38

4. Deployment of machine learning model.

Figure-1 Flow chart of machine learning process

In this we have used simple Linear regression machine learning model

Linear Regression is a machine

Figure-2 Linear Regression

Model 1: Linear model based on normalized ENTHRU only

blow counts=a0 +a1 ( ENTHRU )normalized + €

Figure-3 Model-1 simple linear regression

Model 2: ENTHRU feature is modified to hyperbolic tanh function

ENTHR U modified =tanh ( 5∗ENTHR U normalized −0.5 )

blow counts=a0 +a1 ( ENTHRU )modified +€

Figure-5 Model-3 modified linear regression

where pi is the proportion of data points in S that

when all data points in S belong to the same class.

Some important definitions:

Figure-7 Artificial neural network

Table-2 Ranges of values in given dataset

 References of the dataset - https://www.kaggle.com/c/isfog2020-pile-driving-predictions/

 z [m] - depth below mudline

Figure-8 Variation of depth

Table-3 Correlation table of Blow-Counts with all others parameters

Figure-10 variation of blowcounts vs depth

Figure-12 variation of blowcounts vs fs

Figure-14 variation of blowcounts vs Normalised ENTRHU [-]

Analysis-9 Variation of Blow-Count with Normalised hammer energy [-]

blowcounts vs Normalised hammer energy [-]

Analysis-11 Variation of Blow-Count with Diameter [m]

Figure-17 variation of blowcounts vs diameter

blowcounts vs bottom wall thickness

Analysis-13 Variation of Blow-Count with Pile penetration [m]

Figure-19 variation of blowcounts vs pile penetration

Table-4 Results - Algorithms and their respective accuracies in terms of R2 score

 Model-7 (Gradient Boosting Regression), Model-6 (Random Forest Regression) having R2

You might also like