You are on page 1of 5

1

Forecasting Stability Categories Using Neural


Networks
Steevo Xavier

Abstract—Using Feed Forward Neural Network to construct a prediction Model for forecasting stability categories. Exploratory data
analysis for the understanding of the provided data.

Index Terms— Forecasting stability categories using neural networks ,Deep Learning, EDA

I. Introduction algorithms for classification, regression, clustering, and


more.
D eep Learning a subset of the machine learning has
revolutionized various industries by enabling computers
• keras: A high-level neural network library built on top of
to learn and make decisions from complex and unstructured
TensorFlow and Theano. It simplifies the process of
data. At the heart, the neural networks consisting of constructing neural network architectures and
interconnected layers of artificial neurons performs specific facilitates rapid experimentation.
computation on the input data and pass it through the network
to generate an output. • tensorflow: An open-source deep learning framework
The remarkable feature of deep learning is its ability to developed by Google. It provides a flexible platform
automatically extract relevant features from raw data, for building and training various types of neural
eliminating the need for manual feature engineering, which networks.
can at times be difficult while engaging with high volumes of
data like image processing or text analysis. • ann_visualizer: A library that aids in visualizing the
II. LIBRARIES USED architecture of neural networks, including activations
and connections between layers.
Several libraries and frameworks were utilized to perform
various tasks sauch as data manipulation, model construction,
visualization, and evaluation. The following is a list of key
libraries used throughout the project:

• pandas: A versatile data manipulation library used for


loading, cleaning, and organizing datasets. It provides
data structures for efficient data handling and
transformation.

• numpy: A fundamental library for numerical


computations in Python. It offers support for arrays and
matrices, enabling efficient mathematical operations on
large datasets.

• matplotlib: A popular data visualization library that aids


in creating various types of plots and charts to visually
represent data distributions, relationships, and trends. Figure 1: Libraries Used

• seaborn: Built on top of matplotlib, seaborn is another


data visualization library that provides an interface for
III. DATA ANALYTIC STEPS
creating aesthetically pleasing and informative
statistical graphics. Data analytics involves extracting valuable insights and
knowledge from raw data to make informed decisions. The
• sklearn (scikit-learn): A comprehensive machine learning data we use is for the prediction of stability categories using
library that offers tools for data preprocessing, model neural networks, the following data analytics steps were
selection, training, and evaluation. It includes various performed:
2

a. Data Loading and Understanding records for training and testing of the model.
b. Data Preprocessing
c. Exploratory Data Analysis (EDA)
d. Data Preparation
e. Neural Network Architecture
f. Model Compilation and Training
g. Model Evaluation and Visualization
h. Evaluation Metrics

A. Data Loading and Understanding


Import the necessary libraries such as pandas, numpy, and
matplotlib. The training and testing datasets were loaded using
the read_csv function from pandas. The info() method
provided an overview of the datasets, including the number of
entries, data types, and memory usage.

Figure 3: Details of the Null values in the train and test dataset
C. Exploratory Data Analysis (EDA)

EDA involves analyzing and visualizing data to gain


insights into its characteristics. Various visualizations were
created to understand the distribution of stability categories,
explore feature distributions, and examine correlations
between numerical features. Histograms, count plots,
heatmaps, and descriptive statistics were used to uncover
patterns and relationships in the data.

Figure 2: Dataframe details

B. Data Preprocessing
Data preprocessing is crucial to ensure data quality and
suitability for analysis. The datasets were examined for
missing values using the isnull().sum() method. Any missing
values were addressed by using techniques like dropping rows
or imputing values. This step ensured that the data was ready
for exploration and modelling.

We will be using dropna function to drop the rows with


NaN values at later stage of the analysis. As we have enough Figure 4: Stability Category Distribution
3

The correlation matrix provides us the insights to the


relations between the data in the dataset. With the EDA we got
the insights of the data and the features of our dataset..

D. Data Preparation
The data was divided into input features and target labels.
Categorical labels were converted to one-hot encoded format
using the to_categorical function from Keras. Unnecessary
columns, such as 'Timestamp' and 'SCMstability_category',
were removed from the datasets to isolate relevant features.
The rows containing the null values are deleted using the
dropna function.

Figure 5: Box plot for the risk indexes

Figure 8: Code for Data Preparation

X_train dataframe contains the features of our data for


training the model.

Figure 6: Feature Distribution

Figure 9: X_Train Dataframe

Y_train dataframe contains the categorical data which has


been converted to one-hot encoded labels.

Figure 10: Y_train Dataframe array

Figure 7: Correlation Matrix


4

E. Neural Network Architecture

A neural network model was constructed using the Keras


library. The architecture included hidden layers with ReLU
activation functions and an output layer with softmax
activation. The model's summary and architecture
visualization were generated to understand its structure.
Code for the construction of the neural network, we will be
using FFNN for the model.
# Build the neural network
model = Sequential()
model.add(Dense(64, activation='relu',
input_dim=X_train.shape[1]))
model.add(Dense(32, activation='relu'))
model.add(Dense(y_train.shape[1], activation='softmax'))
Figure 13: Neural Network Model

The neural network comprises of input layers , 2 hidden layers


and one output layer.
There are 5 inputs and 5 outputs. From each of the input
neurons, there are 5 outputs connected to the 1st hidden layer.
1st hidden layer consists of 64 neurons and each receives 5
inputs and 64 outputs. These 64 outputs are connected to the
32 neurons in the 2nd Hidden layer. 2nd hidden layer has 64
inputs receiving from 2nd hidden layer and 32 outputs going
to the 5 outputs. The output layer receives 32 inputs from the
2nd Hidden layer and has 5 outputs. The 5 outputs are the 5
categories we are having.
F. Model Compilation and Training
The neural network model was compiled using an optimizer
Figure 11: Model Summary (Adam) and a loss function (categorical cross-entropy). The
model was then trained using the training data. Techniques
like early stopping and learning rate reduction were applied
using Keras callbacks to prevent overfitting and optimize
training.

Figure 14: Compilation and Training of the model

Figure 12: Model Plot


5

G. Model Evaluation and Visualization H. Evaluation Metrics


Further evaluation metrics were calculated, including the
Model evaluation involved various steps: confusion matrix, F1 score, and accuracy. These metrics
provided a comprehensive understanding of the model's
• ROC Curve: An ROC curve was plotted to visualize performance.
the true positive rate vs. false positive rate for a Mean squared error ( MSE) and Root Mean Squared Error
specific class. are also plotted to find the error values of the model.

Figure 15: ROC curve

• Accuracy and Loss Curves: Training and validation


accuracy and loss curves were plotted using Figure 18: Confusion Matrix & F1 Score
matplotlib to track model performance during
training. The accuracy of the model was around 98
percentage.
Figure 19: MSE & RMSE Values

IV. CONCLUSION
The neural networks for stability category prediction,
showcased the efficacy of deep learning in complex data
analysis. We used data preprocessing, neural network
architecture creation, model training, thorough evaluation, and
insightful interpretation. Key evaluation metrics, like ROC
curves and F1 scores, underscored the model's competence in
forecasting stability categories. Visualizations and
Figure 16: Model Accuracy Train VS Test comprehensive reporting facilitated clear communication of
findings.

V. REFERENCES

Pedregosa, F., et al. (2011). Journal of Machine Learning


Research, 12, 2825-2830.
Seaborn Development Team. (2020). Seaborn: statistical data
visualization. https://seaborn.pydata.org/
The TensorFlow Authors. (2019). TensorFlow.
https://github.com/tensorflow/tensorflow

Figure 17: Model Loss Train VS Test

You might also like