Professional Documents
Culture Documents
Prerequisite
Dataset:
Here, we have used Yahoo Finance to get the share market dataset.
Output:
[*********************100%%**********************] 1 of 1 completed
ncols = 2
nrows = int(round(df_plot.shape[1] / ncols, 0))
data_plot(df)
Output :
Line plots showing the features of Apple Inc. stock through time
We follow the common practice of splitting the data into training and testing set. We calculate
the length of the training datasets and print their respective shapes to confirm the split.
Generally, the split is 80:20 for training and test set.
# Train-Test Split
# Setting 80 percent data for training
training_data_len = math.ceil(len(df) * .8)
training_data_len
Output:
(6794, 1) (1698, 1)
Here, we are choosing the feature (‘Open’ prices), reshaping it into the necessary 2D format, and
validating the resulting shape to make sure it matches the anticipated format for model input, this
method prepares the training data for use in a neural network.
Training Data
Output:
(6794, 1)
Testing Data
Output:
(1698, 1)
We carefully prepared the training and testing datasets to guarantee that our model could produce
accurate predictions. We made the issue one that was suited for supervised learning by creating
sequences with the proper lengths and their related labels.
Normalization
We have applied Min-Max scaling which is a standard preprocessing step in machine learning
and time series analysis, to the dataset_test data. It adjusts the values to be between [0, 1],
allowing neural networks and other models to converge more quickly and function better. The
normalized values are contained in the scaled_test array as a consequence, ready to be used in
modeling or analysis.
print(scaled_train[:5])
# Normalizing values between 0 and 1
scaled_test = scaler.fit_transform(dataset_test)
print(*scaled_test[:5]) #prints the first 5 rows of scaled_test
Output:
In this step, it is necessary to separate the time-series data into X_train and y_train from the
training set and X_test and y_test from the testing set. Time series data are transformed into a
supervised learning problem that may be used to develop the model. While iterating through the
time series data, the loop generates input/output sequences of length 50 for training data and
sequences of length 30 for the test data. We can predict future values using this technique while
taking into account the data’s temporal dependence on earlier observations.
We prepare the training and testing data for a neural network by generating sequences of a given
length and their related labels. It then converts these sequences to NumPy arrays and PyTorch
tensors.
Training Data
Output:
Testing Data
Output:
batch_first= True: input data will have the batch size as the first dimension
The function super(LSTMModel, self).__init__() initializes the parent class for building the
neural network.
The forward method defines the forward pass of the model, where the input x is processed
through the layers of the model to produce an output.
class LSTMModel(nn.Module):
# input_size : number of features in input at each time step
# hidden_size : Number of LSTM units
# num_layers : number of LSTM layers
def __init__(self, input_size, hidden_size, num_layers):
super(LSTMModel, self).__init__() #initializes the parent class
nn.Module
self.lstm = nn.LSTM(input_size, hidden_size, num_layers,
batch_first=True)
self.linear = nn.Linear(hidden_size, 1)
Output:
cuda
Defining the model
Now, we define the model, loss function and optimizer for the forecasting. We have adjusted the
hyperparameters of the model and set the loss fuction to mean squared error. To optimize the
parameters during the training, we have considered Adam optimizer.
input_size = 1
num_layers = 2
hidden_size = 64
output_size = 1
loss_fn = torch.nn.MSELoss(reduction='mean')
Output:
LSTMModel(
(lstm): LSTM(1, 32, num_layers=2, batch_first=True)
(linear): Linear(in_features=32, out_features=1, bias=True)
)
batch_size = 16
# Create DataLoader for batch training
train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=batch_size, shuffle=True)
num_epochs = 50
train_hist =[]
test_hist =[]
# Training loop
for epoch in range(num_epochs):
total_loss = 0.0
# Training
model.train()
for batch_X, batch_y in train_loader:
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
predictions = model(batch_X)
loss = loss_fn(predictions, batch_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
total_test_loss += test_loss.item()
Output:
We have plotted the learning curve to track the progress and give us an idea, how much time
time and training is required by the model to understand the patterns.
x = np.linspace(1,num_epochs,num_epochs)
plt.plot(x,train_hist,scalex=True, label="Training loss")
plt.plot(x, test_hist, label="Test loss")
plt.legend()
plt.show()
Output:
Step 7: Forecasting
After training the neural network on the provided data, now comes the forecasting for next
month. The model predicts the future opening price and store the future values along with their
corresponding dates. Using for loop, we are going to perform a rolling forecasting, the steps are
as follows –
We have set the future time steps to 30 and converted the test sequence to numpy array
and remove singleton dimensions using sequence_to_plot.
Then, we have converted historical_data to a Pytorch tensor. The shape of the tensor is
(1, sequence_length, 1), where sequence_length is the length of the historical data
sequence.
the model further predicts the next value based on the ‘historical_data_tensor’.
The prediction is then converted to a numpy array and the first element is extracted.
Once the loop ends, we get the forecasted values, which are stored in list, and future dates are
generated to create index for these values.
Once, we have forecasted the future prices, we can visualize the same using line plots. We have
plotted the graph for a specific time range. The blue line is the indicator of the test data. Then we
plot the last 30-time steps of the test data index using the green colored line plot.
The forecasted values are plotted using red colored line plot that uses a combined index that
includes both the historic data and future dates.
#Test data
plt.plot(test_data.index[-100:-30], test_data.Open[-100:-30], label =
"test_data", color = "b")
#reverse the scaling transformation
original_cases = scaler.inverse_transform(np.expand_dims(sequence_to_plot[-1],
axis=0)).flatten()
#Forecasted Values
#reverse the scaling transformation
forecasted_cases = scaler.inverse_transform(np.expand_dims(forecasted_values,
axis=0)).flatten()
# plotting the forecasted values
plt.plot(combined_index[-60:], forecasted_cases, label='forecasted values',
color='red')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.title('Time Series Forecasting')
plt.grid(True)
Output:
By plotting the test data, actual values and model’s forecasting data. We got a clear idea of how
well the forecasted values are aligning with the actual time series.
The intriguing field of time series forecasting using PyTorch and LSTM neural networks has
been thoroughly examined in this paper. In order to collect historical stock market data using
Yahoo Finance module, we imported the yfinance library and started the preprocessing step.
Then we applied crucial actions like data loading, train-test splitting, and data scaling to make
sure our model could accurately learn from the data and make predictions.
For more accurate forecasts, additional adjustments, hyperparameter tuning, and optimization are
frequently needed. To improve predicting capabilities, ensemble methods and other cutting-edge
methodologies can be investigated.
We have barely begun to explore the enormous field of time series forecasting in this essay.
There is a ton more to learn, from managing multi-variate time series to resolving practical
problems in novel ways. With this knowledge in hand, you’re prepared to use PyTorch and
LSTM neural networks to go out on your own time series forecasting adventures.