Professional Documents
Culture Documents
Search...
Long Short-Term Memory networks, or LSTMs for short, can be applied to time series forecasting.
There are many types of LSTM models that can be used for each specific type of time series
forecasting problem.
In this tutorial, you will discover how to develop a suite of LSTM models for a range of standard time
series forecasting problems.
The objective of this tutorial is to provide standalone examples of each model on each type of time
series problem as a template that you can copy and adapt for your specific time series forecasting
problem.
This is a large and important post; you may want to bookmark it for future reference.
Kick-start your project with my new book Deep Learning for Time Series Forecasting, including
step-by-step tutorials and the Python source code files for all examples.
Tutorial Overview
In this tutorial, we will explore how to develop a suite of different types of LSTM models for time
series forecasting.
The models are demonstrated on small contrived time series problems intended to give the flavor of
the type of time series problem being addressed. The chosen configuration of the models is arbitrary
and not optimized for each problem; that was not the goal.
These are problems comprised of a single series of observations and a model is required to learn
from the series of past observations to predict the next value in the sequence.
We will demonstrate a number of variations of the LSTM model for univariate time series forecasting.
1. Data Preparation
2. Vanilla LSTM
3. Stacked LSTM
4. Bidirectional LSTM
5. CNN LSTM
6. ConvLSTM
Each of these models are demonstrated for one-step univariate time series forecasting, but can
easily be adapted and used as the input part of a model for other types of time series forecasting
problems.
Data Preparation
Before a univariate series can be modeled, it must be prepared.
The LSTM model will learn a function that maps a sequence of past observations as input to an
output observation. As such, the sequence of observations must be transformed into multiple
examples from which the LSTM can learn.
We can divide the sequence into multiple input/output patterns called samples, where three time
steps are used as input and one time step is used as output for the one-step prediction that is being
learned.
The split_sequence() function below implements this behavior and will split a given univariate
sequence into multiple samples where each sample has a specified number of time steps and the
output is a single time step.
We can demonstrate this function on our small contrived dataset above.
Running the example splits the univariate series into six samples where each sample has three input
time steps and one output time step.
Now that we know how to prepare a univariate series for modeling, let’s look at developing LSTM
models that can learn the mapping of inputs to outputs, starting with a Vanilla LSTM.
Click to sign-up and also get a free PDF Ebook version of the course.
Vanilla LSTM
A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units, and an output layer
used to make a prediction.
We can define a Vanilla LSTM for univariate time series forecasting as follows.
Key in the definition is the shape of the input; that is what the model expects as input for each
sample in terms of the number of time steps and the number of features.
We are working with a univariate series, so the number of features is one, for one variable.
The number of time steps as input is the number we chose when preparing our dataset as an
argument to the split_sequence() function.
The shape of the input for each sample is specified in the input_shape argument on the definition of
first hidden layer.
We almost always have multiple samples, therefore, the model will expect the input component of
training data to have the dimensions or shape:
Our split_sequence() function in the previous section outputs the X with the shape [samples,
timesteps], so we easily reshape it to have an additional dimension for the one feature.
In this case, we define a model with 50 LSTM units in the hidden layer and an output layer that
predicts a single numerical value.
The model is fit using the efficient Adam version of stochastic gradient descent and optimized using
the mean squared error, or ‘mse‘ loss function.
Once the model is defined, we can fit it on the training dataset.
We can predict the next value in the sequence by providing the input:
The model expects the input shape to be three-dimensional with [samples, timesteps, features],
therefore, we must reshape the single input sample before making the prediction.
We can tie all of this together and demonstrate how to develop a Vanilla LSTM for univariate time
series forecasting and make a single prediction.
Running the example prepares the data, fits the model, and makes a prediction.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
We can see that the model predicts the next value in the sequence.
Stacked LSTM
Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a
Stacked LSTM model.
An LSTM layer requires a three-dimensional input and LSTMs by default will produce a two-
dimensional output as an interpretation from the end of the sequence.
We can address this by having the LSTM output a value for each time step in the input data by
setting the return_sequences=True argument on the layer. This allows us to have 3D output from
hidden LSTM layer as input to the next.
We can tie this together; the complete code example is listed below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
Running the example predicts the next value in the sequence, which we expect would be 100.
Bidirectional LSTM
On some sequence prediction problems, it can be beneficial to allow the LSTM model to learn the
input sequence both forward and backwards and concatenate both interpretations.
We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first
hidden layer in a wrapper layer called Bidirectional.
An example of defining a Bidirectional LSTM to read input both forward and backward is as follows.
The complete example of the Bidirectional LSTM for univariate time series forecasting is listed below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
Running the example predicts the next value in the sequence, which we expect would be 100.
CNN LSTM
A convolutional neural network, or CNN for short, is a type of neural network developed for working
with two-dimensional image data.
The CNN can be very effective at automatically extracting and learning features from one-
dimensional sequence data such as univariate time series data.
A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to
interpret subsequences of input that together are provided as a sequence to an LSTM model to
interpret. This hybrid model is called a CNN-LSTM.
The first step is to split the input sequences into subsequences that can be processed by the CNN
model. For example, we can first split our univariate time series data into input/output samples with
four steps as input and one as output. Each sample can then be split into two sub-samples, each
with two time steps. The CNN can interpret each subsequence of two time steps and provide a time
series of interpretations of the subsequences to the LSTM model to process as input.
We can parameterize this and define the number of subsequences as n_seq and the number of time
steps per subsequence as n_steps. The input data can then be reshaped to have the required
structure:
For example:
We want to reuse the same CNN model when reading in each sub-sequence of data separately.
This can be achieved by wrapping the entire CNN model in a TimeDistributed wrapper that will apply
the entire model once per input, in this case, once per input subsequence.
The CNN model first has a convolutional layer for reading across the subsequence that requires a
number of filters and a kernel size to be specified. The number of filters is the number of reads or
interpretations of the input sequence. The kernel size is the number of time steps included of each
‘read’ operation of the input sequence.
The convolution layer is followed by a max pooling layer that distills the filter maps down to 1/2 of
their size that includes the most salient features. These structures are then flattened down to a single
one-dimensional vector to be used as a single input time step to the LSTM layer.
Next, we can define the LSTM part of the model that interprets the CNN model’s read of the input
sequence and makes a prediction.
We can tie all of this together; the complete example of a CNN-LSTM model for univariate time
series forecasting is listed below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
Running the example predicts the next value in the sequence, which we expect would be 100.
ConvLSTM
A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading of input
is built directly into each LSTM unit.
The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be
adapted for use with univariate time series forecasting.
The layer expects input as a sequence of two-dimensional images, therefore the shape of input data
must be:
For our purposes, we can split each sample into subsequences where timesteps will become the
number of subsequences, or n_seq, and columns will be the number of time steps for each
subsequence, or n_steps. The number of rows is fixed at 1 as we are working with one-dimensional
data.
We can now reshape the prepared samples into the required structure.
We can define the ConvLSTM as a single layer in terms of the number of filters and a two-
dimensional kernel size in terms of (rows, columns). As we are working with a one-dimensional
series, the number of rows is always fixed to 1 in the kernel.
The output of the model must then be flattened before it can be interpreted and a prediction made.
The complete example of a ConvLSTM for one-step univariate time series forecasting is listed below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
Running the example predicts the next value in the sequence, which we expect would be 100.
Now that we have looked at LSTM models for univariate data, let’s turn our attention to multivariate
data.
There are two main models that we may require with multivariate time series data; they are:
The input time series are parallel because each series has an observation at the same time steps.
We can demonstrate this with a simple example of two parallel input time series where the output
series is the simple addition of the input series.
We can reshape these three arrays of data as a single dataset where each row is a time step, and
each column is a separate time series. This is a standard way of storing parallel time series in a CSV
file.
Running the example prints the dataset with one row per time step and one column for each of the
two input and one output parallel time series.
As with the univariate time series, we must structure these data into samples with input and output
elements.
An LSTM model needs sufficient context to learn a mapping from an input sequence to an output
value. LSTMs can support parallel input time series as separate variables or features. Therefore, we
need to split the data into samples maintaining the order of observations across the two input
sequences.
If we chose three input time steps, then the first sample would look as fo
Input:
Output:
That is, the first three time steps of each parallel series are provided as input to the model and the
model associates this with the value in the output series at the third time step, in this case, 65.
We can see that, in transforming the time series into input/output samples to train the model, that we
will have to discard some values from the output time series where we do not have values in the
input time series at prior time steps. In turn, the choice of the size of the number of input time steps
will have an important effect on how much of the training data is used.
We can define a function named split_sequences() that will take a dataset as we have defined it with
rows for time steps and columns for parallel series and return input/output samples.
We can test this function on our dataset using three time steps for each input time series as input.
The first dimension is the number of samples, in this case 7. The second dimension is the number of
time steps per sample, in this case 3, the value specified to the function. Finally, the last dimension
specifies the number of parallel time series or the number of variables, in this case 2 for the two
parallel series.
This is the exact three-dimensional structure expected by an LSTM as input. The data is ready to
use without further reshaping.
We can then see that the input and output for each sample is printed, showing the three time steps
for each of the two input series and the associated output for each sample.
Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked,
Bidirectional, CNN, or ConvLSTM model.
We will use a Vanilla LSTM where the number of time steps and parallel series (features) are
specified for the input layer via the input_shape argument.
When making a prediction, the model expects three time steps for two input time series.
We can predict the next value in the output series providing the input values of:
The shape of the one sample with three time steps and two variables must be [1, 3, 2].
We would expect the next value in the sequence to be 100 + 105, or 205.
Running the example prepares the data, fits the model, and makes a prediction.
We may want to predict the value for each of the three time series for the next time step.
Again, the data must be split into input/output samples in order to train a model.
Input:
Output:
The split_sequences() function below will split multiple parallel time series with rows for time steps
and one series per column into the required input/output shape.
We can demonstrate this on the contrived problem; the complete example is listed below.
Running the example first prints the shape of the prepared X and y components.
The shape of X is three-dimensional, including the number of samples (6), the number of time steps
chosen per sample (3), and the number of parallel time series or feature
The shape of y is two-dimensional as we might expect for the number of samples (6) and the number
of time variables per sample to be predicted (3).
The data is ready to use in an LSTM model that expects three-dimensional input and two-
dimensional output shapes for the X and y components of each sample.
Then, each of the samples is printed showing the input and output components of each sample.
Any of the varieties of LSTMs in the previous section can be used, such as a Vanilla, Stacked,
Bidirectional, CNN, or ConvLSTM model.
We will use a Stacked LSTM where the number of time steps and parallel series (features) are
specified for the input layer via the input_shape argument. The number of parallel series is also used
in the specification of the number of values to predict by the model in the output layer; again, this is
three.
We can predict the next value in each of the three parallel series by providing an input of three time
steps for each series.
The shape of the input for making a single prediction must be 1 sample, 3 time steps, and 3 features,
or [1, 3, 3]
We would expect the vector output to be:
We can tie all of this together and demonstrate a Stacked LSTM for multivariate output time series
forecasting below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few
average outcome.
Running the example prepares the data, fits the model, and makes a prediction.
Specifically, these are problems where the forecast horizon or interval is more than one time step.
There are two main types of LSTM models that can be used for multi-step forecasting; they are:
Before we look at these models, let’s first look at the preparation of data for multi-step forecasting.
Data Preparation
As with one-step forecasting, a time series used for multi-step time series forecasting must be split
into samples with input and output components.
Both the input and output components will be comprised of multiple time steps and may or may not
have the same number of steps.
We could use the last three time steps as input and forecast the next two time steps.
Input:
Output:
The split_sequence() function below implements this behavior and will split a given univariate time
series into samples with a specified number of input and output time steps.
We can demonstrate this function on the small contrived dataset.
Running the example splits the univariate series into input and output time steps and prints the input
and output components of each.
Now that we know how to prepare data for multi-step forecasting, let’s look at some LSTM models
that can learn this mapping.
This approach was seen in the previous section were one time step of each output time series was
forecasted as a vector.
As with the LSTMs for univariate data in a prior section, the prepared samples must first be
reshaped. The LSTM expects data to have a three-dimensional structure of [samples, timesteps,
features], and in this case, we only have one feature so the reshape is straightforward.
With the number of input and output steps specified in the n_steps_in and n_steps_out variables, we
can define a multi-step time-series forecasting model.
Any of the presented LSTM model types could be used, such as Vanilla, Stacked, Bidirectional,
CNN-LSTM, or ConvLSTM. Below defines a Stacked LSTM for multi-step forecasting.
The model can make a prediction for a single sample. We can predict the next two steps beyond the
end of the dataset by providing the input:
As expected by the model, the shape of the single sample of input data when making the prediction
must be [1, 3, 1] for the 1 sample, 3 time steps of the input, and the single feature.
Tying all of this together, the Stacked LSTM for multi-step forecasting with a univariate time series is
listed below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
Running the example forecasts and prints the next two time steps in the sequence.
Encoder-Decoder Model
A model specifically developed for forecasting variable length output sequences is called the
Encoder-Decoder LSTM.
The model was designed for prediction problems where there are both input and output sequences,
so-called sequence-to-sequence, or seq2seq problems, such as translating text from one language
to another.
As its name suggests, the model is comprised of two sub-models: the encoder and the decoder.
The encoder is a model responsible for reading and interpreting the input sequence. The output of
the encoder is a fixed length vector that represents the model’s interpretation of the sequence. The
encoder is traditionally a Vanilla LSTM model, although other encoder models can be used such as
Stacked, Bidirectional, and CNN models.
The decoder uses the output of the encoder as an input.
First, the fixed-length output of the encoder is repeated, once for each required time step in the
output sequence.
This sequence is then provided to an LSTM decoder model. The model must output a value for each
value in the output time step, which can be interpreted by a single output model.
We can use the same output layer or layers to make each one-step prediction in the output
sequence. This can be achieved by wrapping the output part of the model in a TimeDistributed
wrapper.
The full definition for an Encoder-Decoder model for multi-step time series forecasting is listed below.
As with other LSTM models, the input data must be reshaped into the expected three-dimensional
shape of [samples, timesteps, features].
In the case of the Encoder-Decoder model, the output, or y part, of the training dataset must also
have this shape. This is because the model will predict a given number of time steps with a given
number of features for each input sample.
The complete example of an Encoder-Decoder LSTM for multi-step time series forecasting is listed
below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
Running the example forecasts and prints the next two time steps in the sequence.
It is possible to mix and match the different types of LSTM models presented so far for the different
problems. This too applies to time series forecasting problems that involve multivariate and multi-
step forecasting, but it may be a little more challenging.
In this section, we will provide short examples of data preparation and modeling for multivariate
multi-step time series forecasting as a template to ease this challenge, specifically:
1. Multiple Input Multi-Step Output.
2. Multiple Parallel Input and Multi-Step Output.
Perhaps the biggest stumbling block is in the preparation of data, so this is where we will focus our
attention.
For example, consider our multivariate time series from a prior section:
We may use three prior time steps of each of the two input time series to predict two time steps of
the output time series.
Input:
Output:
We can see that the shape of the input portion of the samples is three-dimensional, comprised of six
samples, with three time steps, and two variables for the 2 input time series.
The output portion of the samples is two-dimensional for the six samples and the two time steps for
each sample to be predicted.
The prepared samples are then printed to confirm that the data was prepared as we specified.
We can now develop an LSTM model for multi-step predictions.
A vector output or an encoder-decoder model could be used. In this case, we will demonstrate a
vector output with a Stacked LSTM.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
It is a challenging framing of the problem with very little data, and the arbitrarily configured version of
the model gets close.
For example, consider our multivariate time series from a prior section:
We may use the last three time steps from each of the three time series as input to the model and
predict the next time steps of each of the three time series as output.
Input:
Output:
Running the example first prints the shape of the prepared training dataset.
We can see that both the input (X) and output (Y) elements of the dataset are three dimensional for
the number of samples, time steps, and variables or parallel time series respectively.
The input and output elements of each series are then printed side by side so that we can confirm
that the data was prepared as we expected.
We can use either the Vector Output or Encoder-Decoder LSTM to model this problem. In this case,
we will use the Encoder-Decoder model.
We would expect the values for these series and time steps to be as follows:
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the
average outcome.
We can see that the model forecast gets reasonably close to the expected values.
Further Reading
Long short-term memory, Wikipedia.
Deep Learning for Time Series Forecasting (my book)
Summary
In this tutorial, you discovered how to develop a suite of LSTM models for a range of standard time
series forecasting problems.
How to Develop Convolutional Neural Network Models for Time Series Forecasting
How to Grid Search Deep Learning Models for Time Series Forecasting
REPLY
Jenna Ma November 16, 2018 at 12:09 am #
REPLY
Jenna Ma November 16, 2018 at 7:27 pm #
REPLY
Jason Brownlee November 17, 2018 at 5:45 am #
Great!
REPLY
Ahmed Saif February 11, 2021 at 9:51 pm #
Thank you
REPLY
maria October 8, 2019 at 8:52 pm #
Hi!
i would like to cite your book “Deep Learning for Time Series Forecasting: Predict the Future
with MLPs, CNNs and LSTMs in Python.” Is there an appropriate format for doing this?
REPLY
Jason Brownlee October 9, 2019 at 8:12 am #
REPLY
H October 31, 2019 at 2:31 am #
Hi Jason,
I want please an example of Sliding window-based support vector regression for prediction.
have you this example .
Thanks a lot
REPLY
chi August 27, 2019 at 7:07 am #
Hello Jason,
Thank you so so much for your post, it was super helpful. For the multiple timesteps output LSTM
model, I am wondering what will be the difference of the performance between model-1 and
model-2? Model-1 is your multiple timesteps output LSTM model, for example, we input last 7
days data features, and the output is the next 5 days prices. Model-2 is the simple 1-timstep
output LSTM model, where the input is last 7 days data features, output is the next day price.
Then we use our predicted price as the new input to predict future prices until we predict all next 5
days prices.
I am wondering what are the key differences between those 2 strategies to predict the next 5
days prices? What are the advantages and disadvantages of those 2 LSTM models?
Thank you,
REPLY
Jason Brownlee August 27, 2019 at 2:06 pm #
Good question, the differences really depend on the choice of model and complexity
of the dataset.
REPLY
Rick March 28, 2020 at 5:45 pm #
Hey Jason,
Thanks for the blogs. They are really helpful and I have learned a lot from
machinelearningmastery.
This blog about LSTM is very informative, but I have a question
I have a set of amplitude scans, and I want to predict next scan (many to one problem). So my
data is of (6,590) and the result should be (1,590). 590 are the amplitude values in the scan.
Thanks
You’re welcome.
Try it and see. This framework will help:
https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
REPLY
Amy November 16, 2018 at 7:17 am #
Thanks Jason for this good tutorial. I have a question. When we have two different time
series, 1 and 2. Time series 1 will influence time series 2 and our goal is to predict the future value of
time series 2. How can we use LSTM for this case?
REPLY
Jason Brownlee November 16, 2018 at 1:55 pm #
I call this a dependent time series problem. I given an example of how to model it on this
post:
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
REPLY
Amy November 16, 2018 at 2:21 pm #
The link is the link of the current page, Do you mean that?
REPLY
Jason Brownlee November 17, 2018 at 5:41 am #
REPLY
Kwan November 22, 2018 at 8:03 pm #
Thanks Jason for this good tutorial, I have read your tutorial for a long time , I have a
question. How to use LSTM model forecasting Multi-Site Multivariate Time Series, such as EMC Data
Science Global Hackathon dataset, thank you very much!
REPLY
Jason Brownlee November 23, 2018 at 7:48 am #
REPLY
Jason Brownlee November 29, 2018 at 2:40 pm #
Sounds like the model has learned a persistance model and may not be skillful.
REPLY
Sudrit Saisa-ing August 9, 2019 at 8:48 pm #
Thank you
REPLY
Jason Brownlee August 10, 2019 at 7:16 am #
If your model is predicting a class label, you can specify the accuracy measure
as a metric in the call to compile() then use the evaluate() model to calculate the
accuracy.
You can learn how for an MLP in this post which will be the same for an LSTM:
http://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
REPLY
WLF December 5, 2018 at 6:04 pm #
REPLY
Jason Brownlee December 6, 2018 at 5:51 am #
REPLY
rkk621 December 6, 2018 at 2:27 am #
For the Multiple input case in Multivariate series, if we use three time steps and
10,15
20,25
30,35
as our inputs, shouldn’t the output (predicted val used for training) be
85
instead of 65?
REPLY
Jason Brownlee December 6, 2018 at 5:57 am #
In the chosen framing of the problem, we want to predict the output at t not t+1, given
inputs up to and including t.
You can choose to frame the problem differently if you like. It is arbitrary.
REPLY
WLF December 6, 2018 at 11:02 pm #
if you want to predict 85, you can change the code to:
REPLY
Ida December 10, 2018 at 7:55 pm #
REPLY
John December 12, 2018 at 9:21 am #
Hi Jason,
Thanks for this nice blog! I am new to LSTM in time-series, and I need your help.
Most info on internet is for a single time series and for next-step forecasting. I want to produce 6
months ahead forecast using previous 15 months for 100 different time series, each of length 54
months.
So, there is 34 windows for each time-series if we use sliding windows. So, my initial X_train has a
shape of (3400,15). Then. I am reshaping my X_train [samples, timesteps, features] as follows:
(3400, 15, 1). Is this reshaping correct? In genera, how can we choose “timesteps” and “features”
arguments in this multi-input multi-step forecast?
Also, how can I choose “batch_size” and “units”? Since I want 6 months ahead forecast, my output
should be a matrix with dimensions (100,6). I chose units=6, and batch_size=1. Are these numbers
correct?
REPLY
Jason Brownlee December 12, 2018 at 2:14 pm #
Looks good.
Time steps is really problem specific – e.g. how much history do you need to make a prediction.
Perhaps test with your data.
Batch size and units – again, depends on your problem. Test. 6 units is too few. Start with 100, try
500, 1000, etc. Batch size of 1 seems small, perhaps also try 32, 64, etc.
REPLY
John December 13, 2018 at 2:06 am #
Hi Jason,
I don’t understand “6 units is too few”. In documentation of lstm functions in R, units is defined
as “dimensionality of the output space”. Since I need an output with 6 columns (6 months
forecast), I define units=6. Any other number does not produce the output I want. Is there
anything wrong in my interpretation?
REPLY
Jason Brownlee December 13, 2018 at 7:55 am #
I recommend using a Dense layer as the output rather than the outputting from
the LSTM directly.
Then dramatically increase the capacity of the model by increasing the number of LSTM
units.
REPLY
Ravi Varma Injeti December 18, 2019 at 12:55 am #
Hii Jason that’s great tutorial. I have time series data of the size 2245 where timings of
bus from starting station to destination station. I want to find the pattern is it possible through
LSTM WITHOUT THE CATEGORICAL RESPONSES.
REPLY
Jason Brownlee December 18, 2019 at 6:08 am #
REPLY
Shaifali December 16, 2018 at 1:21 am #
Bidirectional LSTM works better than LSTM. Can you please explain the working of
bidirectional LSTM. Since we do not know future values. How do we do prediction?
REPLY
Jason Brownlee December 16, 2018 at 5:23 am #
It has two LSTM layers, one that processes the sequences forwards, and one that
processes it backwards.
REPLY
Jenna Ma December 16, 2018 at 4:24 pm #
In the last encoder-decoder model, if I have different features of input and output, is it correct
that I change the code like this?
model = Sequential()
model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features_in)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(200, activation=’relu’, return_sequences=True))
model.add(TimeDistributed(Dense(n_features_out)))
model.compile(optimizer=’adam’, loss=’mse’)
REPLY
Jason Brownlee December 17, 2018 at 6:19 am #
REPLY
Jenna Ma December 18, 2018 at 1:50 pm #
REPLY
Jason Brownlee December 18, 2018 at 2:36 pm #
It looks correct, but I don’t have the capacity to test the code to be sure.
Nice work! Now you can start tuning the model to lift skill.
REPLY
dani December 19, 2018 at 12:55 pm #
how we can test these examples if have big excel data set?and its time series data, kindly
refer to a link?
REPLY
Jason Brownlee December 19, 2018 at 2:30 pm #
Save as a CSV file then use code in this post to prepare it for modeling:
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
REPLY
mk December 20, 2018 at 7:13 pm #
REPLY
Jason Brownlee December 21, 2018 at 5:27 am #
REPLY
Lionel December 21, 2018 at 6:16 pm #
I want to predict visibility on one airport for the next 120 hours.
I already build a LSTM to predict the visibility for the next hour, solely based on visibility observation.
(Basically, the network learned that persistance is a good algorithm.)
REPLY
Jason Brownlee December 22, 2018 at 6:03 am #
REPLY
Lionel December 22, 2018 at 7:21 pm #
let’s say:
Input : last 120 h of measured visibility
weather forcast for the next 120 h
Implementation:
make visibility prediction every hour for the next 120 h
I have trouble to see how the LSTM will update its state every hour, since it will only get as
new information a measured visibility for the last hour, and not about the full 120 h prediction.
REPLY
Jason Brownlee December 23, 2018 at 6:04 am #
The model is only aware of the data that you provide it.
REPLY
Potofski December 22, 2018 at 3:48 am #
Thanks a lot for your post. Your work is a great resource on forecasts with lstm!
Assume, I have dependent time series (heating costs and temperature) and I want to predict the
dependent (heating costs), how could I implement temperature predictions (from other weather
forecasts) into my model for heating cost predictions?
Do you know of any common approaches to this? Or any papers on how to handle external forecasts
for independent variables?
REPLY
Jason Brownlee December 22, 2018 at 6:07 am #
REPLY
Jenna Ma January 4, 2019 at 9:35 pm #
Hi Jason,
I think I saw you mentioning the activation function ‘relu’ usually works better than ‘tanh’ in LSTM
model. But, I forget I saw this in which post. I don’t find any post from your blog that focuses on how
to choose the activation function. So, I submit this question under this post and hope you don’t mind.
Is it true that ‘relu’ often works better than ‘tanh’ in your experience? If you have any post talking
about activation function, please give me the title or URL.
Thank you very much!
REPLY
Jason Brownlee January 5, 2019 at 6:55 am #
It really depends on the dataset, I have found LSTMs with relu more robust on some
problems.
REPLY
Jenna Ma January 7, 2019 at 12:50 am #
Thank you! So, the way I can make sure which activation function is the best for my dataset
is to enumerate and see the results?
REPLY
Jason Brownlee January 7, 2019 at 6:37 am #
REPLY
Matt January 9, 2019 at 3:21 am #
All the content on your site is amazing, I really appreciate it. Thank you.
REPLY
Jason Brownlee January 9, 2019 at 8:47 am #
Thanks!
REPLY
Andrew Jabbitt January 10, 2019 at 4:23 am #
Hi Jason,
1 question: can you please explain the purpose of the out_seq series in the Multiple Parallel Series
example?
Many thanks,
Andrew
REPLY
Jason Brownlee January 10, 2019 at 7:57 am #
REPLY
Andrei February 20, 2020 at 2:59 am #
Correct me if I’m wrong, but isn’t the prediction the output? I mean, besides the way
you obtained the out_seq sequence in the first place, it’s no different than in_seq1 or in_seq2.
It could even be considered an engineered feature that expands the data.
REPLY
Jason Brownlee February 20, 2020 at 6:19 am #
REPLY
sophia January 22, 2019 at 8:29 am #
another great article, Jason! I’m trying to get started on a project that is similar to the LSTM
model described in this article: https://medium.com/bcggamma/using-deep-learning-to-predict-not-
just-what-but-when-fae6515acb1b
I’d greatly appreciate your input on how to develop an LSTM model that can predict ‘what’ a
consumer may buy and ‘when’ they will buy it;
Based on your article, it looks like the right model to choose would be Multiple Parallel Input and
Multi-Step Output. Would you agree or do you think i should choose a different model? Any pointers
or links to relevant articles would help!
Thanks,
REPLY
Jason Brownlee January 22, 2019 at 11:42 am #
I’d encourage you to prototype and explore a suite of different framings of the problem in
order to discover what works best for your specific dataset.
REPLY
Raman January 22, 2019 at 3:17 pm #
I have used your code to get started, at the last step I am getting a below error-
NameError: name ‘to_list’ is not defined
REPLY
Jason Brownlee January 23, 2019 at 8:43 am #
Ensure you have copied all of the code from the tutorial, I have more suggestions here:
https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-
me
REPLY
Raman January 23, 2019 at 4:09 pm #
Hi Jason,
Thanks for taking time out, I have copied your code line by line and checked couple of times as well.
Example is from Vanila LSTM.
Checks done-
I was getting some error, then I followed stack overflow and downgraded my keras to Version: 2.1.5
I searched stack overflow and related questions and even posted my questions there.
REPLY
Jason Brownlee January 24, 2019 at 6:38 am #
REPLY
Sarra January 30, 2019 at 3:02 am #
Please, have you an example of LSTM encoder-decoder with the train / test-evaluation
partitions.
# reshape
# fit model
model.fit(trainX, trainy, epochs=5, verbose=2)
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
I may, you can use the search box to look at all tutorials that use the encoder-decoder
pattern.
REPLY
Gunay February 6, 2019 at 2:43 am #
Hi Jason,
Thanks for this tutorial. I am quite new to the time series forecasting with LSTM. I have a question
about the part “Multiple Parallel Input and Multi-Step Output”. The output data shape is (5,2,3). I mean
the each instance on the output is not just a sequence, It is a sequence of sequence. And you have
show the example there with Encoder and Decoder. I just want to implement one of the methods of
Stacked or Bidirectional LSTM. But I am not sure which number I should put the Dense layer. For
example, in the previous examples, the output shape is like (6,2) and It is obvious we should put 2 for
the Dense layer. But I can not figure out the right thing for the Stacked LSTM. Do you have any
example tutorial for this?
Kind Regards,
Gunay
REPLY
Jason Brownlee February 6, 2019 at 7:51 am #
With multi-step output, the number of nodes in the output layer must match the number
of output time steps.
With multivariate multi-step, a vanilla or bidirectional LSTM is not suited. You could force it, but
you will need n x m nodes in the output for n time steps for m time series. The time steps of each
series would be flattened in this structure. You must interpret each of the outputs as a specific
time step for a specific series consistently during training and prediction.
REPLY
Gunay February 6, 2019 at 7:27 pm #
Thank you!
REPLY
Gunay February 6, 2019 at 7:29 pm #
Is there any alternative structure for this kind of problems except Encoder-Decoder?
Jason Brownlee February 7, 2019 at 6:37 am #
Yes, the one I described. There may be others, it is good to brainstorm and prototype
approaches.
REPLY
Tian February 10, 2019 at 5:29 pm #
Thanks for your great tutorial. I just wonder should we avoid using bidirectional LSTM for
time series data? Does it mean we use future data to train the past model parameters?
REPLY
Jason Brownlee February 11, 2019 at 7:56 am #
No, it means the model will process the input sequence forwards and backwards at the
same time.
REPLY
Gunay February 15, 2019 at 8:56 am #
Hi Jason,
I faced one problem and just interesting maybe you did it before. I have the forecasting problem as
like Multiple Input Multi-Step Output but a little bit different. Let’s just assume, my input(which are
features dataset) and output (target we want to forecast) datasets have historic data. And I should
forecast one week ahead for the target. But I have also the one week ahead forecasted input
dataset(which is forecasted by another system). I should use both the historic input and one week
ahead forecasted input to forecast one week ahead output. But I do not know how I should use that
one week ahead forecasted input data during the learning process. Can you give me any hint?
REPLY
Jason Brownlee February 15, 2019 at 2:20 pm #
Perhaps use a model with two heads, one for the historical data and one for the other
forecast?
The functional API will help you to design a model of this type:
https://machinelearningmastery.com/keras-functional-api-deep-learning/
REPLY
Anirban February 15, 2019 at 4:08 pm #
What if we want to predict anything for the next 20 upcoming days! Here sequentially we
have to predict for 20 days. How can we apply LSTM here?
REPLY
Jason Brownlee February 16, 2019 at 6:13 am #
Yes, although the further you predict into the future, the more error the model will make.
This is called multi-step forecasting, there are many examples, perhaps start here:
https://machinelearningmastery.com/start-here/#deep_learning_time_series
REPLY
Aaron March 6, 2019 at 2:47 pm #
HI Jason, thanks for all the tutorials. They are really helpful. I am looking to try and
implement an LSTM that returns a sequence, and had read this tutorial –
https://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-
networks-python/
One thing I am having trouble understanding is how to really shape the input data and get a sequence
output using Tensorflow / Keras. I am looking to predict the sequence T – T+12 hours using T-1 –
T-48 hours. So predicting the next 12 hours from the last 48 hours in 1 hour increments. Each hour of
data has a dozen or so features for that time step. From what I have read of yours so far it seems as
if each of the 48 previous time steps should be considered features of the time step T to predict a
sequence for the next 12 hours. And so basically, from what I gather, I would end up with the input for
Timestep T having 576 columns (48 time steps, each with 12 features) – I mean does that seem
right? I am also a bit unsure of what particular model I should use… is it going to be a multi-step,
multi-input network… just a bit confused on the jargon as well and maybe thats why I’m having
trouble figuring out what I need to do.
Looking at some of your books too, but not sure what might be the right one to help guide me through
a problem like this.
Thanks,
Aaron
REPLY
Jason Brownlee March 6, 2019 at 2:49 pm #
REPLY
Aaron March 6, 2019 at 11:59 pm #
Thanks! That definitely makes sense now from the input shape standpoint. If I have
20 samples with 48 timesteps and 12 features the input shape would be [20, 48, 12]
For the output however, looking through the Keras docs https://keras.io/layers/recurrent/, I am
trying to get a return sequence. Would I be using a 3D tensor? (batch_size, timesteps, units)
where it would look like (20, 12, 1)? Since I am trying to find 1 value at each of the 12 time
steps for the sample size of 20
Thanks again!
Aaron
REPLY
Jason Brownlee March 7, 2019 at 6:52 am #
I don’t recommend returning a sequence from the LSTM itself, instead use an
encoder-decoder model:
https://machinelearningmastery.com/start-here/#lstm
Why don’t you recommend returning a sequence from the LSTM? If I was
using the below encoder-decoder model from another one of your posts, what would
the output of the first LSTM be?
model = Sequential()
model.add(LSTM(…, input_shape=(…)))
model.add(RepeatVector(…))
model.add(LSTM(…, return_sequences=True))
model.add(TimeDistributed(Dense(…)))
Generally the output sequence from an LSTM is the activation of the nodes
from each step in the input sequence. It is unlikely to capture anything meaningful.
It is better to interpret these activations or the final activations with more LSTM or
Dense layers, and the output a sequence of the same or different lengths using a
separate model.
Hi there,
I love this tutorial, all of your tutorials actually but this one I have found the most
helpful. Questions about the MIMO LSTM output shape has come up a few times,
and I am also having trouble with it.
Keras complains that it is expecting the dense layer to have 2 dimensions, but I am
passing it an array with shape (n_samples,n_steps_out,n_features)
REPLY
Abderrahim March 15, 2019 at 5:20 pm #
Hi Jason,
I have a question: are LSTM suitable for predicting based on a test set with the same nature of inputs
as of train set ? Like in other cases of prediction where you will be having input signals in train set,
that the model will work on. plus the memory based on the fact that entries are ordered.
I trained an LSTM on a CNN model acting on ordered images, to predict a timeserie. on test set I
have the following ordered set of images by time. I guess there is no concept of horizon here, how
should I improve my model, and what starting point in predicting test set in this case?
Many thanks.
REPLY
Jason Brownlee March 16, 2019 at 7:48 am #
I would recommend modeling the raw time series directly, instead of images of the time
series.
REPLY
Tayson March 26, 2019 at 12:06 am #
Hello Jason,
[[
[147.56306 167.8626 312.92883]
[185.38152 205.36024 385.96536] ] ]
Best regards,
Tayson
REPLY
Chris March 26, 2019 at 1:27 am #
Hi Jason,
How would you handle building the LSTM model for time series data with irregular time intervals (e.g.
Jan 1, Jan 2, Jan 4, Jan 7, Jan 13, Jan 14, etc…)?
You could fill the “missing” days with zeros or impute them with, say, the mean of the last 3 values,
but I would like to know how to make the LSTM model without filling/imputing the time series data.
How would you handle this?
REPLY
Jason Brownlee March 26, 2019 at 8:10 am #
Yes, I would try many approaches and compare results, such as:
– model as is
– normalize interval with padding
– upsample/downsample to new intervals
– etc.
REPLY
neb August 21, 2019 at 7:17 am #
Are the various combination of models above able to cope when the number of time-steps per
each Sample is variable?
REPLY
Jason Brownlee August 21, 2019 at 1:57 pm #
Yes, you can either pad all samples to the same length or use a dynamic RNN.
Assumptions of the model hold for both cases.
REPLY
Jason Brownlee March 27, 2019 at 9:05 am #
Perhaps perform a sensitivity analysis of the model to see how history impacts model
performance.
REPLY
Ron March 27, 2019 at 1:16 pm #
Thanks Jason! If the history has distinct patterns for each quarter, should we have 3
months in each row? How would the results differ when we keep 12 months on each row
versus 3 months on each row versus 1 month on each row?
REPLY
Jason Brownlee March 27, 2019 at 2:07 pm #
REPLY
Peter March 28, 2019 at 5:57 am #
Hi Jason,
I am trying to predict high and low value of a time series in next X days, my output layer in RNN is :
model.add(Dense(2, activation=’linear’))
so basically output vector is [y_high, y_low], the model works pretty well however it sometimes
outputs y_low > y_high, which of course doesn’t make any sense, is there a way to enforce model so
that condition y_high >= y_low is always met.
REPLY
Jason Brownlee March 28, 2019 at 8:24 am #
Interesting, perhaps you simplify the problem and predict a value in a discrete ordinal
interval, e.g. each category is a blocks of values?
REPLY
Jason Brownlee March 29, 2019 at 8:42 am #
REPLY
Joe March 29, 2019 at 2:43 am #
Hi Jason, a colleague and I are thinking of trying an LSTM model for time series forecasting.
We are faced with over a thousand potential predictors, and would like to select only a smaller
number for the final model. In particular, I have recently become fascinated by SHAP values; e.g., see
this informal blog post by Scott Lundberg himself, in the context of XGBoost.
https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27
Tantalizingly, Scott L. demonstrates SHAP values in the context of an LSTM model here:
https://slundberg.github.io/shap/notebooks/deep_explainer
/Keras%20LSTM%20for%20IMDB%20Sentiment%20Classification.html
But that is using text input (sentiment classification in the IMDB data set), which involves an
Embedding layer just before the LSTM layer. For a non-text problem like time series forecasting, we
would exclude the Embedding layer. But doing so breaks the code.
Do you have any suggestions how SHAP values might be used in the context of LSTMs for time
series forecasting (not text processing)? If not, do you have any suggestions for feature selection in
that context?
Thanks!
REPLY
Jason Brownlee March 29, 2019 at 8:42 am #
REPLY
Sam April 14, 2020 at 6:31 pm #
Hi, Joe. I am running into the exact same topic. Have you found a way to implement
SHAP to multivariate timeseries forecasting?
REPLY
Hsin March 29, 2019 at 5:26 pm #
Hi Jason,
Thanks for this useful tutorial.
I am confused to inverse scaling of my data after splitting it into the form:
x(data_length, n_step, feature)
Because the scaler only can be used in 2D condition.
What I want to do is evaluate rmse between prediction and true values, so I have to
inverse transform data. Could you please tell me how to deal with this problem?
REPLY
Jason Brownlee March 30, 2019 at 6:23 am #
REPLY
Pratik March 30, 2019 at 12:12 am #
Hi Jason,
Firstly, I must say you have a fabulous chunk of articles on ML/DL. Thanks for helping out the
community at large.
Coming to LSTMs, I am stuck in one problem from last few days. Here is how it goes –
I have 3 columns namely customer id and basket_index and timestamp. For every customer, each
row represents one time stamp. Lets say there are 3 customers with variable time stamps. First one is
having 30 time stamps, 2nd is having 25 and 3rd is having 50. So, the total number of rows are 105.
Now for the column basket index, each row signifies a list of product keys bought by any customer on
a particular timestamp. Here is the snapshot of the dataset –
3) Also, I am creating an embedding layer for both customer and product keys (taking mean for every
basket). How to specify how many steps back does every time series look in such cases?
4) How should I specify batch size in this case?
REPLY
Huiping March 30, 2019 at 1:13 am #
One question hopes to get your guide: For a LSTM work, we can’t stop on say the model is good but
most important is how to use the good model outcome.
For example flu or not for patients. Now I want to predict the flu for future half year (Jun-2019 to
Dec-2019) but what I have is history data (I have past 4 years those people’s flu data and target on
that model is half year from 6-1-2018 to 12-31-2018).
Can I get a list of important features from the history model with some value(like a weight) and apply
this to my future data?
Or can i get the list of important feature from a good fit LSTM model and those features are important
than other features?
REPLY
Jason Brownlee March 30, 2019 at 6:30 am #
REPLY
Jeyson Hernández April 3, 2019 at 11:40 am #
Hi Jason,
Amazing work! Thanks sharing us your knowledge, this tutorial was so helpfull.
I’m new in ML/DL, i’m trying to predict sales in a company for future six months using LSTM. But i
have an issue, i’m not sure about how to get more than 1 next step from your code using just one x
vector by input. I’m using a monthly time step
Could you help me to understand a little bit better how to get it?
Jason Brownlee April 3, 2019 at 4:12 pm #
There are many examples of multi-step forecasting that you can use, including in the
above tutorial.
REPLY
Md. Abul Kalam Azad April 4, 2019 at 3:05 pm #
Dear Sir,
Thanks for your sharing example. I have collected traffic information like (Road property, weather,
datetime,adjacent road speed, target road speed and more) for predicting road speed. Currently, I
have prepared my code using Vanilla LSTM model for one step as well as multi-step-ahead
prediction. Can you suggest me for which below model will be best for road speed prediction with
higher accuracy?
Models are:
Data Preparation
Vanilla LSTM
Stacked LSTM
Bidirectional LSTM
CNN LSTM
ConvLSTM
Thanks,
Azad
REPLY
Jason Brownlee April 5, 2019 at 6:11 am #
I recommend testing a suite of model types and model configurations in order to discover
what works best for your specific dataset, you can learn more here:
https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
REPLY
Fazano April 8, 2019 at 8:58 pm #
hi Jason, im using vanilla LSTM for forecasting,and i want to forecast 10 days ahead using
this code
REPLY
Jason Brownlee April 9, 2019 at 6:23 am #
REPLY
Sher April 13, 2019 at 2:41 am #
Hi, do you have any tips for implementing univariate ConvLSTM for two-dimensional spatial-
temporal data? I’m trying to input 10 time steps of 55 x 55 images for single-step time series
forecasting.
REPLY
Jason Brownlee April 13, 2019 at 6:38 am #
https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-
forecasting-of-household-power-consumption/
REPLY
Jason Brownlee April 13, 2019 at 6:44 am #
REPLY
Lloyd April 19, 2019 at 6:23 am #
I have a question, is there a model where the outputs can influence each other?
I.e. you have multiple sequences all which move independently but can influence the others?
Thank you
REPLY
Jason Brownlee April 19, 2019 at 3:03 pm #
Thanks.
Yes, an encoder-decoder model that outputs a time step for each series in concert might be such
an approach.
REPLY
Jules April 19, 2019 at 10:07 pm #
Awesome. Great Explanation as always. I have always got rather frustrated and confused
over the shape of data going into Keras models. So I relied upon your tutorials to make it clear.
Anyway using your examples I have been able demonstrate use of LSTM in predicting simple 2-D
ballistics prediction calculations. I have used your code to help me here.
https://github.com/JulesVerny/BallisticsRNNPredictions
REPLY
Kishore April 19, 2019 at 11:48 pm #
Dear Prof,
Imagine I have raw text containing only words ‘N1,N2,N3,………….,N1000’ in a shuffled format , i.e, 1
million words, each of which can belong to any of these 1000 words.
I want to select the number of time steps =5, and predict the next word.
Eg: An input of [N1,N6,N5,N88,N32] would be followed by ‘N73′.
Now, assume that I have tokenized all the 1000 possible words into numbers.
REPLY
Jason Brownlee April 20, 2019 at 7:39 am #
If the words are shuffled, then there would be no structure for a model to learn.
REPLY
HK April 23, 2019 at 8:06 pm #
Dear Jason!
I’m trying to use stacked lstm for this problem – Multiple Parallel Input and Multi-Step Output.
However I’m not sure how the final Dense layer should look like. Could you give me some hints,
please?
REPLY
Jason Brownlee April 24, 2019 at 7:57 am #
Perhaps start with the example in the above post and then add an additional LSTM
layer?
REPLY
HK April 29, 2019 at 6:11 am #
Which example do you mean? I can’t find any example for Multiple parallel input and
multi step output LSTM, which uses stacked LSTM layers instead of encoder decoder.
REPLY
Jason Brownlee April 29, 2019 at 8:28 am #
REPLY
Raman Singh April 25, 2019 at 8:27 am #
Could you please tell how can we add hyperparameters for tuning “Forget Gate”, Input Gate” and
“Output Gate” in LSTM compile or fit methods or is it done internally and we can’t control these
gates?
REPLY
Jason Brownlee April 25, 2019 at 8:28 am #
REPLY
will April 25, 2019 at 1:36 pm #
REPLY
Jason Brownlee April 25, 2019 at 2:45 pm #
I believe there is a few multi-time step models listed above that will provide a good
starting point.
REPLY
John April 25, 2019 at 4:33 pm #
Hi Jason,
REPLY
Jason Brownlee April 26, 2019 at 8:29 am #
REPLY
Shiva April 26, 2019 at 10:59 pm #
Hi jason,
REPLY
Jason Brownlee April 27, 2019 at 6:32 am #
REPLY
Raghu April 28, 2019 at 3:45 pm #
Hi Jason,
Thanks for the very informative tutorial. Can you please throw more light on how to come up with
confidence intervals for the predicted value
REPLY
Jason Brownlee April 29, 2019 at 8:16 am #
Hi Jason
Thanks for your helpful tutorial
Could you please tell how can we predict the futures that we don’t have its data available
for example, I finalized my LSTM model, how can I predict the values on 2050
REPLY
Jason Brownlee April 29, 2019 at 8:22 am #
REPLY
will April 28, 2019 at 11:41 pm #
Thanks for the article.However, I have a problem that every prediction results are different,
such as Multiple Parallel Series,The first time is [[101.25582 106.49429 207.8928 ]],The second
time it became [[101.82945 107.527626 209.8016 ]],Why is this?
thanks
REPLY
Jason Brownlee April 29, 2019 at 8:22 am #
REPLY
shiva April 29, 2019 at 4:39 am #
now say such systems are in paralell(one above another, say2) and series(say 2, the final outlet from
each parallel series join at the final output.) (Total 4 buckets).
Perhaps.
You can use this framework to explore different framings of your problem:
http://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
REPLY
Ali April 29, 2019 at 7:53 pm #
Hello Jason,
to the step:
To the step:
I don´t have a right mathematical equation. I want to predict the output without any knowledge about
the relationship between the input signals.
Kind regards
Ali
REPLY
Jason Brownlee April 30, 2019 at 6:53 am #
Hi Jason,
I was interested in the example you have provided for multi-variate version of LSTM. You have
provided an example of a simple addition case. How can this be extended to instances where there
are multiple inputs, but an exact relation between the inputs are not known even though it is known
that the inputs are correlated? Thanks much for your guidance!
REPLY
Jason Brownlee April 30, 2019 at 6:54 am #
The model will learn the relationship, addition was just for demonstration.
REPLY
Sree May 1, 2019 at 9:34 am #
REPLY
Jason Brownlee May 1, 2019 at 2:09 pm #
Observations across multiple input time series are aligned to time steps based on their
time of observation.
REPLY
Sree May 2, 2019 at 8:17 pm #
Cheers,
Sree.
I have tried using the split_sequences method from the encoder/decoder example with the vector
output example and the dimensions dont work out. I end up with a value error
ValueError: Error when checking target: expected dense_2 to have 2 dimensions, but got array with
shape (5, 2, 3)
I greatly appreciate your help, I have been struggling with this for a while. I would imagine the output
should be a matrix (number of features X prediction horizon) so I think there is something
conceptually I am not understanding.
Thank you, and thank you for all of the wonderful tutorials
Gideon
REPLY
Jason Brownlee May 3, 2019 at 6:25 am #
Perhaps start wit the code example you want to use and slowly change it for your needs.
If the data size does not match the models expectations, you will need to change the data shape
or change the model’s expectations.
REPLY
Gideon May 3, 2019 at 7:30 am #
I will toil away some more, but I just want to be sure it is possible to use a dense
layer/vector output approach for Multiple Parallel Input and Multi-Step Output LSTM in Keras.
Thanks again for your time.
Gideon
REPLY
Jason Brownlee May 3, 2019 at 2:40 pm #
E.g. the output would be a vector with n x m nodes, where n is number of variates and m
is the number of steps.
Ive figured it out, and its not too ugly and exactly what I needed. I was
unaware of the Reshape layer in Keras.
model.add(Dense(n_steps_out*n_features))
model.add(Reshape((n_steps_out,n_features)))
Thank you again for your help. I am buying your book right now.
Cheers
Gideon
Nice work.
REPLY
George July 17, 2019 at 11:48 pm #
Hi Gideon,
I was struggling around something similar and applying your solution, solved all the
matters! Do you have any more documentation on this?
REPLY
Alberto May 3, 2019 at 11:07 pm #
Hello Jason,
Great article, very useful. I want to use LSTM to predict sun irradiance 12 hours ahead using 8
features (including sun irradiance) of the last 24 hours as inputs. Thus, it would be a multivariate
multi-step LSTM where the output is a sequence of 12 timesteps. I have 8 years of data and I want to
use first 6 for training and last 2 for testing. I have some questions:
REPLY
Jason Brownlee May 4, 2019 at 7:08 am #
I recommend testing both approaches and use data to make the decision, e.g. choose
the model that gives the best result.
REPLY
aravind May 5, 2019 at 3:54 am #
hai jason,
the article was very much helpful.
can you just tell me which approach should I take if I have two columns in my dataset .
one is time in ddmmyyyy format and the other is stock price.
I have the data for last 12 months.
I want to predict the stock price for 4 upcoming months.
how can I do the same.
one more doubt is that if the column for time is not actually having a same interval in between them,
then is there anything more that I should do to or consider for predicting the 4 upcoming months stock
price
REPLY
Jason Brownlee May 5, 2019 at 6:35 am #
You can drop the date if all observations have a consistent interval.
REPLY
aravind May 5, 2019 at 3:44 pm #
So after dropping the dates column I guess I would have to go for a univariate
multistep lstm model. right?
And what should be done if seasonality comes into picture?
and one more doubt is that when I am predicting the four upcoming months in a multistep ,
will the model consider the predicted value of 3rd month while predicting the fourth month or
will the model consider the predicted value of 2rd month while predicting the third month and
likewise ?
REPLY
Jason Brownlee May 6, 2019 at 6:46 am #
Try a suite of models, and compare to a linear model or naive model to confirm
they have skill.
If you have seasonality, try modeling with and without the seasonality and compare
performance.
Try multiple approaches for multi-step prediction, e.g. direct, recursive, etc:
https://machinelearningmastery.com/multi-step-time-series-forecasting/
REPLY
shiva May 7, 2019 at 7:33 am #
[[10 15]
[20 25]
[30 35]] 65
[[20 25]
[30 35]
[40 45]] 85
[[30 35]
[40 45]
[50 55]] 105
[[40 45]
[50 55]
[60 65]] 125
[[50 55]
[60 65]
[70 75]] 145
[[60 65]
[70 75]
[80 85]] 165
[[70 75]
[80 85]
[90 95]] 185
2.are timesteps, neurons and batchsize all hyperparameter? how do we optimize them
REPLY
Jason Brownlee May 7, 2019 at 2:26 pm #
No, the number of blocks in the first hidden layer is unrelated to the length of the input
sequences.
See this:
https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-
timesteps-and-features-for-lstm-input
REPLY
shiva May 8, 2019 at 4:56 am #
thanks..
Then what is the total number of LSTM blocks?
for every epoch, are the weights reinitialized and states are reset?
REPLY
Jason Brownlee May 8, 2019 at 6:46 am #
REPLY
shiva May 8, 2019 at 8:15 am #
my question is the number of individual LSTM nodes(block) equal to number of samples in the a
batch?
REPLY
Jason Brownlee May 8, 2019 at 2:08 pm #
Yes, the shape defines the shape of each input sample (time steps and features).
The number of units and sample shape are both unrelated to the batch size. Unless you are
working with a stateful LSTM, in which case the input shape must also specify the batch size.
REPLY
Jason Brownlee May 9, 2019 at 6:43 am #
REPLY
shiva May 9, 2019 at 11:05 am #
how does this input concatenate with hidden layer … i cannot visualize this..
i was thinking the input were a vector[n*1]
Each node in the hidden layer gets the complete put sequence.
[10 15]
[20 25]
[30 35]] 65
REPLY
shiva May 8, 2019 at 11:07 am #
https://stackoverflow.com/questions/37901047/what-is-num-units-in-tensorflow-
basiclstmcell#39440218
REPLY
Jason Brownlee May 8, 2019 at 2:11 pm #
REPLY
Philipp May 13, 2019 at 2:26 am #
Dear Jason,
My question:
As I understood it, a LSTM network learns the information in a time-series by backpropagation
through a specific length (in time) at which the LSTM cells are unrolled during training.
So, while training it is necessary to define the number of timesteps provided in the training data. But
shouldn’t it be possible to use the (trained) network with ANY number of input timesteps to make a
prediction (because of the recurrent nature in which the LSTM cells work)?
Am I getting something wrong here from the beginning?
REPLY
sumitra May 13, 2019 at 3:41 pm #
Dear Jason,
I am currently working on a disease outbreak prediction model. I have 4 years of data with over 100
input variables and each year has got 365 data points. I would like to create a LSTM model that will
be able to predict the future outbreak (whether thr will be an outbreak-1 or no outbreak-0) based on
the given input variables. For example, given 7 days of data points, i would like to predict the
occurance of outbreak (whether 0 or 1) on the 8th day.
However, i am not sure on which LSTM model will best fit my case. Will ‘multiple input multi-step
output) be the best approach? Your guidance will be much appreciated.
Thank you
REPLY
Jason Brownlee May 14, 2019 at 7:38 am #
REPLY
Nitin May 26, 2019 at 1:54 pm #
Hi Jason,
Can you please provide some pointers that will help us in minimizing the step-loss during model
fitting….
Thanks
REPLY
Jason Brownlee May 27, 2019 at 6:43 am #
REPLY
ICHaLiL May 29, 2019 at 12:26 am #
Dear Jason,
Thank you for your tutorials. They are really useful for us.
I’ve one question about LSTM. I have different time series more than one (for example 100). I need to
train network with 100 different time series. and test 10 different time series. Which method should I
use?
REPLY
Jason Brownlee May 29, 2019 at 8:44 am #
Thank you for sharing. I wonder if there is a way to set timestep > 1 without doing subsequence
sampling as you did in data preparation, e.g. convert a 9-by-1 time series to a 6-by-3 data set. After
the conversion, the 3-feature dataset is no more time dependent. You are able to use any kind of ML
models (say OLS) to predict y. So why LSTM? Should LSTM be able to select (forget) previous
information without this conversion?
REPLY
Jason Brownlee May 30, 2019 at 2:55 pm #
LSTM does have the benefit that it can remember across samples.
This may or may not be useful, and is often not useful for simple autoregressions.
REPLY
QuantCub May 31, 2019 at 1:19 am #
Thank you for your quick reply. In your example, if I do a subsequence sampling and
convert
[10, 20, 30, 40, 50, 60, 70, 80, 90] to
[[10, 20, 30],
[40, 50, 60],
[70, 80, 90]] (no replacement between each subsequence)
and run LSTM(input_shape=(3,1)), is that the same as I run LSTM(batch_input_shape=
(3,1,1), stateful=True) on the origin time series (9-by-1)?
REPLY
Jason Brownlee May 31, 2019 at 7:51 am #
No, as each “sample” will cause an output from the model, e.g. 1 sample with 3
time steps vs 3 samples with 1 time step.
No problem.
REPLY
Neel June 11, 2019 at 9:08 pm #
For a classification LSTM, using a Seed I get the same classification matrix each time I run it.
However, when I vary the batch size in model.predict, I get the following:
Batch size in predictions is merely for ram managment. Correct? If yes, what do you think Dr. Jason
would cause these irregularities ?
REPLY
Jason Brownlee June 12, 2019 at 8:01 am #
REPLY
Neel June 12, 2019 at 4:06 pm #
Hi Jason,
Sorry I didn’t explain my concern well. I was referring to the Batch Size parameter that we
mention in “model.predict i.e. predicting” and not while training. I agree that batch size during
training will have an impact. During prediction, the default size is 32 as defined by keras but
when I change that to anything but 32 I get a different classification matrix even though I use
a seed. When I leave the batch size as default, my seed is able to produce the same results.
REPLY
Jason Brownlee June 13, 2019 at 6:11 am #
Recall that with the LSTM, the state is reset at the end of each batch. This
explains why you are getting different results for the same model with different inference
batch sizes.
REPLY
Diego June 24, 2019 at 5:23 am #
Hi Jason,
Thanks for the tutorial.
I’d like to apply this example to a real case.
I have to forecast how much money will be withdrawn every day from a group of ATMs.
Currently I am using a time series for every ATM. (100 ATMs = 100 time series).
REPLY
Jason Brownlee June 24, 2019 at 6:37 am #
REPLY
Liang Zhao June 25, 2019 at 5:58 am #
Hi Jason, I want to use some kind of machine learning method to demonstrate that there is a
relationship between the score gap of two basketball teams and the demand for a taxi outside the
stadium.
I have time series of pick-ups near a stadium. I have the score gap time series between two
basketball teams.
What I want to achieve is that training a machine learning model that could tell me, based on the taxi
pick-ups at time t, what is the taxi pick-ups at time t+1.
I also want to see if I also have the score gap at time t, can I improve my prediction accuracy of pick-
ups at time t+1.
REPLY
Jason Brownlee June 25, 2019 at 6:30 am #
REPLY
liang zhao June 26, 2019 at 7:26 pm #
Yes, but perhaps test a suite of methods and discover what works best for your
specific dataset.
REPLY
James July 1, 2019 at 1:11 am #
Hi Jason,
Suppose I have several time series showing cumulative bookings for different trains last year. I don’t
want to forecast but just classify those time series to see if some of them have similar patterns. Can I
include all those series into one LSTM model? Is there any risks when doing so?
Thanks in advance.
REPLY
Jason Brownlee July 1, 2019 at 6:35 am #
REPLY
James July 1, 2019 at 11:44 pm #
Thanks Jason!
So is it the same as multivariate LSTM? Sorry I’m new to modelling so still find things
confusing
REPLY
Jason Brownlee July 2, 2019 at 7:32 am #
Probably not, each example is a separate sample or input-output pair for the
model to learn from.
REPLY
Irini July 1, 2019 at 9:35 am #
Hi Jason,
I have a dataset with 3000 univariate timeseries (i.e. 3000 samples) and each sample has 4000
timesteps. When i use [samples, time steps, features]=[3000, 4000, 1] the code is extremely slow and
with bad performance.
On the other hand, if instead [3000, 4000, 1] i write [3000, 1, 4000] the c
great performance.
But is the reshape [3000, 1, 4000] correct? I mean according to the rule [samples, time steps,
features] and given the fact that each of my samples have 4000 timesteps and for each time step
there is one feature the correct should be [3000, 4000, 1].
So is [3000, 1, 4000] correct? And if it is not (logically it is not) why it works much better than [3000,
4000, 1] ?
Thanks in advance
REPLY
Jason Brownlee July 1, 2019 at 11:35 am #
I would recommend not using more than 200 to 400 time steps per sample. Perhaps you
can truncate your data?
REPLY
Irini July 1, 2019 at 8:15 pm #
I did also an experiment and i truncated my data and used as input [samples, time
steps, features]=[3000, 400, 1]. It was quicker but i got a mean accuracy 42% (in 10 random
splits).
As i told you in my previous post when i exchange timesteps with features namely when i use
[3000, 1, 4000] i get an accuracy 90%.
But giving 1 timestep means that i don’t exploit the memory, whis is the characteristic of lstm?
I am confused as to whether i should use [3000, 1, 4000], which is very quick and gives very
good results but maybe it is not very correct? Or it is correct as if i used [3000, 400, 1](if i
truncated my data to 400)
REPLY
Jason Brownlee July 2, 2019 at 7:30 am #
The state of the LSTM is reset at the end of each batch by default, so you can
get some across-sample memory.
I recommend testing a suite of different configurations to see what works well or best for
your specific dataset. I cannot know what will work well, you must discover the answer.
REPLY
Manish July 2, 2019 at 2:39 am #
Hello Jason,
I am quite new to ML and LSTMs. I have a scenario where I intend to train a model using my hourly
sensor values. For eg
12-1-2019 12:00:00 12
12-1-2019 13:00:00 16
…
12-5-2019 12:00:00 14
Once I am done with my training I intend to predict values every hour and compare the values with
live sensor values….I am planning to use LSTM and which approach do you recommend me ?
REPLY
Jason Brownlee July 2, 2019 at 7:35 am #
REPLY
Harish July 2, 2019 at 7:18 pm #
Jason, this is very useful. Im try to to do some prediction around IT incidents. based on
historic data i want to predict what type incident i can expect next month/week/day. do you have
anything similar done if so request to share pls
REPLY
Jason Brownlee July 3, 2019 at 8:32 am #
Perhaps you can model it as a time series classification, e.g. what event is predicted in
this interval.
REPLY
Lopa July 3, 2019 at 12:35 am #
Hi Jason,
Thanks for answering my question in your other tutorials. I have a minor doubt suppose my data has
a continuous time series(non stationary) & other categorical variables (which are already encoded).
Under that circumstance what is the best way to difference the data ? Because categorical data are
not differenced but they have to be used while training the model.
The function written above differences all the variables irrespective of whether they are continuous or
categorical. It would be great if you can help.
REPLY
Jason Brownlee July 3, 2019 at 8:36 am #
Hi Jason,
I copied and pasted your first example from Multi-Step LSTM Models, the one with the vector output
of two values and the input being one.
This you use the parameters you report? Is this so dependant on architecture?
REPLY
Jason Brownlee July 4, 2019 at 7:40 am #
Results are dependent upon the model, the model configuration and the data, the
performance is also stochastic, subject to random variance.
REPLY
myro July 19, 2019 at 7:49 pm #
REPLY
Lopa July 3, 2019 at 7:10 pm #
My data is non stationary & there are seasonality every 7 days ( as evident from the ADF
tests & ETS plots) & a first order differencing makes it stationary.
I totally get that I have to difference only the real values & that is what I have been aiming to do . But
the reason I asked this question because the moment I difference the real values its get shifted by
one place so if the original data has 100 observations the differenced data will have 99 observations
(with a first order differencing). But the categorical data which cannot be differenced remains to be the
same 100. How do I deal with this ?
REPLY
Lopa July 3, 2019 at 8:06 pm #
I think I have been able to solve the issue thanks Jason for addressing my query
REPLY
Jason Brownlee July 4, 2019 at 7:44 am #
REPLY
Leon July 5, 2019 at 5:58 am #
REPLY
Jason Brownlee July 5, 2019 at 8:11 am #
Perhaps try running the example a few times? It can very given the stochastic nature of
the learning algorithm.
REPLY
Leon July 11, 2019 at 1:30 am #
never get any chance to around [100, 110]. I ran many times, the output is always
around [110, 120] with some variations.
no kidding you can try that part of codes. The output looks ridiculous.
REPLY
Jason Brownlee July 11, 2019 at 9:50 am #
Intersting.
Is Keras/TensorFlow/Python up to date?
REPLY
Jason Brownlee July 6, 2019 at 8:29 am #
I recommend testing each method and use the one with the lowest error.
Also, get creative and test a suite of other configurations. Ensure your test harness is robust and
reliable so that you can trust the decisions you make.
REPLY
skyrim4ever July 8, 2019 at 8:13 pm #
Hello, this example was nice to follow and seemed little more simpler than other LSTM
examples because of no pre-processinhg transformations (normalization, standardization, making
data into stationary, etc.). However, should I perform these pre-processing transformations in general
for time series prediction? Should I do such thing for this kind of examples too even though the
dataset is simple?
REPLY
Jason Brownlee July 9, 2019 at 8:09 am #
REPLY
Nutakki July 11, 2019 at 8:59 pm #
how the input is looping to obtain output as follows: [[ 72.74373 106.51455 251.78499]]?
Can you give a clear idea what does n_steps=4, n_features=X.shape[2] really means and how does it
function?
REPLY
Jason Brownlee July 12, 2019 at 8:40 am #
Yes, perhaps this will help you to understand the input shape:
https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-
timesteps-and-features-for-lstm-input
REPLY
Amelie July 12, 2019 at 2:10 am #
Please, is there a method to find the correct parameter of an ANN model: LSTM, MLP
(hidden layer number, activation function, loss function ..)
what does it mean when my train and validation loss curves are parallel while the Train Score and
Test Score are small?
REPLY
Jason Brownlee July 12, 2019 at 8:45 am #
REPLY
Samil July 17, 2019 at 8:16 am #
Thanks for the tutorial. I have applied the multistep, multivariate logic to my own dataset.
Namely, I have 12 look-back, 12 look-ahead and 41 features (all having exact look-back as the main
variable of interest). Trying the TimeDistributed code snippet gave me progressively increasing
RMSE. Is this due to the nature of my time series or is it a sign of mistake done during construction of
the model? It is hard to tell for you but maybe you can share your take on this issue. Thanks
REPLY
Jason Brownlee July 17, 2019 at 8:33 am #
It could be either.
REPLY
Samil July 19, 2019 at 9:48 am #
Also, one quick related question. You use “-1” in multi step future multivariate
split_sequence models (such as n_steps_out-1 etc.). This reduces the number of
resulting features by one when compared to other split_sequence snippets. I tested it with
the other multistep split_sequence code you shared above. Not sure but are’nt we
supposed to have the same number of features? Thanks
Thanks.
REPLY
Jason Brownlee July 19, 2019 at 2:20 pm #
REPLY
Aziz Ahmad July 22, 2019 at 4:56 am #
Sir Plz! Suggest me good learning sources about my project ( carbon emission forcasting
using LSTM).
REPLY
Jason Brownlee July 22, 2019 at 8:28 am #
Start here:
https://machinelearningmastery.com/start-here/#deep_learning_time_series
REPLY
Mans Oshanov July 22, 2019 at 6:20 pm #
Thank you for the great tutorial. Is it possible to get the probability of prediction(in
percentage) or second best prediction out of these models? Thank you)
Jason Brownlee July 23, 2019 at 7:58 am #
REPLY
Kennard July 26, 2019 at 11:57 am #
Hi, Jason
And I have a question that how to adjust the learning rate of the LSTM network in the CNN-LSTM
code you’ve mentioned above.
REPLY
Jason Brownlee July 26, 2019 at 2:18 pm #
You can learn more about how to tune the learning rate (generally) here:
https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-
neural-networks/
REPLY
Luis July 27, 2019 at 3:10 am #
REPLY
Jason Brownlee July 27, 2019 at 6:11 am #
Thanks Luis.
REPLY
Armande Kertanio July 28, 2019 at 8:31 am #
I have a problem when reshaping the data for multiple output architecture.
the architecture is:
outputs=[]
y=y.reshape((len(y),5,1))
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your
model is not the size the model expected. Expected to see 5 array(s), but instead got the following list
of 1 arrays: [array([[0.35128802, 0.01439778, 0.60109704, 0.52722118, 0.25493708],
REPLY
Jason Brownlee July 29, 2019 at 5:58 am #
Perhaps define what you want the output shape to be, e.g. n samples with m time steps,
then confirm your data has that shape, or if not set that shape?
REPLY
Florian July 29, 2019 at 2:00 am #
REPLY
Jason Brownlee July 29, 2019 at 6:16 am #
If the map is 8×8 and we apply a 2×2 pooling layer, then we get a 4×4 out, e.g. 1/4 the area (64
down to 16).
For time series, if we have 1×8 and apply a 1×2 pooling, we get 1×4, you’re right. 1/2 the size, not
1/4 as in image data.
Fixed. Thnaks!
REPLY
Nic July 29, 2019 at 5:53 pm #
Hi Jason,
In the section “Multiple Input Series” you used the following example:
[[ 10 15 25]
[ 20 25 45]
[ 30 35 65]
[ 40 45 85]
[ 50 55 105]
[ 60 65 125]
[ 70 75 145]
[ 80 85 165]
[ 90 95 185]]
As you mentioned the first two entries in the arrays refer to the two time series and the last one to the
corresponding target variable. To train the LSTM you split the data into input and output samples like:
[[10 15]
[20 25]
[30 35]] 65
Why do I drop the first two target entries (25 and 45). Isn’t that information my network loses for
training? Why don’t we use each (single) sample like x = [10 15] y[25] to train the time series. Isn’t it
easier to lern the series if i have the target for each step?
REPLY
Jason Brownlee July 30, 2019 at 6:04 am #
Good question.
Some of the input at the beginning of the dataset don’t have enough prior data to recreate an
input, therefore must be removed.
REPLY
Joel August 1, 2019 at 12:04 am #
Good work, However, you should provide the library imports, to make it easier for beginners.
REPLY
Jason Brownlee August 1, 2019 at 6:53 am #
All library inputs are provided in the “complete example” listed in the post.
REPLY
will August 11, 2019 at 12:27 am #
Hi, Jason,I need to predict a hundred thousand sequences like this[10, 20, 30, 40, 50, 60,
70, 80, 90], how do I do it, do I do it in cycles, one by one, I do it in cycles, it feels like it’s going to take
longer
REPLY
Jason Brownlee August 11, 2019 at 5:59 am #
If the model is read only and you are not dependent upon state across samples, you can
run the model in parallel on different machines and prepare batches of samples for each model to
make predictions.
REPLY
PRADEEP CHAKRAVARTHI NUTAKKI August 11, 2019 at 3:42 am #
I have 300 excel workbooks of which each excel sheet has 3 values…..
the 3 values will be in this format [1.02,2.20,1.0]; [2.9,3.5,3.3];…….like this 300 sets.
Now i want to train and test my model with the data from 300 excel workbooks as input and the model
has to predict the 301th set for example: [5,3.3,2.4] depending on the sequence of previous values.
Note: the output shouldn’t be the probability set from the 300 sets, the output should be a new set.
REPLY
Jason Brownlee August 11, 2019 at 6:04 am #
Perhaps you can use some custom code to extract all of the data from the excel files into
a csv file ready for modeling?
REPLY
y jing August 12, 2019 at 4:45 pm #
How to construct parallel three lstms, and then add a DNN in series.
REPLY
Jason Brownlee August 13, 2019 at 6:06 am #
You can use one LSTM with 3 variables or 3 LSTMs and concat the outputs together.
REPLY
Doron August 13, 2019 at 4:42 pm #
Hi Jason,
Thanks for this wonderful post. I have been trying to digest LSTM’s (metaphorically) and one
particular aspect was not clear to me. I know the general structure of LSTM’s but I’m having hard time
to understand:
When ReLU is set as an activation function, but not in the output layer, what exactly happens behind
the scenes? To make myself clear, I am aware of the gates and their respective activation functions:
sigmoid and tanh. But if we set ReLU like above, does that mean that each unit/LSTM cell outputs a
hidden state –> pass it to a ReLu –> pass it to the next unit/LSTM cell?
Thanks!
REPLY
Jason Brownlee August 14, 2019 at 6:33 am #
Yes, that is correct. It controls the output gate, not the internal gates which are governed
by a sigmoid.
REPLY
Dawjidda August 17, 2019 at 1:46 am #
hello Mr Jason Brownlee please my dataset is in matrics form, i want convert it to fit into
GRU or LSTM sequential model,
REPLY
Jason Brownlee August 17, 2019 at 5:55 am #
If your matrix represents a sequence, you can reshape it for your model. This will help:
https://machinelearningmastery.com/faq/single-faq/what-is-the-differe
timesteps-and-features-for-lstm-input
REPLY
Tommy August 18, 2019 at 9:22 pm #
Hi Jason,
What will happen if we use both lstm and gru layers simultaneously in the model? Does this make
sense?
For example this architecture:
model=Sequential()
model.add(GRU(256 , input_shape = (x.shape[1], x.shape[2]) , return_sequences=True))
model.add(LSTM(256))
model.add(Dense(64))
model.add(Dense(1))
Because I used this model and I got good results compared to using each one separately.
REPLY
Jason Brownlee August 19, 2019 at 6:06 am #
REPLY
Ali Altin August 21, 2019 at 6:14 pm #
I have a question. My dataset has 27 features. 26 of them I want to use as input and the last one as
output (this feature is also the last column in my dataset). I use the multiple input multi-step output
code from above. After using the function “def split_sequences(sequences, n_steps_in,
n_steps_out)”, I split the dataset into train and test sets and choose a number of time steps for
n_steps_in and n_steps out. After transforming from 2D to 3D with “split_sequences(train, n_steps_in,
n_steps_out)” I printed the shape of train_X, train_y, test_X and test_y. The results are:
(14476887, 25, 26) (14476887, 20) (7130386, 25, 26) (7130386, 20)
1.) Does python count from 0 upwards, so that 0 is my first feature or does it count from 1 upwards?
2.) Does python work from left to right, so that the left feature in the csv file is my first feature and so
on?
3.) Is the shape above (7130386, 20) equal to (7130386, 20, 1) or why is it 2D?
I hope that I could explain my problem and the questions good enough.
Many thanks in advance.
Ali
REPLY
Jason Brownlee August 22, 2019 at 6:23 am #
Yes, you can transform (7130386, 20) to (7130386, 20, 1) directly. They are the same thing.
REPLY
Ali Altin August 22, 2019 at 5:59 pm #
Hello Jason,
I take the ‘split a multivariate sequence into samples’ code from above:
After that I split the dataset into train and test sets:
14476930 7130429
(14476887, 25, 26) (14476887, 20) (7130386, 25, 26) (7130386, 20)
n_features = 26
model = Sequential()
model.add(LSTM(50, input_shape=(n_steps_in, n_features)))
I want to predict the last column (column 27) in my csv-file. The first 26 are the input features
(columns).
1.) Where in the codes above do I explicitly define my input features and my output feature?
Sorry that I bother you with this banal questions but I have not enough experience.
REPLY
Jason Brownlee August 23, 2019 at 6:23 am #
REPLY
Joe August 21, 2019 at 6:42 pm #
Hi Jason,
cnn = Sequential([
Conv1D(filters=16, kernel_size=4, strides=2, activation=’relu’, input_shape=(n_steps, n_features)),
BatchNormalization(),
MaxPooling1D(pool_size=2)
])
model = Sequential()
model.add(cnn)
model.add(LSTM(50, activation=’relu’))
model.add(Dense(1))
The reason is that when the true model is path dependency, longer look back period should be used,
but it is not very efficient for LSTM dealing with large time step, so I use CNN to reduce the length of
time step and encode some predictive information.
Joe
REPLY
Jason Brownlee August 22, 2019 at 6:25 am #
REPLY
Ahmad August 23, 2019 at 7:28 pm #
Dear Jason,
As I understood from your explanations, for bidirectional neural networks we need both past and
future input data to predict the current time step. So, in case of univariate LSTM, when we are going
to predict the energy use of current time as example, we need to know the energy use of future? This
is abit confusing to me. Would you please explain about it.
Thank you
REPLY
Jason Brownlee August 24, 2019 at 7:47 am #
Or you can frame your prediction problem any way you wish.
REPLY
Ahmad August 24, 2019 at 7:04 pm #
Thank you for your answer. Can you explain a bit more to make it clear? Because as
I just checked the mathematical formulation of Bidirectional RNNs, I see that there is a hidden
state of the next time step as the input: ( x(t), h(t-1) and h(t+1) are used to calculate y(t) ).
So, when there is a hidden state from the next time step as the input, how is t possible to just
use the past data in univariate bidirectional RNN?
REPLY
Jason Brownlee August 25, 2019 at 6:36 am #
REPLY
Amirreza August 24, 2019 at 10:23 pm #
Actually I applied the bidirectional layer but I got much higher error than typical LSTM
network. Is it possible or I am doing wrong?
When I write 50 neurons it means that each single layer of bidirectional has 50 neurons or it would be
the summation of two layers?
REPLY
Jason Brownlee August 25, 2019 at 6:37 am #
REPLY
helloworld August 27, 2019 at 4:31 pm #
Hi, I have question regarding data normalization (scaling values between specific number
such as [0,1]). Should I perform it before making the dataset supervised form as in this example? Or
after the supervised form?
I noticed that if I do after, the columns looks little different from each other because the scaling are
done via columns only. Here is example output if done after:
t-1 t t+1
-1.000000 -1.000000 -0.870529
-1.000000 -0.869976 -0.895359
-0.869976 -0.894799 -0.897133
-0.894799 -0.896572 -0.901271
REPLY
Jason Brownlee August 28, 2019 at 6:31 am #
REPLY
Samit August 28, 2019 at 7:26 pm #
Hi Jason.
Great Tutorial. I have electronic health record data which has multivariate time series inputs. Is it
better to use normal LSTM or bidirectional LSTM for prediction?
Thanks
REPLY
Jason Brownlee August 29, 2019 at 6:04 am #
I would encourage you to test a range of models, linear, ml, mlp, cnn and lstms and
discover what works best for your specific dataset.
REPLY
Radhouane Baba August 28, 2019 at 8:20 pm #
Hi Jason,
i am trying to train my model to forecast a 144 data points (1 day) (10 minutes for each values (load
forecast for a home)) based on 5 days (=144*5 values) (i have more data but till now i didnt find a
good result so i training my model by less amount of data.. it takes so long)
there a seasonality each day.. so i chose the n_input to be 144.
i am varying the batch size from 1 to 6… and the epochs from 25 to 150,
but my problem is: each time i get a result, i have one of these problems:
1- values converge to a constant (i thought maybe it is underfitting)… so i try to reduce batch size and
increase epochs
2- when i do so.. i always get a loss value of n.a.n and then i get no predictions from the model….
REPLY
Jason Brownlee August 29, 2019 at 6:04 am #
You can discover how to diagnose the performance of your model and improve
performance here:
https://machinelearningmastery.com/start-here/#better
REPLY
Radhouane Baba August 29, 2019 at 9:22 am #
thank you,
but i mean not the values of the loss.. the values of the forecast.. they converge to a constant,
and when i increase the number of epochs.. there are better results but then many times,
starting from epoch n. 150 or so.. the loss is n.a.n.. and there are no predictions…
REPLY
Jason Brownlee August 29, 2019 at 1:32 pm #
I see.
REPLY
Radhouane Baba August 28, 2019 at 8:28 pm #
Hello Jason,
history.append(yhat_sequence)
…
add.dense(1)
Thank you so Much!!!
REPLY
Jason Brownlee August 29, 2019 at 6:05 am #
Perhaps compare a few approaches for your dataset and discover what works best.
REPLY
Ken August 30, 2019 at 7:15 pm #
Hello Jason,
Thanks for this great tutorial and dive into LSTMs.
For Multiple Parallel Input and Multi-Step Output you also mention that it is possible to use the vector
version of LSTMs. I cannot get my head around it how that model should look like.
model = Sequential()
model.add(LSTM(100, activation=’relu’, return_sequences=True, input_shape=(n_steps_in,
n_features)))
model.add(LSTM(100, activation=’relu’, return_sequences=True))
# What is needed here??? dim to n_steps_out
model.add(TimeDistributed(Dense(n_features)))
model.compile(optimizer=’adam’, loss=’mse’)
The above architecture doesn’t what I intend. In the end there should be an output of dim (batch_size,
n_steps_out, n_features) but what I achieve is (batch_size, 100, n_feautres) or an error. So how to
make the above architecture work without the encoder-decoder version of your snippets?
REPLY
Jason Brownlee August 31, 2019 at 6:03 am #
Perhaps you can use the example in the post as a starting point?
REPLY
Vishal Saini August 30, 2019 at 8:37 pm #
Hi Jason,
REPLY
Sai Krishna Natesan September 2, 2019 at 6:23 am #
Dear Jason,
80, 85
90, 95
100, 105
My question is, in the first model, you know more information about the output in the past that you are
not passing on to the model.
Similarly, in the second model let us assume that you know the first two fields 100 and 105 and you
only want to predict the 205.
70, 75, 145
80, 85, 165
90, 95, 185
100, 105, X
Again we are unnecessarily trying to predict some known values.
Is there a model where I can use all available information from the previous time series and try to
predict X?
I learnt a lot from this post and the above question is something I am trying to answer. Thanks a lot for
sharing your knowledge. It is helping us a lot.
In your proposed framing, you could use a new token to indicate missing and then use a Masking
input layer.
Or a multiple input model with a separate input for the dependent variables and the univariate
series that your predicting.
Perhaps experiment and see what model you prefer and what works best for your specific
dataset.
REPLY
SAI KRISHNA NATESAN September 3, 2019 at 2:33 am #
Thank you Jason. Is there any reference model that you can point me to where you
have explained how the above has been done?
REPLY
Jason Brownlee September 3, 2019 at 6:18 am #
Great question.
REPLY
raaj February 28, 2020 at 6:02 pm #
Can you please give an example code for handling missing data and using Masking
layer?
I am trying to eliminate training with samples where even 1 point has missing values in case
of multivariate (multiple series)
REPLY
Jason Brownlee February 29, 2020 at 7:09 am #
REPLY
Jem September 3, 2019 at 2:13 am #
Hi Jason,
I was implementing the cross-validation method for the LSTM Encoder-Decoder model, I wanted to
ask you if it is better that at each step I recreate the class or I can use the old one calling the fit
method.
Thanks and Regards
REPLY
Jason Brownlee September 3, 2019 at 6:18 am #
Typically cross-validation is not valid for sequence prediction, instead you must use
walk-forward validation:
https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
REPLY
sara j September 4, 2019 at 12:31 am #
Hi Jason,
If I have to train my model in such a manner that I have the data like :
Input are two columns i.e temperature and pressure i.e. the first 25 perc data and output are also two
column temperature and pressure i.e. the 75 perc data remaining one.
My goal is to predict the temperature and pressure together by giving little input and receiving greater
output to LSTM
. If i train my model by giving input [x,y] can I predict [x,y] but I do not want to give time stamp. Which
method should I follow?
I have already made my data according to your blog and I am now confused hot to train the model
without time steps
REPLY
Jason Brownlee September 4, 2019 at 6:00 am #
REPLY
rita September 10, 2019 at 12:20 am #
Hi Jason,
Why you have not used the minmax scaler over here while training the input sequence in the LSTM
model?
REPLY
Jason Brownlee September 10, 2019 at 5:50 am #
Good question, I skipped scaling to keep the example simpler – e.g. for brevity.
REPLY
rita September 10, 2019 at 6:08 pm #
Thank you very much and I have one more question if I have 200000 data points and I have
to make time steps for them maybe dividing the data into 5 time steps and giving 40,000 points in
each of the timestep for LSTM will it be a good training? or you can suggest something for this? So,
that I can prepare the data properly.
I have a multivariate data of 2 variables and want to predict both of them. So, basically 2 inputs and 2
outputs but do I have to make them supervised first as they are temperature and viscosity and they
are dependent on each other with respect to time.
So, should I supervise them first or I can directly use multivariate time series for the prediction by
dividing the data into 5 time steps and predicting 2 outputs.
REPLY
Jason Brownlee September 11, 2019 at 5:32 am #
Regarding consulting:
https://machinelearningmastery.com/faq/single-faq/can-you-do-some-consulting
REPLY
Himanshi September 13, 2019 at 5:28 pm #
Hi,
can you please tell me how to visualize the results. As, when I am reshaping the array it is not able to
get reshaped into 2 dimension from 3D.
REPLY
Jason Brownlee September 14, 2019 at 6:16 am #
You can use matplotlib via the plot() function to create a line plot.
I think I did not reframe my question properly. My question is for example: I trained my LSTM
model with 300 n_step_in and 300 n_steps_out. Now, after the training, yhat has a shape (20000,
300,2) . So, when I am reshaping it to 2D so as to see the results it is giving me an error and is not
able to reshape it back.
REPLY
Jason Brownlee September 17, 2019 at 6:27 am #
REPLY
sx September 17, 2019 at 3:54 pm #
Hi can i add an extra layer under this one and if yes how should i do that?
model.add(LSTM(200, activation=’relu’, input_shape=(n_timesteps, n_features)))
Thanks in advance.
REPLY
Jason Brownlee September 18, 2019 at 5:55 am #
REPLY
bhavna September 17, 2019 at 6:21 pm #
Hi, can you please tell me is this type of prediction only suitable for sequential data?
REPLY
Jason Brownlee September 18, 2019 at 5:58 am #
Sequence prediction is appropriate for data comprised of sequences. You can learn
more here:
https://machinelearningmastery.com/sequence-prediction-problems-learning-lstm-recurrent-
neural-networks/
REPLY
suraj September 17, 2019 at 8:09 pm #
Hi Jason,
If I have unsupervised data and I make it supervised for the training in the LSTM model.
My question is that when we make the data supervised and we give input data points and we predict
the output data points, but the output is just the n+1 point of input and at last we are only predicting 1
point from the whole data. Basically we are giving the model all the points in the training only. What is
the model actually doing?
REPLY
Jason Brownlee September 18, 2019 at 6:05 am #
The model learns a function that takes input points and predicts the next point.
REPLY
Suraj September 18, 2019 at 4:57 pm #
but what if I want the model to get not all data as input points and just few input
points to predict the remaining data? then what strategy is used?
REPLY
Jason Brownlee September 19, 2019 at 5:52 am #
The examples above will provide a template you can use to start with and adapt for your
problem.
REPLY
sx September 18, 2019 at 12:24 am #
REPLY
Jason Brownlee September 18, 2019 at 6:12 am #
REPLY
Peter September 26, 2019 at 5:52 pm #
Sorry you are talking about time series, what if there is a date with time (I didn’t see the
feature of date and time in your created data)
Jason Brownlee September 27, 2019 at 7:47 am #
Date and time are removed from the dataset and the series of observations is worked
with directly.
REPLY
Luis September 29, 2019 at 9:32 am #
Hi Jason,
I have really enjoyed many of your articles over the last half year. Question on your output vector
model using stacked LSTM model. Under the hood, what type of architecture is being used here for 3
input time-steps and 2 output time-steps. I’m sure it is a many-to-many problem, but can you help me
with the exact visual connection? Is the first output time-step laid out directly over the second time-
step of the input series?
REPLY
Jason Brownlee September 30, 2019 at 6:01 am #
Good question.
For a model that take a sequence input and outputs one time step which happens to be a vector –
I would classify that as many to one model:
https://machinelearningmastery.com/models-sequence-prediction-recurrent-neural-networks/
REPLY
Luis October 1, 2019 at 1:51 am #
Great read, that was a nice nugget of knowledge. Thank you I appreciate you work.
REPLY
Jason Brownlee October 1, 2019 at 6:55 am #
Thanks.
REPLY
Al October 4, 2019 at 8:57 pm #
Hi Jason, thanks for your great posts and prompt replies. On a Multi-Step LSTM Models
when I loaded my dataset I first noticed that the number of steps should be a number divisible by the
length of the dataset (i.e. if my data is 1239 rows, a step in number of 59 is suitable since 1239/59 =
21). In fact trying with non-divisible numbers assigned to n_steps_in would result in nan loss values
when fitting the model. I was indeed able to run all the way 50 epochs using 59 over 1239, however
something I cannot explain happened: after re-running the code without making any changes, the loss
on the various epochs (after setting the verbose to 1) jumped back to nan. Running it again it would
start populating some values and along the way end up in nan.. It is very
and to end up all epochs looks like a lucky test, Could you help me to understand what is wrong?
Thanks!
REPLY
Jason Brownlee October 6, 2019 at 8:09 am #
REPLY
Al October 7, 2019 at 10:52 pm #
Yes, you are correct, as always. Scaling not only did not return nan but also made
each epoch faster to run. Thanks Jason!
REPLY
Jason Brownlee October 8, 2019 at 8:02 am #
Well done.
REPLY
Addi October 7, 2019 at 1:32 am #
Thanks Jason. I apologize if this was addressed somewhere in the list of comments but in
the case of predicting a continuous variable, how would you compare the performance of LSTM vs.
another algorithm such as Random Forest?
Other than comparing the actual value vs. predicted value from both models, is there a separate way
to assess accuracy of both models?
REPLY
Jason Brownlee October 7, 2019 at 8:30 am #
Use the same test harness, that is the same evaluation methodology like walk forward
validation:
https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
REPLY
ABDULKARIM GIZZINI October 8, 2019 at 7:43 pm #
Thanks Jason,
all your work is clear! thank you very much. I have some questions if you please. What are the
differences between all LSTM models you applied above ? is there and performance trade-off
between them? because you repeat the sentence that we can use any of them for time series
forecasting.
on the other hand, im working in the domain of wireless channel prediction. its a complex number
problem. So can I split it into real and Imag parts and apply your LSTM models for each part
separately and then concatenate the output results?
REPLY
Jason Brownlee October 9, 2019 at 8:10 am #
Good question.
Not so much a performance trade-off as different framings of the problem, or different problem
types.
The goal was to show you how flexible the method is and that you should adapt it to your
problem, not your problem to the method.
REPLY
Mayank Prakash October 9, 2019 at 10:46 pm #
I wanted to know how to approach this problem. Let’s say we have a time series with 2
features, ranging from 0 to n as:
[a0, b0], [a1, b1], [a2, b2] upto [an, bn]
The output of the series would be,
[a0 b0], [a1, b1], [a2, b2] -> [a3]
My question is how do I incorporate this so that, I am able to use a0, a1, a2, b0, b1, b2, b3 to feed
into the model and predict [a3].
REPLY
Jason Brownlee October 10, 2019 at 6:59 am #
REPLY
Yawar Abbas October 10, 2019 at 4:30 am #
Great tutorial.
I have a question related to lstm model for time series forecasting problem. I have dataset with four
input features like 78, 153.23, 77.25, 4.33.
The first input ordering difference is like 78,80,87,96….so on.
The other inputs ordering is well like 77.25,77.35,77.40….
I have used lstm model with one previous timestamp as input to predict the next timestamp which
predict well on the last three input but poor for the first one.i.e.
Actual: 78, 153.23, 77.25, 4.33
Predicted: 82, 153.01, 77.02, 4.12
How i tunned this model for good result of first input?
REPLY
Jason Brownlee October 10, 2019 at 7:05 am #
You can discover suggestions on how to diagnose and tune deep learning models here:
https://machinelearningmastery.com/start-here/#better
REPLY
ovi95 October 12, 2019 at 12:01 am #
hi Jason,
I want to make a model to predict the Inflow to a reservoir, with past rainfall data, temperature data,
and also past inflow data.
i want the model to be able to predict the inflow for a week ahead (7 timesteps) when given the past
week’s, rainfall and temperature data.
what model should i use for this?
REPLY
Jason Brownlee October 12, 2019 at 7:05 am #
REPLY
ovi95 October 12, 2019 at 4:25 pm #
thank you jason. i read through the process and some other links on your site and
decided that a multivariate multi-step lstm would help.
Is there specific link for that? because i only found univariate multistep lstm.
REPLY
Jason Brownlee October 13, 2019 at 8:25 am #
The above tutorial is a great starting place that you can adapt to your problem.
Hi Jason,
Can I put time in the X axis to predict wind speed on Y axis?
Best Regards
REPLY
Jason Brownlee October 18, 2019 at 5:48 am #
Sure. You would discard time and model wind speed directly.
REPLY
Ulia October 18, 2019 at 8:27 pm #
sorry, i did not understand that. Should I discard time or I can use it to train my model
so as to predict the wind speed.
REPLY
Jason Brownlee October 19, 2019 at 6:31 am #
REPLY
James A October 21, 2019 at 6:43 am #
Hi Jason,
In the “Multiple Parallel Input and Multi-Step Output” example, you stated that it could be done with
the vector output method, or the encoder/decoder, and proceeded to demonstrate the
encoder/decoder.
I’ve been wondering how the example would look in vector output form. Would the target, y, for each
sample need to be merged into a single 1D array, or vector?
For example,
If y for one sample looks like:
[a1,b1,c1],
[a2,b2,c2],
…
[an,bn,cn]
REPLY
Jason Brownlee October 21, 2019 at 1:38 pm #
Probably one long 1d vector with all time steps that you can then choose to interpret
anyway you wish (e.g. by the structure of the expected/target y).
REPLY
Julien Loutre October 21, 2019 at 12:27 pm #
REPLY
Jason Brownlee October 21, 2019 at 1:41 pm #
REPLY
Kannu October 22, 2019 at 12:55 am #
hey,
How can we se the root mean square error in the training of the model here
Best Regards,
Kannu
REPLY
Jason Brownlee October 22, 2019 at 5:54 am #
Here is an example:
https://machinelearningmastery.com/custom-metrics-deep-learning-keras-python/
Hello Jason,
I am trying to use the CNN-LSTM for forecasting
(175196, 4, 4) (175196, 1)
Where 175196 is the samples, 4 is number of steps and 4 is the features ( variables)
Then i reshape the input vector as directed in the tutorial, but when i run the model
25 model.add(Dense(1))
26 model.compile(optimizer=’adam’, loss=’mse’)
I know it is hard to debug in this manner! but any idea what could be wrong here ?
REPLY
Jason Brownlee October 24, 2019 at 5:46 am #
REPLY
Vishnu Suresh October 24, 2019 at 6:04 am #
REPLY
Vishnu Suresh October 24, 2019 at 8:54 am #
You were right I updated to higher version of tensorflow and keras and it
worked! thanks!
Perhaps try simplifying the example and see what the case of the fault could be on your
workstation?
REPLY
FRI October 24, 2019 at 9:31 pm #
Hi Jason,
Thank you for this interesting article. Can I create one model for all sites with LSTM ? That means if
we have for example a group of persons and every person has its time series data with different
features, LSTM model can learn from all these time series for once?
Best regards
REPLY
Jason Brownlee October 25, 2019 at 6:40 am #
REPLY
FRI November 12, 2019 at 7:26 pm #
REPLY
Jason Brownlee November 13, 2019 at 5:39 am #
You’re welcome.
REPLY
Sarveswara Rao October 30, 2019 at 10:22 pm #
REPLY
Jason Brownlee October 31, 2019 at 5:29 am #
A hyperparameter.
Hi Jason,
i have some questions in LSTM model.
First, it is the LSTM input x definition. In the time series forecast case, we divide input data into some
portion by batchsize parameters. Later, these 2D portion data were transformed into 3D tensor data
and feed to model for training. After all portions feed to the model and complete forward/backward
propagation, the 1 epoch routine is completed. My question is : in the x[t] input time, the LSTM model
input x refers to only first portion of x data or the all portions data ?
Second, what is the LSTM_unit parameter definition ? My understanding is the number of the LSTM
input x vector’s element. For example, if have 10 input, the LSTM_unit should be 10 to capture all the
input vector. But, it is not always requiring the higher numbers such as 20, so on.
Third, is there any “feature importance” example in the LSTM now? I am looking forward and quite
frustration this moment. Could LSTM and XGBoost have sample feature importance result ?
many thank
REPLY
Jason Brownlee November 10, 2019 at 8:18 am #
A unit is like a node in an MLP. Each unit gets the entire input sequence as input. The number of
nodes is unrelated to the number of input time steps.
REPLY
mingkai November 11, 2019 at 7:31 pm #
Hi Jason, I run the first example, but it was failed. It shows: TypeError: Input ‘b’ of ‘MatMul’
Op has type float32 that does not match type int32 of argument ‘a’. Do you know what the problem
is?
REPLY
Jason Brownlee November 12, 2019 at 6:36 am #
REPLY
Jason Brownlee November 15, 2019 at 7:41 am #
You can use Keras 2.3. with TensorFlow 2.0, or Keras 2.2 with TensorFlow 1.15.
REPLY
Bonnard November 12, 2019 at 7:43 pm #
Hi Jason,
I have a problem to modelize and i think lstm network are the most adapted models to do it.
I want to predict the true trajectory of an airplane before it takes off. I have to set of data, the first is
the trajectories announced before departure (the fake), and the second is the trajectory announced
after landing (the true).
I want to predict the true, giving the fake.
I have a list of array, each array represent a flight made by a plane. Each flight is represented by
different variable and after an interpolation i have 50 observations points by flight.
At each point we can observe a vector of our variables like latitude, longitude ect ..
Let assume i have N variables like that.
I have 2200 flights, so my input data is an array with (2200,50,N) shape.
I already tried a little model but oddly the model seems to follow the fake trajectory and not the true.
Do you have an idea of what architecture i can use ?
REPLY
Jason Brownlee November 13, 2019 at 5:40 am #
Perhaps test a suite of different approaches and discover what works best for your
specific dataset?
REPLY
Bonnard November 13, 2019 at 7:45 pm #
Yeah this what i am doing, but maybe you can help with the last layer, i think the
error comes from there.
As i said I have a vector (50,N) shape wich represent a flight with 50 points and N features,
and i want to predict a (50,2) vector wich is 50 points with (latitude longitude).
I cannot use dense layer at the end of the model because it does not return the right shape.
REPLY
Jason Brownlee November 14, 2019 at 8:01 am #
Encoder-decoder with 2 nodes in the output layer and 50 in the repeat vector
layer – this would achieve the desired output.
REPLY
Marlon November 13, 2019 at 12:51 am #
Hello,
Thanks for your tutorials; they are amazing! I’m having the following pitfall by implementing your
ideas: I use your “split_sequences” in order to prepare the network input and, accordingly, I train my
network and save the model. When I use the same input in the trained model and plot it, I get a very
weirdo plot, like the many times over ploted lines. Do you mind what is my problem?
REPLY
Jason Brownlee November 13, 2019 at 5:46 am #
Thanks.
REPLY
Michaela November 13, 2019 at 6:27 am #
Hi Jason,
I’m building a Multiple Parallel Input and Multi-Step Output model, and I’m curious why you repeat the
same LSTM output in model.add(RepeatVector(n_steps_out))? The alternative that I was
thinking is using the keras functional API, training n_steps_out LSTMs from the input, concatenating
the output of these LSTMs, and feeding it into the next LSTM. so it would look something like this
input = Input(shape=(n_steps_in,n_features))
concat_layers = []
for i in range n_steps_out:
concat_layers.concat(LSTM(200,activation=’relu’))(input)
x = tf.keras.layer.Concatenate(concat_layers)
x = LSTM(200,activation=’relu’,return_sequences=True)(x)
x = TimeDistributed(Dense(n_features)))(x)
model=Model(input,x)
model.compile(optimizer=’adam’, loss=’mse’)
The biggest drawback that I can see is there will be a ton more parameters, but are there other issues
that I’m missing? For instance, does this get rid of some relationship between the different timesteps
that the previous model maintains better?
Thanks!
REPLY
Jason Brownlee November 13, 2019 at 1:41 pm #
The reason is because it is an encoder-decoder model where the same encoding of the
input is used in the generation of each output time step.
Perhaps try it and see? It’s could to test a suite of different models in order to discover what works
best for your specific dataset.
REPLY
fan November 19, 2019 at 9:01 pm #
Dear Jason,
thanks for the tutorial, that is very helpful! However, i am having a hard time to understand the input
shape given in the CNN LSTM example below:
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
# define model
model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None,
n_steps, n_features)))
…
Here, X is first reshaped into 4 dimensions, however, the input_shape defined and used in the model
Conv1D layer is 3 dimensions. Is the None used in “input_shape=(None, n_steps, n_features)”
referring to the “n_seq” dimension of X or the number of samples of X…?
And then, the data used to fit and predict are again 4 dimensions…
could you please kindly explain a bit? I am really confused …
thanks a lot!
REPLY
Arjun November 21, 2019 at 8:34 pm #
Hi jason,
What if we had a dataset of every day of a years sales data and we wanted to predict say for example
10 days sales based on the sales data of previous 30 days? What should be the form of output that
we get? and also the code for getting the predicted value? Is it model.predict(X_test)?
REPLY
Jason Brownlee November 22, 2019 at 6:01 am #
REPLY
Arne November 22, 2019 at 12:58 am #
Hey Jason, I am halfway through and reading this stuff is pure joy! Thank you for your
tremendous efforts and making this available! I’ve become an instand fan of your site.
REPLY
Jason Brownlee November 22, 2019 at 6:05 am #
Thanks Arne!
REPLY
jasper November 25, 2019 at 1:15 am #
Hi Jason,
one practical question in LSTM. If the input data sets have the various range, how to deal with the
LSTM forecast model ? For example, if input vector one spans 0~100, vector two spans 0~0.5, could
we still put these two input vectors together to compile the model? I use SHAP package to analyze
the weight. In this case, vector one is always very strong rather than vector two. In mathematical view,
this result is correct. how do you think in this case?
jasper
You can try normalizing or standardizing each variable prior to modeling, or try using a
relu activation.
REPLY
Saeed November 27, 2019 at 1:44 pm #
Hi Jason,
Thank you for such a detailed explanation. I am having an issue with scaling data for a multistep
multivariate lstm problem. I am taking data of last 14/21 days to predict for the next 7 days. Can you
please give any idea what is the proper way of scaling data using MinMax for these type of problems,
as I am lost in the shapes of matrices.
REPLY
Jason Brownlee November 27, 2019 at 1:51 pm #
REPLY
Saeed November 27, 2019 at 3:38 pm #
Thank you. I know how scaling works and I have implemented it in single step
forecasting. However, when it comes to multistep, we actually split the data and it becomes 3
dim after using the split_sequency function. Which means we have 3 dim matrices for X and
Y.
Scaler doesn’t work on 3 dim matrices.
If I do scaling before splitting, I will end up with a matrix dimension that I can’t retrieve after
prediction and thus will be stuck without doing the inverse_transform for scaling. I will
appreciate your help in this matter
REPLY
Jason Brownlee November 28, 2019 at 6:31 am #
Yes, it is sticky. You may have to write some custom code as the libraries don’t
accomodate it.
REPLY
Jason Brownlee December 19, 2019 at 3:03 pm #
REPLY
husfe December 25, 2019 at 8:27 pm #
Hi Jason.
Is there some simple method to add attention to the Encoder_Decoder Model in this article?
I’ve trid to use AttentionWrapper class to achieve it, but I’m failed, because It’s hard for me to do it
during a short time. So can you give me some guide?
Thank you!
REPLY
Jason Brownlee December 26, 2019 at 7:38 am #
REPLY
San December 26, 2019 at 10:33 pm #
Hi Jason,
Thank you so much for this valuable tutorial. Really appreciate it.
Jason, I’m bit new to DL with RNN. I have two small doubts to get cleared. In my question I want to
predict how many steps (i.e:- step counts) a participant walk tomorrow depending on the previous
step counts. For this we have collected step counts of large number of participants for n number of
days.
Is this a univariate problem where each participant step count is taken as a univariate sequence and
train the model? AND do you think RNN is a good move to this problem?
Do I have to scale the sequences of each and everyone’s step counts (by taking the each participants
current mean and sd) Or can’t I use the raw count?
Thank you so much in advanced again. All the best for your future work too!!!!
San
It is a good idea to scale data prior to modeling, perhaps try fitting on raw data first to get started.
REPLY
San December 27, 2019 at 8:24 am #
Thank you Jason !!!. Please keep up the good work !!!
REPLY
Jason Brownlee December 28, 2019 at 7:39 am #
Thanks!
REPLY
Abhishek Singhal January 5, 2020 at 4:06 am #
I am trying to predict next 6 or 12 hours data, as of now trying to predict next 6 hours data training
with n_steps_in- 72 and expecting n_steps_out- 6 with 6 features
but I am getting output as nan
dataset = df_104902.values
# choose a number of time steps
n_steps_in, n_steps_out = 72, 6
# covert into input/output
X, y = split_sequences(dataset, n_steps_in, n_steps_out)
# the dataset knows the number of features, e.g. 2
n_features = X.shape[2]
# define model
model = Sequential()
model.add(LSTM(200, activation=’relu’, input_shape=(n_steps_in, n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(200, activation=’relu’, return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))
model.compile(optimizer=’adam’, loss=’mse’)
# fit model
model.fit(X, y, epochs=30, verbose=0)
# demonstrate prediction
x_input = array(df_104902[-72:])
x_input = x_input.reshape((1, n_steps_in, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)
Output is coming-
REPLY
Jason Brownlee January 5, 2020 at 7:08 am #
Perhaps check the scale of your input data and normalize or standardize prior to fitting the
model?
REPLY
Abhishek Singhal January 5, 2020 at 4:14 am #
REPLY
Jason Brownlee January 14, 2020 at 7:14 am #
REPLY
Willian Alcoser January 11, 2020 at 1:57 am #
Saludos Jason, una consulta, como puedo validar el método de predicción. He visto en otros
ejemplos que la serie lo dividen en dos partes: en entrenamiento y prueba, y en este caso no lo hace,
a que se debe eso ?
REPLY
Jason Brownlee January 11, 2020 at 7:27 am #
REPLY
Mehdi January 17, 2020 at 12:06 pm #
Dear Jason,
First of all, thank you so much for your time and great contents.
Second, I studied your website for long time. I have a question: I have developed a model which
predict the price of shares, my model can predict X_test data as well, now how can I forecast
sequences(future times) does not happened?
REPLY
Jason Brownlee January 17, 2020 at 1:50 pm #
You’re welcome.
REPLY
Mehdi January 19, 2020 at 4:18 am #
newData are not available, i.e. the future days does not happened and not available,
how do I prepare them for the model?
REPLY
Jason Brownlee January 19, 2020 at 7:20 am #
You must design and train your model based on the data you will have available
at the time a prediction is required.
For example, if you have 7 days prior data at the time of prediction when predicting the
next week, then design your model around that and train it on that type of data.
Then when you start using your model on new data, you will have the data available.
REPLY
Mehdi January 19, 2020 at 5:28 pm #
Dear Jason,
Thank you so much for your time and attention. I will try your approach.
REPLY
Jason Brownlee January 20, 2020 at 8:39 am #
You’re welcome.
REPLY
Pietro FUSCO January 21, 2020 at 3:07 am #
Dear Jason,
Thank you so much for your time and attention
I was wondering if I can use time as univariate sequence.
Regards
REPLY
Jason Brownlee January 21, 2020 at 7:18 am #
Sure.
REPLY
Adonis El Hajj January 23, 2020 at 1:49 am #
Hello Jason,
my model will learn from the past Forcast data and past actual AC Power data.
my Input is the future 7 days Forecast as csv file.
my goal is to predict the AC Power data based on the input.
I dont know how to apply what I want to you model here.
can you please help me?
REPLY
Jason Brownlee January 23, 2020 at 6:39 am #
REPLY
Anshu Shah January 25, 2020 at 6:30 pm #
REPLY
Jason Brownlee January 26, 2020 at 5:15 am #
REPLY
Patrick February 7, 2020 at 11:13 pm #
Dear Jason,
Thank you for your contributions. You have helped me a lot in the start of deep learning.
I have a question. I am working on a model and surprisingly the predicted output shape is
different from the target shape of training data
Traning: X (12000, 12, 8), Y (12000,)
Test: X (3000, 12, 8); Y (3000,)
pred = model.predict (X (3000, 12, 8))
and pred shape is (3000, 12, 1) but I was expecting (3000,)
what am i doing wrong?
Please help me
REPLY
Jason Brownlee February 8, 2020 at 7:13 am #
Perhaps double check the structure of your model, e.g. the output layer/model.
REPLY
wang February 3, 2020 at 5:49 pm #
Dear Jason,
thanks for the tutorial, that is very helpful! However, I use data normalization method for input
data(10,20,30…) carry out your Multi-Step LSTM Models, it happens error. I dont konw how to resolve
it. Pls see the belowing program. Thanks!
training_set = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
training_set = training_set.reshape(-1,1)
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
raw_seq = sc.fit_transform(training_set)
print(raw_seq)
# define model
model = Sequential()
model.add(LSTM(40, activation=’relu’, return_sequences=True, input_shape=(n_steps_in,
n_features)))
model.add(LSTM(40, activation=’relu’))
model.add(Dense(n_steps_out))
model.compile(optimizer=’adam’, loss=’mse’)
# fit model
print(‘X: \n’,X)
print(‘y: \n’,y)
model.fit(X, y, epochs=60, verbose=0)
# demonstrate prediction
#x_input = array([70, 80, 90])
x_input = np.array([70, 80, 90])
x_input= x_input.reshape(-1,1)
x_input = sc.transform(x_input)
x_input = x_input.reshape((1, n_steps_in, n_features))
yhat = model.predict(x_input, verbose=0)
yhat = sc.inverse_transform(yhat)
print(100,110)
print(yhat)
REPLY
Jason Brownlee February 4, 2020 at 7:48 am #
I’m eager to help, but I don’t have the capacity to debug your code, sorry.
REPLY
wang February 4, 2020 at 3:09 pm #
REPLY
Jason Brownlee February 5, 2020 at 8:01 am #
REPLY
Wandy February 10, 2020 at 7:36 pm #
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
# define model
model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation=’relu’), input_shape=(None,
n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation=’relu’))
The code in your post, use CNN+LSTM for Univariate Models above.
I am confused in the numbers of n_seq, why is 2. And Can I consider the n_seq as the times_step of
LSTM?
REPLY
Jason Brownlee February 11, 2020 at 5:10 am #
The configuration is arbitrary, you can change it you anything you like.
REPLY
Wandy February 11, 2020 at 2:57 pm #
And can I consider the n_seq as the times_step of LSTM? I mean the first subsequence is the
first step as input of LSTM, and the second subsequence is the second step as input of
LSTM?
REPLY
Jason Brownlee February 12, 2020 at 5:41 am #
REPLY
Francis February 12, 2020 at 7:25 pm #
REPLY
Amelie February 12, 2020 at 11:58 pm #
In your code:
In your code:
#########################
# define model
model = Sequential()
model.add(LSTM(50, activation=’relu’, return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(50, activation=’relu’))
model.add(Dense(1))
#########################
REPLY
Jason Brownlee February 13, 2020 at 5:42 am #
Yes.
REPLY
bunty sahoo February 13, 2020 at 4:28 pm #
Thanks for the wonderful explanation. I have query regarding which category my dataset and
requirement falls into.
i want to forecast what will be the number of defects of all parts in next 2
Tools shipped column has relationship with number of defects.I have future data for Tools shipped
too. so output desired :
2019-01-22 part a 2 ??
2019-01-22 part b 2 ??
2019-01-22 part c 2 ??
REPLY
Jason Brownlee February 14, 2020 at 6:28 am #
REPLY
Ayush February 13, 2020 at 8:04 pm #
Hi Jason,
Thank you for such an informative tutorial. I am planning on implementing LSTM for a multivariate
time series data. The input dimension is (1000*7*24) and the output is (1000*30). I wanted to
understand how can I decide how many layers and units to use. Similarly the batch size which would
be appropriate in this case. It would be great you could comment on some of standard heuristics or
point to some reliable resource for the same.
REPLY
Jason Brownlee February 14, 2020 at 6:29 am #
REPLY
Andrei February 19, 2020 at 9:07 pm #
Hi Jason,
Great tutorial, it really helped my get on my feet and started.
I have a question on Multiple Parallel series. Does parallel mean the input features and the output are
treated as independent across columns?
to predict:
[F1_t5, F2_t5, F3_t5]
Also, how would you go about combining Multiple input and Multiple parallel series in a case where
the the input is a N-feature vector and using 3 timesteps, predict M-features (many-to-one), where M
< N (and M-features included in the N-features) .
And on a separate note, any literature suggestions for using this with categorical data? I tried
encoding to numerical, but they are not treated as categories
REPLY
Jason Brownlee February 20, 2020 at 6:12 am #
No, parallel inputs mean separate input variables measured over time. Perhaps this will
help:
https://machinelearningmastery.com/taxonomy-of-time-series-forecasting-problems/
Yes, try ordinal and one hot encode and compared to an embedding:
https://machinelearningmastery.com/how-to-prepare-categorical-data-for-deep-learning-in-python/
REPLY
zhouhua February 20, 2020 at 3:14 pm #
Hi Jason,
I am new for the LSTM, can you put a related picture of topology for each type’s visualization?
REPLY
Jason Brownlee February 21, 2020 at 8:18 am #
REPLY
Roberto February 22, 2020 at 5:07 am #
Hi Jason,
Thank you very much your effort and for offering us your great tutorials. I enjoy a lot!
I do not have much experience with LSTM so I get already problems with definitions which are
problably clear for most of the readers. For Vanilla LSTM you say you use 50 LSTM units. Does it
mean you have 1 LSTM whose Input is 3 dimensional and the output 50 dimensional or you actually
have 50 LSTM accepting 3 dimensional vectors and 1 dimensional outputs?
REPLY
Ben February 27, 2020 at 3:15 am #
Hi Jason, where you state Vanilla LSTM for univariate time series
forecasting and make a single prediction. is it possible to predict more than a single
variable? How would I modify to make 5 value predictions?
REPLY
Jason Brownlee February 27, 2020 at 5:59 am #
REPLY
Alvaro Fierro Clavero March 4, 2020 at 12:59 am #
REPLY
Jason Brownlee March 4, 2020 at 5:57 am #
Thanks!
REPLY
manjeet kumar yadav March 6, 2020 at 6:18 pm #
Hi Jason
I need help i am working on project for HAR for video dataset, could you help me making model
which use cnn-lstm .
REPLY
Jason Brownlee March 7, 2020 at 7:14 am #
REPLY
Jason Brownlee March 7, 2020 at 7:21 am #
REPLY
David March 7, 2020 at 8:02 am #
Thanks
REPLY
Jason Brownlee March 8, 2020 at 6:00 am #
You’re welcome.
REPLY
Peter Yocote March 19, 2020 at 7:59 am #
Hello Jason
I was wondering how could we know the accuracy and have some sort of validation_data (the
parameter used in model.fit).
This to obtain the loss and accuracy curves for training and validation
REPLY
Jason Brownlee March 19, 2020 at 10:13 am #
Do you think that TS Deep Learning has proved itself successful when applied to stock market
forecasting?
REPLY
Jason Brownlee March 31, 2020 at 8:15 am #
No.
See this:
https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-
finance-or-the-stock-market
And this:
https://machinelearningmastery.com/findings-comparing-classical-and-machine-learning-
methods-for-time-series-forecasting/
REPLY
Marlon April 1, 2020 at 7:41 am #
df = np.load(‘Principal.npy’)
# Conv1D
#model = load_model(‘ModeloConv1D.h5’)
model = autoencoder_conv1D((2, 20, 17), n_passos=40)
model.load_weights(‘weights_35067.hdf5’)
# summarize model.
model.summary()
# load dataset
df = df
# Make predictions
test_predictions = model.predict(X)
## test_predictions.shape = (59891, 40, 1)
test_predictions = model.predict(X).flatten()
##test_predictions.shape = (2395640, 1)
plt.figure(3)
plt.plot(test_predictions)
plt.legend(‘Prediction’)
plt.show()
REPLY
Jason Brownlee April 1, 2020 at 8:12 am #
REPLY
Marlon April 1, 2020 at 10:34 pm #
Hello,
Thank you very much for your reply. Anyway, it didn’t help. I’ve changed the input size of my
Conv1D from (2, 20, 17) to (40, 1, 17), but it didn’t accept – it tells me that it has negative
dimension. I don’t understand why it doesn’t happen when training the network but does
when I use the saved model to predict.
REPLY
Marlon April 1, 2020 at 10:36 pm #
REPLY
Jason Brownlee April 2, 2020 at 5:54 am #
– Consider aggressively cutting the code back to the minimum required. This will help you
isolate the problem and focus on it.
– Consider cutting the problem back to just one or a few simple examples.
– Consider finding other similar code examples that do work and slowly modify them to
meet your needs. This might expose your misstep.
– Consider posting your question and code to StackOverflow.
REPLY
Marlon April 2, 2020 at 2:33 am #
I’ve used your split_sequences() for multivariate and 40 steps. Therefore, for dataset was taking the
ith+40 steps and later ith+1+40 steps and so on. It always has the last item of each subsequence as a
new one, all the rest equals the past subsequence.
The output layer, for some reason that I still couldn’t figure out, is making a prediction of every
subsequence. Then I design a function that takes the first item of each subsequence.
def separador_output(sequence):
X = list()
for i in range(len(sequence)):
x = sequence[i][30]
X.append(x)
return np.array(X)
I sharing that because I still believe that there should be a manner of doing this without introduce
such function as above.
Best regards!
REPLY
Jason Brownlee April 2, 2020 at 6:01 am #
Perhaps write your own function for preparing your data how you need it?
REPLY
Jim April 4, 2020 at 11:26 am #
I am trying to perform an LSTM model of time series data following the strategy you outline in tis
article.
I have one input (feature) at multiple timepoints in the past, and I use your code “split_sequence()” to
split the univariate sequence into multiple samples, each with a specified number of time steps and a
single output.
I have to standardize my “train” dataset for which I had planned on using StandardScaler (per your
other excellent articles including: https://machinelearningmastery.com/how-to-develop-lstm-models-
for-time-series-forecasting/). I am performing the standardization prior to performing the SPLIT into
multiple samples for the LSTM. This seems straightforward (although please comment if you think this
plan is inappropriate.)
The complication is that at any given timepoint, my single input feature actually has multiple values,
each derived from any one of many “related but independent” sources. While I can perform the LSTM
on each source separately, I would like to try maximizing my sample size by performing the LSTM on
the aggregate of all of the sources (since the sources seem to follow similar behavior to each other,
but not necessarily within the same time window). Or at least I would like to see what the results of
that aggregated model looks like. My only question is: does it make more sense to perform the data
input standardization separately for each source (so each source is standardized to mean of zero and
SD of 1, and has equal weighting in the model), versus standardizing once across all sources in the
aggregated data.
Jim
REPLY
Jason Brownlee April 5, 2020 at 5:39 am #
REPLY
Jim April 5, 2020 at 8:24 am #
I understand “feature”.
By ‘”time series” variable’: Do you mean each of the individual sequences created by the
“split_sequence()” function in your example is scaled separately?
Thank you!
REPLY
Jason Brownlee April 5, 2020 at 1:40 pm #
REPLY
Jam April 5, 2020 at 7:58 pm #
Hey, Jason Thanks for your helpful blog. could you please help me on a case?
my data includes a fixed size of input as (1, 16, 2) . but output is different in number of timesteps. i
mean that one may be like (1,2,2) or other may be (1, 20, 2). i thought to use Encoder-Decoder
format. but the problem is determining dimension of “repeatVector()”. how should i do that?
is it possible to adjust its size for each input?
REPLY
Jason Brownlee April 6, 2020 at 6:04 am #
Perhaps try padding all output sequences to the same length and use an encoder-
decoder model to that length.
REPLY
Abhishek Neema April 7, 2020 at 7:41 am #
REPLY
Jason Brownlee April 7, 2020 at 7:49 am #
This tutorial will help you understand input shape for LSTMs:
https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-
timesteps-and-features-for-lstm-input
REPLY
Jason Brownlee April 7, 2020 at 1:33 pm #
REPLY
Jason Brownlee April 9, 2020 at 7:55 am #
REPLY
Dan April 9, 2020 at 12:35 am #
I’m wondering what the intuition behind applying a convolution with a 1D kernel on a sequence of
data is? What does this involve – is this equivalent to taking a single value as a feature, to represent
the input sequence?
REPLY
Jason Brownlee April 9, 2020 at 8:05 am #
Thanks.
REPLY
Jason Brownlee April 10, 2020 at 8:26 am #
Each filter will extract different patterns from the sequence – in an analogy to
filters extracting patterns from an image.
REPLY
Ricardo April 12, 2020 at 10:17 am #
Hi Jason,
What are the limits of LSTM models on multistep prediction length?, like if we have N samples and M
features and we are predicting K future samples of a 1-D variable. Is there a way to relate N, M and
K? or to have a quick rule of thumb on how large K can go before it doesn’t make sense anymore?
Thanks!
REPLY
Jason Brownlee April 12, 2020 at 1:15 pm #
The further you predict into the future, the more errors will compound.
That is about as general as we can go – you will need test specific models on specific datasets to
learn more.
REPLY
younes April 16, 2020 at 8:42 pm #
Hi Jason,
is there a method to choose the best n-steps-in? knowing that I need to make a prediction of 3 days,
and I have a data of 1 year (8760 observations).
REPLY
Jason Brownlee April 17, 2020 at 6:19 am #
Test different values for your dataset and use the configuration that results in the best
average performance.
REPLY
Dkb April 16, 2020 at 10:24 pm #
REPLY
Jason Brownlee April 17, 2020 at 6:20 am #
REPLY
Abdül Meral April 17, 2020 at 4:40 am #
REPLY
Jason Brownlee April 17, 2020 at 6:27 am #
Well done!
REPLY
esyraq ekram April 18, 2020 at 1:18 am #
omg omg omg omg omg, I just graduate last year and i did my internship. there i learn the
great wonders of machine learning. for 1 year i have look at so many tutorial saying a this and that
blah blah blah then it takes me a couple of hours to try to run that nonsense. Then i saw this, omg I
cant stop saying that. this is what i want. this is it, i just want a simple code that i can run my self, i
don’t need the million lines of explanation. i just want to know what works. you sir are my hero. Thank
you so much from the bottom of my heart, i really feel that i don’t deserve this kindness of free usable
knowledge. Thank you Dr Jason Brownlee my hero.
REPLY
Jason Brownlee April 18, 2020 at 6:04 am #
Well done!
REPLY
Jung Hwan Kim April 29, 2020 at 4:59 pm #
Professor Brownlee , What makes you put relu activation function on LSTM? When I tested
my own project, the loss value increased astronomically (e.g. Loss: 2382585115.4067 Acc: 0.23)
When I removed relu function on LSTM. It run as charm. Could you explain it more about this topic?
REPLY
Jason Brownlee April 30, 2020 at 6:36 am #
If it doesn’t work well for your data, don’t use it. Find what works best on your project.
REPLY
H. Joner May 4, 2020 at 3:37 am #
Hi professor Brownlee,
I’m thinking why when i use “Multiple Input Multi-Step Output” and select just 1 n_steps_out i don’t get
the same result has i just make a simple Multiple Input Series predicting the next output?
Shouldn’t get the same result?
Thank you,
REPLY
Jason Brownlee May 4, 2020 at 6:26 am #
The models and data are very small in these cases, they are to show you how to use
them, not actually solve the tiny prediction tasks.
REPLY
Ehsan Ameri May 9, 2020 at 8:30 pm #
Hello everyone.
Thanks to Mr.Jason Brownlee
I reviewed some examples of the site (airplane passengers and shampoo) and I am totally confused
the concept of timesteps with features.
REPLY
Jason Brownlee May 10, 2020 at 6:08 am #
REPLY
Juan Pablo May 15, 2020 at 8:49 am #
REPLY
Jason Brownlee May 15, 2020 at 1:27 pm #
The tutorials on sequence classification here will help you to get started:
https://machinelearningmastery.com/start-here/#deep_learning_time_series
REPLY
Hrishikesh Bawane May 16, 2020 at 6:29 am #
Thank you so much for this. Preparing data is always a great task. I have a chat dataset
which I want to use to create a chatbot. How should I prepare data for the encoder-decoder model?
REPLY
Jason Brownlee May 16, 2020 at 10:11 am #
Perhaps prepare sequences of integers (ordinal encoded words) for input and output sequences
per sample.
Hi Jason, thank you so much for this article. I really liked this and learned many things.
I have a time series data, I have 60 input data points and I have to predict 1 output at the last layer of
LSTM, so basically I want my lstm to be
first_day_data–>lstm_unit1–>second_day_data–>lstm_unit2–>….60th_day_data–>lstm_unit60–>den
seLayer–>output.
REPLY
Jason Brownlee May 20, 2020 at 1:32 pm #
REPLY
Makarand Datar May 20, 2020 at 1:01 pm #
Hi Jason,
If I have a time series with say 600000 time steps as output and 3 time serieses for 3 features all of
them also of the same length. Then I form sequences (say 60 consecutive features at a time, first
sequence will be 1-60, second will be 2-61 third will be 3-62 and so on) from my data and now I want
to split it into training and testing sets.
If I shuffle the sequences (within a sequence, all the points will still be chronologically sequential) and
then split the data into 80 – 20 train test split, is that ok or would it lead to data leakage into testing?
REPLY
Jason Brownlee May 20, 2020 at 1:36 pm #
Yes.
REPLY
Makarand Datar May 20, 2020 at 1:43 pm #
Hi Jason, sorry for my ambiguity in the question. You said yes for
1. Ok to shuffle or
2. data leak into testing?
If you shuffle sequences, then train a model, it will likely lead to data leakage and an
optimistic evaluation of model performance.
You’re welcome.
REPLY
Makarand Datar May 20, 2020 at 1:02 pm #
Also, thank you for this and all the other articles!
REPLY
Jason Brownlee May 20, 2020 at 1:36 pm #
You’re welcome!
REPLY
Matheus May 23, 2020 at 4:17 am #
REPLY
Jason Brownlee May 23, 2020 at 6:30 am #
(to be more precise I have x products with sales for z days and I want to make each of the products a
train sample, but by doing this the days will be repeated for each product is it correct? or should I
build a train sample to contain all items?) * I mention that I’ve implemented the sliding window
approach to build the dataset for each item
REPLY
Jason Brownlee May 31, 2020 at 6:32 am #
Thanks!
Yes, the order of samples probably matters both in splitting data for train/eval and within the
training and test set themselves.
REPLY
tnuich June 10, 2020 at 4:48 pm #
At the model.fit the samples are automatically shuffle, so in this case the order of
samples(items)[in train] still matter?
REPLY
tuteja June 3, 2020 at 5:45 am #
hi Jason,
Could you provide your opinion on this usecase – I am working on a multivariate, multi-step time
series problem to forecast sales for each of the cities. I understand from your tutorials how to use
LSTM with vector output on such a problem but how do I handle the forecasting by cities? one way I
read is to build separate models for each of the cities and then model concatenate at the end. what
are your thoughts? do you have a post on it that I can refer to?
Your blog has been a “go to” solution for all my problems. Thanks for sharing knowledge and keeping
it simple!
REPLY
Gopikrishna K S June 8, 2020 at 3:21 am #
Thanks a lot for the article, can you please explain briefly how to do the same in Java using
Deeplearning4j library
REPLY
Jason Brownlee June 8, 2020 at 6:18 am #
REPLY
Adideva June 10, 2020 at 4:04 am #
Hey Jason
Thanks for the wonderful article. Can you help me with the data reshaping for the Multiple Parallel
Series for a CNN LSTM model? It would be great if you could provide a python function fro the same.
As a beginner, it is a bit tricky to understand the data shapes needed for the different models. Thanks
REPLY
Jason Brownlee June 10, 2020 at 6:21 am #
REPLY
Onur June 14, 2020 at 8:43 pm #
Hi Jason ,
I get the error that string data could not be converted to float type.
REPLY
Jason Brownlee June 15, 2020 at 6:05 am #
REPLY
Higo Felipe Pires June 16, 2020 at 4:31 pm #
Hi, Jason. Thank you for your helpful blog and post.
I’m doing a project to predict COVID-19 growth in countries/regions. My plan is to use data of a
handful of chosen countries in training and do the prediction with only one country (dataset:
https://github.com/datasets/covid-19/blob/master/data/time-series-19-covid-combined.csv). Is this
possible with the knowledge exposed in the post? If yes, which type of time series I’ll have to apply?
Univariate? Multivariate? Multi-step?
Best regards,
Higo
REPLY
Higo Felipe Silva Pires June 17, 2020 at 5:45 am #
I wanna use the “Confirmed”, “Recovered” and “Deaths” to predict “Cases” (and eventually
“Deaths”).
REPLY
Jason Brownlee June 17, 2020 at 6:18 am #
The growth rate can be modelled directly with an exponential function, use the
GROWTH() function in excel.
REPLY
Higo Felipe Pires June 17, 2020 at 7:03 am #
Jason, thanks for the reply, but I don’t think I expressed myself in the best way.
What I intend to do on my project is to train an LSTM with data from confirmed cases,
recovered patients and deaths from a certain set of countries and try to predict the number of
cases in another country. The dataset is that on my first comment.
For example: training the LSTM with data from Australia, Costa Rica, Greece, Hungary and
Israel (from 2020-01-22 to 2020-06-15) and trying to predict the number of cases in Brazil
(here i would like to try two approaches: a validation with predictions in the same range
2020-01-22 to 2020-06-15, and another aimed at predicting future cases, beyond the date
2020-06-15).
Which of the approaches exposed in the article should I use? It is not yet clear to me which
would be the best.
Thanks in advance.
REPLY
Jason Brownlee June 17, 2020 at 1:40 pm #
That sounds like a great project, I think this might give you some ideas:
https://machinelearningmastery.com/faq/single-faq/how-to-develop-forecast-models-for-
multiple-sites
REPLY
Syed Nazir Hussain June 20, 2020 at 1:05 am #
I would like to know, how can I get the next week’s forecasting results in the vanilla LSTM model. In
this site example, we only get single forecast value.
Can you help me in this senario.?
REPLY
Jason Brownlee June 20, 2020 at 6:16 am #
See this:
https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
REPLY
Kyu June 22, 2020 at 8:26 am #
Hi Jason,
Thank you for the valuable post. I have a question regarding multi-step LSTM model. I was trying to
apply CNN-LSTM for the multi-step model, but I am a bit confused on reshaping [sample, timesteps]
into [sample, subsequences, time steps, features].
but in case of CNN-LSTM, we need the number of subsequence for the CNN model. But whenever I
input n_seq=2 and run the code
X = X.reshape((X.shape[0], n_seq, X.shape[1], n_features))
, the error occurred: ValueError: cannot reshape array of size 15 into shape (5,2,3,3)
You may need to experiment with different input shapes that are divisible by the number of
timesteps in each sample.
REPLY
mike mirza June 24, 2020 at 4:58 am #
so I need to predict a time series but with help of another that I already have, whats the best
approach?
REPLY
Jason Brownlee June 24, 2020 at 6:39 am #
I recommend that you test a suite of different models and configs and discover what
works best for your dataset, this may help:
https://machinelearningmastery.com/how-to-develop-a-skilful-time-series-forecasting-model/
REPLY
busssard June 25, 2020 at 7:43 am #
REPLY
Jason Brownlee June 25, 2020 at 1:04 pm #
You’re welcome!
Hello,
I am testing some forecasting algorithms including the LSTM model. From this, I wanted to seek its
complexity in terms of memory and computing time.
So if you allow me, what complexities for the example of the univriate time series forecasting
presented in the example above.
REPLY
Jason Brownlee July 6, 2020 at 2:05 pm #
Not sure off hand, sorry. You might have to check the literature if anyone has estimated
the big-O for the method.
REPLY
Julian July 10, 2020 at 5:07 pm #
Hi Jason,
I have a litlle complictaed, but I think not so rare forecast Problem I’d like to solve.
Example description:
Lets say we do klimate-measurements at ground level but also at 15km hight. The last 2 years we
started weatherbaloons every day to measure i.e. the pressure at 15 km hight. weather balloons are
expensive and not really enviromental friendly, so we like to reduce the amount of weather balloons
we need.
The idea:
from now on, we could start a weather balloon only every Sunday. The folowing 6 days we would
predict the pressure at 15 km hight based on the current measurements of each day at ground level
and on the Sundays we could ‘refocus’ our model using the real world measurement.
This sounds feasable to me, but I do not know where to start.
my first idea:
Not really what i want but possible:
put all the input data for last week together ((Sunday+)? Monday-Sunday) in one feature set and build
a standard RNN to predict the pressure in 15km hight for Monday-Saturday. I think this would work,
but then i would only get the values for last week. If I would like to have a estimation for today I to not
see a way.
I think there are many processes you could optimise this way. Also in Industry where products of one
batch often have rather equal properties. We could drastically reduce the prodcution time if we predict
kalibration Measurements which take a long time to perform based on rather simple Measurements.
Do you have a idea how I could start building such a model? Do you know a good book with a similar
example?
Chears,
Julian
REPLY
Jason Brownlee July 11, 2020 at 6:05 am #
Generally, I would encourage you to prototype and evaluate each approach you can think of,
rather than guess a priori what might be best – use results to guide you.
REPLY
Joe July 13, 2020 at 11:56 pm #
Thanks again.
REPLY
Jason Brownlee July 14, 2020 at 6:27 am #
You can use either, as long as the model matches the data.
REPLY
Emmanuel July 16, 2020 at 5:27 pm #
Great tutorial, your work has always been of help to me. I am trying to develop a predictive
model for a belt drive. In this case, my time series data is not necessarily for forecasting but the
trained model predicts the status of the belt drive based on new time series data. Is LSTM
nevertheless optimal or do you have any two to three neural network you can recommend in this
case?
REPLY
Jason Brownlee July 17, 2020 at 6:02 am #
You’re welcome.
Good question. I recommend testing a suite of algorithms and algorithm configurations in order to
discover what works best for your specific dataset.
REPLY
Melissa July 22, 2020 at 2:58 am #
Hey Jason, I have very much enjoyed your tutorial! In your opinion, is there a ‘right’ amount
to data points (e.g., rows) to feed into an LSTM model? I was thinking to use around 500000 – 1M
data points and I was wondering if they are too much and what wold be the limitations between using
a small dataset vs a very large one?
REPLY
Jason Brownlee July 22, 2020 at 5:44 am #
Thanks.
Hello Jason,
Thanks for your great tutorials, they have been always very helpful.
I’m interested in the calculation process behind LSTM. I’m familiar with all formulas which are used in
LSTM but I’m not sure what is the input at each calculation step in Vanilla LSTM example.
For example, let suppose that the input time series is [30, 40, 50]
So, at the first step, using C_{0} (cell memory), H_{0} (cell output) and number 30 (from time series
above), we calculate C_{1} and H_{1}
Next, using C_{1}, H_{1} and 40 are calculated C_{2} and H_{2} and so on. Right?
I’m a little confused because in sentence time series, each word can be represented as a one-hot
vector and in that example, the sentence would be time series of one-hot vectors and at each
calculation step, the input in the formula would be one one-hot vector.
Regards, Enes
REPLY
Jason Brownlee August 3, 2020 at 1:31 pm #
You’re welcome.
And this:
https://machinelearningmastery.com/faq/single-faq/how-is-data-processed-by-an-lstm
REPLY
Jinhui August 3, 2020 at 8:05 pm #
So the input [100000, 150, 11] and output [100000,15, 11] are used to train. I set the epochs to 50 and
got the model after 4h’s training. But I find that all the prediction result of this model keeps constant,
i.e. [0, 15, 11], [1, 15, 11], [2, 15, 11], … are the same.
I will be grateful if you could give me some possible reasons that I should check.
Thank you!
REPLY
Gab August 7, 2020 at 4:42 am #
Which of the above models would be more logical to use so that I could predict both variables at the
same time?
How would I predict the next 30 days of the month from the last dataset date?
REPLY
Jason Brownlee August 7, 2020 at 6:29 am #
REPLY
Sumedha Sandip Borde August 8, 2020 at 6:31 pm #
Hello
I have an EEG(brain signal) dataset which i want to use for classification. .64 electrodes are attached
to every subject(patient) and 5012 samples are recorded for every electrode. this way every subject
has 64 series of 5012 samples and one class label for each subject. likewise there are 108 such
subjects.
Can you suggest the right deep learning method that can be used for classification?
REPLY
Jason Brownlee August 9, 2020 at 5:37 am #
Accidents that happened in the morning would affect traffic in the afternoon and traffic patterns that
developed in the morning due to these accidents will as well. Hence I thought an LSTM could help.
But I want to test against simpler models.
I would imagine model performance varies over the course of the day so my performance measure
would be a graph showing the errors of the model over the course of the day as a distribution as the
test set would include multiple days.
Where I am stuck is the training part: I selected a few characteristic days over the past years that I
want to pass to the model. I assume that no effects spill over from one day to the next as there’s
almost no traffic at night. So in effect my train data set consists of a number of days that shall be
taken individually. That way I don’t have to pass years of data but can select typical days and only
train on these. How do I pass these to the model and avoid at the same time that the “memory” takes
info from previous days into consideration?
Should I just use one model.fit(X, y) where I add a dummy variable to the X representing the day?
that doesn’t seem like good practice to me. If I do not point out the day specifically the model may
think that the state of the neural network from the day before would affect the following day.
REPLY
Mike August 17, 2020 at 6:43 pm #
Sorry, mistake in the last code snippet. That would have to be:
REPLY
Jason Brownlee August 18, 2020 at 6:00 am #
Perhaps you can use all prior data up to the day you want to test as training, then test on
the hold out day. Repeat for each day you want to evaluate.
REPLY
emmanuel August 19, 2020 at 10:58 pm #
Hello Jason, thank you for this tutorial which is very useful. I’m working on panel data right
now, i.e. I observe certain variables on several individuals at different times. I have a dataset of 719
individuals and 11 variables observed daily over 10 years (2010 to 2019)
Can we apply an LSTM model on these data?
If yes, how to prepare the data (reshape).
Thank you.
REPLY
Jason Brownlee August 20, 2020 at 6:42 am #
LSTM might be appropriate if each subject is a time series and you want to learn across
subjects.
REPLY
Malik Elam August 23, 2020 at 8:53 pm #
Hi Jason,
Your answers were always inspiring for me. I am always thankful for that.
Assuming (t-current time, t+1 -future time), I prepared my data set as follows: Xt -> Yt+1, so that data
links current feature inputs with future change in the price.
As I understand, RNNs try to map Yt -> Yt+1. In my case I cannot include (Yt+1, Yt+2,…) in the
training set as upon using the trained model for forecasting; these will be unknown future values that
cannot be fed into the model.
On the other hand, using a data set of Xt -> Yt does not hold the core information: the future price
change of the stock, so as to be able to forecast it.
How can I make use of say: Xt-n, … ,Xt, Yt-n,…,Yt to forecast Yt+1 ?
REPLY
Jason Brownlee August 24, 2020 at 6:22 am #
Thanks.
REPLY
Jason Brownlee August 24, 2020 at 7:48 am #
Recall, you have complete control over the choice of inputs and outputs of the model. Try
a suite of different framings of the problem and discover what works well/best for your
dataset.
Perhaps try modeling the target using any and all data you have available. Then, once
you have something working, try removing/ablating data and review how it impacts model
performance.
If you’re question is about the mechanics of preparing data, perhaps the function in this
tutorial will be helpful:
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-
python/
REPLY
Asish August 25, 2020 at 4:29 am #
Hi Jason, I have time series eye data like diameter, number of blinks, duration of fixation and
each features has different threshold like diameter is more than 3.5 means high cognitive load for
eyes. Which LSTM can I use for this dataset to measure cognitive load? Or any other ML will fit for
this problem?
REPLY
Jason Brownlee August 25, 2020 at 6:44 am #
I recommend testing a suite of different models and model configurations, not list lstms,
in order to discover what works best for your specific dataset.
REPLY
Asish August 28, 2020 at 6:27 am #
Hi Jason,
Thanks for your suggestions.
I don’t have ground truth data. I’m recording data using device and I’m thinking to ask user to
label data for the last 2/3 min recording data. But it has downside to label many rows with
same label. Is there any way to generate ground truth data?
Thanks
REPLY
Jason Brownlee August 28, 2020 at 6:58 am #
You can take each candidate answer as a separate row, or try consolidating
each row using the mode or mean estimate.
REPLY
Simon PERROTT August 25, 2020 at 6:52 pm #
I love your articles so much I’ve bought several of your books which I find excellent.
Do you have any advice for how I can balance out my training dataset?
REPLY
Jason Brownlee August 26, 2020 at 6:48 am #
Great question.
Finally, try simple duplication of input patterns for the minority class and add gaussian noise to the
observations – e.g. a primitive form of random oversampling.
REPLY
Simon Perrott August 26, 2020 at 8:51 pm #
I’ll try those out, I’m learning a lot from you and appreciate your explanations
REPLY
Jason Brownlee August 27, 2020 at 6:13 am #
You’re welcome.
REPLY
Khin Thida San August 27, 2020 at 2:16 pm #
Thank so much for your articles, I have been learning deep NN and LSTM, this helps me a
lot to understand deep down and to build my own model for time series analysis.
REPLY
Jason Brownlee August 28, 2020 at 6:34 am #
REPLY
Shinichiro Imoto August 31, 2020 at 12:09 pm #
Hi Jason!
I always appreciate your blobs. They help me understand the deep idea of DNN with precious sample
codes.
Now, I’m little struggling with CNN + LSTM model for Multivariate – Multistep time series forecasting
problem.
I experimentally added CNN before LSTM layer and your blob made me notice that I needed
TimeDistributed wrapper to layers before LSTM layers. To do so, I reshaped input as follows, as well
as x validation set.
[Now]
InputLayer(input_shape=(x_train_multi.shape[1],x_train_multi.shape[2],x_train_multi.shape[3]),
batch_size=BATCH_SIZE)
x_train.shape[1]: subsequences (e.g. 600)
x_train.shape[2]: time steps (e.g. 1)
x_train.shape[3]: # of features, No change.,
batch_size: No change.
I adjusted the ratio of [1]:[2], then found 600:1 is the best.
“fit()” works normally and the accuracy is almost the same as before I added CNN.
However, when I enable MaxPooling1D layer after Conv1D, the layer throws ValueError with regards
to input shape. When I delete “padding=”causal”” from Conv1D arg, Conv1D also throws the same
ValueError too.
I’m sorry for this long question, but if you see any wrong part especially about the input shape, please
give me your comment.
Thank you.
REPLY
Jason Brownlee August 31, 2020 at 1:24 pm #
Well done!
Sorry, I don’t have a good off the cuff answer for you, you will need to tune the model for your
problem including ensuring the architecture is a good match for the shape of the data flowing
through the model. I cannot debug the model for you.
REPLY
Shinichiro Imoto September 1, 2020 at 12:39 pm #
I found the cause of this ValueError. It is because the size of Maxpooling1D has to be more
than “timesteps”. As I posted, I reshaped the original time step 600 into 600 x 1 (
subsequences x “timesteps” in [samples, subsequences, “timesteps”, features]).
It has to be 300 x *2*(or more) since the pooling size is *2*.
But no errors do not mean that it is correct. I hope this would work to fit.
REPLY
Jason Brownlee September 1, 2020 at 1:47 pm #
REPLY
Suwei September 9, 2020 at 10:40 pm #
Hi, Jason, thank you for your tutorial. I have a question, I want to predict the flood, and my
data is not continuous, like for the year, 2019, I have the data of part weeks of 5, 8 month, and for the
year 2020, I have data of 3, 6 month. how should I do to make the prediction?
REPLY
Jason Brownlee September 10, 2020 at 6:29 am #
REPLY
Victor September 16, 2020 at 6:52 pm #
Sorry Jason, I have read many times and alongside with some questions people asked
above. I still don’t understand what’s the difference of using a RepeatVector comparing to LSTM with
return_sequence = True? Is there any easy way to understand the major difference? Would like to
understand when each method would be ideal to use.
Much appreciated!
REPLY
Jason Brownlee September 17, 2020 at 6:43 am #
Repeat vector uses the same single output vector from the encoder in the creation of
each output step by the decoder.
Return sequences is the output of each input time step from the encoder.
REPLY
Rudiger September 18, 2020 at 6:28 am #
Hi Jason,
Thank you for this wonderful post!
I have tried out multistep your example “Vector Output Model” with exactly the same numbers, same
code. Some of the important data:
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
n_steps_in, n_steps_out = 3, 2
x_input = array([70, 80, 90])
print(yhat)
[[124.500435 137.70433 ]]
Normally yhat should be close to 100 and 110. Do you have an explanation what is happening or
possibly going wrong?
REPLY
Rudiger September 18, 2020 at 6:40 am #
By the way, I am running Keras ‘2.3.0-tf’. Re-running the model changes the numbers
but it stays far away from 100 and 110.
REPLY
Jason Brownlee September 18, 2020 at 6:52 am #
REPLY
Rudiger September 22, 2020 at 6:48 am #
REPLY
Armin October 6, 2020 at 6:42 am #
Hello Jason,
thank you very much for your great work. Please let me ask you two questions:
a) can I train (fit) a LSTM with a series with e.g. timestep =5, but predict the network with data with
timestep = 1?
b) is there a relationship between quantity of timestep and quantity of hidden layers or neurons per
layer?
Thanks
BR
Armin
REPLY
Jason Brownlee October 6, 2020 at 7:02 am #
I don’t see why not. You might have to re-define the model input layer after it is fit.
Yes, it varies for dataset and models. Run a sensitivity analysis to see how performance varies
with model capacity in your case.
REPLY
Armin October 6, 2020 at 7:39 am #
REPLY
Jason Brownlee October 6, 2020 at 8:01 am #
You’re welcome!
REPLY
jean October 8, 2020 at 6:30 am #
Thanks for the algorithms. I have a question about how optmize the LTSM hyperparameters.
Is there some algorithm that do this?
REPLY
Jason Brownlee October 8, 2020 at 8:36 am #
REPLY
Manoj October 20, 2020 at 9:28 pm #
I’ve difficulty in understanding LSTM input shape. For example. I’ve 50 videos out of these
25 are categorized as Awake (0) and 25 as Drowsy (1). I preprocessed them to extract Eye Aspect
Ratio and Mouth Aspect Ratio as features every second.
I extracted above data from first 3 and 4 seconds of two videos respectively as the length of videos
may be different.
I’ve a very basic question here. How should I feed this data to LSTM? Any code example would be
fine. I know input shape should be [Batch Size, Time Step, Features] but I’m confused how to feed
this to LSTM should I feed each video’s data in a loop.
REPLY
Jason Brownlee October 21, 2020 at 6:40 am #
REPLY
Adrien October 23, 2020 at 1:21 am #
Hello Jason,
Great article!
just a quick question about the split sequence method for Multiple Input Multi-Step Output.
On this line, you select only the first two features in X and the last feature in Y.
seq_x, seq_y = sequences [i: end_ix,: -1], sequences [end_ix-1: out_end_ix, -1]
Thank you
REPLY
Jason Brownlee October 23, 2020 at 6:14 am #
You can structure the prediction problem any way you wish.
REPLY
Adrien October 23, 2020 at 7:06 pm #
You’re welcome.
REPLY
Xu Ji October 29, 2020 at 1:13 pm #
Thanks you very much for this. I learned a lot from you different post, especially LSTM. Just
wondering if you have recommendation using LSTM for anomaly detection? Thank you!
REPLY
Jason Brownlee October 29, 2020 at 1:43 pm #
THanks.
Yes, you could use LSTMs for time series classification, this will help:
https://machinelearningmastery.com/faq/single-faq/how-do-i-model-anomaly-detection
REPLY
yjk November 3, 2020 at 12:33 am #
Thanks for sharing this! I learned really well about LSTM models, and I am wondering why
you used a Vanilla LSTM on ‘Multiple Input Series’ part, and why I cant use other models such as
Stacked LSTM, Bidirectional LSTM, or ConvLSTM. Is it because of the dimensional of input?
REPLY
Jason Brownlee November 3, 2020 at 6:55 am #
In some cases yes, on other cases because one model performs better than the others
for a given dataset.
REPLY
Eugene November 15, 2020 at 1:25 pm #
Thanks for sharing, how would you model a regression problem to predict at a various
arbitrary time steps? For example: predicting a inflection point where we are interested in where
inflection occur and when is the time step it will happen. For example: The next predicted inflection
point at 12345 occur at t+136.
Will it be the same multi time step LSTM model above, or is it a completely different problem, and how
can we approach to this?
e.g. time series classification – is an event expected to occur in the next interval.
or multi-step forecast and use an if-statement to post-process the predictions.
etc.
REPLY
Arslan November 15, 2020 at 11:22 pm #
Currently I am working on a project where I want to predict how many pieces of a material should be
ordered for the next three month.
I have purchasing data of 20,000 materials (different time series) on monthly base which correlate to
eachother ín case of seasonality but have very short time series (50-80 data points).
For example:
date | mat | amount | workload |
2020-08 | A | 20.0 | 0.8
Does it make sense to build a LSTM model for this kind of problem?
As a regressor I could implement the months (for seasonality) and also the workload for this month.
I could train the model with all time series and 80-90% of data points. The other 10-20% for test set)
Maybe another model is better? (S)ARIMA is only a univariate approach, so I can’t implement the
workload.
Thank you!
REPLY
Jason Brownlee November 16, 2020 at 6:26 am #
REPLY
Arslan December 3, 2020 at 3:46 am #
REPLY
Jason Brownlee December 3, 2020 at 8:23 am #
REPLY
Konstantinos November 25, 2020 at 7:42 am #
Is the model trained only with training data or for every prediction the actual data of the
prediction is added to trainig data and the model retrained?
REPLY
Jason Brownlee November 25, 2020 at 7:53 am #
You can re-train the model as new observations become available if you like – both in
walk forward validation and when the model is deployed.
REPLY
Tiago November 29, 2020 at 10:19 pm #
I am using PyTorch instead of Keras and would like to reproduce your vanilla LSTM. Could you
please explain more about what is the ‘input’ parameter of the LSTM?
Thanks!
REPLY
Jason Brownlee November 30, 2020 at 6:36 am #
Thanks.
I am trying to build and LSTM for a time series data, unfortunately i am unable to reshape a 4D input
data into 3D input data to fit my LSTM model. do you know how is this possible?
REPLY
Jason Brownlee December 7, 2020 at 7:38 am #
REPLY
Ítalo Romani December 8, 2020 at 11:34 am #
Hi Jason, I am trying to develop a custom loss function for my LSTM mode which was based
on yours, like:
model = Sequential()
model.add(LSTM(neurons, activation=’relu’, input_shape=(n_steps_in, n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(neurons, activation=’relu’, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer=’adam’, loss=my_loss,run_eagerly=True)
The custom loss my_loss function receives from Keras the parameters (y_true,y_pred), in my
understanding they should have shape like input_shape. However, regardless of the input shape I
use, y_true comes with shape (32,1,1), even if I remove all layers and leave a bare Sequential model.
I am trying to understand the logic of this, googled around but so far nothing helped me to explain
this.
REPLY
Jason Brownlee December 8, 2020 at 1:31 pm #
REPLY
Ítalo Romani December 8, 2020 at 12:16 pm #
Actually I made some confusion in this question. Trying to explain again: imagine that I am
trying to predict a series with a single (1-dimensional) value each time step, so y_true, y_pred should
have shape (total_time_steps,1). However I always get shape (32,1,1), with values that have no
remembrance to the actual values.
REPLY
Jason Brownlee December 8, 2020 at 1:32 pm #
Sorry, I don’t understand what you’re asking exactly. Perhaps you can rephrase.
REPLY
Italo Romani December 8, 2020 at 9:14 pm #
Hi Jason, thanks so much for the reply. I did a further search and found the answers
to my problem. First, y_true, y_pred come with sizes defined by batch_size; second, by
default, their values come shuffled, so I have to use shuffle=False. Third, and most important,
I don’t know if what I am trying to do is even possible with Keras because all the operations in
the loss function have to use tensor operations, otherwise the loss function cannot provide
gradients to the optimiser. My intended loss function goes sequentially over each element of
y_true and y_pread, compares each pair and updates an accumulation function not definable
by custom algebraic/symbolic functions. It’s a bit large and too specific to share here, but if
you are interested I can share the details of what I am trying to do.
REPLY
Italo Romani December 8, 2020 at 9:23 pm #
Perhaps there’s an optimiser in Keras that does not require gradients, but not
that I know of
REPLY
Jason Brownlee December 9, 2020 at 6:18 am #
Well done!
REPLY
Jam December 9, 2020 at 11:48 am #
Hi Jason, I learn a lot from your articles. Could you please help on a network. I have an input
of presumably (4, 10, 2). [(10,2) are time steps and features, respectively.] There are a lot of data in
such a shape and for each one I propose to train a lstm and then make a Convolution layer among
them. So by an Conv1D(1), I expect the output (3, 10, 2).
please correct me if I am wrong. I reshaped data into (1, 4,10,2). Then I used TimeDistributed
wrapper for prediction. but then I am not able to make a convolution on shape[0] (I mean 4). what is
get is convolution on the shape[2] (I mean 2). can you help me how to arrange data for the network or
whether my network is true or not?
REPLY
Jason Brownlee December 9, 2020 at 1:26 pm #
Typically you would use a CNN than an LSTM, not the othe
I have not tried LSTM-CNN, but I expect it would be challenging and you may need to debug the
model yourself.
REPLY
John White December 25, 2020 at 2:42 pm #
Hello Jason!
First off, thanks for being here for my machine learning journey! So I have a base scenario to check
for understanding:
Context: I have supervised binary classification dataset on weather temperatures with 4 features.
Target variable at time t is 0 or 1. 0 is if temperature at t+30 is down, 1 if up.
Framing the Problem for LSTM: Say timesteps is 60. So we take the previous 60 timesteps of data to
predict 0 or 1 for t+1. In doing so, we can predict if the weather temperature is up or down in 30 days.
Input shape would be (60, 4). I would have to chunk the training dataset and reshape it to be
compatible with the (60, 4) input shape.
REPLY
Jason Brownlee December 26, 2020 at 5:09 am #
You’re welcome.
REPLY
John White December 26, 2020 at 8:52 pm #
Sweet! Thanks
Follow up question: Is there an article that can provide guidelines in order to build a good
base LSTM model (ie adding layers, how many layers, stacked or vanilla, # of units in each
layer, etc)?
I feel like I’m shooting in the dark with loss values and accuracy not changing at all for every
epoch.
REPLY
Jason Brownlee December 27, 2020 at 5:01 am #
Dear Jason,
The tutorial was really useful as ever is.
But I have not seen in your tutorials that you applied any **Bilstm** network for regression to predict
** Multivariate and Multi-step ahead** data.
model = Sequential()
model.add(Bidirectional(LSTM(200, return_sequences=True), activation=’relu’, input_shape=
(n_steps_in, n_features)))
model.add(RepeatVector(n_steps_out))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(100, activation=’relu’, return_sequences=False)))
model.add(Dense(3))
model.add(TimeDistributed(Dense(n_features)))
model.compile(optimizer=’adam’, loss=’mse’)
1- Is it common to use **Bilstm** for regression in the case of ** Multivariate and Multi-step ahead**??
2- what is the best model for regression in the case of ** Multivariate and Multi-step ahead**??
I am really sorry for writing too much, but I am really looking forward to get anwer.
Best
Mary
REPLY
Jason Brownlee December 27, 2020 at 6:14 am #
No, typically bidirectional LSTMs are not used in the encoder-decoder architecture, but I
don’t see any reason why they couldn’t be used.
We cannot know the best model for a given dataset, the job of a machine learning practitioner is
to use careful experiments and discvoer what works well or best.
REPLY
Mary December 27, 2020 at 7:16 am #
Dear Jason,
I really appreciate your quick reply.
( I am a beginner in using Bilstm in regression, so I am not sure whether I made the layers correctly or
not)?
model = Sequential()
model.add(Bidirectional(LSTM(200, return_sequences=True), activation=’relu’, input_shape=
(n_steps_in, n_features)))
model.add(RepeatVector(n_steps_out))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(100, activation=’relu’, return_sequences=False)))
model.add(Dense(3))
model.add(TimeDistributed(Dense(n_features)))
model.compile(optimizer=’adam’, loss=’mse’)
2- Is it common to use **Bilstm** for regression in the case of ** Multivariate and Multi-step ahead**??
I am really looking forward to see your clear answer, as I did not get the mean of your previous
answer.
Best
Mary
REPLY
Jason Brownlee December 27, 2020 at 9:25 am #
I don’t have the capacity to review and comment on your model architecture:
https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
LSTMs are used for sequence prediction, not regression. In numeric sequence prediction,
bidirectional are rarely used – but if it gives the best results for your dataset, then use it.
REPLY
Mary December 28, 2020 at 4:48 am #
Dear Jason,
I am grateful for your reply.
As you mentioned “LSTMs are used for sequence prediction, not regression”,
so do you have a tutorial post to introduce the best techniques for numeric sequence
regression?
Best
Mary
Jason Brownlee December 28, 2020 at 6:04 am #
“sequence prediction” or “sequence regression” are the same kind of thing. The above
examples fall into this.
Dear Jason,
Thank you a lot for the time you spent answering that much clearly.
Best
Mary
You’re welcome.
REPLY
Devendra January 4, 2021 at 1:30 am #
i want to make an earthquake prediction using rnn (LSTM).I am getting difficulty to code . can
you please help me
REPLY
Jason Brownlee January 4, 2021 at 6:09 am #
Is there a specific problem you are having that I can perhaps address?
REPLY
Devendra January 4, 2021 at 3:25 pm #
can i get it from those code you have provided here? If yes then which LSTM should
i follow?
Perhaps you can start with a model listed above and adapt it for your specific dataset.
REPLY
Bensayah Abdallah January 8, 2021 at 5:25 am #
Many thanks
This is my situation: I have several companies. According to 20 measurable features varying from
year to year. For ten years, we have a binary classification Fail/Succes.
My question is what model adequate for this problem to train the machine to predict a probable
success or failure of a given company with its successive given features?
Many thanks
REPLY
Jason Brownlee January 8, 2021 at 5:47 am #
Perhaps try a suite of data preparation methods, models and model configurations in
order to discover what works well or best for your dataset.
REPLY
Bensayah Abdallah January 8, 2021 at 10:05 am #
Many thanks
A googling led to your article
https://machinelearningmastery.com/how-to-develop-baseline-forecasts-for-multi-site-
multivariate-air-pollution-time-series-forecasting/
Is it helpful?
REPLY
Jason Brownlee January 8, 2021 at 1:20 pm #
I think only you can comment whether you find the linked article helpful.
REPLY
Hnin January 9, 2021 at 8:10 pm #
Thanks Jason for the insights. I have one question regarding Convo_LSTM. It can extract the
spatio-temporal features. How can we input the spatio data? Do we have the examples code of it?
REPLY
Jason Brownlee January 10, 2021 at 5:39 am #
REPLY
Hnin January 13, 2021 at 6:29 pm #
Do you mean to extract spatio temporal feature , we need to input a sequence of images as
input rather than a sequence of values?
REPLY
Jason Brownlee January 14, 2021 at 6:12 am #
Yes.
REPLY
Hyundong January 11, 2021 at 7:44 pm #
Thank you for your great tutorial. I learned through this article but I have a question about the
number of samples.
If [10, 20, 30, 40, 50, 60, 70, 80, 90] is one sample, I have about 10,000 samples that each sample is
independent and has the same characteristics.
For instance, [10.01, 20, 30.035, 40.102, 50.1, 60, 70.364, 80.112, 90.623], [10.541, 20.983, 30.097,
40.152, 50.2, 60.942, 70.73, 80, 90.53], [10.543, 20.486, 30.897, 40, 50.766, 60.519, 70.132, 80.11,
90.445], …
In this case, I am wondering if there is a way to apply all 10,000 samples to training the model.
Thank you.
REPLY
Jason Brownlee January 12, 2021 at 7:50 am #
REPLY
Yilma January 12, 2021 at 12:50 am #
Dear Jason,
I am not familiar with python. Do you have this tutorial in R?
Best
Yilma
REPLY
Jason Brownlee January 12, 2021 at 7:52 am #
Sorry I do not.
REPLY
Alex January 28, 2021 at 2:06 am #
Hi Jason
regarding the case ‘Multiple Parallel Series’… my problem has 150k time series, and for each I need
to predict the future value.
I guess this means that I will have 150k features.
So the input array for my LSTM NN will have dimensions [n_samples, n_steps, 150k].
The size of the array is too large! I get the error:
‘Unable to allocate 606. MiB for an array with shape (365, 3, 150000) and data type float32’.
What should I do? is this the right way to approach the problem?
Many thanks!
REPLY
Jason Brownlee January 28, 2021 at 6:06 am #
Also, you may want to work with a smaller sample of the data initially or run on a large machine
like an AWS EC2 instance with tons of RAM.
REPLY
Alex January 29, 2021 at 2:21 am #
Hi Jason
Thanks
REPLY
Jason Brownlee January 29, 2021 at 6:07 am #
Ideally, data scaling is fit on training examples and applied to train and test examples.
REPLY
Raheel Anjum February 8, 2021 at 1:51 pm #
Hi Jason,
I am intending to do my research work in electricity prices forecasting. I have electricity price data of 6
years from 2012 to 2017. And I need to forecast the value for 2017 using NEURAL NETWORK AUTO
REGRESSIVE in R.The data set ranges from January 1st 2012 to December 31th 2017 (52608
observations, covering 2192 days). Each day of the data set comprises 24 observations, where each
observation corresponds to a load period. For modeling and forecasting purposes,the data set is
further divided into two sets:January 1st 2012 to December 31th 2016 (43848 observations, covering
1827 days) for identification and estimation of the models, and January 1st 2017 to December 31th
2017 (8760 observations, covering 365 days) for evaluating one-day-ahead out-of-sample forecasting
accuracy of the models. I need your help. I’ve tried searching but couldn’t find a specific code of one
day ahead forecasting with NN-AR .Can you kindly send me the code of neural network
autoregression to make forecasts for one-day-ahead out-of-sample forecasting for the complete year
2017. I will be highly obliged for this favor. Thank you and have a nice day.
REPLY
Jason Brownlee February 9, 2021 at 6:30 am #
This will help you understand how to make out of sample predictions:
https://machinelearningmastery.com/make-predictions-long-short-term-memory-models-keras/
REPLY
Nigel February 14, 2021 at 12:32 pm #
Hi Jason,
About the tutorial. You’ve stacked the output sequence with the input sequence, and I’m trying to
understand how it differentiates x from y.
Let’s say, I have 10 input_seq and 1 out_seq how would you approach this?
I tried it myself with some random numbers, but the code predicts all values along the x-axis, which
takes forever with LSTM. Should I stack the output)seq at the end of the input_seq’s.
Thanks in advance!
REPLY
Jason Brownlee February 14, 2021 at 2:17 pm #
Thanks.
They are past observations of the target that we believe will help to predict future values of the
target.
REPLY
Robet February 16, 2021 at 8:08 am #
Hello I am new to machine learning and trying to wrap my head around some of the
examples to find the best use cases for each.
In the section on ‘Multiple Parallel Series’ is this procesessed as multiple paralel univariable
predictions or multiple multivariable predictions?
I am looking for a solution where it is the later. I was considering creating seperate multivariable
models for each output but wondering if the parallel series might be the better way to go.
REPLY
Jason Brownlee February 16, 2021 at 10:04 am #
Multiple parallel univariate time series, which is a multivariate input time series.
Perhaps experiment with a few of the approaches and see what is a good fit for your data.
REPLY
Gerard Church February 16, 2021 at 9:54 am #
Hi Jason,
Really great and informative article. My first time working with LSTMs but the input format is really
clear and has been easy to understand.
I am trying to adapt this to an a problem I am trying to solve. I am trying to predict net income from a
financial income statement from 31 balance sheet and income statement items. I am using 3 years of
quarterly data to predict this, thus a time step of 12. For each yhat, my x_train contains 12 lists for
each quarter that contains the 31 independent balance sheet/ income statement variables being used
to try and predict my yhat.
Thus due to the fact my y_train has a length of 63, my input data is 63 x 12 x 31. This is stored at a
list of arrays, each with 12 lists containing the 31 variables values for each quarter. The LSTM model
really doesn’t like this format and gives the error:
ValueError: Failed to find data adapter that can handle input: ( containing values of types {“”}), (
containing values of types {“”})
Do you have any advice as to how to format this input into my LSTM? Hope the question is clear and
thanks for the help!
REPLY
Jason Brownlee February 16, 2021 at 10:05 am #
Thanks!
REPLY
Anshuka Anshuka February 24, 2021 at 3:05 pm #
Hi Jason,
Say for example I have two sets of multivariate datasets with parallel input series in both.
How can we use dataset (X) which is multivariate and has parallel input time series , to predict
dataset (Y), which again is a multivariate dataset with parallel input series.
REPLY
Jason Brownlee February 25, 2021 at 5:23 am #
The above examples under “Multivariate LSTM Models” can be used as a starting point
and adapted directly.
REPLY
Alireza February 26, 2021 at 3:01 am #
Hi Jason,
Thanks
REPLY
Jason Brownlee February 26, 2021 at 5:04 am #
Yes many, you can use the search box at the top of the page.
REPLY
H. March 2, 2021 at 6:57 am #
REPLY
Jason Brownlee March 2, 2021 at 8:14 am #
Sorry to hear that you’re having trouble, are you able to check that you are using the
latest version of Python, Keras and TensorFlow.
REPLY
Martin March 2, 2021 at 5:22 pm #
Hello Jason,
Thanks for this brilliant blog post. it has really been helpful to me.
However, I have got a real-world Spatio-temporal traffic dataset and I reckon that the procedure to
model it as a supervised learning problem would be quite different from multivariate time series (as
the order of the spatial variables matter).
T1 T2 T3 T4 T5 T6
S1 | 67 | 34 | 24 | 54 | 49 | 67 |
S2 | 61 | 55 | 23 | 42 | 53 | 78 |
S3 | 74 | 83 | 55 | 50 | 62 | 68 |
S4 | 48 | 73 | 78 | 56 | 61 | 78 |
S5 | 80 | 58 | 67 | 54 | 51 | 89 |
where the rows represent the spatial identity (the position of the detectors) and the columns represent
the time interval for collection of the data.
In formulating this as a supervised learning problem with 5 time-step per sample and 1 step prediction
made at S3, would this be a logical formulation?
Input:
T1 T2 T3 T4 T5
S1 | 67 | 34 | 24 | 54 | 49 |
S2 | 61 | 55 | 23 | 42 | 53 |
S3 | 74 | 83 | 55 | 50 | 62 |
S4 | 48 | 73 | 78 | 56 | 61 |
S5 | 80 | 58 | 67 | 54 | 51 |
Output:
68
Also, Since I am working with a real Spatio-temporal dataset, do I need to split the samples into
subsequences when using the ConvLSTM module?
If No, for the example above, would this input to the ConvLSTM be correct:
[no of samples, time-step=5, rows=spatial, columns=temporal, features=1]
REPLY
Jason Brownlee March 3, 2021 at 5:26 am #
It’s hard to be prescriptive, perhaps experiment and see what works/makes sense for
your dataset.
REPLY
Rigveda Sengupta March 3, 2021 at 11:25 pm #
Hi, just a quick question I am working with a multiple multivariate timeseries. Will the
structure remain the same as the Multiple Input Series model discussed above?
REPLY
Jason Brownlee March 4, 2021 at 5:49 am #
REPLY
Zhou March 5, 2021 at 11:45 am #
Hi Jason,
Thank you for such an informative tutorial.
But I’m having problems using the convLSTM module for multivariate time series prediction. I hope
you can answer this for me, it is really important for me and I would appreciate it.
My topic is to learn the train dataset to perform outlier detection on the test dataset. If the test set has
no outliers, the convLSTM module can predict well. However, when I add outliers, the predictions
change and I can’t do outlier detection.I can’t explain it very well.
Ideally, the prediction generated by learning the train set would be [8, 9, 10, 11, 12, 13, 14], which is
used to prove that there are 2 outliers in my test set.But the real situation is that I get predictions
similar to[8, 9, 10, 11, 11.1241, 11.3661, 14].
Questions:
1). The data in the prediction set and the test set are too close to each other, so I can’t do outlier
detection.
2). How to use the convLSTM module to perform multi-step prediction for multivariate sequences?
Because I guess the reason for the first problem is that I am using the convLSTM module for single-
step time series prediction.
REPLY
Jason Brownlee March 5, 2021 at 1:38 pm #
You’re welcome.
Sorry, I don’t understand your first question sorry. Outlier detection would probably occur prior to
modeling as a data prep step.
You can perform multi-step prediction a few ways – all described above, e.g. vector out for an
encoder-decoder model each time step.
REPLY
ZHAO, WENYU March 8, 2021 at 8:47 pm #
Hi Jason!
I would like to ask another question. After training a mulitivate lstm model, how do we know if the
model is good or not?
REPLY
Jason Brownlee March 9, 2021 at 5:17 am #
You can evaluate the performance of the model on a hold out dataset and calculate a
metric, then compare the metric to other methods including a naive method. This will help:
https://machinelearningmastery.com/faq/single-faq/how-to-know-if-a-model-has-good-
performance
REPLY
ZHAO, WENYU March 9, 2021 at 1:19 pm #
Hi Jason!
Thanks for your quick reply! However, I probably did not make myself understood. I was
struggling with how to know the test error of my multivariate lstm model…
REPLY
Jason Brownlee March 10, 2021 at 4:36 am #
You can calculate test error for an LSTM using walk-forward validation described
in this tutorial:
https://machinelearningmastery.com/backtest-machine-learning-models-time-series-
forecasting/
You’re welcome.
Thanks.
REPLY
Aditya March 12, 2021 at 6:35 am #
Hi Jason,
I’m currently working on stock price prediction. As of now, I’ve used historical data of the end-of-day
‘Closing’ prices ONLY as univariate sequences. My aim is further improve the model by giving it more
than just old ‘closing’ prices. I want to give it open, high and low too. From your article, I could
understand that I can achieve this using Multivariate sequences. I have gained so much knowledge
from this and I can make my project even better.
Thanks a lot! I would be really happy if you can give me some tips!
REPLY
Jason Brownlee March 12, 2021 at 7:59 am #
REPLY
Aditya March 12, 2021 at 5:51 pm #
Yes, that’s true. Stock prices cannot be predicted but I’m trying to at least achieve a
fair/legit movement in the chart for the future (at the back of my mind I feel, even this is not
completely achievable).
Should I continue with this project or just be honest with my faculty saying it is not
achievable?
REPLY
Jason Brownlee March 13, 2021 at 5:27 am #
You have the freedom to choose what you work on, I don’t want to interfere with
that!
Hey Jason,
So, I’m using the Multistep Univariate method for my problem. Instead of splitting into
only X, y; I have split the data into X_train, y_train & X_test, y_test. It’s obvious that
X_test will be used to test the model and compare it to y_test for ERROR computing.
For example,
If my y_test looks like [[10],[20],[15],[21]…….[34]]
and my predicted y_test looks like [[11[,[19],[17],[20]…….[34]]
Collect predictions in a list or array and then calculate the error with
predictions vs expected values.
Now I know how to develop a Multivariate Multistep forecasting model for the hourly weather
forecasting task.
But in case we are also given a day ahead “weather guess” dataset, how can I use these guessed
values in a model? do you know any tutorial or blog post?
Leave a Reply
Name (required)
Website
SUBMIT COMMENT
Welcome!
I'm Jason Brownlee PhD
and I help developers get results with machine learning.
Read more
How to Develop Convolutional Neural Network Models for Time Series Forecasting
The Deep Learning for Time Series EBook is where you'll find the Really Good stuff.