A Gentle Introduction To LSTM Autoencoders

10/22/2020 A Gentle Introduction to LSTM Autoencoders
Never
 miss a tutorial:
Navigation
Picked for you:
How to Reshape Input Data for Long

Click to Take the FREE LSTMs Crash-Course
Short-Term Memory Networks in Keras
Search... 
How to Develop an Encoder-Decoder
A Gentle Introduction to LSTM Autoencoders

Model for Sequence-to-Sequence
Prediction in Keras
by Jason Brownlee on November 5, 2018 in Long Short-Term Memory Networks

Tweet ModelShare
with Attention in Keras
Share
Last Updated on August 27, 2020

How to Use the TimeDistributed Layer in
An LSTMKeras
Autoencoder is an implementation of an autoencoder for sequence data using an Encoder-
Decoder LSTM architecture.
Once fit, A
the encoder
Gentle part oftothe
Introduction model can be used to encode or compress sequence data that in turn
LSTM
may be used in data visualizations or as a feature vector input to a supervised learning model.
Autoencoders
In this post, you will discover the LSTM Autoencoder model and how to implement it in Python using
Keras.
Loving the Tutorials?
After reading this post, you will know:
The LSTMs with Python EBook is
where you'll find the Really Good stuff.
Autoencoders are a type of self-supervised learning model that can learn a compressed
representation of inputINSIDE
>> SEE WHAT'S data.
LSTM Autoencoders can learn a compressed representation of sequence data and have been
used on video, text, audio, and time series sequence data.
How to develop LSTM Autoencoder models in Python using the Keras deep learning library.
Kick-start your project with my new book Long Short-Term Memory Networks With Python, including
step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
https://machinelearningmastery.com/lstm-Autoencoders/ 1/74
Never miss a tutorial:
Picked for you:


Prediction in Keras
Start Machine Learning ×
How to Develop an Encoder-Decoder You can master applied Machine Learning
Model with Attention in Keras without math or fancy degrees.
Find out how in this free and practical course.

Email Address
Keras
START MY EMAIL COURSE

A Gentle Introduction to LSTM
Autoencoders A Gentle Introduction to LSTM Autoencoders
Photo by Ken Lund, some rights reserved.
Overview
This postThe
is divided intoPython
LSTMs with six sections;
EBook isthey are:
1. What Are Autoencoders?
2. A Problem withWHAT'S
>> SEE Sequences
INSIDE
3. Encoder-Decoder LSTM Models
4. What Is an LSTM Autoencoder?
5. Early Application of LSTM Autoencoder
6. How to Create LSTM Autoencoders in Keras
Start Machine Learning
What Are Autoencoders?
An autoencoder is a neural network model that seeks to learn a compressed representation of an input.
They are an unsupervised learning method, although technically, they are trained using supervised
learning methods, referred to as self-supervised. They are typically trained as part of a broader model
that attempts to recreate the input.
For example:
1 X = model.predict(X)
The design of the autoencoder model purposefully makes this challenging by restricting the architecture
to a bottleneck at the midpoint of the model, from which the reconstruction of the input data is
performed.
There are many types of autoencoders, and their use varies, but perhaps the more common use is as a
learned or
Picked forautomatic
you: feature extraction model.
Howonce
In this case, to Reshape Input is
the model Data
fit, for
theLong
reconstruction aspect of the model can be discarded and the
model up to the point of the bottleneck can be used. The output of the model at the bottleneck is a fixed
length vector that provides a compressed representation of the input data.
Input dataHow to Develop

from an Encoder-Decoder
the domain can then be provided to the model and the output of the model at the
bottleneck can be used as a feature vector in a supervised learning model, for visualization, or more
Prediction in Keras
generally for dimensionality reduction. Start Machine Learning ×
A Problem with inSequences
Model with Attention Keras without math or fancy degrees.
Sequence prediction problems are challenging, not least because the length of the input sequence can
vary. How to Use the TimeDistributed Layer in
Email Address
Keras
This is challenging because machine learning algorithms, and neural networks in particular, are
designed to work with fixed length inputs. START MY EMAIL COURSE
Another challenge with sequence data is that the temporal ordering of the observations can make it
Autoencoders
challenging to extract features suitable for use as input to supervised learning models, often requiring
deep expertise in the domain or in the field of signal processing.
Loving
Finally, many the Tutorials?
predictive modeling problems involving sequences require a prediction that itself is also a
sequence. These are called sequence-to-sequence, or seq2seq, prediction problems.
You can learn more about sequence prediction problems here:
>> SEE WHAT'S INSIDE
Making Predictions with Sequences
Encoder-Decoder LSTM Models

Recurrent neural networks, such as the Long Short-Term Memory, or LSTM, network are specifically
designed to support sequences of input data.
They are capable of learning the complex dynamics within the temporal ordering of input sequences as
well as use an internal memory to remember or use information across long input sequences.
The LSTM network can be organized into an architecture called the Encoder-Decoder LSTM that allows
the model to be used to both support variable length input sequences and to predict or output variable
length output sequences.
This architecture is the basis for many advances in complex sequence prediction problems such as
speech recognition and text translation.
In this architecture, an encoder LSTM model reads the input sequence step-by-step. After reading in
the entire input sequence, the hidden state or output of this model represents an internal learned
representation of the entire input sequence as a fixed-length vector. This vector is then provided as an
input to the decoder model that interprets it as each step in the output sequence is generated.
You can for

Picked learn more about the encoder-decoder architecture here:
you:
How to ReshapeLong
Encoder-Decoder Input Short-Term
Data for LongMemory Networks
What Is an LSTM Autoencoder?

An LSTMHow to Develop an Encoder-Decoder
Autoencoder is an implementation of an autoencoder for sequence data using an Encoder-
Decoder LSTM architecture.
Prediction in Keras
For a given dataset of sequences, an encoder-decoder LSTM is configured to read the input sequence,
encode it,How to Develop
decode anrecreate
it, and Encoder-Decoder
it. The performanceYou can model
of the master applied Machine
is evaluated Learning
based on the model’s
ability to recreate the input sequence.
Once the model achieves a desired level of performance recreating the sequence, the decoder part of
the model may be removed, leaving just the encoder model. This model can then be used to encode
Email Address
Keras
input sequences to a fixed-length vector.

The resulting vectors can then be used in a variety of applications, not least as a compressed
representation of the sequence as an input to another supervised learning model.
Autoencoders
Early Application of LSTM Autoencoder

One of the Loving
early andthe Tutorials?
widely cited applications of the LSTM Autoencoder was in the 2015 paper titled
“Unsupervised Learning of Video Representations using LSTMs.”
LSTM Autoencoder Model

Taken from “Unsupervised Learning of Video Representations using LSTMs”
In the paper, Nitish Srivastava, et al. describe the LSTM Autoencoder as an extension or application of
the Encoder-Decoder LSTM.
They use the model with video input data to both reconstruct sequences of frames of video as well as to
predict frames of video, both of which are described as an unsupervised learning task.
Picked for you:
The input to the model is a sequence of vectors (image patches or features). The encoder
 How to Reshape Input Data for Long
LSTM reads in this sequence. After the last input has been read, the decoder LSTM takes
over and outputs a prediction for the target sequence.
How to Develop
— Unsupervised an Encoder-Decoder
Learning of Video Representations using LSTMs, 2015.
Prediction
More than in Keras
simply using the model directly, the authors explore some interesting architecture choices
that may help inform future applications of the model.
They designed the Attention
Model with model ininsuch
Kerasa way as to recreatewithout
the target
mathsequence of video frames in reverse
or fancy degrees.
order, claiming that it makes the optimization problemFind
solved by the
out how model
in this more
free and tractable.
practical course.

Email Address
The target sequence is same as the input sequence, but in reverse order. Reversing the
 Keras
target sequence makes the optimization easier because the model can get off the ground by
looking at low range correlations. START MY EMAIL COURSE
Autoencoders
— Unsupervised Learning of Video Representations using LSTMs, 2015.
They also explore two approaches to training the decoder model, specifically a version conditioned in
the previous output generated by the decoder, and another without any such conditioning.

The decoder can be of two kinds – conditional or unconditioned. A conditional decoder
 where you'll find the Really Good stuff.
receives the last generated output frame as input […]. An unconditioned decoder does not
receive that input.
A more elaborate autoencoder model was also explored where two decoder models were used for the
one encoder: one to predict the next frame in the sequence and oneLearning
Start Machine to reconstruct frames in the
sequence, referred to as a composite model.
… reconstructing the input and predicting the future can be combined to create a composite
 […]. Here the encoder LSTM is asked to come up with a state from which we can both
predict the next few frames as well as reconstruct the input.
Picked for you:


Prediction in Keras

Email Address
Keras

Autoencoders

LSTM Autoencoder Model With Two Decoders

Taken from “Unsupervised Learning of Video Representations using LSTMs”
The models were evaluated in many ways, including using encoder Learning
Start Machine to seed a classifier. It appears that
rather than using the output of the encoder as an input for classification, they chose to seed a
standalone LSTM classifier with the weights of the encoder model directly. This is surprising given the
complication of the implementation.
We initialize an LSTM classifier with the weights learned by the encoder LSTM from this
 model.
The composite model without conditioning on the decoder was found to perform the best in their
experiments.

The best performing model was the Composite Model that combined an autoencoder and a
 future predictor. The conditional variants did not give any significant improvements in terms
of classification accuracy after fine-tuning, however they did give slightly lower prediction
errors.
Picked for you:

Short-Term
Many other Memory
applications of Networks
the LSTM in Autoencoder
Keras have been demonstrated, not least with sequences
of text, audio data and time series.

How to Create
Model LSTM Autoencoders in Keras
for Sequence-to-Sequence
Prediction in Keras
Creating an LSTM Autoencoder in Keras can be achieved Startby Machine
implementingLearning
an Encoder-Decoder LSTM ×
architecture and configuring the model to recreate the input sequence.
Model with Attention in Keras
Let’s look at a few examples to make this concrete. without math or fancy degrees.
Reconstruction LSTM Autoencoder

Email Address
Keras
The simplest LSTM autoencoder is one that learns to reconstruct each input sequence.
For these demonstrations, we will use a dataset of one START

sampleMY
ofEMAIL COURSE
nine time steps and one feature:
1 [0.1,Autoencoders
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
We can start-off by defining the sequence and reshaping it into the preferred shape of [samples,
timesteps, features].
1 # define input sequence
2 The LSTMs
sequence with Python0.2,
= array([0.1, EBook is 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
0.3,
3 # reshape input into [samples, timesteps, features]
4 n_in = len(sequence)
5 sequence = sequence.reshape((1, n_in, 1))
Next, we can define the encoder-decoder LSTM architecture that expects input sequences with nine
time steps and one feature and outputs a sequence with nine time steps and one feature.
1 # define model
2 model = Sequential()
3 model.add(LSTM(100, activation='relu', input_shape=(n_in,1)))
4 model.add(RepeatVector(n_in))
5 model.add(LSTM(100, activation='relu', return_sequences=True))
6 model.add(TimeDistributed(Dense(1)))
7 model.compile(optimizer='adam', loss='mse')
Next, we can fit the model on our contrived dataset.
1 # fit model
2 model.fit(sequence, sequence, epochs=300, verbose=0)
The complete example is listed below.
The configuration of the model, such as the number of units and training epochs, was completely
arbitrary.
1 # lstm autoencoder recreate sequence

Never miss
2 from a tutorial:
numpy import array
3 from keras.models import Sequential
4 from keras.layers import LSTM
5 from keras.layers import Dense
6 from keras.layers import RepeatVector
7 from keras.layers import TimeDistributed
8 fromfor
Picked keras.utils
you: import plot_model
10 sequence = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
How to input
11 # reshape Reshape Input[samples,
into Data for Long
timesteps, features]
12 n_inShort-Term
= len(sequence)
Memory Networks in Keras
14 # define model
Prediction in Keras
20 model.compile(optimizer='adam', loss='mse') Start Machine Learning ×
21 # fit model
23 plot_model(model, show_shapes=True,
Model with Attention in Keras to_file='reconstruct_lstm_autoencoder.png')
without math or fancy degrees.
24 # demonstrate recreation
25 yhat = model.predict(sequence, verbose=0) Find out how in this free and practical course.
26 print(yhat[0,:,0])
Running Keras Email
the example fits the autoencoder and prints the Address
reconstructed input sequence.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical
A Gentle precision.
Introduction to LSTMConsider running the example a few times and compare the average
outcome.Autoencoders
The results are close enough, with very minor rounding errors.
1 [0.10398503 0.20047213
Loving 0.29905337 0.3989646 0.4994707 0.60005534
the Tutorials?
2 0.70039135 0.80031013 0.8997728 ]
A plot of the architecture is created for reference.
Picked for you:


Prediction in Keras

Email Address
Keras

Autoencoders
LSTM Autoencoder for Sequence Reconstruction
Prediction LSTM Autoencoder

We can modify the reconstruction LSTM Autoencoder to instead predict the next step in the sequence.
In the case of our small contrived problem, we expect the output to be the sequence:
1 [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
This means that the model will expect each input sequence to have nine time steps and the output
sequence to have eight time steps.

2 n_in = len(seq_in)
3 seq_in = seq_in.reshape((1, n_in, 1))
4 # prepare output sequence
5 seq_out = seq_in[:, 1:, :]
6 n_out = n_in - 1
1 # lstm autoencoder predict sequence

2 from numpy import array
8 from keras.utils import plot_model
10 seq_in = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
Never miss a input
11 # reshape tutorial:
into [samples, timesteps, features]
15 seq_out = seq_in[:, 1:, :]
16 n_out = n_in - 1
17 # define
Picked model
for you:
20 model.add(RepeatVector(n_out))
21 model.add(LSTM(100,
Short-Term Memoryactivation='relu',
Networks in Keras return_sequences=True))
24 plot_model(model, show_shapes=True, to_file='predict_lstm_autoencoder.png')
25 # fit model
26 model.fit(seq_in, seq_out, epochs=300, verbose=0)
27 # demonstrate prediction
28 yhatPrediction in Keras
= model.predict(seq_in,
29 print(yhat[0,:,0])
verbose=0)
Running How to Developprints
the example an Encoder-Decoder
the output sequence that You
predicts the next
can master timeMachine
applied step for each input time
Learning
step. Model with Attention in Keras without math or fancy degrees.
How
differences in tonumerical
Use the TimeDistributed Layer in running the example a few times and compare the average
precision. Consider Email Address
Keras
outcome.

We can see that the model is accurate, barring some minor rounding errors.
Autoencoders
1 [0.1657285 0.28903174 0.40304852 0.5096578 0.6104322 0.70671254
2 0.7997272 0.8904342 ]


Picked for you:


Prediction in Keras

Email Address
Keras

Autoencoders
LSTM Autoencoder for Sequence Prediction
Composite LSTM Autoencoder

Finally, we can create a composite LSTM Autoencoder that has a single encoder and two decoders,
one forwhere
reconstruction
you'll find theand oneGood
Really for prediction.
stuff.
We can implement this multi-output

>> SEE WHAT'S INSIDE model in Keras using the functional API. You can learn more about
the functional API in this post:
How to Use the Keras Functional API for Deep Learning
First, the encoder is defined.

1 # define encoder
2 visible = Input(shape=(n_in,1))
3 encoder = LSTM(100, activation='relu')(visible)
Then the first decoder that is used for reconstruction.
1 # define reconstruct decoder

2 decoder1 = RepeatVector(n_in)(encoder)
3 decoder1 = LSTM(100, activation='relu', return_sequences=True)(decoder1)
4 decoder1 = TimeDistributed(Dense(1))(decoder1)
Then the second decoder that is used for prediction.
1 # define predict decoder

2 decoder2 = RepeatVector(n_out)(encoder)

Never miss =a TimeDistributed(Dense(1))(decoder2)
4 decoder2 tutorial:
We then tie the whole model together.
1 # tie it together
2 model = Model(inputs=visible, outputs=[decoder1, decoder2])
Picked for you:
1 Short-Term
# lstm Memory
autoencoder Networks in Keras
reconstruct and predict sequence
3 from keras.models import Model
4 from keras.layers import Input
5 fromHow
keras.layers
to Develop animport LSTM
Encoder-Decoder
6 fromModel
keras.layers import Dense
8
9
Prediction in Keras
from keras.layers import TimeDistributed
from keras.utils import plot_model
11 How= toarray([0.1,
seq_in Develop an Encoder-Decoder
0.2, 0.3, 0.4, 0.5, 0.6,You can0.8,
0.7, master applied Machine Learning
0.9])
12 # reshape input into [samples,
Model with Attention in Keras timesteps, features]
16 seq_out
How =to seq_in[:, 1:, :]
Use the TimeDistributed Layer in
17 n_out = n_in - 1 Email Address
Keras
18 # define encoder
19 visible = Input(shape=(n_in,1))
20 encoder = LSTM(100, activation='relu')(visible)START MY EMAIL COURSE
21 # define reconstruct decoder
22 decoder1 = RepeatVector(n_in)(encoder)
23 Autoencoders
decoder1 = LSTM(100, activation='relu', return_sequences=True)(decoder1)
24 decoder1 = TimeDistributed(Dense(1))(decoder1)
25 # define predict decoder
26 decoder2 = RepeatVector(n_out)(encoder)
28 Loving
decoder2 the Tutorials?
= TimeDistributed(Dense(1))(decoder2)
29 # tie it together
30 model
The= LSTMs
Model(inputs=visible,
with Python EBook isoutputs=[decoder1, decoder2])
31 model.compile(optimizer='adam',
where you'll find the Really Good stuff.loss='mse')
32 plot_model(model, show_shapes=True, to_file='composite_lstm_autoencoder.png')
33 # fit model
34 model.fit(seq_in,
>> SEE WHAT'S [seq_in,seq_out],
INSIDE epochs=300, verbose=0)
35 # demonstrate prediction
36 yhat = model.predict(seq_in, verbose=0)
37 print(yhat)
differences in numerical precision. Consider running the example
Start a few
Machine times and compare the average
Learning
outcome.
Running the example both reconstructs and predicts the output sequence, using both decoders.
1 [array([[[0.10736275],
2 [0.20335874],
3 [0.30020815],
4 [0.3983948 ],
5 [0.4985725 ],
6 [0.5998295 ],
7 [0.700336 ,
8 [0.8001949 ],
9 [0.89984304]]], dtype=float32),
10
11 array([[[0.16298929],
12 [0.28785267],
13 [0.4030449 ],
14 [0.5104638 ],
15 [0.61162543],
16 [0.70776784],
17 [0.79992455],
18 [0.8889787 ]]], dtype=float32)]
Picked for you:

Prediction in Keras

Email Address
Keras

Autoencoders
Composite LSTM Autoencoder for Sequence Reconstruction and Prediction
Keep Standalone LSTM Encoder

RegardlessThe of the method
LSTMs chosen
with Python (reconstruction,
EBook is prediction, or composite), once the autoencoder has
been fit, the you'll
where decoder canReally
find the be removed and the encoder can be kept as a standalone model.
Good stuff.
The encoder>>can
SEEthen be used
WHAT'S to transform input sequences to a fixed length encoded vector.
INSIDE
We can do this by creating a new model that has the same inputs as our original model, and outputs
directly from the end of encoder model, before the RepeatVector layer.
1 # connect the encoder LSTM as the output layer

2 model = Model(inputs=model.inputs, outputs=model.layers[0].output)
A complete example of doing this with the reconstruction LSTM autoencoder is listed below.
1 # lstm autoencoder recreate sequence

4 from keras.models import Model
11 sequence = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
13 n_in = len(sequence)

Never miss amodel
15 # define tutorial:
Picked for you:
22 # fit model
How to the
24 # connect Reshape Input LSTM
encoder Data for
as Long
the output layer
25 model = Model(inputs=model.inputs,
Short-Term Memory Networks in Kerasoutputs=model.layers[0].output)
26 plot_model(model, show_shapes=True, to_file='lstm_encoder.png')
27 # get the feature vector for the input sequence
28 yhat = model.predict(sequence)
29 print(yhat.shape)
30 print(yhat)
Running Prediction
the examplein Keras
creates a standalone encoder model that could be used or saved for later use.
We demonstrate the encoder
How to Develop by predicting the sequence
an Encoder-Decoder andmaster
You can getting back Machine
applied the 100Learning
element output of
the encoder.
differences
Howin tonumerical precision. Consider
Use the TimeDistributed Layer in running the example a few times and compare the average
Email Address
outcome.Keras
Obviously, this is overkill for our tiny nine-step input sequence.

1 Autoencoders
[[0.03625513 0.04107533 0.10737951 0.02468692 0.06771207 0.
2 0.0696108 0. 0. 0.0688471 0. 0.
3 0. 0. 0. 0. 0. 0.03871286
4 0. 0. 0.05252134 0. 0.07473809 0.02688836
5 0. 0. 0. 0. 0. 0.0460703
6 0. Loving0. the Tutorials?
0.05190025 0. 0. 0.11807001
7 0. 0. 0. 0. 0. 0.
8 0.The LSTMs 0.14514188 0.
with Python EBook is 0. 0. 0.
9 0.02029926 0.02952124 0. 0. 0. 0.
10 0. 0.08357017 0.08418129 0. 0. 0.
11 0. 0. 0.09802645 0.07694854 0. 0.03605933
12 0. >> SEE 0.06378153
WHAT'S INSIDE0. 0.05267526 0.02744672 0.
13 0.06623861 0. 0. 0. 0.08133873 0.09208347
14 0.03379713 0. 0. 0. 0.07517676 0.08870222
15 0. 0. 0. 0. 0.03976351 0.09128518
16 0.08123557 0. 0.08983088 0.0886112 0. 0.03840019
17 0.00616016 0.0620428 0. 0. ]
A plot of the architecture is created for reference. Start Machine Learning
Standalone Encoder LSTM Model
Further
Never Reading
miss a tutorial:
This section provides more resources on the topic if you are looking to go deeper.
Making Predictions with Sequences

Picked for you:
Encoder-Decoder Long Short-Term Memory Networks
Autoencoder, Wikipedia
Unsupervised Learning of Video Representations using LSTMs, ArXiv 2015.
Unsupervised Learning of Video Representations using LSTMs, PMLR, PDF, 2015.
Unsupervised Learning of Video Representations using LSTMs, GitHub Repository.
Building
How Autoencoders in Keras, 2016.
to Develop an Encoder-Decoder
HowModel
to Use forthe Keras Functional API for Deep Learning
Sequence-to-Sequence
Prediction in Keras
Summary
In this post, youwith
Model discovered
Attention inthe LSTM Autoencoder model
Keras and
without how
math orto implement
fancy degrees.it in Python using
Keras. Find out how in this free and practical course.
Specifically,
Howyou learned:
to Use the TimeDistributed Layer in
Email Address
Keras
Autoencoders are a type of self-supervised learning model that can learn a compressed
representation of input data. START MY EMAIL COURSE
LSTMA Gentle Introduction
Autoencoders canto learn
LSTM a compressed representation of sequence data and have been
usedAutoencoders
on video, text, audio, and time series sequence data.
How to develop LSTM Autoencoder models in Python using the Keras deep learning library.
Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.
Develop LSTMs for Sequence Prediction Today!

Develop Your Own LSTM models in Minutes

...with just a few lines of python code
Start Machine
Discover Learning
how in my new Ebook:
Long Short-Term Memory Networks with Python
It provides self-study tutorials on topics like:

CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation,
making predictions and much more...
Finally Bring LSTM Recurrent Neural Networks to

Your Sequence Predictions Projects
Skip the Academics. Just Results.
SEE WHAT'S INSIDE
Tweet
Never miss aShare
tutorial:Share
About Jason Brownlee

Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results
with modern machine learning methods via hands-on tutorials.
Picked for you:
View all posts by Jason Brownlee →

 LSTM Model Architecture for Rare Event Time Series Forecasting

How to Use the TimeseriesGenerator for Time Series Forecasting in Keras 
Prediction in Keras
261 Responses to A Gentle Introductionwithout
to LSTM Autoencoders
math or fancy degrees.
REPLY 
samaksh kumar November 5, 2018 at 11:35 am #
Email Address
Keras
Nice Explained……….

Autoencoders
REPLY 
Jason Brownlee November 5, 2018 at 2:27 pm #
Thanks.

where you'll find the
ramar Really
October 11,Good
2019 atstuff.
8:31 pm #
REPLY 
>> SEE
VeryWHAT'S INSIDE
Good and Detailed representation of LSTM.
I have a csv file which contains 3000 values, when i run it in Google colab or jupyter notebook it was
much slow What may be the reason?

REPLY 
Jason Brownlee October 12, 2019 at 6:56 am #
Thanks.
Perhaps try running on a faster machine, like EC2?

Perhaps try using a more efficient implementation?
Perhaps try using less training data?
REPLY 
Ali Alwehaibi November 6, 2018 at 8:16 am #
Thanks for the great posts! I have learn a lot from them.
Can this approach for classification problems such as sentiment analysis?
REPLY 
Picked for you:
How toPerhaps.
Reshape Input Data for Long
REPLY 
TJtoChen
How November
Develop 7, 2018 at 1:21 pm #
an Encoder-Decoder
Hi Jason,
Prediction in Keras
Thanks for the posts, I really enjoy reading this. Start Machine Learning ×
I’m trying to use this method to do time series data anomaly detection and I got few questions here:
When you reshape the sequence into [samples, timesteps, features], samples and features always equal
to 1. What is the guidance to choose the value here? If the input sequences have variable length, how to
set timesteps, always choose max length?
Also, ifHow to Useisthe

the input twoTimeDistributed Layerdata
dimension tabular in with each row has different length, how will you do the
Email Address
reshape Keras
or normalization?
Thanks in advance!
Autoencoders
REPLY 
The time steps should provide enough history to make a prediction, the features are the
observations recorded at each time step.
More on preparing data for LSTMs here:
https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
REPLY 
Soheila November 8, 2018 at 5:52 pm #
Hi,
I am wondering why the output of encoder has a much higher dimension(100), since we usually use
encoders to create lower dimensions!
Could you please bring examples if I am wrong?
And what about variable length of samples? You keep saying that LSTM is useful for variable length. So
how does it deal with a training set like:
dataX[0] = [1,2,3,4]
dataX[1] = [2,5,7,8,4]
dataX[2] = [0,3]
I am really confused with my second question and I’d be very thankful for your help!
Never miss a tutorial: REPLY 

Jason Brownlee November 9, 2018 at 5:19 am #
The model reproduces the output, e.g. a 1D vector with 9 elements.
You can pad the variable length inputs with 0 and use a masking layer to ignore the padded values.
Picked for you:

Short-Term Memory Networks in Keras REPLY 
SB December 25, 2018 at 5:31 pm #
“I am wondering why the output of encoder has a much higher dimension(100), since
How
we to Develop
usually useanencoders
Encoder-Decoder
to create lower dimensions!”, I have the same question, can you please
explain more?
Prediction in Keras
Model with Attention in Keras REPLY 
Jason without
Brownlee December 26, 2018 at 6:40 math
am # or fancy degrees.
It is a demonstration of the architecture only, feel free to change the model
How toconfiguration
Use the TimeDistributed Layer problem.
for your specific in
Email Address
Keras

A Gentle Introduction REPLY 
J.V May 14,to2019
LSTMat 7:10 am #
Autoencoders
Great article. But reading through it I thought you were tackling the most important
problem with sequences – that is they have variable lengths. Turns out it wasn’t. Any chance you
could write a tutorial on using a mask to neutralise the padded value? This seems to be more
difficult than the rest of the model.
REPLY 
Jason
>> SEE WHAT'S Brownlee May 14, 2019 at 7:54 am #
INSIDE
Yes, I believe I have many tutorials on the topic.
Perhaps start here:

https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-
problems-python/ Start Machine Learning
REPLY 
Ufanc November 9, 2018 at 6:17 pm #
I really likes your posts and they are important.I got a lot of knowledge from your post.
Today, am going to ask your help. I am doing research on local music classifications. the key features of
the music is it sequence and it uses five keys out of the seven keys, we call it scale.
1. C – E – F – G – B. This is a major 3rd, minor 2nd, major 2nd, major 3rd, and minor 2nd
2. C – Db – F – G – Ab. This is a minor 2nd, major 3rd, major 2nd, minor 2nd, and major 3rd.
3. C – Db – F – Gb – A. This is a minor 2nd, major 3rd, minor 2nd, minor 3rd, and a minor 3rd.
4. C – D – E – G – A. This is a major 2nd, major 2nd, minor 3rd, major 2nd, and a minor 3rd
it is not dependent on range, rythm, melody and other features.
This key has to be in order. Otherwise it will be out of scale.
So, which tools /algorithm do i need to use for my research purpose and also any sampling mechanism
to takefor
Picked 30you:
sec sample music from each track without affecting the sequence of the keys ?
Regards
REPLY 
Perhaps try a suite of models and discover what works best for your specific dataset.
Prediction in Keras
More here:
https://machinelearningmastery.com/faq/single-faq/what-algorithm-config-should-i-use
REPLY 
lungen
How to Use November 9, 2018 at 7:43
the TimeDistributed pm #in
Layer
Email Address
Keras
Hi, can you please explain the use of repeat vector between encoder and decoder?
Encoder is encoding 1-feature time-series into fixed length 100 vector. In my understanding, decoder
shouldAtake thisIntroduction
Gentle 100-lengthtovector
LSTMand transform it into 1-feature time-series.
So, encoder is like
Autoencoders many-to-one lstm, and decoder is one-to-many (even though that ‘one’ is a vector of
length 100). Is this understanding correct?

REPLY 
The RepeatVector repeats the internal representation of the input n times for the number of
required output steps.
REPLY 
Abraham May 10, 2019 at 4:02 am #
Hi Jason? Start Machine Learning

What is the intuition behind “representing of the input n times for the number of required output
steps?”Here n times denotes, let say as in simple LSTM AE, 9 i.e. output step number.
I understand from repeatvector that here sequence are being read and transformed into a single
vector(9×100) which is the same 100 dim vector, then the model uses that vector to reconstruct
the original sequence.Is it right?
What about using any number except for 9 for the number of required output steps?
Thanks from now on.
REPLY 
Jason Brownlee May 10, 2019 at 8:19 am #
To provide input for the LSTM on each output time step for one sample.
REPLY 
rekha November 10, 2018 at 4:10 am #
Picked for you:
Which model is most suited for stock market prediction
REPLY 
None,
Model for a time series of prices is a random walk as far as I’ve read.
Prediction in Keras
More here:
https://machinelearningmastery.com/faq/single-faq/can-you-help-me-with-machine-learning-for-
finance-or-the-stock-market
How REPLY 
MJtoNovember
Use the TimeDistributed
10, 2018 at 4:41 pmLayer
# in
Email Address
Keras
Hi,
thanks for the instructive post!
Autoencoders
I am trying to repeat your first example (Reconstruction LSTM Autoencoder) using a different syntax of
Keras; here is the code:
import numpy as np
Loving the
from keras.layers Tutorials?
import Input, LSTM, RepeatVector
from keras.models import Model
timesteps = 9 find the Really Good stuff.
where you'll
input_dim = 1
>>=SEE
latent_dim 100 WHAT'S INSIDE
# input placeholder
inputs = Input(shape=(timesteps, input_dim))
# “encoded” is the encoded representation of the input

encoded = LSTM(latent_dim,activation=’relu’)(inputs) Start Machine Learning
# “decoded” is the lossy reconstruction of the input

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, activation=’relu’, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)

encoder = Model(inputs, encoded)
# compile model
sequence_autoencoder.compile(optimizer=’adadelta’, loss=’mse’)
# run model
sequence_autoencoder.fit(sequence,sequence,epochs=300, verbose=0)
# prediction
sequence_autoencoder.predict(sequence,verbose=0)
I did not know why, but I always get a poor result than the model using your code.
So my question is: is there any difference between the two method (syntax) under the hood? or they are
actually the same ?
Picked for you:
Thanks.
REPLY 
If you have trouble with the code in the tutorial, confirm that your version of Keras is 2.2.4 or
Prediction
higher in Keras
and TensorFlow is up to date.
REPLY 
J Hogue November 29, 2018 at 5:01 am # Find out how in this free and practical course.
I feel like a bit more description could go into how to setup the LSTM autoencoder. Particularly
tune the bottleneck. Right now when I apply thisEmail
how toKeras to myAddress
data its basically just returning the mean
for everything, which suggests the its too aggressive but I’m not clear on where to change things.
Autoencoders
REPLY 
Thanks for the suggestion.


where you'll find the Really Good stuff. REPLY 
Dimitre Oliveira December 10, 2018 at 11:59 am #

Hi Jason, thanks for the wonderful article, I took some time and wrote a kernel on Kaggle
inspired by your content, showing regular time-series approach using LSTM and another one using a
MLP but with features encoded by and LSTM autoencoder, as shown here, for anyone interested here’s
the link: https://www.kaggle.com/dimitreoliveira/time-series-forecasting-with-lstm-autoencoders
I would love some feedback.

REPLY 
Jason Brownlee December 10, 2018 at 2:17 pm #
Well done!
REPLY 
Simranjit Singh December 27, 2018 at 10:53 pm #
Hey! I am trying to compact the data single row of 217 rows. After running the program it is
returning nan values for prediction Can you guide me where did i do wrong?
REPLY 
Jason Brownlee December 28, 2018 at 5:57 am #
I have some advice here:

Picked for you:
https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code

REPLY 
Net December 31, 2018 at 9:32 pm #
Dear Jason
After building andin training
Keras the model above, how to evaluate the model? (like model.evaluate(train_x,
Prediction
train_y…) in common LSTM)? Start Machine Learning ×
ThanksHow
a lotto Develop an Encoder-Decoder You can master applied Machine Learning
REPLY 
How toJason
Use theBrownlee
TimeDistributed Layer
January in at 6:15 am #
1, 2019
Email Address
Keras
The model is evaluated by its ability to reconstruct the input. You can use evaluate function
or perform the evaluation of the predictions manually.START MY EMAIL COURSE
Autoencoders
REPLY 
Andy Hung January 2, 2019 at 6:28 am #

I have learned a lot from your website. Autoencoder can be used as dimension reduction. Is it
possible to merge multiple time-series inputs into one using RNN autoencoder? My data shape is (9500,
20, 5) => (sample size, time steps, features). How to encode-decode into (9500, 20, 1)?
Thank you very much,
REPLY 
Jason Brownlee January 2, 2019 at 6:44 am #
Perhaps, that may require some careful design. It might be

Start Machine easier to combine all data to a
Learning
single input.
REPLY 
Andy Hung January 3, 2019 at 3:47 am #
Thank you for your reply. Will (9500,100,1) => (9500,20,1) be easier?
REPLY 
Perhaps test it and see.

REPLY 
Jimmy Joe January 5, 2019 at 9:41 am #
Hi Jason,
Picked for you:
I’m a regular reader of your website, I learned a lot from your posts and books!
This one is also very informative, but there’s one thing I can’t fully understand: if the encoder input is
[0.1, 0.2, …, 0.9] and
Short-Term the expected
Memory decoder
Networks in Keras output is [0.2, 0.3, …, 0.9], that’s basically a part of the input
sequence. I’m not sure why you say it’s “predicting next step for each input step”. Could you please
explain? Is an autoencoder a good fit for multi-step time series prediction?
AnotherHowquestion: does
to Develop antraining the composite autoencoder imply that the error is averaged for both
Encoder-Decoder
expected
Modeloutputs ([seq_in, seq_out])?
Prediction in Keras
How to Develop an Encoder-Decoder You can master applied Machine Learning REPLY 
Jason Brownlee January
Model with Attention in Keras 6, 2019 at 10:15 am #
I am demonstrating two ways to learn the encoding, by reproducing the input and by
predicting the next step in the output.
Email Address
Keras
Remember, the model outputs a single step at a time in order to construct a sequence.
Good question, I assume the reported error is averaged over both outputs. I’m not sure.
Autoencoders
REPLY 
Junetae Kim January 27, 2019 at 4:12 pm #
Hi, I am JT.
First of all, thanks for your post that provides an excellent explanation of the concept of LSTM AE
models and codes.
If I understand your AE model correclty, features from your LSTM AE vector layer [shape (,100)] does
not seem>> to SEE
be time dependent.
WHAT'S INSIDE
So, I have tried to build a time-dependent AE layer by modifying your codes.
Could you check my codes whether my codes are correct to build an AE model that incpude a time-wise
AE layer, if you don’t mind?
My codes are below. Start Machine Learning
from numpy import array

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
## Data generation
# define input sequence
seq_in = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
# reshape input into [samples, timesteps, features]
n_in = len(seq_in)
seq_in = seq_in.reshape((1, n_in, 1))
# prepare output sequence
seq_out = array([3, 5, 7, 9, 11, 13, 15, 17, 19])
seq_out = seq_out.reshape((1, n_in, 1))
Picked for you:
## Model specification
# define encoder
visible Short-Term
= Input(shape=(n_in,1))
encoder = LSTM(60, activation=’relu’, return_sequences=True)(visible)
# AE Vector
AEV =How to Develop
LSTM(30, an Encoder-Decoder
activation=’relu’, return_sequences=True)(encoder)
# define reconstruct
Prediction decoder
in Keras
Start Machine
decoder1 = LSTM(60, activation=’relu’, return_sequences=True)(AEV) Learning ×
decoder1 = TimeDistributed(Dense(1))(decoder1)
Model
# define withdecoder
predict Attention in Keras without math or fancy degrees.
decoder2 = LSTM(30, activation=’relu’, return_sequences=True)(AEV)
decoder2 = TimeDistributed(Dense(1))(decoder2)
# tie it Keras
together Email Address
model = Model(inputs=visible, outputs=[decoder1, decoder2])
model.summary() START MY EMAIL COURSE
model.compile(optimizer=’adam’,
A Gentle Introduction to LSTMloss=’mse’)
Autoencoders
# fit model
model.fit(seq_in, [seq_in,seq_out], epochs=2000, verbose=2)
## The model that feeds seq_in to predict seq_out

hat1= model.predict(seq_in)
The
## The LSTMs
model thatwith Python
feeds EBook
seq_in is
to predict AE Vector values
model2 = Model(inputs=model.inputs, outputs=model.layers[2].output)
hat_ae= model2.predict(seq_in)
## The model that feeds AE Vector values to predict seq_out
input_vec = Input(shape=(n_in,30))
dec2 = model.layers[4](input_vec)
dec2 = model.layers[6](dec2)
model3 = Model(inputs=input_vec, outputs=dec2) Start Machine Learning
hat_= model3.predict(hat_ae)
Thank you very much
REPLY 
I’m happy to answer questions, but I don’t have the capacity to review and debug your
code, sorry.

Anirban Ray January 28, 2019 at 5:33 pm #
Thanks for the nice post. Being a beginner in machine learning, your posts are really helpful.
I want to build an auto-encoder for data-set of names of a large number of people. I want to encode the
Picked
entire for
fieldyou:
instead of doing it character or wise, for example [“Neil Armstrong”] instead of [“N”, “e”, “i”, “l”,
” “, “A”, “r”, “m”, “s”, “t”, “r”, “o”, “n”, “g”] or [“Neil”, “Armstrong”]. How can I do it?
REPLY 
Wrap your list of strings in one more list/array.
Prediction in Keras
Model with Attention in 15,
Keras REPLY 
Benjamin February 2019 at 11:05 am # without math or fancy degrees.
Hey, thanks for the post, I have found it helpful… Although I am confused about one, in my
opinion,
Howmajor point..
Email Address
Keras
– If autoencoders are used to obtain a compressed representation of the input, what is the purpose of
taking the output after the encoder part if it is now 100 elements instead of 9? I’m struggling to find a
meaning of the 100 element data and how one could use this 100 element data to predict anomalies. It
sort of Autoencoders
seems like doing the exact opposite of what was stated in the explanation prior to the example.
An explanation would be greatly appreciated.
– In the end I’m trying to really understand how after learning the weights by minimizing the
reconstruction error of the training set using the AE, how to then use this trained model to predict
anomalies in the cross validation and test sets.
–
REPLY 
Jason Brownlee February 15, 2019 at 2:22 pm #
It is just a demonstration, perhaps I could have given a better example.
For example, you could scale up the input to be 1,000 inputs that is summarized with 10 or 100
values.
REPLY 
Ahmad May 19, 2019 at 10:22 pm #
Hi Jason, Benjamin is right. The last example you provided for using standalone LSTM
encoder. The input sequence is 9 elements but the output of the encoder is 100 elements
despite explaining in the first part of the tutorial that encoder part compresses the input
sequence and can be used as a feature vector. I am also confused about how the output of 100
elements can be used as a feature representation of 9 elements of the input sequence. A more
detail explanation will help. Thank you!

REPLY 
Thanks for the suggestion.

Picked for you:

REPLY 
Cloudy Memory
Short-Term Networks
February 17, in Keras
2019 at 7:51 pm #
Thank your great post.

As youHow
mentioned in the
to Develop first section, “Once fit, the encoder part of the model can be used to encode or
an Encoder-Decoder
compress
Modelsequence data that in turn may be used as a feature vector input to a supervised learning
model”.Prediction
I fed theinfeature
n_dimensions=50
Keras vector (encode part) to 1 feedforward neural network 1 hidden layer:
1 Howget_model(n_dimensions):
def to Develop an Encoder-Decoder You can master applied Machine Learning
2 inputs
Model = Input(shape=(timesteps,
with Attention in Keras input_dim))
3 encoded = LSTM(n_dimensions, return_sequences=False, name="encoder")(inputs)
4 decoded = RepeatVector(timesteps)(encoded)
5 decoded = LSTM(input_dim, return_sequences=True, name='decoder')(decoded)
6 Howdecoded = TimeDistributed(Dense(features_n))(decoded)
7 Email Address
Keras
8 autoencoder = Model(inputs, decoded)
9 encoder = Model(inputs, encoded)
10 mid = Dense(num_units, activation='relu')(encoded) #FFNN
START MY EMAIL (1in, 1hid, 1out)
COURSE
11 out = Dense(num_classes,
A Gentle Introduction to LSTM activation='softmax')(mid) #FFNN
12 full_model=Model(inputs, out)
13 Autoencoders
return autoencoder, encoder, full_model
14 autoencoder, encoder, full_model = get_model(n_dimensions)
15 history = autoencoder.fit(train_x, train_x, batch_size=100, epochs=epochs, validation_
16 train_encoded = encoder.predict(train_x)
17 val_encoded = encoder.predict(train_x)
18 Loving the Tutorials?
history_class=full_model.fit(train_encoded, train_y, epochs=3, batch_size=256, validat
The LSTMs
Error when with PythonError
fit(): ValueError: EBook is checking input: expected input_1 to have 3 dimensions, but got
when
where
array withyou'll
shape find(789545,
the Really Good stuff.
50).
I mix Autoencoder to FFNN

>> SEE WHAT'S and is my method, right? Can you help me shape the feature vector before
INSIDE
fed to FFNN
REPLY 
Jason Brownlee February 18, 2019 at 6:30 am #
Change the LSTM to not return sequences in the decoder.
REPLY 
Cloudy February 19, 2019 at 6:14 pm #
Thank for your reply soon.

I saw your post, LSTM layer at the decoder is set “return_sequences=True” and I follow and then
error as you saw. Actually, I thought the decoder is not a stacked LSTM (only 1 LSTM layer), so
“return_sequences=False” is suitable. I changed as you recommend. Another error:
decoded = TimeDistributed(Dense(features_n))(decoded)
File “/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py”, line 592, in __call__
self.build(input_shapes[0])
File “/usr/local/lib/python3.4/dist-packages/keras/layers/wrappers.py”, line 164, in build
assert len(input_shape) >= 3
AssertionError.
Can you give me an advice?
Picked forThank
you:you

REPLY 
How to DevelopI’man not sure about this error, sorry. Perhaps post code and error to stackoveflow or
Encoder-Decoder
Modeltry
fordebugging?
Prediction in Keras
Cloudy
Model with Attention in Keras February 22, 2019 at 7:45 pm #
Hi Jason,
How to UseI found another way Layer
the TimeDistributed to build
in full_model. I don’t use autoencoder.predict(train_x) to input
Email Address
Keras to full_model. I used orginal inputs, saved weights of the encoder part in autoencoder
model, then set that weights to encoder model. Something like this:
autoencoder.save_weights(‘autoencoder.h5′) START MY EMAIL COURSE
for l1,l2 in zip(full_model.layers[:a],autoencoder.layers[0:a]):
A Gentle Introduction to LSTM #a:the num_layer in the
Autoencoders
encoder part
l1.set_weights(l2.get_weights())
train full_model:
full_model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
history_class=full_model.fit(train_x, train_y, epochs=2, batch_size=256, validation_data=
The LSTMs(val_x, val_y))
with Python EBook is
My full_model run, but the result so bad. Hic, train 0%, test/val: 100%
Interesting, sounds like more debugging might be required.

REPLY 
Anshuman Singh Bhadauria February 21, 2019 at 12:20 am #
Hi Jason,
Thank you for putting in the effort of writing the posts, they are very helpful.
Is it possible to learn representations of multiple time series at the same time? By multiple time-series I
don’t mean multivariate.
For eg., if I have time series data from 10 sensors, how can I feed them simultaneously to obtain 10
representations, not a combined one.
Best,
Anshuman
REPLY 
Picked for you:
How toYes, eachInput

Reshape series would
Data be a different sample used to train one model.
for Long
REPLY 
JasOlean
How to DevelopMarch 5, 2019 at 9:42 pm #
an Encoder-Decoder
Hi Jason,
Prediction in Keras
I use MinMaxScaler function to normalize my training and testing data.
After that I did some training process to get model.h5 file.
And then I use this file to predict my testing data. Find out how in this free and practical course.
After that I got some prediction results with range (0,1).

I reverse Email
my original data using inverse_transform function Address
from MinMaxScaler.
Keras
But, when I compare my original data (before scaler) with my predictions data, the x,y coordinates are
changed like this: START MY EMAIL COURSE
Ori_data =
Autoencoders
[7.6291,112.74,43.232,96.636,61.033,87.311,91.55,115.28,121.22,136.48,119.52,80.53,172.08,77.987,1
99.21,94.94,228.03,110.2,117.83,104.26,174.62,103.42,211.92,109.35,204.29,122.91,114.44,125.46,16
8.69,124.61,194.97,134.78,173.77,141.56,104.26,144.11,125.46,166.99,143.26,185.64,165.3,205.14]
Predicted_data = [94.290375, 220.07372, 112.91617, 177.89548, 133.5322, 149.65489,
The LSTMs
161.85602, with Python
99.74797, EBook60.718987,
178.18903, is 86.012276, 113.3682,
111.641655, 90.18026, 134.16464, 82.28861, 155.12575, 78.26058,
99.82883, 145.162, 98.78825, 98.62861, 130.25414, 62.43494,
>> 74.574684,
143.52762, SEE WHAT'S99.36809,
INSIDE 169.79303, 107.395615, 131.40468,
124.29078, 114.974014, 135.11014, 107.4492, 90.64477, 188.39305,

121.55309, 174.63484, 138.58575, 167.6933, 144.91512, 162.34071]
When I visualize these predictions data on my image, the direction is 90 degree changing (i.e Original
data is horizontal but predictions data is vertical).
Why I face this and how can I fix that?
REPLY 
Jason Brownlee March 6, 2019 at 7:54 am #
You must ensure that the columns match when calling transform() and inverse_transform().
See this tutorial:

https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/

saria March 11, 2019 at 6:09 am #
Hi Jason,
Thank you so much for your great post. I wish you have done this with a real data set like 20 newsgroup
Picked for you:
data set.
It is at first not clear the different ways of preparing the data for different objectives.
My understanding is that with LSTM Autoencoder we can prepare data in different ways based on the
goal. Am I correct?
Or can you please give me the link which is preparing the text data like 20 news_group for this kind of
model?
Model for
Again thanks forSequence-to-Sequence
your awesome material
Prediction in Keras
REPLY 
Jason Brownlee March
Model with Attention in Keras 11, 2019 at 6:58 am #
If you are working with text data, perhaps start here:
https://machinelearningmastery.com/start-here/#nlp
Email Address
Keras

REPLY 
saria March
A Gentle Introduction to 11, 2019 at 3:06 pm #
LSTM
Autoencoders
Thank you so much Jason for the link. I have already gone through lots of material, in
detail the mini corse in the mentioned link months ago.
My problem mainly is the label data here.
For example, in your code, in the reconstruction part, you have given sequence for both data
Theand label.with
LSTMs however,
Pythonin the prediction
EBook is part you have given the seq_in, seq_out as the data and
wherethe
you'll findand
label, the their
Really Good stuff.
difference is that seq_out looking at one timestamp forward.
My question according to your example will be if I want to use this LSTM autoencoder for the
purpose of topic modeling, Do I need to follow the reconstruction part as I don’t need any
prediction?
Start Machine Learning REPLY 

I think I got my answer. Thanks Jason
REPLY 
No. Perhaps this model would be more useful as a starting point:

https://machinelearningmastery.com/develop-word-embedding-model-predicting-movie-
review-sentiment/

REPLY 
Based on different objectives I meant, for example if we use this architecture for topic modeling,
or sequence generation, or … is preparing the data should be different?
Picked for you:
How to Reshape Input Data for Long REPLY 

Mingkuan Wu March 14, 2019 at 9:27 am #
Thanks for your post! When you use RepeatVectors(), I guess you are using the unconditional
decoder, am I right?
Prediction in Keras
REPLY 
Model with Attention
I guess in Keras
so. Are you referring to a specificwithout
model in comparison?
math or fancy degrees.

Email Address REPLY 
Keras
rekha March 18, 2019 at 4:47 am #
Thanks for the post. Can this be formulated as START

a sequence prediction
MY EMAIL COURSE research problem
Autoencoders
REPLY 

The demonstration is a sequence prediction problem.

REPLY 
Kristian March 20, 2019 at 11:32 pm #
Hi Jason,
what a fantastic tutorial!
I have a question about the loss function used in the composite model.
Start and
Say you have different loss functions for the reconstruction Machine Learning
the prediction/classification parts, and
pre-trains the reconstruction part.
In Keras, would it be possible to combine these two loss functions into one when training the model,
such that the model does not lose or diminish its reconstruction ability while traning the
prediction/classification part?
If so; could you please point me in the right direction.
Kind regards
Kristian

Yes, great question!
You can specify a list of loss functions to use for each output of the network.
Picked for you:

El March 26, 2019 at 4:54 am #
Dear,
Would Model
it makeforsense to set statefull = true on the LSTMs layers of an encoder decoder?
Thanks
Prediction in Keras
Jason Brownlee March 26, 2019 at 8:12 am Find
# out how in this free and practical course. REPLY 
How toItUse
really
thedepends on whether
TimeDistributed Layer you
in want control over when the internal state is reset, or not.
Email Address
Keras

REPLY 
A Gentle Introduction
saria March 28, 2019toatLSTM
3:34 am #
Autoencoders
Thank you, Jason, but still, I have not got the answer to my question.
Lets put it another way. what is the latent space in this model? is it only a compressed version of the
input data?
Loving
do you think the
if I use theTutorials?
architecture of Many to one, I will have one word representation for each
sequence of data?
Whywhere you'll findprint
am I able to out theGood
the Really clusters of the topics in autoencoder easily but when it comes to this
stuff.
architecture I am lost!
REPLY 
In some sense, yes, but a one value representation is an aggressive

projection/compression of the input and may not be useful.
What problem are you having exactly?
REPLY 
JohnAlex March 28, 2019 at 2:01 pm #
Hi Jason,
I am appreciate your tutorial!
Now I’m implementing the paper ‘“Unsupervised Learning of Video Representations using LSTMs.”But
my result is not very well.The predict pictures are blurred,not good as the paper’s result.
（You can see my result at here:
https://i.loli.net/2019/03/28/5c9c374d68af2.jpg
https://i.loli.net/2019/03/28/5c9c37af98c65.jpg）
I don’t think there exists difference between my keras model and the paper’s model.But the problem has
confused me for 2 weeks,I can not get a good solution.I really appreciate your help!
This my
Picked forkeras
you:model’s code:
1 Input_seqlen=10
2 Output_seqlen=10
3 Pic_size=64
4 Channels=1
5 Num_units = 2048
6
7 def
HowBasic_Encoder_Decoder(input_img_shape=(Input_seqlen,Pic_size,Pic_size,Channels)):
to Develop an Encoder-Decoder
8 Model for Sequence-to-Sequence
9 encoder_x = keras.layers.Input(shape=input_img_shape[1:])
10
11
Prediction in Keras
encoder_flatten = keras.layers.Flatten()(encoder_x)
conv_model=keras.Model(encoder_x,encoder_flatten)
12
13 Howdecode_in
to Develop=ankeras.layers.Input(shape=(Num_units,))
Encoder-Decoder You can master applied Machine Learning
14 decode_dense=keras.layers.Dense(Pic_size*Pic_size,activation='sigmoid')(decode_in)
15 decode_reshape = keras.layers.Reshape(target_shape=(Pic_size,Pic_size,Channels))(d
16 decode_model=keras.Model(decode_in,decode_reshape)
17
18 How to Use the TimeDistributed Layer in
19 Email Address
#TensorShape([Dimension(None), Dimension(10), Dimension(64), Dimension(64), Dimens
Keras
20 Encoder_inp = keras.layers.Input(shape=input_img_shape)
21 Encode_seq = keras.layers.TimeDistributed(conv_model)(Encoder_inp)
22 START MY EMAIL COURSE
23
24 Encoder_Lstm,Encoder_h,Encoder_c= keras.layers.LSTM(units=Num_units,
25 Autoencoders return_sequences=False,return_state=True,dropou
26
27 copylayer= keras.layers.RepeatVector(Output_seqlen)(Encoder_h)
28 decoder_lstm,_,_= keras.layers.LSTM(units=Num_units,return_sequences=True,
29 return_state=True,dropout=0.2,activation='relu')(copylayer)
30 Loving
Decode_seqthe Tutorials?
= keras.layers.TimeDistributed(decode_model)(decoder_lstm)
31
32 The cae
LSTMs= keras.Model(Encoder_inp,Decode_seq)
33where you'll find the Really Good stuff.
34 rms = keras.optimizers.RMSprop(lr=0.001)
35
36 defSEE
>> ae_loss(inp,outp):
WHAT'S INSIDE
37 inp = K.flatten(inp)
38 outp = K.flatten(outp)
39 xent_loss = keras.losses.binary_crossentropy(inp, outp)
40 return xent_loss
41 cae.compile(optimizer=rms,loss=ae_loss)
REPLY 
Jason Brownlee March 28, 2019 at 2:43 pm #
Sounds like a great project!
Sorry, I don’t have the capacity to debug your code, I have some suggestions here though:
https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
REPLY 
JohnAlex March 28, 2019 at 10:33 pm #
Thank for your reply.

And I wanna know that what may cause the image of the output to be blurred according to your
experience ?Thank you~
Picked for you:

REPLY 
In what
Short-Term Memory context
Networks in exactly?
Keras

REPLY 
Birish
April 5, 2019 at 2:26 am #
Prediction in Keras
Start
How can I use the cell state of this “Standalone LSTMMachine Learning
Encoder” model as an input layer for
×
another model? Suppose in your code for “Keep Standalone LSTM Encoder”, you had
“return_state=True” option for the
encoder LSTM layer and create the model like:
Find out how in this
model = Model(inputs=model.inputs, outputs=[model.layers[0].output, free and practical
hidden_state, course.
cell_state])
Then one
Howcan retrieve
to Use the cell state by:
the TimeDistributed model.outputs[2]
Layer in
Email Address
Keras is that this will return a “Tensor” and keras complains that it only accept “Input Layer” as an
The problem
input for ‘Model()’. How can I feed this cell state to another model as input?
Autoencoders
REPLY 
Jason Brownlee April 5, 2019 at 6:20 am #
I think it would be odd to use cell state as an input, I’m not sure I follow what you want to
do.
Nevertheless, you can use Keras to evaluate the tensor, get the data, create a numpy array and
provide it as input to the model.
Also, >>
thisSEE
mayWHAT'S
help: INSIDE
https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/
REPLY 
Birish April 5, 2019 at 4:21 pm # Start Machine Learning
That’s the approach used in this paper: https://arxiv.org/pdf/1709.01907.pdf
“After the encoder-decoder is pre-trained, it is treated as an intelligent feature-extraction

blackbox. Specifically, the last LSTM cell states of the encoder are extracted as learned
embedding. Then, a prediction network is trained to forecast the next one or more timestamps
using the learned embedding as features.”
They trained an LSTM autoencoder and fed the last cell states of last encoder layer to another
model. Did I misunderstand it?

Sounds odd, perhaps confirm with the authors that they are not referring to hidden
states (outputs) instead?
Picked for you:

REPLY 
GeorgeMemory
Short-Term April 11, 2019 at 1:07 in
Networks amKeras
#
Hello Jason,
Is thereHow
anytoway
Develop an Encoder-Decoder
to stack the LSTM autoencoder?
for example:
Prediction in Keras
model = Sequential()
How to Develop anactivation=activation,
model.add(LSTM(units, Encoder-Decoder input_shape=(n_in,d)))
You can master applied Machine Learning
model.add(RepeatVector(n_in)) without math or fancy degrees.
model.add(LSTM( units/2, activation=activation)) Find out how in this free and practical course.
model.add(RepeatVector(n_in)
How to Use the TimeDistributed
model.add(LSTM(units/2, Layer in
activation=activation,return_sequences=True))
Email Address
Keras
model.add(LSTM(units, activation=activation, return_sequences=True))
model.add(TimeDistributed(Dense(d)))
is this a
A correct approach? to LSTM
Gentle Introduction
Autoencoders
Do you see any benefits by stacking the autoencoder?

REPLY 
where you'll find the
I have Really
never seenGood stuff. like this
something
REPLY 
Leland Hepworth September 14, 2019 at 5:53 am #
Hi George,
Start Machine
Stacked encoder / decoders with a narrowing bottleneck are usedLearning
in a tutorial on the Keras website
in the section “Deep autoencoder”
https://blog.keras.io/building-autoencoders-in-keras.html
The tutorial claims that the deeper architecture gives slightly better results than the more shallow
model definition in the previous example. This tutorial uses simple dense layers in its models, so I
wonder if something similar could be done with LSTM layers.
REPLY 
Jason Brownlee September 14, 2019 at 6:25 am #
Thanks for sharing.
REPLY 
Rojin April 12, 2019 at 3:36 am #
I have a theoretical question about autoencoders. I know that autoencoders are suppose to
Picked for the
construct you:input at the output, and by doing so they will learn a lower-dim representation of the input.
Now I want to know if it is possible to use autoencoders to construct something else at the output (let’s
say a something that is a modified version of the input).
How to Develop an Encoder-Decoder REPLY 

Prediction in Keras
Sure. Start Machine Learning ×
Perhaps check out conditional generative models, VAEs, GANs.
REPLY 
How to UseRojin April 12, 2019 atLayer
the TimeDistributed 10:35 am
in #
Email Address
Keras
Thanks for the response. I will check those out. I though about denoising autoencoders,
but was not sure if that is applicable to my situations.
A Let’s
Gentle Introduction
say to two
that I have LSTM
versions of a feature vector, one is X, and the other one is X’, which has
Autoencoders
some meaningful noise (technically not noise, meaningful information). Now my question is
whether it is appropriate to use denoising autoencoders in this case to learn about the transition
between X to X’ ?

where you'll find theJason
Really Brownlee
Good stuff. April 12, 2019 at 2:43 pm # REPLY 
Conditional
>> SEE WHAT'S INSIDE GANs do this for image to image translation.
REPLY 
Nick April 16, 2019 at 4:18 pm #
Hi Jason, could you explain the difference between RepeatVector and return_sequence?
It looks like they both repeat vector several times but what’s the difference?
Can we only use return_sequence in the last LSTM encoder layer and don’t use RepeatVector before
the first LSTM decoder layer?
REPLY 
Yes, they are very different in operation, but similar in effect.
The “return_sequence” argument, returns the LSTM layer outputs for each input time step.
The “RepeatVector” layer copies the output from the LSTM for the last input time step and repeats it
n times.
REPLY 
Picked for you: Nick April 18, 2019 at 3:44 am #
ThankInput
How to Reshape you,Data
Jason, now I understand the difference between them. But, here is another
for Long
question, Memory
Short-Term can we Networks
do like this:
in Keras
”’
encoder = LSTM(100, activation=’relu’, input_shape=(n_in,1), return_sequence=True)
(no RepeatVector layer here, but return_sequence is True in encoder layer)
Prediction
decoder in= Keras
LSTM(100, activation=’relu’, return_sequences=True)(encoder)
decoder = TimeDistributed(Dense(1))(decoder)
”’ to Develop an Encoder-Decoder
How You can master applied Machine Learning
Model with Attention in Keras without
If yes, what’s the difference between this one and the math or fancy
one you shareddegrees.
(with RepeatVector layer
between encoder and decoder, but return_sequence is False in encoder layer)

Email Address
Keras
REPLY 
The repeat vector allows the decoder to use the same representation when creating
Autoencoders
each output time step.
A stacked LSTM is different in that it will read all input time steps before formulating an
output, and in your case, will output an activation for each input time step.
There’s no “best” way, test a suite of models for your problem and use whatever works best.

Nick April 18, 2019 at 2:53 pm #
Thank you for answering my questions.

TS May 6, 2019 at 3:27 pm #
Dear Sir,
One point I would like to mention is the Unconditioned Model that Srivastava et al use.
a) They do not supply any inputs in the decoder model.. Is this tutorial only using the
conditioned model?
b) Even if we are using the any of the 2 models that is mentioned in the paper, we
should be passing the hidden state or maybe even the cell state of the encoder model to
the models first time step and not to all the time steps..
The tutorial over here shows us that the repeat vector is supplying inputs to all the time
steps in the decoder model which should not be the case in any of the models
Also the target time steps in the auto reconstruction decoder model should have been
reversed.
Please correct me if I am wrong in understanding the paper. Awaiting for you to clarify
my doubt. Thanking you in advance.
Picked for you:

Perhaps.
You can
How to Develop consider the implementation inspired by the paper. Not a direct re-
an Encoder-Decoder
implementation.
Prediction in Keras
Model with AttentionTS May 9, 2019 at 1:16 am #
in Keras without math or fancy degrees.
Thank you for the clarification.. Thank you for the post, it helped
Email Address
Keras
Jason Brownlee May 9, 2019 atSTART

6:46 am #
MY EMAIL COURSE
Autoencoders You’re welcome.
REPLY 
Loving theXiaoyang
Tutorials?
Ruan May 22, 2020 at 9:48 am #
The LSTMs withMy Python EBook is is that repeatvector function utilizes a more “dense”
understanding
where you'll find the Really
representation ofGood stuff. inputs. For an encoder lstm with 100 hidden units, all
the original
information are compressed into a 100 elements vector (which then duplicated by
repeatvector for desired output timesteps). For return_sequence=TRUE, it is a totally
different scenario — you end up with 100 x input time steps latent variables. It is more like a
sparse autoencoder. Correct me if i am wrong.

REPLY 
Taraka Rama April 18, 2019 at 5:47 pm #
Hi Jason,
The blog is very interesting. A paper that I published sometime ago uses LSTM autoencoders for
German and Dutch dialect analysis.
Best,
Taraka
REPLY 
Thanks.
REPLY 
Taraka Rama April 18, 2019 at 5:48 pm #
Picked for you:
Hi Jason,
(ForgotHow to Reshape
to paste Inputlink)
the paper Data for Long
The blog is very interesting. A paper that I published sometime ago uses LSTM autoencoders for
German and Dutch dialect analysis.
https://www.aclweb.org/anthology/W16-4803
Best, Prediction in Keras
Taraka
REPLY 
Email Address
Keras Thanks for sharing.

REPLY 
Geralt Xu May 4, 2019 at 8:01 pm #
Autoencoders
Hi Jason,
Thanks for the tutorial, it really helps.

Here is a question about connection between Encoder and Decoder.
In your implementation,
where you copy
you'll find the Really Goodthe H-dimension hidden vector from Encoder for T times, and convey it
stuff.
as a T*H time series, into the Decoder.
Why chose this way? I’m wondering, there are some another ways to do:
Take hidden vector as the initial state at the first time-step of Decoder, with zero inputs series.
Can this way work?
Best, Start Machine Learning

Geralt
REPLY 
Because it is an easy way to achieve the desired effect from the paper using the Keras
library.
No, I don’t think you’re approach is the spirit of the paper. Try it and see what happens!?

Atefeh May 6, 2019 at 11:29 am #
Hello Mr.Jason
i want to start a handwritten isolated charactor recognition with RNN and lstm.
i mean, we have a number of charactor images and i want a code to recognize that charactor.
Picked for you:
would you please help me to find a basic python code for this purpose, ans so i could start the work?
thank you

Jason Brownlee May 6, 2019 at 2:33 pm #
Prediction in Keras
Sounds like a great problem. Start Machine Learning ×
Perhaps a CNN-LSTM model would be a good fit!
REPLY 
Xinyang
How May 13, 2019 at 12:37 pm #
Email Address
Keras
Hi, Dr Brownlee
Thanks for your post, here I want to use LSTM to prediction STARTa time series.COURSE
MY EMAIL For example the series like (1
2 3 4 5 6 7 8 9), and use this series for training. Then the output series is the series of multi-step
Autoencoders
prediction until it reach the ideal value, like this(9.9 10.8 11.9 12 13.1)
Loving the Tutorials? REPLY 

See
where you'll findthis
thepost:
Really Good stuff.
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
REPLY 
Xinyang May 13, 2019 at 12:57 pm #
Sorry, maybe I didn’t make it clear. Here I want to use LSTM to prediction a time series. the
sequence may like this[10,20,30,40,50,60,70],and use it for training,if time_step is 3. When
input[40,50,60],we want the output is 70. when finish training the model, the prediction begin. when input
[50,60,70], the output maybe 79 and then use it for next step prediction, the input is [60,70,79] and
output might be 89. Until satisfying certain condition(like the output>=100) the the iteration is over.
So how could I realize the prediction process above and where can I find the code
Please, hope to get your reply
REPLY 
Yes, you can get started with time series forecasting with LSTMs in this post:
I have more advanced posts here:

https://machinelearningmastery.com/start-here/#deep_learning_time_series
REPLY 
Picked for you: Xinyang May 13, 2019 at 5:55 pm #
Thanks
How to Reshape for Data
Input your for
quick reply.
Long
And I still have a question, the multi-step LSTM model uses the last three time steps as input
and forecast the next two time steps. But in my case, I want to predict the capacity decline trend
of Lithium-ion battery, and for example let the data of declining curve of capacity(the cycling
number<160) as the training data, then I want to predict the future trend of capacity until it reach
the certain value(maybe <=0.7Ah) –failure threshold,which might be achieved at the cycling
Prediction in Keras
number of 250 or so. And between the cyclingStartnumber Machine
of 160 and 220,Learning
around 90 data need be ×
predicted. So I have no idea how to define time-steps and samples, if the output time-steps
How to Develop an Encoder-Decoder You can
defined as 60(220-160=60),the how should I define themaster appliedofMachine
time-steps input, it Learning
seems
unreasonable.
I am extremely hope to get your reply, Thank you so much
Email Address
Keras
REPLY 
Jason Brownlee May 14, 2019 at 7:41START
am # MY EMAIL COURSE
Autoencoders You can define the model with any number of inputs or outputs that you require.
If you are having trouble working with numpy arrays, perhaps this will help:
https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-
python/

REPLY 
Kishore Surendra May 15, 2019 at 8:34 pm #
Dear Prof,
I have a list as follows :
[5206, 1878, 1224, 2, 329, 89, 106, 901, 902, 149, 8]

When I’m passing it as an input to the reconstruction LSTM (with an added LSTM and repeat vector
layer and 1000 epochs) , I get the following predicted output :
[5066.752 1615.2777 1015.1887 714.63916 292.17035 250.14038

331.69427 356.30664 373.15497 365.38977 335.48383]
While some values are almost accurate, most of the others have large deviations from original.
What can be the reason for this, and how do you suggest I fix this ?
REPLY 
No model is perfect.
You can expect error in any model, and variance in neural nets over different training runs, more
here:
https://machinelearningmastery.com/faq/single-faq/why-do-i-get-different-results-each-time-i-run-the-
code
Picked for you:

Kishore Surendra May 21, 2019 at 11:57 pm #
Thanks, professor.
If I have varying numbers such as 2 and 1000 in the same list, is it better to normalize the list by
Prediction in Keras
dividing each element by the highest element , and then passing the resulting sequence as an
input to the autoencoder ?
REPLY 
Email
Yes, normalizing input is a good idea Address
in general:
Keras
https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-
performance-with-data-scaling/ START MY EMAIL COURSE
Autoencoders
REPLY 
Harsh May 15, 2019 at 11:17 pm #
Loving
Hi, the Tutorials?
I haveThe
twoLSTMs
questions, would be
with Python grateful
EBook is if you can help –
1) The above sequence is very small.
How about if the length of vector is around 800. I tried, but its taking too long.
What do you suggest.
2) Also, is it possible to encode a matrix into vector ?
thanks
REPLY 
Perhaps reduce the size of the sequence?

Perhaps try running on a faster computer?
Perhaps try an alternate approach?
REPLY 
Harsh May 16, 2019 at 6:30 pm #
thanks for you quick response… I have a confusion, right now when you mention ‘training’, it is only one
vector… how can truly train it with batches of multiple vectors.
REPLY 
Picked for you:
How toIfReshape
you are Input
new to Keras,
Data perhaps start with this tutorial:
for Long
https://machinelearningmastery.com/5-step-life-cycle-neural-network-models-keras/

REPLY 
snowbear
May 26, 2019 at 5:51 pm #
Prediction in Keras
Start
Hello Jason, I really appreciate your informative posts.Machine Learning
But I got to have two questions.
×
Question
How1.toDoes model.add(LSTM(100,
Develop an Encoder-Decoder activation='relu', input_shape=(n_in,1)))
You can master applied Machine Learning mean that
you areModel with an
creating Attention
LSTMinlayer
Keras
with 100 hidden state?without math or fancy degrees.
LSTM structure needs hidden state(h_t) and cell state(c_t) in addition to the input_t, right? So the
number 100 there means that with the data whose shape is (9,1) (timestep = 9, input_feature_number =
LSTM layer produces 100-unit long hidden stateEmail
1), theKeras (h_t)?Address
Question 2. how small did it get reduced in terms of ‘dimension reduction?’ Can you tell me how smaller
the (9, 1) data got to be reduced in the latent vector? START MY EMAIL COURSE
Autoencoders
REPLY 
100 refers to 100 units/nodes in the first hidden layer, perhaps this will help:
https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-
timesteps-and-features-for-lstm-input
You can experiment with different sized bottlenecks to see what works well/best for your specific
dataset.
REPLY 
hassam May 30, 2019 at 3:07 am #
hi jason! can this approach is used for sentence correction? i.e spelling or grammatical
mistakes of the input text.
for example I have a huge corpus of unlabelled text, and I trained it using autoencoder technique. I want
to built a model that takes input (a variable length) sentence, and output the most probable or corrected
sentence based on the training data distribution, is it possible?
REPLY 
Perhaps, I’d encourage you to review the literature first.

REPLY 
John June 11, 2019 at 9:35 am #
How do I shape the data for autoencoder if I have multiple samples
Picked for you:

Jason Brownlee June 11, 2019 at 2:22 pm #
Perhaps this will help:
Prediction in Keras
REPLY 
How to Develop
Jose an 14,
Luis July Encoder-Decoder
2019 at 7:48 am # You can master applied Machine Learning
Hi Jason, thanks for your greats articles! I have a out
Find work where
how I get
in this freeseveral hundreds
and practical of galaxy
course.
spectra (a graphic where I have a continuous number of frecuencies in the x axis and the number of
How
received to Use from
photons the TimeDistributed
each galaxy inLayer
the yinaxis; it’s something like a continuos histogram). I need to
Email Address
Keras
make an unsupervised clustering with all this spectra. Do you thing this LSTM autoencoder can be a
good option I can use? (Each spectrum has 4000 pairs frecuency-flux).
I was thinking
A Gentleabout passing
Introduction to the
LSTMfeature space of the autoencoder with a K-means algorithm or
something similar to make the clusters (or better, something like this: https://arxiv.org/abs/1511.06335).
Autoencoders
Loving theBrownlee
Jason Tutorials?
July 14, 2019 at 8:18 am #
REPLY 
The LSTMs
Perhapswith Python EBook
try it and is
evaluate the result?

REPLY 
Xing Wang Tong July 17, 2019 at 10:30 pm #
hello and thanks for your tutorial… do you have a similar tutorial with LSTM but with multiple
features?
The reason I ask for multiple feature is because I built multiple autoencoder models with different
structures but all had timesteps = 30… during training the loss, the rmse, the val_loss and the val_rmse
seem all to be within acceptable range ~ 0.05, but when I do prediction and plot the prediction with the
original data in one graph, it seems that they both are totally different.
I used MinMaxScaler so I tried to plot the original data and the predictions before I inverse the transform
and after, but still the original data and the prediction aren’t even close. So, I think I am having trouble
plotting the prediction correctly
REPLY 
Jason Brownlee July 18, 2019 at 8:27 am #
You could adapt the examples in this post:

REPLY 
Picked forsara
you:July 18, 2019 at 1:15 am #
I would
How like toInput
to Reshape thankData
youfor
forLong
the great post, though I wish you have included more sophisticated
model.Short-Term Memory Networks in Keras
For example the same thing with 2 feature rather one feature.

Prediction in Keras
Jason Brownlee July 18, 2019 at 8:30 am # Start Machine Learning ×
REPLY 
How toThanks
Developfor
anthe
Encoder-Decoder
suggestion. You can master applied Machine Learning
The examples here will be helpful:
Email Address
Keras
REPLY 
Shiva July 25, 2019 at 4:30 am # START MY EMAIL COURSE
Hi Jason
Autoencoders
Thanks for the tutorial.
I have a sequence A B C. Each A B and C are vectors with length 25.
my samples are like this: A B C label, A’ B’ C’ label’,….
Loving
How should the Tutorials?
I reshape the data?
what is the size of the input dimension?
>> SEE WHAT'S INSIDE REPLY 

Sorry, I don’t follow.
What is the problem that you are having exactly?
REPLY 
Shiva July 25, 2019 at 7:39 pm #
my dataset is an array with the shape (10,3,25).(3 features and each feature has 25
features in a vector form)
is it necessary to reshape it?
and what is the value of input_shape for this array?
REPLY 
Perhaps read this first to confirm your data is in the correct format:
Picked for you:

REPLY 
Shiva July 27, 2019 at 9:24 pm #
Thank you,
Short-Term Jason.

REPLY 
Jason
Model for Brownlee July 28, 2019 at 6:43 am #
Prediction in Keras
You’re welcome. Start Machine Learning ×
Find out how in this free and practical course. REPLY 
Felix August 12, 2019 at 4:11 am #
Hi Jason, Email Address
Keras
Thank you for the great work.
I have one doubt about the layer concept. Is the LSTM layer (100) means, a hidden layer of 100 neurons
from the first LSTM layer output and the data from all these 100 layer will consider as the final state
Autoencoders
value. Is that correct?

REPLY 
Jason Brownlee August 12, 2019 at 6:39 am #
where you'll
Yes.find the Really Good stuff.
REPLY 
Hello Jason,
Start
Thank you for the quick response and appreciate Machine
your kind toLearning
respond my doubt. Still I am
confused with the diagram provided by Keras.
https://github.com/MohammadFneish7/Keras_LSTM_Diagram
Here they have explained as the output of each layer will the “No of Y variables we are
predicting * timesteps”
My doubt is like is the output size is “Y – predicted value” or “Hidden Values”?
Thanks

Perhaps ask the authors of the diagram about it?
I have some general advice here that might help:

Picked for you:

Thank you Jason for the reply.
Prediction in Keras
Start
I have gone through your post and I am Machine
clear about the input Learning
format to the initial LSTM
×
layer.
Model withI Attention
have theinbelow
Kerasdoubt about the internal
without math of
structure orKeras.
fancy degrees.
Suppose I have a code as below.
step_size =3 Layer in
Email Address
Keras
model = Sequential()
model.add(LSTM(32, input_shape=(2, step_size), return_sequences = True))
model.add(LSTM(18))
model.add(Dense(1))
Autoencoders
model.add(Activation(‘linear’))
I am getting below summary.
Loving_________________________________________________________________
the Tutorials?
Layer (type) Output Shape Param #
The LSTMs=================================================================
lstm_1 (LSTM) (None, 2, 32) 4608
_________________________________________________________________
lstm_2 (LSTM) (None, 18) 3672
_________________________________________________________________
dense_1 (Dense) (None, 1) 19
_________________________________________________________________
activation_1 (Activation) (None, 1) 0
=================================================================
Total params: 8,299
Trainable params: 8,299
Non-trainable params: 0
_________________________________________________________________
None
And I have the below internal layer matrix data.
Layer 1
(3, 128)
(32, 128)
(128,)
Layer 2
(32, 72)
(18, 72)
Picked for you:
(72,)
Layer 3
(18, 1)
(1,)
I can not find any relation between output size and the matrix size in each layer. But in
Prediction each
in Keras
layer the parameter size specified is the total of weight matrix size. Can you please
help me to get an idea of the implementation of these numbers.
Jason Brownlee August 13, 2019 at 2:36 pm #
Email Address
Keras I believe this will help you understand the input shape to an LSTM model:
https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-
samples-timesteps-and-features-for-lstm-input
Autoencoders
REPLY 
Felix August 13, 2019 at 5:07 pm #
Loving
Hi Jason,the Tutorials?
The
Thanks forLSTMs with Python EBook is
the reply.
I have gone through the post
https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-samples-timesteps-
and-features-for-lstm-input
I can able to understand the structure of the input data into the first LSTM layer. But I am not able to
identify the matrix structure in the first Layer and the connection with Second Layer. Can you please give
me more guidelines to understand the matrix dimensions in the Layers.
Thanks
REPLY 
If the LSTM has return_states=False then the output of an LSTM layer is one value for each
node, e.g. LSTM(50) returns a vector with 50 elements.
If the LSTM has return_states=True, such as when we stack LSTM layers, then the return will be a
sequence for each node where the length of the sequence is the length of the input to the layer, e.g.
LSTM(50, input_shape=(100,1)) then the output will be (100,50) or 100 time steps for 50 nodes.
Does that help?

REPLY 
Thank you Jason for the reply.

Picked for you:
Really appreciate the time and effort to give me the answer. It helped me a lot. Thank you very
How to Reshape
much. You areInput Data the
teaching for Long
whole world. Great !!!

Jason Brownlee August 14, 2019 at 2:07 pm #
Prediction in Keras
Thanks, I’m glad it helped. Start Machine Learning ×
REPLY 
Hossein September 4, 2019 at 6:15 pm # Find out how in this free and practical course.
Howhi,toI am
Useathe
student and I wantLayer
TimeDistributed to forecast
in a time-series (electrical load) for the next 24 hr.
Email Address
I want Keras
to do it by using an autoencoder boosting with LSTM.
I am looking for a suitable topology and structure for it.Is it possible to help me?
best regards START MY EMAIL COURSE
Autoencoders
REPLY 
Loving the some

Perhaps Tutorials?
of the tutorials here will help as a first step:

REPLY 
Shreeram Bhattarai September 18, 2019 at 11:19 pm #
Hi,
I have a question regarding compositive model. In your tutorial, you have sent all data into LSTM
encoder. And decorder1 tries to reconstruct whatever it has been passed to the encoder. The another
decorder tries to predict the next sequence.
My question is that once encoder has seen all the data, does it make sense for prediction branch? Since
it has already seen all day, definitely it can predict well enough, right?
I don’t know how encoder part works? Does it works differently for two branch. Does encoder part create
a single encoded latent space from which both part does their job accordingly?
Could you please help me to figure it out. Thank you.
REPLY 
Perhaps focus on the samples aspect of the data, the model receives a sample, and predicts the
output, then the next sample is processed, and predicts an output, so on.
It just so happens when we train the model we provide all the samples in a dataset together.
Does that help?

Picked for you:

REPLY 
Short-Term Shreeram
Bhattarai September 25, 2019 at 9:52 pm #
Thanks for your reply but still not clear to me.

For examples:
we have a 10 time steps data of size 120 (N,10,120). (N is sample numbers)
Prediction in Keras
f5 = frist 5 time steps Start Machine Learning ×
l5 = last 5 time steps
whilewith
Model training :
Attention in Keras without math or fancy degrees.
1 Option() Find out how in this free and practical course.
seq_in = (N,f5, 120)
seq_out
How to Use=the
(N,l5,120)
TimeDistributed Layer in
Email Address
Keras
2 Option() START MY EMAIL COURSE

seq_in = (N,10, 120)
Autoencoders
seq_out = (N,l5,120)
Loving
Could youthe Tutorials?
please help me to understand that difference between above options? Which way is
the correct way to train a network? Thank you.
>> SEE WHAT'S INSIDE REPLY 

I don’t follow, sorry.
len(f5) == 5?
Then you’re asking the difference betweenStart Machine

(N,5,120) and Learning
(N,10,120)?
The difference is the number of time steps.
If you are new to array shapes, this will help:

Sorry for inconvenience .
I am trying to ask with you that whether we have to pass all time steps( in this case 10),
or pass first 5 time steps (in this case) to predict the next 5 steps. (I have a data of 10
time steps, my wish is to train a network with two decoder. First decorder should return
the reconstruction of input, and second decorder predict the next value).
The question is if I pass all 10 time steps to the network then it will see all the time steps
Picked for you: which means it encodes all seen data. from encoding space two decorde will try to
reconstruct and predict. It seems that both decoder looks similar then what is the
significance of using reconstruction branch decoder? How it helps to prediction decorder
in composite model?
Thank you once again.

Prediction in Keras
Yes, the
Model with Attention in Keras goal is not to train a predictive
without model,
math it is to
or fancy train an effective encoding
degrees.
of the input. Find out how in this free and practical course.
Using a prediction model as a decoder does not guarantee a better encoding, it is just
an alternate strategy to try that may beEmail
usefulAddress
on some problems.
Keras
If you want to use an LSTM for time series prediction, you can start here:
Autoencoders

Thank you very much your answers.
REPLY 
Marvi
>> SEEWaheed September 23, 2019 at 8:07 am #
WHAT'S INSIDE
Hi Jason,
I get NaN values when i apply the reconstruction autoencoder to my data (1,1000,1)
What can be the reason for it and how to resolve?
I am exploring how reshaping data works for LSTMs and have
Start tried dividing
Machine Learningmy data into batches of 5
with 200 timesteps each but wanted to check how (1,1000,1) works
REPLY 
Sorry to hear that, this might help with reshaping data:


Marvi Waheed September 23, 2019 at 5:15 pm #
Thanks for replying.
can u identify the lstm model used for reconstruction? is it 1to1 or manyto1?
Picked for you:
where can i find explicit examples for lstm models on the website?
REPLY 
YouSequence-to-Sequence
Model for can get started with LSTMs here:
https://machinelearningmastery.com/start-here/#lstm
Prediction in Keras
Including tutorials, a mini-course, and my book.
REPLY 
Sounak Ray September 29, 2019 at 2:26 am #
Email Address
Hello,
Keras
I had a question. If I am using the Masking layer in the first part of the network, then does the
RepeatVector() layer support masking. Because if it does not support masking and replicates each
timestep with the same value, then our output loss will not be computed properly. Because ideally in our
Autoencoders
mse loss for each example we do not want to include the timestep where we had zero paddings.
Could you please share how to ignore the zero padded values while calculating the mse loss function.
The LSTMs with Python EBook is REPLY 

Masking is only needed for input.
The bottle beck will have a internal representation of the input – after masking.
Masked values are skipped from input.

REPLY 
Sounak Ray September 29, 2019 at 8:34 pm #
Hello,
But if the reconstructed timesteps corresponding to the padded part is not zero, then the mean
square error loss will be vary large I suppose? Can you tell me if I am wrong here because my
mse loss is becoming “nan” after certain number of epochs. And is it best to do post padding or
pre padding?
Thanks,
Sounak Ray.
REPLY 
Correct.
The goal is not to create a great predictive model, it is to learn a great intermediate
representation.
Picked for you:
Sorry to hear that you are getting NANs, I have some suggestions here that might help:
https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-
work-for-me

James December 8, 2019 at 3:55 am #
Prediction in Keras
Hi Jason, thanks for the article. I’m struggling the same problem with Sounak
How to Develop an mask
that the Encoder-Decoder You can
actually get lost when LSTM master applied Machine
return_sequence = False Learning
(also the
Model withRepeatVector
Attention in Keras without
does not explicitly support mathbecause
masking or fancyitdegrees.
actually change the
Timestep dimension), since the mask Find out be
cannot how in this free
passed and
to the practical
end of the course.
model, the loss
will be calculated also for those padded timesteps (I’ve validated this on a simple
example), which are Layer in
not preferred. Email Address
Keras

Autoencoders
I wonder if you can do experiments to see if it makes a difference to the

bottleneck representation that is learned?

REPLY 
whereAlireza
you'll findHadj
the Really
OctoberGood stuff.
10, 2019 at 4:58 am #
Hi
>>Jason,
SEE WHAT'S INSIDE
I really enjoy your posts. Thanks for sharing your expertise. Really appreciate it!
I also have a question regarding this post. In the “Prediction Autoencoder” shouldn’t you split the time
sequence in half and try to predict the second half by feeding the first half to the encoder. They way that
you have implemented the decoder does not truly predict the sequence because the entire sequence
had been summarized and given to it by the encoder. Start
Is thatMachine Learning
true, or am I missing something here?
REPLY 
You can try that – there are many ways to frame a sequence prediction problem, but that is
not the model used in this example.
Recall, we are not developing a prediction model, instead an autoencoder.

Marvi Waheed October 21, 2019 at 5:22 pm #
Hello,
I’m working on data reconstruction where input is [0:8] columns of the dataset and required output is the
Picked for you:
9th column. However the LSTM autoencoder model returns the same value as output after 10 to 15
timesteps. I have applied the model on different datasets but facing similar issue.
What parameter
Short-Termadjustments must in
Memory Networks I do to obtain unique reconstructed values?
Keras

REPLY 
Jason
Model for Brownlee October 22, 2019 at 5:44 am #
Prediction in Keras
StartorMachine
Perhaps try using a different model architecture Learning
different training hyperparameters?
×
Find out how in this free and practical course. REPLY 
Xi Zhu October 26, 2019 at 5:38 am #
Fantastic! I hope you are getting paid for your Email Address
time here.
Keras

A Gentle Introduction to LSTM REPLY 
Autoencoders
Thanks.
Yes, some readers purchase ebooks to support me:

https://machinelearningmastery.com/products/
REPLY 
wysohn October 29,
>> SEE WHAT'S 2019 at 4:47 pm #
INSIDE
Hello,
Thank you for the amazing article!
I’ve read comments regarding the RepeatVector(), yet I’m still skeptical if I understood it correctly.
We are merely copying the last output of the encoder LSTM and feed it to each cell of the decoder LSTM
to produce the unconditional sequence. Is it correct?
Also, I’m curious that what happens to the internal state of the encoder LSTM. Is it just discarded and will
never be used for the decoder? I wonder if using the internal state of the final LSTM cell of the encoder
for the initial internal state of LSTM of the decoder would have any kind of benefit. Or is it just completely
unnecessary since all we want is to train the encoder?
Thank you for your time!
REPLY 
Correct.
The internal state from the encoder is discarded. The decoder uses state to create the
output.
The construction of each output step is conditional on the bottleneck vector and the state from
Pickedcreating the prior output step.
for you:

REPLY 
Syed November 12, 2019 at 7:55 am #
HowReally appreciate
to Develop your hard work and the tutorials are great. I have learned a lot. Can you
an Encoder-Decoder
please write a tutorial on teacher forcing method in encoder decoder architecture? That would be really
helpful.Prediction in Keras
REPLY 
Jason Brownlee November 12, 2019 at 2:01 Find
pm # out how in this free and practical course.
How toThanks!
Use the TimeDistributed Layer in
Email Address
Keras
Yes, I believe all of my tutorials for the encoder-decoder use teacher forcing.

Autoencoders REPLY 
Chrysostome November 17, 2019 at 1:27 am #
I would know what is the point to doing an autoencoder.

it seem equivalent to build one side an encoder decoder and in the other side the prediction model , as
the two ouput don’t seem being used by each other.
MaybeThe
it would
LSTMsbewith
meaningler to use
Python EBook is the decodeur as discriminant for the prediction like a GAN

REPLY 
It can be used as a feature extraction model for sequence data.
E.g. you could fit a decoder or any model and make predictions.
REPLY 
Meenal December 9, 2019 at 5:09 am #
Is there any way of building an overfitted autoencoder(is overfitting needs to be taken care while
training an autoencoder).
and how can one justify that the encoded features obtained are the best compression possible for
reconstruction of original.
Also, can you please explain the time distributed layer in terms of the input to this layer. What is the use
of time distributed layer. is this layer only useful if working with LSTM layer?
Thanks for all your posts and books, they are very useful in understanding concepts and applying them.
REPLY 
Good question!
Picked for you:
Yes. E.g. an autoencoder that does well on a training set but cannot reconstruct test data well .
More on the time distributed layer here:
https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-
python/

Prediction in Keras
Gideon Prior December 12, 2019 at 8:04 am # Start Machine Learning ×
REPLY 
How to Develop
I am an Encoder-Decoder
having trouble seeing the bottle neck. IsYou
it the 100
can unit layer
master after
applied the input?
Machine Should this
Learning
Model
normally, with Attention
without in Keras
a trivial data set for your example, bewithout math orthan
much smaller fancy degrees.
the number of time steps?

Email Address
Keras Jason Brownlee December 12, 2019 at 1:41 pm # REPLY 
Yes, the output of the first hidden layer – the encoder

START – is the
MY EMAIL the encoded representation.
COURSE
Autoencoders
REPLY 
Jenna December 16, 2019 at 12:53 am #
Loving
Hi Jason,the Tutorials?
ThankTheyouLSTMs
so much withfor writing
Python this great
EBook is post. But I have a question that really confusing me. Here it is.
As the Encoder-Decoder
where LSTM
you'll find the Really Goodcanstuff.
benefit the training for output variable length, I’m wondering if it can
support the variable multi-step output. I am trying to vary the length of output steps with the “Multiple
Parallel Input
>> SEEandWHAT'S
Multi-Step Output” example from another post
INSIDE
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/, so the
output sequence like:
[[[ 40 45 85]
[ 0 0 0]
[ 0 0 0]] Start Machine Learning
[[ 50 55 105]
[ 60 65 125]
[ 70 75 145]]
[[ 60 65 125]
[ 70 75 145]
[ 0 0 0]]
[[ 70 75 145]
[ 80 85 165]
[ 0 0 0]]
[[ 80 85 165]
[ 0 0 0]
[ 0 0 0]]]
But my prediction results turned out to be not good. Could you give me some guidance? Is the padding
value 0 not suitable? Is the Encoder-Decoder LSTM cannot support the variable length of steps?
Picked foragain.
Thanks you:

REPLY 
Yes, but you must pad the values. If you cannot use padding with 0, perhaps try -1.
Predictionyou
Alternately, in Keras
can use a dynamic LSTM and process one time step at a time. This wills how you
how:
https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-
prediction-keras/

Keras Jenna December 16, 2019 at 8:49 pm #
Thank you for suggesting me to process one time

START step at
MY EMAIL a time. I suddenly realize
COURSE
A there
GentleisIntroduction
no need totomake
LSTMthe output time steps variable since we can predict the output step by
step. Did I get it right? Besides, I think there is no rationale difference between the two Encoder-
Autoencoders
Decoder models from these two posts except for predicting different timesteps and using
different Keras function. Is this understanding correct?
Hope to hear from you. Thanks again.

REPLY 
Perhaps.
REPLY 
jackson January 10, 2020 at 7:21 pm #
I have tried your model with my input. The loss was getting convergenced before 10 epochs as
excepting. However, the loss became bigger after a point in 10th epochs.
5536/42706 [==>………………………] – ETA: 39s – loss: 0.4187
5600/42706 [==>………………………] – ETA: 39s – loss: 0.4190
5664/42706 [==>………………………] – ETA: 39s – loss: 0.4189
5728/42706 [===>……………………..] – ETA: 39s – loss: 0.4188
5792/42706 [===>……………………..] – ETA: 39s – loss: 0.4189
5856/42706 [===>……………………..] – ETA: 39s – loss: 0.4184
5920/42706 [===>……………………..] – ETA: 38s – loss: 0.4185
5984/42706 [===>……………………..] – ETA: 38s – loss: 0.4188
6048/42706 [===>……………………..] – ETA: 38s – loss: 7.7892
6112/42706 [===>……………………..] – ETA: 38s – loss: 8.6366
6176/42706 [===>……………………..] – ETA: 38s – loss: 8.5517

6240/42706 [===>……………………..] – ETA: 38s – loss: 8.4680
6304/42706 [===>……………………..] – ETA: 38s – loss: 8.3862
6368/42706 [===>……………………..] – ETA: 38s – loss: 8.3056
6432/42706 [===>……………………..] – ETA: 38s – loss: 8.2270
6496/42706
Picked [===>……………………..] – ETA: 38s – loss: 8.1499
for you:
6560/42706 [===>……………………..] – ETA: 38s – loss: 8.0738
How to[===>……………………..]
6624/42706 Reshape Input Data for Long – ETA: 38s – loss: 7.9993
6688/42706 [===>……………………..] – ETA: 38s – loss: 7.9269
6752/42706 [===>……………………..] – ETA: 38s – loss: 7.8556
6816/42706 [===>……………………..] – ETA: 38s – loss: 7.7855
How to[===>……………………..]
6880/42706 Develop an Encoder-Decoder – ETA: 37s – loss: 7.7169
6944/42706 [===>……………………..] – ETA: 37s – loss: 7.6496
Prediction in Keras
7008/42706 [===>……………………..] – ETA: 37s – loss: Start Machine Learning
7.5831 ×
7072/42706 [===>……………………..] – ETA: 37s – loss: 7.5183
How to[====>…………………….]
7136/42706 Develop an Encoder-Decoder You can
– ETA: 37s – loss: master applied Machine Learning
7.4546
Model [====>…………………….]
7200/42706 with Attention in Keras without
– ETA: 37s – loss: math or fancy degrees.
7.3912
7264/42706 [====>…………………….] – ETA: 37s – loss: Find out how in this free and practical course.
7.3297
7328/42706 [====>…………………….] – ETA: 37s – loss: 7.2693
7392/42706 [====>…………………….] – ETA: 37s – loss: 7.2094
Email Address
Keras
7456/42706 [====>…………………….] – ETA: 37s – loss: 7.1505
7520/42706 [====>…………………….] – ETA: 37s – loss: 7.0928
7584/42706 [====>…………………….] – ETA: 37s – loss: 7.0363
7648/42706 [====>…………………….] – ETA: 37s – loss: 6.9807
Autoencoders
7712/42706 [====>…………………….] – ETA: 37s – loss: 6.9260
7776/42706 [====>…………………….] – ETA: 37s – loss: 6.8724
7840/42706 [====>…………………….] – ETA: 37s – loss: 6.8196
Loving
7904/42706 the Tutorials?
[====>…………………….] – ETA: 36s – loss: 6.7676
7968/42706 [====>…………………….] – ETA: 36s – loss: 6.7163
8032/42706 [====>…………………….] – ETA: 36s – loss: 6.6655
8096/42706 [====>…………………….] – ETA: 36s – loss: 6.6160
8160/42706 [====>…………………….] – ETA: 36s – loss: 6.5667
8224/42706 [====>…………………….] – ETA: 36s – loss: 6.5184
8288/42706 [====>…………………….] – ETA: 36s – loss: 6.4707
8352/42706 [====>…………………….] – ETA: 36s – loss: 6.4239
8416/42706 [====>…………………….] – ETA: 36s – loss: 6.3782
8480/42706 [====>…………………….] – ETA: 36s – loss: 2378.7514
8544/42706 [=====>……………………] – ETA: 36s – loss: 27760.9716
8608/42706 [=====>……………………] – ETA: 36s – loss: 27755.8645
8672/42706 [=====>……………………] – ETA: 36s – loss: 27978.9607
8736/42706 [=====>……………………] – ETA: 36s – loss: 28032.9492
8800/42706 [=====>……………………] – ETA: 35s – loss: 28025.2542
8864/42706 [=====>……………………] – ETA: 35s – loss: 27902.1603
8928/42706 [=====>……………………] – ETA: 35s – loss: 27837.8133
8992/42706 [=====>……………………] – ETA: 35s – loss: 27830.6104
9056/42706 [=====>……………………] – ETA: 35s – loss: 27731.7000
9120/42706 [=====>……………………] – ETA: 35s – loss: 27630.7813
9184/42706 [=====>……………………] – ETA: 35s – loss: 27768.5311
9248/42706 [=====>……………………] – ETA: 35s – loss: 28076.0159

REPLY 
Nice work.
PickedPerhaps
for you:
try fitting the model again to see if you get a different result?

REPLY 
sampath January 18, 2020 at 8:46 am #
Hi Jason,
I am trying to implement a LSTM autoencoder using encoder-decoder architecture. What if I want to use
Prediction in Keras
the functional API of keras and also NOT have my decoder Start getMachine
the inputs fromLearning
the previous i.e. my ×
decoder LSTM will not have any input but just the hidden and cell state initialized from encoder?
HowI want
(because to Develop an Encoder-Decoder
my encoder output to preserve all theYou can master
information applied Machine
necessary Learning
to reconstruct back the
signal with giving any inputs to the decoder). Is something like this possible in keras?

Email Address
Keras REPLY 
Yes, I believe that is the normal architectureSTART MY EMAIL

described in theCOURSE
above tutorial.
If not, perhaps I don’t understand what you’re trying to achieve.
Autoencoders
Lovingsampath
the Tutorials?
January 18, 2020 at 8:59 am #
REPLY 

I mean in a functional API(the above mentioned is a sequential api).
this is the code from one of your article:
>>
1 SEE
def WHAT'S INSIDE
define_models(n_input, n_output, n_units):
2 # define training encoder
3 encoder_inputs = Input(shape=(None, n_input))
4 encoder = LSTM(n_units, return_state=True)
5 encoder_outputs, state_h, state_c = encoder(encoder_inputs)
6 encoder_states = [state_h, state_c]
7 # define training decoder
8 decoder_inputs = Input(shape=(None, n_output))
9 decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
10 decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_
11 decoder_dense = Dense(n_output, activation='softmax')
12 decoder_outputs = decoder_dense(decoder_outputs)
13 model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
14 # define inference encoder
15 encoder_model = Model(encoder_inputs, encoder_states)
16 # define inference decoder
17 decoder_state_input_h = Input(shape=(n_units,))
18 decoder_state_input_c = Input(shape=(n_units,))
19 decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
20 decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_st
21 decoder_states = [state_h, state_c]
22 decoder_outputs = decoder_dense(decoder_outputs)
23 decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_ou
24 # return all models
25 return model, encoder_model, decoder_model
what if i want my ‘decoder_lstm’ to not have any inputs(in this code, it is give ‘decoder_inputs’ as
inputs)
REPLY 
Picked for you: Jason Brownlee January 18, 2020 at 9:02 am #
How to ReshapeAh I see.

Input Thanks.
Data for Long
Some experimentation will be required, I don’t have an example for you.

sampath January 18, 2020 at 9:04 am #
Prediction in Keras
the reason I want to use functional API is because I want to use stacked
LSTM(multiple layers) and I want the hidden_state from
You can master all layers
applied at the
Machine last time step of
Learning
Model withencoder.
Attention This
in Keras withoutAPI
is possible only with functional math or fancy degrees.
right?

Email Address
Keras Jason Brownlee January 19, 2020 at 7:03 am #
Most likely, yes. START MY EMAIL COURSE

Autoencoders
REPLY 
Rajnish Pandey February 3, 2020 at 11:53 pm #

Hey, @Jason Brownlee, I am working on textual data could you please explain this concept
regarding the text? I am calculating errors with glove pre-trained vector but my result is not up to the
mark
Thank you in advance
REPLY 
Perhaps start here: Start Machine Learning

https://machinelearningmastery.com/start-here/#nlp
REPLY 
Nattachai February 23, 2020 at 7:57 pm #
Hi Jason,
I am working on time series data.
Can I use RNN Autoencode as time series representation like SAX, PAA
Thank you
Never miss a Jason

tutorial:
Brownlee February 24, 2020 at 7:39 am # REPLY 
Perhaps try it and see?
Picked for you:

REPLY 
David March 2, 2020 at 1:20 pm #
Great article! Thanks!

Model for Sequence-to-Sequence REPLY 
Prediction in Keras
Thanks, I’m happy it helped.
REPLY 
Leung Lau March 9, 2020 at 3:40 pm #
Email Address
Keras
Hi Jason, I have a question. is last 100*1 vector you printed in the end of article the feature of
the sequence? Can this vector be later used as, for example, sequence classification or regression?
Thanks! START MY EMAIL COURSE
Autoencoders
REPLY 
In most of the examples we are reconstructing the input sequence of numeric values.
Regression,
The LSTMsbut
withnot really.
Python EBook is
The final example is the feature vector.
REPLY 
Han March 21, 2020 at 5:44 pm #
Hello, dr. Jason, thanks for this useful tutorial!

I built a convolutional Autoencoder (CAE), the result ofStart the reconstructed image from the decoder is
Machine Learning
better than the original image, and i think if a classifer took a better image it would provide a good
output..
so I want to classify the input weather it is a bag, shoes .. etc

Is it better to:
1- delete the decoder and make the encoder as a classifier? (if I did this will it be like a normal CNN?)
2- or do the same as “Composite LSTM Autoencoder in this tutorial” to my CAE
3- take the output of the decoder (better image) to a classifier
I do not know, and I am really new to AI world, your reply will be so useful to me.
Thank you.
Never miss a Jason

tutorial:
Brownlee March 22, 2020 at 6:52 am # REPLY 
You would keep the encoder and use the output of the encoder as input to a new classifier
model.
Picked for you:

Han March 22, 2020 at 2:41 pm #
so I take the output of the encoder (maybe 8*8 matrix) and make it as input to model
that takes the same size (8*8)? no need to connect both CNNs (encoder, classifier)?
Prediction in Keras
REPLY 
You can connect them if you wantwithout
or use math
the encoder
or fancyasdegrees.
a feature extractor.
Typically extracted features are a 1d vector, e.g. a bottleneck layer.

Email Address
Keras
REPLY 
Rekha March 28, 2020 at 1:05 am #
Is it possible to use autoencoders for lstm time series prediction
Autoencoders
REPLY 
Jason
Loving theBrownlee
Tutorials?
March 28, 2020 at 6:21 am #
The LSTMs
Sure.with Python
They EBook
could is features, then feed these features into another model to make
extract
predictions.
REPLY 
Rekha March 28, 2020 at 2:17 pm #
Will there be a blog on autoencoders for lstm time series prediction in

machinelearningmastery.com Start Machine Learning
REPLY 
The above tutorial is exactly this.
REPLY 
Augustus Van Dusen March 29, 2020 at 5:13 am #
Jason, I ran the Prediction LSTM Autoencoder from this post and saw the following error
message:
2020-03-28 14:01:53.115186: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697]

Iteration = 0, topological sort failed with message: The graph couldn’t be sorted in topological order.
Iteration = 1, topological sort failed with message: The graph couldn’t be sorted in topological order.
2020-03-28 14:01:53.127457: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] layout
failed:for
Picked Invalid
you: argument: The graph couldn’t be sorted in topological order.
2020-03-28 14:01:53.190262: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper
failed: How to Reshape
Invalid argument:Input
TheData
graphfor couldn’t
Long be sorted in topological order.
2020-03-28 14:01:53.194523: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502]
arithmetic_optimizer failed: Invalid argument: The graph couldn’t be sorted in topological order.
How
Iteration to topological
= 0, Develop an sort
Encoder-Decoder
failed with message: The graph couldn’t be sorted in topological order.
Prediction in Keras
Iteration = 1, topological sort failed with message: TheStart Machine
graph couldn’t Learning
be sorted in topological order. ×
However, the code ran and the answer was equivalent to your answer. Have you seen this error? If so,
do youModel
knowwith
whatAttention
it means?in Keras without math or fancy degrees.
Thanks. Find out how in this free and practical course.

Email Address
Keras
REPLY 
I have not seento these
LSTM warnings before, sorry.
Autoencoders
Perhaps try searching/posting on stackoverflow?

REPLY 
Chad Paik April 19, 2020 at 11:50 am #
whereHello
you'll Dr.Brownlee
find the Really Good stuff.
I was wondering
>> SEE why you INSIDE
WHAT'S use RepeatVector layer after LSTM to to match the time step size, but you can
obtain the same shape tensor by using repeat_sequence = True on the LSTM layer?
Thank you!

REPLY 
Jason Brownlee April 19, 2020 at 1:18 pm #
Perhaps that is a new addition?
Where is that in the Keras API?

https://keras.io/layers/recurrent/
REPLY 
Chad Paik April 25, 2020 at 8:26 am #
Hello Dr.Brownlee
To clarify what I meant, please refer to the following code snippet I ran on tensorflow2.0 with
eager execution enabled. (I wanted to post a screenshot but I couldnt replay with a picture)
inputs = np.random.random([2, 10, 1]).astype(np.float32)

x = LSTM(4, return_sequences=False)(inputs)
x = RepeatVector(10)(x)
Picked forx =you:
LSTM(8, return_sequences=True)(x)
x = TimeDistributed(Dense(5))(x)
print(f”input1:{inputs.shape}”)
print(f”output1: {x.shape}”)
x = LSTM(4, return_sequences=True)(inputs)
How
x = to Developreturn_sequences=True)(x)
LSTM(8, an Encoder-Decoder
x = TimeDistributed(Dense(5))(x)
Prediction in Keras
print(f”input2:{inputs.shape}”) Start Machine Learning ×
print(f”output2: {x.shape}”)
input1:(2, 10, 1)
output1: (2, 10, 5)
input2:(2, 10, 1)
output2:
How to Use(2,
the10, 5)
TimeDistributed Layer in
Email Address
Keras
I have compared two architectures where the first one emulates your code with repeatvector
after the first LSTM layer, and a second architecture where I used return_sequences=True and
did not use repeatvector layer.
The output shape of each networks are the same.
Autoencoders
Going back to my original question, is there a reason why you used RepeatVector layer instead
of putting return_sequences=True on the first LSTM layer?
ILoving
hope thisthe Tutorials?
clarifies. Thank you!!
REPLY 
Jason
>> SEE WHAT'S Brownlee April 25, 2020 at 1:21 pm #
INSIDE
Yes, it results in a different architecture called an encoder-decoder model:

https://machinelearningmastery.com/encoder-decoder-long-short-term-memory-networks/
Kyle May 4, 2020 at 9:08 pm #
I actually had the same question as Chad. That second article you linked to
does the same thing with ‘RepeatVector’ though. It seems like unless we’re using
‘return_sequence’ with the first LSTM layer (instead of using ‘repeatvector’), this
example only works when there’s a one-to-one pairing of single value outputs to input
sequences. For example, if multiple sequences could lead to a 0.9 value, I don’t see
how this could work since the encoder only uses the last frame of the sequence with
return_sequence=False. If the only argument for using “RepeatVector” is that we have to
do that to make it fit and not throw an error, then why not use return_sequence and not
throw away useful information that the encoder presumably would need? Seems like the
proper way to do this would be as Chad outlined above (i.e. with

return_sequences=True and without simply repeating the last output so it fits).
Picked for you: Jason Brownlee May 5, 2020 at 6:25 am #
NotData
How to Reshape Input quite.
for Long
The output of the encoder is the bottleneck – it is the internal representation of the entire
input sequence. We then condition each step of the output on this representation and
the the previous generated output step.
Prediction in Keras
Kyle May 5, 2020 at 11:49 pm #
Model with Attentioncouldn’t
in Kerasrespond in proper spot
without mathso
in thread, or sorry
fancythis
degrees.
is out of order but
looking into it some more, I think I see.Find
Is itout how in this
basically thatfree and
while practical
the output course.
of the encoder
is just one element (doesn’t return the full sequence), that value could be a very precise
number Layer
that would then in
correspond to a full sequence,
Email Address which the decoder half of it would
Keras
learn? so like two different sequences ending in 0.9 could be encoded as different floats
here, like 3.47 and 5.72 (chosen at random for illustrative purposes), for instance? I was
experimenting with this a bit on my own,START MY EMAIL
and indeed COURSE
if I use return_sequence=True,
there’s very little memory that actually gets saved in the encoding, which makes it kinda
Autoencoders
pointless. What I really want to do is encode sequences of images into small vectors,
building on to the autencoder examples here: https://blog.keras.io/building-
autoencoders-in-keras.html. This has all been very helpful, so thank you

>> SEE WHAT'S The

INSIDE
bottleneck is typically a vector, not a single number.
REPLY 
Jameson April 20, 2020 at 11:17 pm #
Hello Jason,
In this post (https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-

prediction-keras/), state vectors of the decoder network are initialized by the last layer’s states vectors of
encoder network, right (lets call it type1)? However, here you only feed the decoder network’s input using
the output of encoder network (repeating the output values, lets call it type2). Inıtial states of the decoder
network are zeroed (by automatically?), similar to the initial values of the state vectors of the encoder
network?
So, What is the difference between these encoder-decoder networks in terms of usage (e..g when to
choose type1 over type2)?
Why didn’t you do the network you explained here with type1? or the one In section 9 of your book (you
give an example similar to what you are explaining here. The basic calculator.).
Best
REPLY 
Picked for you:
The difference in the architecture is exactly as you say, architectural.
How
I find to Reshape
both Inputare
approaches Data for Long
pretty much the same in practice although the repeatvector method is
way simpler to implement. This is the approach I recommend and use in my tutorials.

Model for Sequence-to-Sequence REPLY 
Theodor Marcu April 24, 2020 at 12:50 am #
Prediction in Keras
Hi Jason – It’s unclear why both you and Francois Chollet (https://blog.keras.io/a-ten-minute-
introduction-to-sequence-to-sequence-learning-in-keras.html)
How to Develop an Encoder-Decoder fit a training
You can master appliedmodel, butLearning
Machine you don’t reuse it
Model
(and the with Attention
associated in Keras
learned weights) in the inference step. From
without math reading your
or fancy code, I don’t understand
degrees.
Find out how
how the fitted model (train variable) is used in the infenc/infdec in thissince
models, free and
thepractical course.is never
train variable
used/called again.
Email Address
Keras
START MY EMAIL COURSE REPLY 

Autoencoders
The model/weights for the encoder and decoder are used during inference. Perhaps review
the code again?

REPLY 
TheAnthony
LSTMs with The Koala
Python May 31,
EBook is 2020 at 3:40 pm #
Dear Dr Jason,
In the exercises
>> SEEwhich involve
WHAT'S the plot_model function
INSIDE
That is in order to produce a graphical model *,png file using plot_model, the python interface may throw
an error,
For Windows OS users, in order to get the graphical model via a *.png file, you will have to:
* Install GraphViz binaries
* Set the environment path for the GraphViz program, eg: path=c:\program files (x86)\graphviz 2.38\bin;
rest of path ;
*In a command window do the following pip
1 pip install graphviz --upgrade
Your plot_model function should work.
Thank you,
Anthony of Sydney

Jason Brownlee June 1, 2020 at 6:16 am #
Great tip!
Picked for you:

REPLY 
How toBaqar June
Reshape 9, 2020
Input Dataat for
3:20Long
pm #
Hi, there’s still an error with graphviz installation. An error something like this:
stdout, stderr: b” b”‘C:\\Program’ is not recognized as an internal or external command,\operable

How to or
program Develop
batch an Encoder-Decoder
file.\r\n”
This can however
Prediction be resolved using a solution provided here,
in Keras
https://github.com/conda-forge/graphviz-feedstock/issues/43
Thanks

Iraj June 1, 2020 at 8:33 am # Email Address REPLY 
Keras
Hi and thank you for great post.

My question is in composite version you have presented, it seems the forecasting is working
independent from construction layers. How can I make a change first reconstruct the input sequence
Autoencoders
then forecast layers take the extracted features and does forecasting? Can I simply first fit decoder1 then
take the output of encoder as input of decoder2 and forecast? I don’t wand to save reconstruction phase.
Thank you

where you'll find the Really Good stuff. REPLY 
Jason Brownlee June 1, 2020 at 1:40 pm #
You’re welcome.
Why reconstruct then forecast, why not use input directly to forecast?
For example, see this:

REPLY 
Iraj June 1, 2020 at 3:31 pm #
Yes, I have seen that link as well.

I thought forecasting on extracted features may be more accurate.
REPLY 
It really depends on the specifics of the model and the data. I recommend controlled experiments in
order to discover what works best for your specific dataset.
REPLY 
Picked forIraj
you:June 1, 2020 at 3:45 pm #
Lettome
How rephrase
Reshape Input my question.
Data I’m not sure whether the input of the forecasting part is extracted
for Long
features or raw inputs!
Short-Term Memory If, row inputs,
Networks in how
Kerascan I use extracted features as input of decoder2?
Thank you

Prediction in Keras
REPLY 
The input to the decoder are extracted features.

Model
You with Attention
can define in Keras
a multi-input model using the functional API
without andor
math have thedegrees.
fancy input flow to anywhere you
like: Find out how in this free and practical course.
https://machinelearningmastery.com/keras-functional-api-deep-learning/
Email Address
Keras
START MY EMAIL COURSE REPLY 

Killian June 13, 2020 at 3:27 am #
Autoencoders
Thanks for the excellent (as usual) post Jason.
“Regardless of the method chosen (reconstruction, prediction, or composite), once the autoencoder has
been fit, the decoder can be removed and the encoder can be kept as a standalone model.”
If your data are 2D images from a video, it may make more sense to use a 2D convolutional LSTM as
outlined
The[inLSTMs
this post](https://towardsdatascience.com/prototyping-an-anomaly-detection-system-for-
videos-step-by-step-using-lstm-convolutional-4e06b7dcdd29).
where you'll find the Really Good stuff. If using this method, is it possible to
extract the compressed features from the last layer of the decoder (the “bottleneck”) as you have below?
model = Model(inputs=model.inputs, outputs=model.layers[0].output)
Jason Brownlee June 13, 2020 at 6:10 am #Start Machine Learning REPLY 
Perhaps try it?
REPLY 
weiL June 24, 2020 at 12:07 am #
Hi Jason,
Is it possible to make conv1D+LSTM autoencoder? I think i saw some example in pytorch, but not sure if
there is any example in Keras?

I don’t see why not. Try experimenting.
Picked for you:

REPLY 
Ananthakrishnan
How September
to Reshape Input Data 6, 2020 at 3:05 am #
for Long
Hai sir,
I have used the same algorithm mentioned here for sequence reconstruction.
But i have a total
Model of 1 sample, 2205 time steps and 1 feature.
Prediction in Keras
Start
I am getting the reconstructed value as ‘Nan’ while using ‘relu’ Machine Learning
activation function. ×
Instead of ‘relu’, if i am using ‘tanh’, the reconstruction works fine without Nan but i am getting the same
reconstructed values which is also considered as an error.
Kindly help me to get the correct reconstruction values.

Email Address
Keras
REPLY 
Perhaps to LSTM
try scaling your data prior to modeling?
Autoencoders
REPLY 
Ananthakrishnan September 6, 2020 at 2:32 pm #
Thank you very much for your response.

REPLY 
Ananthakrishnan September 7, 2020 at 12:01 am #
Hai sir,
I have tried scaling my data by a technique called Normalization. Now i am not getting Nan errors
but the reconstructed values are same which is anStart
error.Machine
If i am feeding a total of 10 values to the
Learning
lstm auto encoder with relu activation function, reconstructions works very fine.
In my case , i need to feed the whole 2205 time steps. What can i do for getting the correct
reconstruction. Kindly waiting for your reply.
Thank you
REPLY 
2205 time steps is probably too much, perhaps try splitting your long sequence into
subsequences:
https://machinelearningmastery.com/handle-long-sequences-long-short-term-memory-recurrent-
neural-networks/
REPLY 
Picked for you: Ananthakrishnan September 28, 2020 at 10:44 pm #
How to ReshapeSir,
Input Data for Long
I split my data (2205 time steps) in to 147 different arrays. Each array now contains 15
elements. So 147*15 = 2205.
How toNow i am an
Develop able to input only one array at a time to the model (sequence reconstruction).
Encoder-Decoder
How to input this 147 arrays (each array contains 15 elements) at a time to the above
Prediction in Keras
Start
mentioned model ( Sequence reconstruction) Machine
so that Learning
i would get the 147 reconstructed ×
arrays at a time.
ModelWaiting for your
with Attention valuable reply
in Keras without math or fancy degrees.
Thank you. Find out how in this free and practical course.

Email Address
Keras

This will help you prepare data for LSTMs:
Autoencoders
https://machinelearningmastery.com/faq/single-faq/what-is-the-difference-between-
samples-timesteps-and-features-for-lstm-input
The LSTMs withBrownlee

Python EBook is REPLY 
Jason September 7, 2020 at 8:25 am #
You’re welcome.
REPLY 
Ananthakrishnan September 7, 2020 at 5:44 pm #
Thank you very much sir

REPLY 
You’re welcome.
REPLY 
Abhijeet Parida September 17, 2020 at 5:18 am #
Nice and super easy to read the article explaining LSTM autoencoder.
I have a clarification regarding autoencoders being self-supervised. The task of an autoencoder is to

learn the compressed representation. The task does not use any kind of label and so is completely
unsupervised as opposed to self-supervised.
If the task of the autoencoder were to learn to recreate the image then you could call it self-supervised
as you provide a created a label(the same image itself) where non existed.
Picked for you:
So I feel that the statement regarding autoencoder being self-supervised is not entirely correct.
REPLY 
Thanks.
Prediction in Keras
How so? Start Machine Learning ×
Abhishek Mane September 25, 2020 at 7:44 am #Find out how in this free and practical course. REPLY 
How
Hi to Use the TimeDistributed Layer in
Jason, Email Address
Keras
I’m trying something different as a part of Master’s research.
I’m working on predicting hourly traffic for individual bikeSTART MY(like

stations EMAIL COURSE
lime bike or citibike).
So I have this data which has start point and end point entry and the time. I convert this into a timeseries
Autoencoders
for each station based on hourly number of traffic at the station.
My goal with AE-LSTM is to use all the stations hourly data like an RBM or AE-LSTM where the model
predicts next hour’s traffic for all stations. (So it takes in account neighbouring station’s previous hour
data and current stations last 24 hour timesteps traffic data)
Now I tried to use the model from this tutorial but I’m stuck with an error –
“ValueError: Error when checking target: expected time_distributed_5 to have 3 dimensions, but got
array with>> SEE WHAT'S
shape INSIDE
(11221, 175)”
My input data shape is (11221, 23, 175) and my output should be something like (11221, 175).
The last LSTM layer generates the output size but due to the TimeDistributed layer I get an error.
Any thoughts you may have would be really helpful.

REPLY 
You may need to reshape your target to be [11221, 175, 1], try that and let me know how
you go.
REPLY 
Abhishek Mane October 2, 2020 at 3:21 am #
Hi Jason,
That did not work for some reason but after reading articles on TimeDistributedDense, I think
TimeDistributedDense is more important in OneToMany and ManyToMany (predicting multiple
time steps).
So I tried with just stacked LSTM layers and a final dense layer it works but I’m not sure if this
method will give me good results.
Picked for you:
I’m not sure about RepeatVector layer as to what is actually does but I did not include in the only
How to Reshape
LSTM InputArchitecture.
and Dense Data for Long
Can this architecture be called an AE-LSTM?
——————————————————————————————————————–
The old
Model AE-LSTM (with TimeDistributedDense) –
Prediction in Keras
model = Sequential() Start Machine Learning ×
model.add(LSTM(64, dropout=0.2,recurrent_dropout=0.2,input_shape=
(X0_train.shape[1],X0_train.shape[2]),activation=’relu’))
model.add(keras.layers.RepeatVector(n=X0_train.shape[1]))
model.add(LSTM(64, dropout=0.2,recurrent_dropout=0.2,activation=’relu’,
return_sequences=True)) Email Address
Keras
model.add(keras.layers.TimeDistributed(Dense(X0_train.shape[2])))
A #Gentle
Compile model to LSTM
Introduction
model.compile(loss=’mse’, optimizer=’adam’)
Autoencoders
——————————————————————————————————————–
Model: “sequential_1”
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 64) 61440
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 23, 64) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 23, 64) 33024
_________________________________________________________________
time_distributed_1 (TimeDist (None, 23, 175) 11375
=================================================================
Total params: 105,839
Trainable params: 105,839
Non-trainable params: 0
———————————————————————————————————————
and finally the error –
ValueError: Error when checking target: expected time_distributed_1 to have shape (23, 175) but
got array with shape (175, 1)

My goal here is to predict only next hour’s predictions so I think Dense layer is good
for my case.
Picked for you:

REPLY 
Jason
Short-Term Memory Brownlee
Networks in Keras
October 2, 2020 at 6:03 am #
I’m eager to help, but I don’t have the capacity to review/debug your code, see this:
How tohttps://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
Develop an Encoder-Decoder
Prediction in Keras
Thank you very much Jason.
It meant a lot that you got back to me.
What I really wantedLayer in know what exactly the TimeDistributedDense and the
was to
Email Address
Keras RepeatVector layer does?
I found out after some deep search.

Thanks a lot.
Autoencoders
Sincerely,
Abhishek

The repeat vector repeats the output of the encode n times, where n is the
number of outputs required from the decoder.
The time distributed wrapper allows you to use the same decoder for each step of the
output instead of outputting a vector directly.

REPLY 
Taemin October 2, 2020 at 4:12 pm #
Hi,
I am figuring out “prediction autoencoder LSTM.” I am wondering which part is the prediction because
the input is [1 2 3 …9] and output is [ around 2 around 3 … around 9]. I am interested in the value after 9
but this system doesn’t show the result. So it looks like just reconstruction.
So, I would appreciate it if you would let me know which part is the prediction part in this system.
REPLY 
The first input is 1 and the first output is 2, etc.

REPLY 
Taemin October 3, 2020 at 2:30 pm #
Picked for you:
Thank you for your reply.
But,toI Reshape
How may have to ask
Input again
Data more specifically.
for Long
So, in your example, your input sequence is
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and output sequence is
[0.1657285 0.28903174 0.40304852 0.5096578 0.6104322 0.70671254 0.7997272 0.8904342 ].
According to your answer, if 0.1657286 is the prediction after input 0.1, what is the prediction
after the input 0.9?
Prediction in Keras
Because the last output is 0.8904342 which isStart Machine
the prediction after 0.8,Learning
I don’t see the prediction ×
after the input 0.9.
Thank
Model you.
with Attention in Keras without math or fancy degrees.

Keras Jason Brownlee October 4, 2020 at 6:50 am #
We don’t predict an output for the input

STARTof MY
0.9,EMAIL
because we don’t know the answer.
COURSE
Autoencoders
Leave a Reply


Name (required)
Email (will not be published) (required)
Website
SUBMIT COMMENT
Welcome!
I'm Jason Brownlee PhD
and I help developers get results with machine learning.
Read more
Picked for you:


Prediction in Keras

Email Address
Keras

Autoencoders

© 2020 Machine Learning Mastery Pty. Ltd. All Rights Reserved.

Address: PO Box 206, Vermont Victoria 3133, Australia. | ACN: 626 223 336.
LinkedIn | Twitter | Facebook | Newsletter | RSS
Privacy | Disclaimer | Terms | Contact | Sitemap | Search

A Gentle Introduction To LSTM Autoencoders

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Gentle Introduction To LSTM Autoencoders

Uploaded by

Copyright:

Available Formats

10/22/2020 A Gentle Introduction to LSTM Autoencoders

Picked for you:

How to Reshape Input Data for Long