Keras - TF2 - Book

Indice
Why use Keras?.......................................................................................................................................13

Keras prioritizes developer experience...............................................................................................13
Keras has broad adoption in the industry and the research community..............................................13
Keras makes it easy to turn models into products...............................................................................14
Keras supports multiple backend engines and does not lock you into one ecosystem.......................14
Keras has strong multi-GPU support and distributed training support...............................................15
Keras development is backed by key companies in the deep learning ecosystem..............................15
You have just found Keras...................................................................................................................15
Multi-backend Keras and tf.keras:......................................................................................................15
Guiding principles...............................................................................................................................16
Getting started: 30 seconds to Keras...................................................................................................16
Installation...........................................................................................................................................17
Configuring your Keras backend........................................................................................................18
Support................................................................................................................................................18
Why this name, Keras?........................................................................................................................18
Getting started with the Keras Sequential model.....................................................................................19
Specifying the input shape..................................................................................................................19
Compilation.........................................................................................................................................20
Training...............................................................................................................................................21
Examples.............................................................................................................................................21
Multilayer Perceptron (MLP) for multi-class softmax classification:............................................22
MLP for binary classification:........................................................................................................22
VGG-like convnet:..........................................................................................................................23
Sequence classification with LSTM:..............................................................................................24
Sequence classification with 1D convolutions:..............................................................................24
Stacked LSTM for sequence classification.....................................................................................24
Same stacked LSTM model, rendered "stateful"............................................................................26
Getting started with the Keras functional API.........................................................................................26
First example: a densely-connected network......................................................................................27
All models are callable, just like layers...............................................................................................27
Multi-input and multi-output models..................................................................................................28
Shared layers.......................................................................................................................................30
The concept of layer "node"................................................................................................................31
More examples....................................................................................................................................33
Inception module............................................................................................................................33
Residual connection on a convolution layer...................................................................................33
Shared vision model.......................................................................................................................33
Visual question answering model...................................................................................................34
Video question answering model....................................................................................................35
Keras FAQ: Frequently Asked Keras Questions......................................................................................35
How should I cite Keras?................................................................................................................36
How can I run Keras on GPU?.......................................................................................................36
How can I run a Keras model on multiple GPUs?..........................................................................36
Data parallelism.........................................................................................................................37
Device parallelism.....................................................................................................................37
What does "sample", "batch", "epoch" mean?................................................................................37
How can I save a Keras model?......................................................................................................38
Saving/loading whole models (architecture + weights + optimizer state).................................38
Saving/loading only a model's architecture...............................................................................39
Saving/loading only a model's weights......................................................................................39
Handling custom layers (or other custom objects) in saved models..........................................40
Why is the training loss much higher than the testing loss?...........................................................40
How can I obtain the output of an intermediate layer?...................................................................40
How can I use Keras with datasets that don't fit in memory?.........................................................41
How can I interrupt training when the validation loss isn't decreasing anymore?.........................41
How is the validation split computed?............................................................................................42
Is the data shuffled during training?...............................................................................................42
How can I record the training / validation loss / accuracy at each epoch?.....................................42
How can I "freeze" Keras layers?...................................................................................................42
How can I use stateful RNNs?........................................................................................................43
How can I remove a layer from a Sequential model?.....................................................................44
How can I use pre-trained models in Keras?..................................................................................44
How can I use HDF5 inputs with Keras?.......................................................................................45
Where is the Keras configuration file stored?................................................................................45
How can I obtain reproducible results using Keras during development?.....................................46
How can I install HDF5 or h5py to save my models in Keras?.....................................................47
About Keras models.................................................................................................................................48
Model subclassing...............................................................................................................................49
The Sequential model API.......................................................................................................................50
Sequential model methods...................................................................................................................50
compile...........................................................................................................................................50
fit.....................................................................................................................................................51
evaluate...........................................................................................................................................54
predict.............................................................................................................................................55
train_on_batch................................................................................................................................56
test_on_batch..................................................................................................................................57
predict_on_batch.............................................................................................................................57
fit_generator....................................................................................................................................58
evaluate_generator..........................................................................................................................60
predict_generator............................................................................................................................61
get_layer.........................................................................................................................................62
Model class API.......................................................................................................................................62
Methods...............................................................................................................................................62
compile...........................................................................................................................................62
fit.....................................................................................................................................................64
evaluate...........................................................................................................................................66
predict.............................................................................................................................................67
train_on_batch................................................................................................................................68
test_on_batch..................................................................................................................................69
predict_on_batch.............................................................................................................................70
fit_generator....................................................................................................................................70
evaluate_generator..........................................................................................................................72
predict_generator............................................................................................................................73
get_layer.........................................................................................................................................74
About Keras layers...................................................................................................................................75
Dense..............................................................................................................................................75
Activation.......................................................................................................................................77
Dropout...........................................................................................................................................77
Flatten.............................................................................................................................................77
Input................................................................................................................................................78
Reshape...........................................................................................................................................79
Permute...........................................................................................................................................80
RepeatVector...................................................................................................................................80
Lambda...........................................................................................................................................81
ActivityRegularization....................................................................................................................82
Masking..........................................................................................................................................83
SpatialDropout1D...........................................................................................................................83
Conv1D...........................................................................................................................................85
Conv2D...........................................................................................................................................87
SeparableConv1D...........................................................................................................................88
SeparableConv2D...........................................................................................................................90
DepthwiseConv2D..........................................................................................................................91
Conv2DTranspose..........................................................................................................................93
Conv3D...........................................................................................................................................95
Conv3DTranspose..........................................................................................................................96
Cropping1D....................................................................................................................................98
Cropping2D....................................................................................................................................99
Cropping3D..................................................................................................................................100
UpSampling1D.............................................................................................................................101
UpSampling2D.............................................................................................................................101
UpSampling3D.............................................................................................................................102
ZeroPadding1D.............................................................................................................................103
ZeroPadding2D.............................................................................................................................103
ZeroPadding3D.............................................................................................................................104
MaxPooling1D..............................................................................................................................105
MaxPooling2D..............................................................................................................................106
MaxPooling3D..............................................................................................................................106
AveragePooling1D........................................................................................................................107
GlobalMaxPooling1D...................................................................................................................109
GlobalAveragePooling1D.............................................................................................................110
LocallyConnected1D....................................................................................................................113
LocallyConnected2D....................................................................................................................114
RNN..............................................................................................................................................116
SimpleRNN...................................................................................................................................118
GRU..............................................................................................................................................120
LSTM............................................................................................................................................121
ConvLSTM2D..............................................................................................................................123
ConvLSTM2DCell.......................................................................................................................125
SimpleRNNCell............................................................................................................................127
GRUCell.......................................................................................................................................127
LSTMCell.....................................................................................................................................128
CuDNNGRU................................................................................................................................130
CuDNNLSTM..............................................................................................................................130
Add...............................................................................................................................................132
Subtract.........................................................................................................................................133
Multiply........................................................................................................................................133
Average.........................................................................................................................................134
Maximum......................................................................................................................................134
Minimum......................................................................................................................................134
Concatenate...................................................................................................................................134
Dot................................................................................................................................................134
add................................................................................................................................................135
subtract..........................................................................................................................................135
multiply.........................................................................................................................................136
average..........................................................................................................................................136
maximum......................................................................................................................................137
minimum.......................................................................................................................................137
concatenate...................................................................................................................................137
dot.................................................................................................................................................138
LeakyReLU...................................................................................................................................138
PReLU..........................................................................................................................................138
ELU..............................................................................................................................................139
ThresholdedReLU.........................................................................................................................140
Softmax.........................................................................................................................................140
ReLU............................................................................................................................................140
BatchNormalization......................................................................................................................141
GaussianNoise..............................................................................................................................142
GaussianDropout..........................................................................................................................142
AlphaDropout...............................................................................................................................143
TimeDistributed............................................................................................................................144
Bidirectional.................................................................................................................................144
Writing your own Keras layers..............................................................................................................145
TimeseriesGenerator.....................................................................................................................146
pad_sequences..............................................................................................................................148
skipgrams......................................................................................................................................148
make_sampling_table...................................................................................................................149
Text Preprocessing........................................................................................................................150
Tokenizer......................................................................................................................................150
hashing_trick.................................................................................................................................151
one_hot.........................................................................................................................................151
text_to_word_sequence................................................................................................................152
Image Preprocessing..............................................................................................................................152
ImageDataGenerator class.................................................................................................................152
ImageDataGenerator methods...........................................................................................................156
apply_transform............................................................................................................................156
fit...................................................................................................................................................157
flow...............................................................................................................................................157
flow_from_dataframe...................................................................................................................158
flow_from_directory.....................................................................................................................160
get_random_transform..................................................................................................................162
random_transform.........................................................................................................................162
standardize....................................................................................................................................162
Usage of loss functions......................................................................................................................163
Available loss functions.....................................................................................................................163
mean_squared_error.....................................................................................................................163
mean_absolute_error.....................................................................................................................163
mean_absolute_percentage_error.................................................................................................163
mean_squared_logarithmic_error.................................................................................................163
squared_hinge...............................................................................................................................163
hinge.............................................................................................................................................164
categorical_hinge..........................................................................................................................164
logcosh..........................................................................................................................................164
huber_loss.....................................................................................................................................164
categorical_crossentropy..............................................................................................................164
sparse_categorical_crossentropy..................................................................................................164
binary_crossentropy......................................................................................................................165
kullback_leibler_divergence.........................................................................................................165
poisson..........................................................................................................................................165
cosine_proximity..........................................................................................................................165
is_categorical_crossentropy..........................................................................................................165
Usage of metrics................................................................................................................................165
Arguments................................................................................................................................166
Returns.....................................................................................................................................166
Available metrics...............................................................................................................................166
accuracy........................................................................................................................................166
binary_accuracy............................................................................................................................166
categorical_accuracy.....................................................................................................................166
sparse_categorical_accuracy.........................................................................................................166
top_k_categorical_accuracy.........................................................................................................167
sparse_top_k_categorical_accuracy.............................................................................................167
cosine_proximity..........................................................................................................................167
clone_metric.................................................................................................................................167
Returns a clone of the metric if stateful, otherwise returns it as is....................................................167
clone_metrics................................................................................................................................167
Custom metrics..................................................................................................................................167
Usage of optimizers...........................................................................................................................168
Parameters common to all Keras optimizers.....................................................................................168
SGD..............................................................................................................................................168
RMSprop......................................................................................................................................169
Adagrad.........................................................................................................................................169
Adadelta........................................................................................................................................169
Adam............................................................................................................................................170
Adamax.........................................................................................................................................170
Nadam...........................................................................................................................................171
Usage of activations..........................................................................................................................171
Available activations.........................................................................................................................172
elu.................................................................................................................................................172
softmax.........................................................................................................................................172
selu................................................................................................................................................173
softplus..........................................................................................................................................173
softsign..........................................................................................................................................173
relu................................................................................................................................................174
tanh...............................................................................................................................................174
sigmoid.........................................................................................................................................174
hard_sigmoid................................................................................................................................175
exponential....................................................................................................................................175
linear.............................................................................................................................................175
On "Advanced Activations"..............................................................................................................176
Usage of callbacks.............................................................................................................................176
Callback........................................................................................................................................176
BaseLogger...................................................................................................................................177
TerminateOnNaN..........................................................................................................................177
ProgbarLogger..............................................................................................................................177
History..........................................................................................................................................177
ModelCheckpoint.........................................................................................................................178
EarlyStopping...............................................................................................................................178
RemoteMonitor.............................................................................................................................179
LearningRateScheduler.................................................................................................................179
ReduceLROnPlateau.....................................................................................................................180
CSVLogger...................................................................................................................................180
LambdaCallback...........................................................................................................................181
TensorBoard..................................................................................................................................182
Create a callback....................................................................................................................................183
Example: recording loss history...................................................................................................183
Example: model checkpoints........................................................................................................184
Datasets..................................................................................................................................................184
CIFAR10 small image classification.................................................................................................184
Usage:...........................................................................................................................................184
CIFAR100 small image classification...............................................................................................184
Usage:...........................................................................................................................................184
IMDB Movie reviews sentiment classification.................................................................................185
Usage:...........................................................................................................................................185
Reuters newswire topics classification..............................................................................................186
Usage:...........................................................................................................................................186
MNIST database of handwritten digits..............................................................................................187
Usage:...........................................................................................................................................187
Fashion-MNIST database of fashion articles....................................................................................187
Usage:...........................................................................................................................................187
Boston housing price regression dataset...........................................................................................188
Usage:...........................................................................................................................................188
Applications...........................................................................................................................................188
Available models...............................................................................................................................188
Models for image classification with weights trained on ImageNet:...........................................188
Usage examples for image classification models..............................................................................189
Classify ImageNet classes with ResNet50...................................................................................189
Extract features with VGG16.......................................................................................................189
Extract features from an arbitrary intermediate layer with VGG19.............................................190
Fine-tune InceptionV3 on a new set of classes.............................................................................190
Build InceptionV3 over a custom input tensor.............................................................................191
Documentation for individual models....................................................................................................191
Xception............................................................................................................................................192
Arguments.....................................................................................................................................192
Returns..........................................................................................................................................193
References.....................................................................................................................................193
License..........................................................................................................................................193
VGG16..............................................................................................................................................193
Arguments.....................................................................................................................................193
Returns..........................................................................................................................................194
References.....................................................................................................................................194
License..........................................................................................................................................194
VGG19..............................................................................................................................................194
Arguments.....................................................................................................................................194
Returns..........................................................................................................................................195
References.....................................................................................................................................195
License..........................................................................................................................................195
ResNet...............................................................................................................................................195
Arguments.....................................................................................................................................196
Returns..........................................................................................................................................196
References.....................................................................................................................................196
License..........................................................................................................................................196
InceptionV3.......................................................................................................................................196
Arguments.....................................................................................................................................197
Returns..........................................................................................................................................197
References.....................................................................................................................................197
License..........................................................................................................................................197
InceptionResNetV2...........................................................................................................................197
Arguments.....................................................................................................................................198
Returns..........................................................................................................................................198
References.....................................................................................................................................198
License..........................................................................................................................................198
MobileNet..........................................................................................................................................198
Arguments.....................................................................................................................................199
Returns..........................................................................................................................................199
References.....................................................................................................................................199
License..........................................................................................................................................199
DenseNet...........................................................................................................................................200
Arguments.....................................................................................................................................200
Returns..........................................................................................................................................200
References.....................................................................................................................................201
License..........................................................................................................................................201
NASNet.............................................................................................................................................201
Arguments.....................................................................................................................................201
Returns..........................................................................................................................................202
References.....................................................................................................................................202
License..........................................................................................................................................202
MobileNetV2.....................................................................................................................................202
Arguments.....................................................................................................................................202
Returns..........................................................................................................................................203
Raises............................................................................................................................................203
References.....................................................................................................................................203
License..........................................................................................................................................203
Keras backends......................................................................................................................................203
What is a "backend"?.........................................................................................................................203
Switching from one backend to another............................................................................................204
keras.json details...............................................................................................................................205
Using the abstract Keras backend to write new code........................................................................205
Backend functions.............................................................................................................................206
backend.........................................................................................................................................206
symbolic........................................................................................................................................206
eager..............................................................................................................................................207
get_uid..........................................................................................................................................207
manual_variable_initialization.....................................................................................................207
epsilon...........................................................................................................................................208
reset_uids......................................................................................................................................208
Resets graph identifiers.....................................................................................................................208
set_epsilon....................................................................................................................................208
floatx.............................................................................................................................................208
set_floatx.......................................................................................................................................209
cast_to_floatx................................................................................................................................209
image_data_format.......................................................................................................................210
set_image_data_format.................................................................................................................210
learning_phase..............................................................................................................................210
set_learning_phase........................................................................................................................211
clear_session.................................................................................................................................211
is_sparse........................................................................................................................................211
to_dense........................................................................................................................................211
variable.........................................................................................................................................212
is_variable.....................................................................................................................................213
constant.........................................................................................................................................213
is_keras_tensor.............................................................................................................................213
is_tensor........................................................................................................................................214
placeholder....................................................................................................................................214
is_placeholder...............................................................................................................................215
shape.............................................................................................................................................215
int_shape.......................................................................................................................................215
ndim..............................................................................................................................................216
size................................................................................................................................................217
dtype.............................................................................................................................................217
eval................................................................................................................................................218
zeros..............................................................................................................................................218
ones...............................................................................................................................................219
eye.................................................................................................................................................219
zeros_like......................................................................................................................................220
ones_like.......................................................................................................................................221
identity..........................................................................................................................................221
random_uniform_variable............................................................................................................222
random_normal_variable..............................................................................................................222
count_params................................................................................................................................223
cast................................................................................................................................................223
update............................................................................................................................................224
update_add....................................................................................................................................224
update_sub....................................................................................................................................225
moving_average_update...............................................................................................................225
dot.................................................................................................................................................225
batch_dot.......................................................................................................................................226
transpose.......................................................................................................................................227
gather............................................................................................................................................228
max...............................................................................................................................................228
min................................................................................................................................................229
sum................................................................................................................................................230
prod...............................................................................................................................................230
cumsum.........................................................................................................................................231
cumprod........................................................................................................................................231
var.................................................................................................................................................231
std..................................................................................................................................................232
mean..............................................................................................................................................232
any................................................................................................................................................233
all..................................................................................................................................................233
argmax..........................................................................................................................................234
argmin...........................................................................................................................................234
square............................................................................................................................................235
abs.................................................................................................................................................235
sqrt................................................................................................................................................235
exp................................................................................................................................................235
log.................................................................................................................................................236
logsumexp.....................................................................................................................................236
round.............................................................................................................................................237
sign................................................................................................................................................237
pow...............................................................................................................................................237
clip................................................................................................................................................238
equal..............................................................................................................................................238
not_equal.......................................................................................................................................238
greater...........................................................................................................................................239
greater_equal.................................................................................................................................239
less................................................................................................................................................240
less_equal......................................................................................................................................240
maximum......................................................................................................................................240
minimum.......................................................................................................................................241
sin..................................................................................................................................................241
cos.................................................................................................................................................241
normalize_batch_in_training........................................................................................................242
batch_normalization.....................................................................................................................242
concatenate...................................................................................................................................243
reshape..........................................................................................................................................243
permute_dimensions.....................................................................................................................243
resize_images................................................................................................................................244
resize_volumes.............................................................................................................................244
repeat_elements............................................................................................................................245
repeat.............................................................................................................................................245
arange............................................................................................................................................245
tile.................................................................................................................................................246
flatten............................................................................................................................................246
batch_flatten.................................................................................................................................247
expand_dims.................................................................................................................................247
squeeze..........................................................................................................................................247
temporal_padding.........................................................................................................................248
spatial_2d_padding.......................................................................................................................248
spatial_3d_padding.......................................................................................................................248
stack..............................................................................................................................................249
one_hot.........................................................................................................................................249
reverse...........................................................................................................................................250
slice...............................................................................................................................................250
get_value.......................................................................................................................................251
batch_get_value............................................................................................................................251
set_value.......................................................................................................................................251
batch_set_value.............................................................................................................................252
print_tensor...................................................................................................................................252
function.........................................................................................................................................252
gradients........................................................................................................................................252
stop_gradient.................................................................................................................................253
rnn.................................................................................................................................................253
switch............................................................................................................................................254
in_train_phase...............................................................................................................................254
in_test_phase.................................................................................................................................255
relu................................................................................................................................................255
elu.................................................................................................................................................256
softmax.........................................................................................................................................256
softplus..........................................................................................................................................257
softsign..........................................................................................................................................257
categorical_crossentropy..............................................................................................................258
sparse_categorical_crossentropy..................................................................................................258
binary_crossentropy......................................................................................................................259
sigmoid.........................................................................................................................................259
hard_sigmoid................................................................................................................................259
tanh...............................................................................................................................................260
dropout..........................................................................................................................................260
l2_normalize.................................................................................................................................261
in_top_k........................................................................................................................................261
conv1d..........................................................................................................................................261
conv2d..........................................................................................................................................262
conv2d_transpose.........................................................................................................................262
separable_conv1d.........................................................................................................................263
separable_conv2d.........................................................................................................................263
depthwise_conv2d........................................................................................................................264
conv3d..........................................................................................................................................264
conv3d_transpose.........................................................................................................................265
pool2d...........................................................................................................................................266
pool3d...........................................................................................................................................266
local_conv1d.................................................................................................................................267
local_conv2d.................................................................................................................................267
bias_add........................................................................................................................................268
random_normal.............................................................................................................................268
random_uniform...........................................................................................................................269
random_binomial..........................................................................................................................269
truncated_normal..........................................................................................................................269
ctc_label_dense_to_sparse............................................................................................................270
ctc_batch_cost...............................................................................................................................270
ctc_decode....................................................................................................................................271
control_dependencies...................................................................................................................271
map_fn..........................................................................................................................................272
foldl...............................................................................................................................................272
foldr..............................................................................................................................................272
Usage of initializers...........................................................................................................................273
Available initializers..........................................................................................................................273
Initializer.......................................................................................................................................273
Zeros.............................................................................................................................................273
Ones..............................................................................................................................................273
Constant........................................................................................................................................273
RandomNormal.............................................................................................................................274
RandomUniform...........................................................................................................................274
TruncatedNormal..........................................................................................................................274
VarianceScaling............................................................................................................................275
Orthogonal....................................................................................................................................275
Identity..........................................................................................................................................275
lecun_uniform...............................................................................................................................276
glorot_normal...............................................................................................................................276
glorot_uniform..............................................................................................................................277
he_normal.....................................................................................................................................277
lecun_normal................................................................................................................................278
he_uniform....................................................................................................................................278
Using custom initializers...................................................................................................................279
Usage of regularizers.........................................................................................................................279
Example.............................................................................................................................................279
Available penalties.............................................................................................................................279
Developing new regularizers.............................................................................................................279
Usage of constraints..........................................................................................................................280
Available constraints.........................................................................................................................280
MaxNorm......................................................................................................................................280
NonNeg.........................................................................................................................................281
UnitNorm......................................................................................................................................281
MinMaxNorm...............................................................................................................................281
Model visualization...........................................................................................................................282
Training history visualization............................................................................................................282
Wrappers for the Scikit-Learn API.........................................................................................................283
Arguments.....................................................................................................................................283
CustomObjectScope.....................................................................................................................284
HDF5Matrix.................................................................................................................................284
Sequence.......................................................................................................................................285
to_categorical................................................................................................................................286
normalize......................................................................................................................................286
get_file..........................................................................................................................................287
print_summary..............................................................................................................................288
plot_model....................................................................................................................................288
multi_gpu_model..........................................................................................................................288
On Github Issues and Pull Requests......................................................................................................291
Bug reporting.....................................................................................................................................291
Requesting a Feature.........................................................................................................................291
Requests for Contributions................................................................................................................292
Pull Requests.....................................................................................................................................292
Note:.........................................................................................................................................292
Adding new examples.......................................................................................................................294
An implementation of sequence to sequence learning for performing addition....................................294
This example demonstrates how to write custom layers for Keras........................................................297
Trains two recurrent neural networks based upon a story and a question..............................................299
Notes.............................................................................................................................................300
Trains a memory network on the bAbI dataset......................................................................................303
Train a simple deep CNN on the CIFAR10 small images dataset.........................................................308
Trains a ResNet on the CIFAR10 dataset...............................................................................................310
Visualization of the filters of VGG16, via gradient ascent in input space.............................................318
This script demonstrates the use of a convolutional LSTM network.....................................................323
Deep Dreaming in Keras........................................................................................................................325
Optical character recognition.................................................................................................................328
Additional dependencies........................................................................................................................329
Trains a Bidirectional LSTM on the IMDB sentiment classification task.............................................338
This example demonstrates the use of Convolution1D for text classification.......................................339
Train a recurrent convolutional network on the IMDB sentiment classification task...........................340
This example demonstrates the use of fasttext for text classification....................................................342
Trains an LSTM model on the IMDB sentiment classification task......................................................344
Sequence to sequence example in Keras (character-level)....................................................................345
Restore a character-level sequence to sequence model from to generate predictions...........................349
How to use a stateful LSTM model, stateful vs stateless LSTM performance comparison..................352
Example script to generate text from Nietzsche's writings....................................................................356
Train an Auxiliary Classifier GAN (ACGAN) on the MNIST dataset..................................................358
Why use Keras?
There are countless deep learning frameworks available today. Why use Keras rather than any other?
Here are some of the areas in which Keras compares favorably to existing alternatives.
Keras prioritizes developer experience

• Keras is an API designed for human beings, not machines. Keras follows best practices for
reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user
actions required for common use cases, and it provides clear and actionable feedback upon user
error.
• This makes Keras easy to learn and easy to use. As a Keras user, you are more productive,
allowing you to try more ideas than your competition, faster -- which in turn helps you win
machine learning competitions.
• This ease of use does not come at the cost of reduced flexibility: because Keras integrates with
lower-level deep learning languages (in particular TensorFlow), it enables you to implement
anything you could have built in the base language. In particular, as tf.keras, the Keras API
integrates seamlessly with your TensorFlow workflows.
Keras has broad adoption in the industry and the research

community
Deep learning frameworks ranking computed by Jeff Hale, based on 11 data sources across 7 categories
With over 250,000 individual users as of mid-2018, Keras has stronger adoption in both the industry
and the research community than any other deep learning framework except TensorFlow itself (and the
Keras API is the official frontend of TensorFlow, via the tf.keras module).
You are already constantly interacting with features built with Keras -- it is in use at Netflix, Uber,
Yelp, Instacart, Zocdoc, Square, and many others. It is especially popular among startups that place
deep learning at the core of their products.
Keras is also a favorite among deep learning researchers, coming in #2 in terms of mentions in
scientific papers uploaded to the preprint server arXiv.org. Keras has also been adopted by researchers
at large scientific organizations, in particular CERN and NASA.
Keras makes it easy to turn models into products

Your Keras models can be easily deployed across a greater range of platforms than any other deep
learning framework:
• On iOS, via Apple’s CoreML (Keras support officially provided by Apple). Here's a tutorial.
• On Android, via the TensorFlow Android runtime. Example: Not Hotdog app.
• In the browser, via GPU-accelerated JavaScript runtimes such as Keras.js and WebDNN.
• On Google Cloud, via TensorFlow-Serving.
• In a Python webapp backend (such as a Flask app).
• On the JVM, via DL4J model import provided by SkyMind.
• On Raspberry Pi.
Keras supports multiple backend engines and does not lock you
into one ecosystem
Your Keras models can be developed with a range of different deep learning backends. Importantly,
any Keras model that only leverages built-in layers will be portable across all these backends: you can
train a model with one backend, and load it with another (e.g. for deployment). Available backends
include:
• The TensorFlow backend (from Google)
• The CNTK backend (from Microsoft)
• The Theano backend
Amazon also has a fork of Keras which uses MXNet as backend.
As such, your Keras model can be trained on a number of different hardware platforms beyond CPUs:
• NVIDIA GPUs
• Google TPUs, via the TensorFlow backend and Google Cloud
• OpenCL-enabled GPUs, such as those from AMD, via the PlaidML Keras backend
Keras has strong multi-GPU support and distributed training
support
• Keras has built-in support for multi-GPU data parallelism
• Horovod, from Uber, has first-class support for Keras models
• Keras models can be turned into TensorFlow Estimators and trained on clusters of GPUs on
Google Cloud
• Keras can be run on Spark via Dist-Keras (from CERN) and Elephas
Keras development is backed by key companies in the deep

learning ecosystem
Keras development is backed primarily by Google, and the Keras API comes packaged in TensorFlow
as tf.keras. Additionally, Microsoft maintains the CNTK Keras backend. Amazon AWS is
maintaining the Keras fork with MXNet support. Other contributing companies include NVIDIA, Uber,
and Apple (with CoreML).
You have just found Keras.

Keras is a high-level neural networks API, written in Python and capable of running on top of
TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being
able to go from idea to result with the least possible delay is key to doing good research.
Use Keras if you need a deep learning library that:
• Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
• Supports both convolutional networks and recurrent networks, as well as combinations of the
two.
• Runs seamlessly on CPU and GPU.
Read the documentation at Keras.io.
Keras is compatible with: Python 2.7-3.6.
Multi-backend Keras and tf.keras:

At this time, we recommend that Keras users who use multi-backend Keras with the TensorFlow
backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better
integration with TensorFlow features (eager execution, distribution support and other).
Keras 2.2.5 was the last release of Keras implementing the 2.2.* API. It was the last release to only
support TensorFlow 1 (as well as Theano and CNTK).
The current release is Keras 2.3.0, which makes significant API changes and add support for
TensorFlow 2.0. The 2.3.0 release will be the last major release of multi-backend Keras. Multi-backend
Keras is superseded by tf.keras.
Bugs present in multi-backend Keras will only be fixed until April 2020 (as part of minor releases).
For more information about the future of Keras, see the Keras meeting notes.
Guiding principles
• User friendliness. Keras is an API designed for human beings, not machines. It puts user
experience front and center. Keras follows best practices for reducing cognitive load: it offers
consistent & simple APIs, it minimizes the number of user actions required for common use
cases, and it provides clear and actionable feedback upon user error.
• Modularity. A model is understood as a sequence or a graph of standalone, fully configurable
modules that can be plugged together with as few restrictions as possible. In particular, neural
layers, cost functions, optimizers, initialization schemes, activation functions and regularization
schemes are all standalone modules that you can combine to create new models.
• Easy extensibility. New modules are simple to add (as new classes and functions), and existing
modules provide ample examples. To be able to easily create new modules allows for total
expressiveness, making Keras suitable for advanced research.
• Work with Python. No separate models configuration files in a declarative format. Models are
described in Python code, which is compact, easier to debug, and allows for ease of
extensibility.
Getting started: 30 seconds to Keras

The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the
Sequential model, a linear stack of layers. For more complex architectures, you should use the
Keras functional API, which allows to build arbitrary graphs of layers.
Here is the Sequential model:
from keras.models import Sequential
model = Sequential()
Stacking layers is as easy as .add():

from keras.layers import Dense
model.add(Dense(units=64, activation='relu', input_dim=100))

model.add(Dense(units=10, activation='softmax'))
Once your model looks good, configure its learning process with .compile():
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
If you need to, you can further configure your optimizer. A core principle of Keras is to make things
reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control
being the easy extensibility of the source code).
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True))
You can now iterate on your training data in batches:

# x_train and y_train are Numpy arrays --just like in the Scikit-Learn API.
model.fit(x_train, y_train, epochs=5, batch_size=32)
Alternatively, you can feed batches to your model manually:

model.train_on_batch(x_batch, y_batch)
Evaluate your performance in one line:

loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)
Or generate predictions on new data:

classes = model.predict(x_test, batch_size=128)
Building a question answering system, an image classification model, a Neural Turing Machine, or any
other model is just as fast. The ideas behind deep learning are simple, so why should their
implementation be painful?
For a more in-depth tutorial about Keras, you can check out:
• Getting started with the Sequential model
• Getting started with the functional API
In the examples folder of the repository, you will find more advanced models: question-answering with
memory networks, text generation with stacked LSTMs, etc.
Installation
Before installing Keras, please install one of its backend engines: TensorFlow, Theano, or CNTK. We
recommend the TensorFlow backend.
• TensorFlow installation instructions.
• Theano installation instructions.
• CNTK installation instructions.
You may also consider installing the following optional dependencies:
• cuDNN (recommended if you plan on running Keras on GPU).
• HDF5 and h5py (required if you plan on saving Keras models to disk).
• graphviz and pydot (used by visualization utilities to plot model graphs).
Then, you can install Keras itself. There are two ways to install Keras:
• Install Keras from PyPI (recommended):
Note: These installation steps assume that you are on a Linux or Mac environment. If you are on
Windows, you will need to remove sudo to run the commands below.
sudo pip install keras
If you are using a virtualenv, you may want to avoid using sudo:
pip install keras
• Alternatively: install Keras from the GitHub source:

First, clone Keras using git:
git clone https://github.com/keras-team/keras.git
Then, cd to the Keras folder and run the install command:

cd keras
sudo python setup.py install
Configuring your Keras backend

By default, Keras will use TensorFlow as its tensor manipulation library. Follow these instructions to
configure the Keras backend.
Support
You can ask questions and join the development discussion:
• On the Keras Google group.
• On the Keras Slack channel. Use this link to request an invitation to the channel.
You can also post bug reports and feature requests (only) in GitHub issues. Make sure to read our
guidelines first.
Why this name, Keras?

Keras (κέρας) means horn in Greek. It is a reference to a literary image from ancient Greek and Latin
literature, first found in the Odyssey, where dream spirits (Oneiroi, singular Oneiros) are divided
between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and
those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the
words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive).
Keras was initially developed as part of the research effort of project ONEIROS (Open-ended Neuro-
Electronic Intelligent Robot Operating System).
"Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men
look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is
made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing
a message that will not be fulfilled; those that come out through polished horn have truth
behind them, to be accomplished for men who see them." Homer, Odyssey 19. 562 ff
(Shewring translation).
Getting started with the Keras Sequential model

The Sequential model is a linear stack of layers.
You can create a Sequential model by passing a list of layer instances to the constructor:
from keras.layers import Dense, Activation
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])
You can also simply add layers via the .add() method:
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
Specifying the input shape

The model needs to know what input shape it should expect. For this reason, the first layer in a
Sequential model (and only the first, because following layers can do automatic shape inference)
needs to receive information about its input shape. There are several possible ways to do this:
• Pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or
None entries, where None indicates that any positive integer may be expected). In
input_shape, the batch dimension is not included.
• Some 2D layers, such as Dense, support the specification of their input shape via the argument
input_dim, and some 3D temporal layers support the arguments input_dim and
input_length.
• If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent
networks), you can pass a batch_size argument to a layer. If you pass both
batch_size=32 and input_shape=(6, 8) to a layer, it will then expect every batch of
inputs to have the batch shape (32, 6, 8).
As such, the following snippets are strictly equivalent:

model.add(Dense(32, input_shape=(784,)))
Compilation
Before training a model, you need to configure the learning process, which is done via the compile
method. It receives three arguments:
• An optimizer. This could be the string identifier of an existing optimizer (such as rmsprop or
adagrad), or an instance of the Optimizer class. See: optimizers.
• A loss function. This is the objective that the model will try to minimize. It can be the string
identifier of an existing loss function (such as categorical_crossentropy or mse), or it
can be an objective function. See: losses.
• A list of metrics. For any classification problem you will want to set this to
metrics=['accuracy']. A metric could be the string identifier of an existing metric or a
custom metric function. See: metrics.
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
# For a binary classification problem

loss='binary_crossentropy',
# For a mean squared error regression problem

loss='mse')
# For custom metrics

import keras.backend as K
def mean_pred(y_true, y_pred):

return K.mean(y_pred)
metrics=['accuracy', mean_pred])
Training
Keras models are trained on Numpy arrays of input data and labels. For training a model, you will
typically use the fit function. Read its documentation here.
# For a single-input model with 2 classes (binary classification):
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
# Generate dummy data

import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))
# Train the model, iterating on the data in batches of 32 samples

model.fit(data, labels, epochs=10, batch_size=32)
# For a single-input model with 10 classes (categorical classification):
model.add(Dense(10, activation='softmax'))

import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))
# Convert labels to categorical one-hot encoding

one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
# Train the model, iterating on the data in batches of 32 samples

model.fit(data, one_hot_labels, epochs=10, batch_size=32)
Examples
Here are a few examples to get you started!
In the examples folder, you will also find example models for real datasets:
• CIFAR10 small images classification: Convolutional Neural Network (CNN) with realtime data
augmentation
• IMDB movie review sentiment classification: LSTM over sequences of words
• Reuters newswires topic classification: Multilayer Perceptron (MLP)
• MNIST handwritten digits classification: MLP & CNN
• Character-level text generation with LSTM
...and more.
Multilayer Perceptron (MLP) for multi-class softmax classification:

import keras
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

import numpy as np
x_train = np.random.random((1000, 20))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(1000, 1)),
num_classes=10)
x_test = np.random.random((100, 20))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)),
num_classes=10)
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

optimizer=sgd,
model.fit(x_train, y_train,
epochs=20,
batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
MLP for binary classification:

import numpy as np
from keras.layers import Dense, Dropout

x_train = np.random.random((1000, 20))
y_train = np.random.randint(2, size=(1000, 1))
x_test = np.random.random((100, 20))
y_test = np.random.randint(2, size=(100, 1))
model.add(Dense(64, input_dim=20, activation='relu'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
epochs=20,
batch_size=128)
VGG-like convnet:
import numpy as np
import keras
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D

x_train = np.random.random((100, 100, 100, 3))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)),
num_classes=10)
x_test = np.random.random((20, 100, 100, 3))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(20, 1)),
num_classes=10)
# input: 100x100 images with 3 channels -> (100, 100, 3) tensors.
# this applies 32 convolution filters of size 3x3 each.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(100, 100, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(x_train, y_train, batch_size=32, epochs=10)
Sequence classification with LSTM:

from keras.layers import Embedding
from keras.layers import LSTM
max_features = 1024
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))

Sequence classification with 1D convolutions:

from keras.layers import Conv1D, GlobalAveragePooling1D, MaxPooling1D
seq_length = 64
model.add(Conv1D(64, 3, activation='relu', input_shape=(seq_length, 100)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(GlobalAveragePooling1D())

Stacked LSTM for sequence classification

In this model, we stack 3 LSTM layers on top of each other, making the model capable of learning
higher-level temporal representations.
The first two LSTMs return their full output sequences, but the last one only returns the last step in its
output sequence, thus dropping the temporal dimension (i.e. converting the input sequence into a single
vector).

from keras.layers import LSTM, Dense
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
# expected input data shape: (batch_size, timesteps, data_dim)

model.add(LSTM(32, return_sequences=True,
input_shape=(timesteps, data_dim))) # returns a sequence of vectors
of dimension 32
model.add(LSTM(32, return_sequences=True)) # returns a sequence of vectors of
dimension 32
model.add(LSTM(32)) # return a single vector of dimension 32
# Generate dummy training data

x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))
# Generate dummy validation data

x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))
batch_size=64, epochs=5,
validation_data=(x_val, y_val))
Same stacked LSTM model, rendered "stateful"

A stateful recurrent model is one for which the internal states (memories) obtained after processing a
batch of samples are reused as initial states for the samples of the next batch. This allows to process
longer sequences while keeping computational complexity manageable.
You can read more about stateful RNNs in the FAQ.
from keras.layers import LSTM, Dense
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
batch_size = 32
# Expected input batch shape: (batch_size, timesteps, data_dim)

# Note that we have to provide the full batch_input_shape since the network is
stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model.add(LSTM(32, return_sequences=True, stateful=True,
batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
# Generate dummy training data

x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, num_classes))
# Generate dummy validation data

x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, num_classes))
batch_size=batch_size, epochs=5, shuffle=False,
Getting started with the Keras functional API

The Keras functional API is the way to go for defining complex models, such as multi-output models,
directed acyclic graphs, or models with shared layers.
This guide assumes that you are already familiar with the Sequential model.
Let's start with something simple.
First example: a densely-connected network

The Sequential model is probably a better choice to implement such a network, but it helps to start
with something really simple.
• A layer instance is callable (on a tensor), and it returns a tensor
• Input tensor(s) and output tensor(s) can then be used to define a Model
• Such a model can be trained just like Keras Sequential models.
from keras.layers import Input, Dense
from keras.models import Model
# This returns a tensor

inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor

output_1 = Dense(64, activation='relu')(inputs)
output_2 = Dense(64, activation='relu')(output_1)
predictions = Dense(10, activation='softmax')(output_2)
# This creates a model that includes

# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.fit(data, labels) # starts training
All models are callable, just like layers

With the functional API, it is easy to reuse trained models: you can treat any model as if it were a layer,
by calling it on a tensor. Note that by calling a model you aren't just reusing the architecture of the
model, you are also reusing its weights.
x = Input(shape=(784,))
# This works, and returns the 10-way softmax we defined above.
y = model(x)
This can allow, for instance, to quickly create models that can process sequences of inputs. You could
turn an image classification model into a video classification model, in just one line.
from keras.layers import TimeDistributed
# Input tensor for sequences of 20 timesteps,

# each containing a 784-dimensional vector
input_sequences = Input(shape=(20, 784))
# This applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)
Multi-input and multi-output models

Here's a good use case for the functional API: models with multiple inputs and outputs. The functional
API makes it easy to manipulate a large number of intertwined datastreams.
Let's consider the following model. We seek to predict how many retweets and likes a news headline
will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words,
but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time
of day when the headline was posted, etc. The model will also be supervised via two loss functions.
Using the main loss function earlier in a model is a good regularization mechanism for deep models.
Here's what our model looks like:
Let's implement it with the functional API.
The main input will receive the headline, as a sequence of integers (each integer encodes a word). The
integers will be between 1 and 10,000 (a vocabulary of 10,000 words) and the sequences will be 100
words long.
from keras.layers import Input, Embedding, LSTM, Dense
import numpy as np
np.random.seed(0) # Set a random seed for reproducibility
# Headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# This embedding layer will encode the input sequence

# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# A LSTM will transform the vector sequence into a single vector,

# containing information about the entire sequence
lstm_out = LSTM(32)(x)
Here we insert the auxiliary loss, allowing the LSTM and Embedding layer to be trained smoothly even
though the main loss will be much higher in the model.
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
At this point, we feed into the model our auxiliary input data by concatenating it with the LSTM
output:
auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])
# We stack a deep densely-connected network on top

x = Dense(64, activation='relu')(x)
# And finally we add the main logistic regression layer

main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:

model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output,
auxiliary_output])
We compile the model and assign a weight of 0.2 to the auxiliary loss. To specify different
loss_weights or loss for each different output, you can use a list or a dictionary. Here we pass a
single loss as the loss argument, so the same loss will be used on all outputs.
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])
We can train the model by passing it lists of input arrays and target arrays:
headline_data = np.round(np.abs(np.random.rand(12, 100) * 100))
additional_data = np.random.randn(12, 5)
headline_labels = np.random.randn(12, 1)
additional_labels = np.random.randn(12, 1)
model.fit([headline_data, additional_data], [headline_labels, additional_labels],
epochs=50, batch_size=32)
Since our inputs and outputs are named (we passed them a "name" argument), we could also have
compiled the model via:
loss={'main_output': 'binary_crossentropy', 'aux_output':
'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# And trained it via:

model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': headline_labels, 'aux_output': additional_labels},
epochs=50, batch_size=32)
To use the model for inferencing, use

model.predict({'main_input': headline_data, 'aux_input': additional_data})
or alternatively,
pred = model.predict([headline_data, additional_data])
Shared layers
Another good use for the functional API are models that use shared layers. Let's take a look at shared
layers.
Let's consider a dataset of tweets. We want to build a model that can tell whether two tweets are from
the same person or not (this can allow us to compare users by the similarity of their tweets, for
instance).
One way to achieve this is to build a model that encodes two tweets into two vectors, concatenates the
vectors and then adds a logistic regression; this outputs a probability that the two tweets share the same
author. The model would then be trained on positive tweet pairs and negative tweet pairs.
Because the problem is symmetric, the mechanism that encodes the first tweet should be reused
(weights and all) to encode the second tweet. Here we use a shared LSTM layer to encode the tweets.
Let's build this with the functional API. We will take as input for a tweet a binary matrix of shape
(280, 256), i.e. a sequence of 280 vectors of size 256, where each dimension in the 256-
dimensional vector encodes the presence/absence of a character (out of an alphabet of 256 frequent
characters).
import keras
from keras.layers import Input, LSTM, Dense
tweet_a = Input(shape=(280, 256))

tweet_b = Input(shape=(280, 256))
To share a layer across different inputs, simply instantiate the layer once, then call it on as many inputs
as you want:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance

# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:

merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top

predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the

# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.fit([data_a, data_b], labels, epochs=10)
Let's pause to take a look at how to read the shared layer's output or output shape.
The concept of layer "node"

Whenever you are calling a layer on some input, you are creating a new tensor (the output of the layer),
and you are adding a "node" to the layer, linking the input tensor to the output tensor. When you are
calling the same layer multiple times, that layer owns multiple nodes indexed as 0, 1, 2...
In previous versions of Keras, you could obtain the output tensor of a layer instance via
layer.get_output(), or its output shape via layer.output_shape. You still can (except
get_output() has been replaced by the property output). But what if a layer is connected to
multiple inputs?
As long as a layer is only connected to one input, there is no confusion, and .output will return the
one output of the layer:
a = Input(shape=(280, 256))
lstm = LSTM(32)
encoded_a = lstm(a)
assert lstm.output == encoded_a
Not so if the layer has multiple inputs:

a = Input(shape=(280, 256))
b = Input(shape=(280, 256))
lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)
lstm.output
>> AttributeError: Layer lstm_1 has multiple inbound nodes,

hence the notion of "layer output" is ill-defined.
Use `get_output_at(node_index)` instead.
Okay then. The following works:

assert lstm.get_output_at(0) == encoded_a
assert lstm.get_output_at(1) == encoded_b
Simple enough, right?

The same is true for the properties input_shape and output_shape: as long as the layer has only
one node, or as long as all nodes have the same input/output shape, then the notion of "layer
output/input shape" is well defined, and that one shape will be returned by layer.output_shape/
layer.input_shape. But if, for instance, you apply the same Conv2D layer to an input of shape
(32, 32, 3), and then to an input of shape (64, 64, 3), the layer will have multiple
input/output shapes, and you will have to fetch them by specifying the index of the node they belong
to:
a = Input(shape=(32, 32, 3))
b = Input(shape=(64, 64, 3))
conv = Conv2D(16, (3, 3), padding='same')

conved_a = conv(a)
# Only one input so far, the following will work:

assert conv.input_shape == (None, 32, 32, 3)
conved_b = conv(b)
# now the `.input_shape` property wouldn't work, but this does:
assert conv.get_input_shape_at(0) == (None, 32, 32, 3)
assert conv.get_input_shape_at(1) == (None, 64, 64, 3)
More examples
Code examples are still the best way to get started, so here are a few more.
Inception module
For more information about the Inception architecture, see Going Deeper with Convolutions.
from keras.layers import Conv2D, MaxPooling2D, Input
input_img = Input(shape=(256, 256, 3))
tower_1 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)

tower_1 = Conv2D(64, (3, 3), padding='same', activation='relu')(tower_1)
tower_2 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)

tower_3 = MaxPooling2D((3, 3), strides=(1, 1), padding='same')(input_img)

output = keras.layers.concatenate([tower_1, tower_2, tower_3], axis=1)
Residual connection on a convolution layer

For more information about residual networks, see Deep Residual Learning for Image Recognition.
from keras.layers import Conv2D, Input
# input tensor for a 3-channel 256x256 image

x = Input(shape=(256, 256, 3))
# 3x3 conv with 3 output channels (same as input channels)
y = Conv2D(3, (3, 3), padding='same')(x)
# this returns x + y.
z = keras.layers.add([x, y])
Shared vision model

This model reuses the same image-processing module on two inputs, to classify whether two MNIST
digits are the same digit or different digits.
from keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten
# First, define the vision modules

digit_input = Input(shape=(27, 27, 1))
x = Conv2D(64, (3, 3))(digit_input)
x = Conv2D(64, (3, 3))(x)
x = MaxPooling2D((2, 2))(x)
out = Flatten()(x)
vision_model = Model(digit_input, out)
# Then define the tell-digits-apart model

digit_a = Input(shape=(27, 27, 1))
digit_b = Input(shape=(27, 27, 1))
# The vision model will be shared, weights and all
out_a = vision_model(digit_a)
out_b = vision_model(digit_b)
concatenated = keras.layers.concatenate([out_a, out_b])

out = Dense(1, activation='sigmoid')(concatenated)
classification_model = Model([digit_a, digit_b], out)
Visual question answering model

This model can select the correct one-word answer when asked a natural-language question about a
picture.
It works by encoding the question into a vector, encoding the image into a vector, concatenating the
two, and training on top a logistic regression over some vocabulary of potential answers.
from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model, Sequential
# First, let's define a vision model using a Sequential model.

# This model will encode an image into a vector.
vision_model = Sequential()
vision_model.add(Conv2D(64, (3, 3), activation='relu', padding='same',
input_shape=(224, 224, 3)))
vision_model.add(Conv2D(64, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vision_model.add(Flatten())
# Now let's get a tensor with the output of our vision model:
image_input = Input(shape=(224, 224, 3))
encoded_image = vision_model(image_input)
# Next, let's define a language model to encode the question into a vector.
# Each question will be at most 100 words long,
# and we will index words as integers from 1 to 9999.
question_input = Input(shape=(100,), dtype='int32')
embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)
(question_input)
encoded_question = LSTM(256)(embedded_question)
# Let's concatenate the question vector and the image vector:

merged = keras.layers.concatenate([encoded_question, encoded_image])
# And let's train a logistic regression over 1000 words on top:

output = Dense(1000, activation='softmax')(merged)
# This is our final model:

vqa_model = Model(inputs=[image_input, question_input], outputs=output)
# The next stage would be training this model on actual data.
Video question answering model

Now that we have trained our image QA model, we can quickly turn it into a video QA model. With
appropriate training, you will be able to show it a short video (e.g. 100-frame human action) and ask a
natural language question about the video (e.g. "what sport is the boy playing?" -> "football").
from keras.layers import TimeDistributed
video_input = Input(shape=(100, 224, 224, 3))

# This is our video encoded via the previously trained vision_model (weights are
reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input) # the output
will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence) # the output will be a vector
# This is a model-level representation of the question encoder, reusing the same

weights as before:
question_encoder = Model(inputs=question_input, outputs=encoded_question)
# Let's use it to encode the question:

video_question_input = Input(shape=(100,), dtype='int32')
encoded_video_question = question_encoder(video_question_input)
# And this is our video question answering model:

merged = keras.layers.concatenate([encoded_video, encoded_video_question])
output = Dense(1000, activation='softmax')(merged)
video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)
Keras FAQ: Frequently Asked Keras Questions

• How should I cite Keras?
• How can I run Keras on GPU?
• How can I run a Keras model on multiple GPUs?
• What does "sample", "batch", "epoch" mean?
• How can I save a Keras model?
• Why is the training loss much higher than the testing loss?
• How can I obtain the output of an intermediate layer?
• How can I use Keras with datasets that don't fit in memory?
• How can I interrupt training when the validation loss isn't decreasing anymore?
• How is the validation split computed?
• Is the data shuffled during training?
• How can I record the training / validation loss / accuracy at each epoch?
• How can I "freeze" layers?
• How can I use stateful RNNs?
• How can I remove a layer from a Sequential model?
• How can I use pre-trained models in Keras?
• How can I use HDF5 inputs with Keras?
• Where is the Keras configuration file stored?
• How can I obtain reproducible results using Keras during development?
• How can I install HDF5 or h5py to save my models in Keras?
How should I cite Keras?

Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:
@misc{chollet2015keras,
title={Keras},
author={Chollet, Fran\c{c}ois and others},
year={2015},
howpublished={\url{https://keras.io}},
}
How can I run Keras on GPU?

If you are running on the TensorFlow or CNTK backends, your code will automatically run on GPU if
any available GPU is detected.
If you are running on the Theano backend, you can use one of the following methods:
Method 1: use Theano flags.
THEANO_FLAGS=device=gpu,floatX=float32 python my_keras_script.py
The name 'gpu' might have to be changed depending on your device's identifier (e.g. gpu0, gpu1, etc).
Method 2: set up your .theanorc: Instructions
Method 3: manually set theano.config.device, theano.config.floatX at the beginning

of your code:
import theano
theano.config.device = 'gpu'
theano.config.floatX = 'float32'
How can I run a Keras model on multiple GPUs?

We recommend doing so using the TensorFlow backend. There are two ways to run a single model on
multiple GPUs: data parallelism and device parallelism.
In most cases, what you need is most likely data parallelism.
Data parallelism
Data parallelism consists in replicating the target model once on each device, and using each replica to
process a different fraction of the input data. Keras has a built-in utility,
keras.utils.multi_gpu_model, which can produce a data-parallel version of any model, and
achieves quasi-linear speedup on up to 8 GPUs.
For more information, see the documentation for multi_gpu_model. Here is a quick example:
from keras.utils import multi_gpu_model
# Replicates `model` on 8 GPUs.

# This assumes that your machine has 8 available GPUs.
parallel_model = multi_gpu_model(model, gpus=8)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
# This `fit` call will be distributed on 8 GPUs.

# Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)
Device parallelism
Device parallelism consists in running different parts of a same model on different devices. It works
best for models that have a parallel architecture, e.g. a model with two branches.
This can be achieved by using TensorFlow device scopes. Here is a quick example:
# Model where a shared LSTM is used to encode two different sequences in parallel
input_a = keras.Input(shape=(140, 256))
input_b = keras.Input(shape=(140, 256))
shared_lstm = keras.layers.LSTM(64)
# Process the first sequence on one GPU

with tf.device_scope('/gpu:0'):
encoded_a = shared_lstm(tweet_a)
# Process the next sequence on another GPU
with tf.device_scope('/gpu:1'):
encoded_b = shared_lstm(tweet_b)
# Concatenate results on CPU

with tf.device_scope('/cpu:0'):
merged_vector = keras.layers.concatenate([encoded_a, encoded_b],
axis=-1)
What does "sample", "batch", "epoch" mean?

Below are some common definitions that are necessary to know and understand to correctly utilize
Keras:
• Sample: one element of a dataset.
• Example: one image is a sample in a convolutional network
• Example: one audio file is a sample for a speech recognition model
• Batch: a set of N samples. The samples in a batch are processed independently, in parallel. If
training, a batch results in only one update to the model.
• A batch generally approximates the distribution of the input data better than a single input. The
larger the batch, the better the approximation; however, it is also true that the batch will take
longer to process and will still result in only one update. For inference (evaluate/predict), it is
recommended to pick a batch size that is as large as you can afford without going out of
memory (since larger batches will usually result in faster evaluation/prediction).
• Epoch: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to
separate training into distinct phases, which is useful for logging and periodic evaluation.
• When using validation_data or validation_split with the fit method of Keras
models, evaluation will be run at the end of every epoch.
• Within Keras, there is the ability to add callbacks specifically designed to be run at the end of an
epoch. Examples of these are learning rate changes and model checkpointing (saving).
How can I save a Keras model?

Saving/loading whole models (architecture + weights + optimizer state)
It is not recommended to use pickle or cPickle to save a Keras model.
You can use model.save(filepath) to save a Keras model into a single HDF5 file which will
contain:
• the architecture of the model, allowing to re-create the model
• the weights of the model
• the training configuration (loss, optimizer)
• the state of the optimizer, allowing to resume training exactly where you left off.
You can then use keras.models.load_model(filepath) to reinstantiate your model.
load_model will also take care of compiling the model using the saved training configuration
(unless the model was never compiled in the first place).
Example:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'

del model # deletes the existing model
# returns a compiled model

# identical to the previous one
model = load_model('my_model.h5')
Please also see How can I install HDF5 or h5py to save my models in Keras? for instructions on how to
install h5py.
Saving/loading only a model's architecture
If you only need to save the architecture of a model, and not its weights or its training configuration,
you can do:
# save as JSON
json_string = model.to_json()
# save as YAML
yaml_string = model.to_yaml()
The generated JSON / YAML files are human-readable and can be manually edited if needed.
You can then build a fresh model from this data:
# model reconstruction from JSON:
from keras.models import model_from_json
model = model_from_json(json_string)
# model reconstruction from YAML:

from keras.models import model_from_yaml
model = model_from_yaml(yaml_string)
Saving/loading only a model's weights

If you need to save the weights of a model, you can do so in HDF5 with the code below:
model.save_weights('my_model_weights.h5')
Assuming you have code for instantiating your model, you can then load the weights you saved into a
model with the same architecture:
model.load_weights('my_model_weights.h5')
If you need to load the weights into a different architecture (with some layers in common), for instance
for fine-tuning or transfer-learning, you can load them by layer name:
model.load_weights('my_model_weights.h5', by_name=True)
Example:
"""
Assuming the original model looks like this:
model.add(Dense(2, input_dim=3, name='dense_1'))
model.add(Dense(3, name='dense_2'))
...
model.save_weights(fname)
"""
# new model
model.add(Dense(2, input_dim=3, name='dense_1')) # will be loaded
model.add(Dense(10, name='new_dense')) # will not be loaded
# load weights from first model; will only affect the first layer, dense_1.
model.load_weights(fname, by_name=True)
install h5py.
Handling custom layers (or other custom objects) in saved models

If the model you want to load includes custom layers or other custom classes or functions, you can pass
them to the loading mechanism via the custom_objects argument:
from keras.models import load_model
# Assuming your model includes instance of an "AttentionLayer" class
model = load_model('my_model.h5', custom_objects={'AttentionLayer':
AttentionLayer})
Alternatively, you can use a custom object scope:

from keras.utils import CustomObjectScope
with CustomObjectScope({'AttentionLayer': AttentionLayer}):

model = load_model('my_model.h5')
Custom objects handling works the same way for load_model, model_from_json,
model_from_yaml:
model = model_from_json(json_string, custom_objects={'AttentionLayer':
AttentionLayer})
Why is the training loss much higher than the testing loss?
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and
L1/L2 weight regularization, are turned off at testing time.
Besides, the training loss is the average of the losses over each batch of training data. Because your
model is changing over time, the loss over the first batches of an epoch is generally higher than over
the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at
the end of the epoch, resulting in a lower loss.
How can I obtain the output of an intermediate layer?

One simple way is to create a new Model that will output the layers that you are interested in:
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a
certain input, for example:
from keras import backend as K
# with a Sequential model

get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]
Similarly, you could build a Theano and TensorFlow function directly.

Note that if your model has a different behavior in training and testing phase (e.g. if it uses Dropout,
BatchNormalization, etc.), you will need to pass the learning phase flag to your function:
get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[3].output])
# output in test mode = 0

layer_output = get_3rd_layer_output([x, 0])[0]
# output in train mode = 1

layer_output = get_3rd_layer_output([x, 1])[0]
How can I use Keras with datasets that don't fit in memory?
You can do batch training using model.train_on_batch(x, y) and
model.test_on_batch(x, y). See the models documentation.
Alternatively, you can write a generator that yields batches of training data and use the method
model.fit_generator(data_generator, steps_per_epoch, epochs).
You can see batch training in action in our CIFAR10 example.
How can I interrupt training when the validation loss isn't decreasing anymore?
You can use an EarlyStopping callback:
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
model.fit(x, y, validation_split=0.2, callbacks=[early_stopping])
Find out more in the callbacks documentation.

How is the validation split computed?
If you set the validation_split argument in model.fit to e.g. 0.1, then the validation data
used will be the last 10% of the data. If you set it to 0.25, it will be the last 25% of the data, etc. Note
that the data isn't shuffled before extracting the validation split, so the validation is literally just the last
x% of samples in the input you passed.
The same validation set is used for all epochs (within a same call to fit).
Is the data shuffled during training?

Yes, if the shuffle argument in model.fit is set to True (which is the default), the training data
will be randomly shuffled at each epoch.
Validation data is never shuffled.
How can I record the training / validation loss / accuracy at each epoch?
The model.fit method returns a History callback, which has a history attribute containing the
lists of successive losses and other metrics.
hist = model.fit(x, y, validation_split=0.2)
print(hist.history)
How can I "freeze" Keras layers?

To "freeze" a layer means to exclude it from training, i.e. its weights will never be updated. This is
useful in the context of fine-tuning a model, or using fixed embeddings for a text input.
You can pass a trainable argument (boolean) to a layer constructor to set a layer to be non-
trainable:
frozen_layer = Dense(32, trainable=False)
Additionally, you can set the trainable property of a layer to True or False after instantiation.
For this to take effect, you will need to call compile() on your model after modifying the
trainable property. Here's an example:
layer = Dense(32)
layer.trainable = False
y = layer(x)
frozen_model = Model(x, y)
# in the model below, the weights of `layer` will not be updated during training
frozen_model.compile(optimizer='rmsprop', loss='mse')
layer.trainable = True
trainable_model = Model(x, y)
# with this model the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model.compile(optimizer='rmsprop', loss='mse')
frozen_model.fit(data, labels) # this does NOT update the weights of `layer`

trainable_model.fit(data, labels) # this updates the weights of `layer`
How can I use stateful RNNs?

Making a RNN stateful means that the states for the samples of each batch will be reused as initial
states for the samples in the next batch.
When using stateful RNNs, it is therefore assumed that:
• all batches have the same number of samples
• If x1 and x2 are successive batches of samples, then x2[i] is the follow-up sequence to
x1[i], for every i.
To use statefulness in RNNs, you need to:

• explicitly specify the batch size you are using, by passing a batch_size argument to the first
layer in your model. E.g. batch_size=32 for a 32-samples batch of sequences of 10
timesteps with 16 features per timestep.
• set stateful=True in your RNN layer(s).
• specify shuffle=False when calling fit().
To reset the states accumulated:

• use model.reset_states() to reset the states of all layers in the model
• use layer.reset_states() to reset the states of a specific stateful RNN layer
Example:
x # this is our input data, of shape (32, 21, 16)
# we will feed it to our model in sequences of length 10
model.add(LSTM(32, input_shape=(10, 16), batch_size=32, stateful=True))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# we train the network to predict the 11th timestep given the first 10:
model.train_on_batch(x[:, :10, :], np.reshape(x[:, 10, :], (32, 16)))
# the state of the network has changed. We can feed the follow-up sequences:
model.train_on_batch(x[:, 10:20, :], np.reshape(x[:, 20, :], (32, 16)))
# let's reset the states of the LSTM layer:

model.reset_states()
# another way to do it in this case:

model.layers[0].reset_states()
Note that the methods predict, fit, train_on_batch, predict_classes, etc. will all
update the states of the stateful layers in a model. This allows you to do not only stateful training, but
also stateful prediction.
How can I remove a layer from a Sequential model?

You can remove the last added layer in a Sequential model by calling .pop():
print(len(model.layers)) # "2"
model.pop()
print(len(model.layers)) # "1"
How can I use pre-trained models in Keras?

Code and pre-trained weights are available for the following image classification models:
• Xception
• VGG16
• VGG19
• ResNet
• ResNet v2
• ResNeXt
• Inception v3
• Inception-ResNet v2
• MobileNet v1
• MobileNet v2
• DenseNet
• NASNet
They can be imported from the module keras.applications:
from keras.applications.xception import Xception
from keras.applications.vgg16 import VGG16
from keras.applications.resnet import ResNet50
from keras.applications.resnet_v2 import ResNet50V2
from keras.applications.resnext import ResNeXt50
from keras.applications.resnext import ResNeXt101
from keras.applications.inception_v3 import InceptionV3
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.applications.mobilenet import MobileNet
from keras.applications.mobilenet_v2 import MobileNetV2
from keras.applications.densenet import DenseNet121
from keras.applications.nasnet import NASNetLarge
from keras.applications.nasnet import NASNetMobile
model = VGG16(weights='imagenet', include_top=True)
For a few simple usage examples, see the documentation for the Applications module.
For a detailed example of how to use such a pre-trained model for feature extraction or for fine-tuning,
see this blog post.
The VGG16 model is also the basis for several Keras example scripts:
• Style transfer
• Feature visualization
• Deep dream
How can I use HDF5 inputs with Keras?

You can use the HDF5Matrix class from keras.utils. See the HDF5Matrix documentation for
details.
You can also directly use a HDF5 dataset:
import h5py
with h5py.File('input/file.hdf5', 'r') as f:
x_data = f['x_data']
model.predict(x_data)
install h5py.
Where is the Keras configuration file stored?

The default directory where all Keras data is stored is:
$HOME/.keras/
Note that Windows users should replace $HOME with %USERPROFILE%. In case Keras cannot create
the above directory (e.g. due to permission issues), /tmp/.keras/ is used as a backup.
The Keras configuration file is a JSON file stored at $HOME/.keras/keras.json. The default
configuration file looks like this:
{
"image_data_format": "channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
It contains the following fields:

• The image data format to be used as default by image processing layers and utilities (either
channels_last or channels_first).
• The epsilon numerical fuzz factor to be used to prevent division by zero in some operations.
• The default float data type.
• The default backend. See the backend documentation.
Likewise, cached dataset files, such as those downloaded with get_file(), are stored by default in
$HOME/.keras/datasets/.
How can I obtain reproducible results using Keras during development?

During development of a model, sometimes it is useful to be able to obtain reproducible results from
run to run in order to determine if a change in performance is due to an actual model or data
modification, or merely a result of a new random sample.
First, you need to set the PYTHONHASHSEED environment variable to 0 before the program starts (not
within the program itself). This is necessary in Python 3.2.3 onwards to have reproducible behavior for
certain hash-based operations (e.g., the item order in a set or a dict, see Python's documentation or issue
#2280 for further details). One way to set the environment variable is when starting python like this:
$ cat test_hash.py
print(hash("keras"))
$ python3 test_hash.py # non-reproducible hash (Python 3.2.3+)
-8127205062320133199
$ python3 test_hash.py # non-reproducible hash (Python 3.2.3+)
3204480642156461591
$ PYTHONHASHSEED=0 python3 test_hash.py # reproducible hash
4883664951434749476
$ PYTHONHASHSEED=0 python3 test_hash.py # reproducible hash
4883664951434749476
Moreover, when using the TensorFlow backend and running on a GPU, some operations have non-
deterministic outputs, in particular tf.reduce_sum(). This is due to the fact that GPUs run many
operations in parallel, so the order of execution is not always guaranteed. Due to the limited precision
of floats, even adding several numbers together may give slightly different results depending on the
order in which you add them. You can try to avoid the non-deterministic operations, but some may be
created automatically by TensorFlow to compute the gradients, so it is much simpler to just run the
code on the CPU. For this, you can set the CUDA_VISIBLE_DEVICES environment variable to an
empty string, for example:
$ CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python your_program.py
The below snippet of code provides an example of how to obtain reproducible results - this is geared
towards a TensorFlow backend for a Python 3 environment:
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary for starting Numpy generated random numbers

# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.

# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
# The below tf.set_random_seed() will make random number generation

# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

K.set_session(sess)
# Rest of code follows ...
How can I install HDF5 or h5py to save my models in Keras?

In order to save your Keras models as HDF5 files, e.g. via
keras.callbacks.ModelCheckpoint, Keras uses the h5py Python package. It is a dependency
of Keras and should be installed by default. On Debian-based distributions, you will have to
additionally install libhdf5:
sudo apt-get install libhdf5-serial-dev
If you are unsure if h5py is installed you can open a Python shell and load the module via
import h5py
If it imports without error it is installed, otherwise you can find detailed installation instructions here:
http://docs.h5py.org/en/latest/build.html
About Keras models

There are two main types of models available in Keras: the Sequential model, and the Model class used
with the functional API.
These models have a number of methods and attributes in common:
• model.layers is a flattened list of the layers comprising the model.
• model.inputs is the list of input tensors of the model.
• model.outputs is the list of output tensors of the model.
• model.summary() prints a summary representation of your model. For layers with multiple
outputs, multiple is displayed instead of each individual output shape due to size limitations.
Shortcut for utils.print_summary
• model.get_config() returns a dictionary containing the configuration of the model. The
model can be reinstantiated from its config via:
config = model.get_config()
model = Model.from_config(config)
# or, for Sequential:
model = Sequential.from_config(config)
• model.get_weights() returns a list of all weight tensors in the model, as Numpy arrays.
• model.set_weights(weights) sets the values of the weights of the model, from a list of
Numpy arrays. The arrays in the list should have the same shape as those returned by
get_weights().
• model.to_json() returns a representation of the model as a JSON string. Note that the
representation does not include the weights, only the architecture. You can reinstantiate the
same model (with reinitialized weights) from the JSON string via:
json_string = model.to_json()
model = model_from_json(json_string)
• model.to_yaml() returns a representation of the model as a YAML string. Note that the
representation does not include the weights, only the architecture. You can reinstantiate the
same model (with reinitialized weights) from the YAML string via:
from keras.models import model_from_yaml
yaml_string = model.to_yaml()
model = model_from_yaml(yaml_string)
• model.save_weights(filepath) saves the weights of the model as a HDF5 file.

• model.load_weights(filepath, by_name=False) loads the weights of the model
from a HDF5 file (created by save_weights). By default, the architecture is expected to be
unchanged. To load weights into a different architecture (with some layers in common), use
by_name=True to load only those layers with the same name.
Note: Please also see How can I install HDF5 or h5py to save my models in Keras? in the FAQ for
instructions on how to install h5py.
Model subclassing
In addition to these two types of models, you may create your own fully-customizable models by
subclassing the Model class and implementing your own forward pass in the call method (the
Model subclassing API was introduced in Keras 2.2.0).
Here's an example of a simple multi-layer perceptron model written as a Model subclass:

import keras
class SimpleMLP(keras.Model):
def __init__(self, use_bn=False, use_dp=False, num_classes=10):

super(SimpleMLP, self).__init__(name='mlp')
self.use_bn = use_bn
self.use_dp = use_dp
self.num_classes = num_classes
self.dense1 = keras.layers.Dense(32, activation='relu')

self.dense2 = keras.layers.Dense(num_classes, activation='softmax')
if self.use_dp:
self.dp = keras.layers.Dropout(0.5)
if self.use_bn:
self.bn = keras.layers.BatchNormalization(axis=-1)
def call(self, inputs):

x = self.dense1(inputs)
if self.use_dp:
x = self.dp(x)
if self.use_bn:
x = self.bn(x)
return self.dense2(x)
model = SimpleMLP()
model.compile(...)
model.fit(...)
Layers are defined in __init__(self, ...), and the forward pass is specified in call(self,
inputs). In call, you may specify custom losses by calling self.add_loss(loss_tensor)
(like you would in a custom layer).
In subclassed models, the model's topology is defined as Python code (rather than as a static graph of
layers). That means the model's topology cannot be inspected or serialized. As a result, the following
methods and attributes are not available for subclassed models:
• model.inputs and model.outputs.
• model.to_yaml() and model.to_json()
• model.get_config() and model.save().
Key point: use the right API for the job. The Model subclassing API can provide you with greater
flexbility for implementing complex models, but it comes at a cost (in addition to these missing
features): it is more verbose, more complex, and has more opportunities for user errors. If possible,
prefer using the functional API, which is more user-friendly.
The Sequential model API

To get started, read this guide to the Keras Sequential model.
Sequential model methods

compile
compile(optimizer, loss=None, metrics=None, loss_weights=None,
sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
Configures the model for training.

Arguments
• optimizer: String (name of optimizer) or optimizer instance. See optimizers.
• loss: String (name of objective function) or objective function or Loss instance. See losses. If
the model has multiple outputs, you can use a different loss on each output by passing a
dictionary or a list of losses. The loss value that will be minimized by the model will then be the
sum of all individual losses.
• metrics: List of metrics to be evaluated by the model during training and testing. Typically you
will use metrics=['accuracy']. To specify different metrics for different outputs of a
multi-output model, you could also pass a dictionary, such as metrics={'output_a':
'accuracy', 'output_b': ['accuracy', 'mse']}. You can also pass a list (len
= len(outputs)) of lists of metrics such as metrics=[['accuracy'], ['accuracy',
'mse']] or metrics=['accuracy', ['accuracy', 'mse']].
• loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight
the loss contributions of different model outputs. The loss value that will be minimized by the
model will then be the weighted sum of all individual losses, weighted by the loss_weights
coefficients. If a list, it is expected to have a 1:1 mapping to the model's outputs. If a dict, it is
expected to map output names (strings) to scalar coefficients.
• sample_weight_mode: If you need to do timestep-wise sample weighting (2D weights), set this
to "temporal". None defaults to sample-wise weights (1D). If the model has multiple
outputs, you can use a different sample_weight_mode on each output by passing a
dictionary or a list of modes.
• weighted_metrics: List of metrics to be evaluated and weighted by sample_weight or
class_weight during training and testing.
• target_tensors: By default, Keras will create placeholders for the model's target, which will be
fed with the target data during training. If instead you would like to use your own target tensors
(in turn, Keras will not expect external Numpy data for these targets at training time), you can
specify them via the target_tensors argument. It can be a single tensor (for a single-
output model), a list of tensors, or a dict mapping output names to target tensors.
• **kwargs: When using the Theano/CNTK backends, these arguments are passed into
K.function. When using the TensorFlow backend, these arguments are passed into
tf.Session.run.
Raises
• ValueError: In case of invalid arguments for optimizer, loss, metrics or
sample_weight_mode.
fit
fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None,
validation_split=0.0, validation_data=None, shuffle=True, class_weight=None,
sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None,
validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False)
Trains the model for a fixed number of epochs (iterations on a dataset).

Arguments
• x: Input data. It could be:
• A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs).
• A dict mapping input names to the corresponding array/tensors, if the model has named
inputs.
• A generator or keras.utils.Sequence returning (inputs, targets) or
(inputs, targets, sample weights).
• None (default) if feeding from framework-native tensors (e.g. TensorFlow data tensors).
• y: Target data. Like the input data x, it could be either Numpy array(s), framework-native
tensor(s), list of Numpy arrays (if the model has multiple outputs) or None (default) if feeding
from framework-native tensors (e.g. TensorFlow data tensors). If output layers in the model are
named, you can also pass a dictionary mapping output names to Numpy arrays. If x is a
generator, or keras.utils.Sequence instance, y should not be specified (since targets
will be obtained from x).
• batch_size: Integer or None. Number of samples per gradient update. If unspecified,
batch_size will default to 32. Do not specify the batch_size if your data is in the form
of symbolic tensors, generators, or Sequence instances (since they generate batches).
• epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire x
and y data provided. Note that in conjunction with initial_epoch, epochs is to be
understood as "final epoch". The model is not trained for a number of iterations given by
epochs, but merely until the epoch of index epochs is reached.
• verbose: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.
• callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply
during training and validation (if ). See callbacks.
• validation_split: Float between 0 and 1. Fraction of the training data to be used as validation
data. The model will set apart this fraction of the training data, will not train on it, and will
evaluate the loss and any model metrics on this data at the end of each epoch. The validation
data is selected from the last samples in the x and y data provided, before shuffling. This
argument is not supported when x is a generator or Sequence instance.
• validation_data: Data on which to evaluate the loss and any model metrics at the end of each
epoch. The model will not be trained on this data. validation_data will override
validation_split. validation_data could be: - tuple (x_val, y_val) of
Numpy arrays or tensors - tuple (x_val, y_val, val_sample_weights) of Numpy
arrays - dataset or a dataset iterator
For the first two cases, batch_size must be provided. For the last case,
validation_steps must be provided.
• shuffle: Boolean (whether to shuffle the training data before each epoch) or str (for 'batch').
'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-
sized chunks. Has no effect when steps_per_epoch is not None.
• class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value,
used for weighting the loss function (during training only). This can be useful to tell the model
to "pay more attention" to samples from an under-represented class.
• sample_weight: Optional Numpy array of weights for the training samples, used for weighting
the loss function (during training only). You can either pass a flat (1D) Numpy array with the
same length as the input samples (1:1 mapping between weights and samples), or in the case of
temporal data, you can pass a 2D array with shape (samples, sequence_length), to
apply a different weight to every timestep of every sample. In this case you should make sure to
specify sample_weight_mode="temporal" in compile(). This argument is not
supported when x generator, or Sequence instance, instead provide the sample_weights as the
third element of x.
• initial_epoch: Integer. Epoch at which to start training (useful for resuming a previous training
run).
• steps_per_epoch: Integer or None. Total number of steps (batches of samples) before declaring
one epoch finished and starting the next epoch. When training with input tensors such as
TensorFlow data tensors, the default None is equal to the number of samples in your dataset
divided by the batch size, or 1 if that cannot be determined.
• validation_steps: Only relevant if steps_per_epoch is specified. Total number of steps
(batches of samples) to validate before stopping.
• validation_steps: Only relevant if validation_data is provided and is a generator. Total
number of steps (batches of samples) to draw before stopping when performing validation at the
end of every epoch.
• validation_freq: Only relevant if validation data is provided. Integer or list/tuple/set. If an
integer, specifies how many training epochs to run before a new validation run is performed,
e.g. validation_freq=2 runs validation every 2 epochs. If a list, tuple, or set, specifies the
epochs on which to run validation, e.g. validation_freq=[1, 2, 10] runs validation at
the end of the 1st, 2nd, and 10th epochs.
• max_queue_size: Integer. Used for generator or keras.utils.Sequence input only.
Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
• workers: Integer. Used for generator or keras.utils.Sequence input only. Maximum
number of processes to spin up when using process-based threading. If unspecified, workers
will default to 1. If 0, will execute the generator on the main thread.
• use_multiprocessing: Boolean. Used for generator or keras.utils.Sequence input only.
If True, use process-based threading. If unspecified, use_multiprocessing will default
to False. Note that because this implementation relies on multiprocessing, you should not pass
non-picklable arguments to the generator as they can't be passed easily to children processes.
• **kwargs: Used for backwards compatibility.
Returns
A History object. Its History.history attribute is a record of training loss values and metrics
values at successive epochs, as well as validation loss values and validation metrics values (if
applicable).
Raises
• RuntimeError: If the model was never compiled.
• ValueError: In case of mismatch between the provided input data and what the model expects.
evaluate
evaluate(x=None, y=None, batch_size=None, verbose=1, sample_weight=None,
steps=None, callbacks=None, max_queue_size=10, workers=1,
use_multiprocessing=False)
Returns the loss value & metrics values for the model in test mode.
Computation is done in batches.
Arguments
inputs.
batch_size will default to 32. Do not specify the batch_size is your data is in the form
of symbolic tensors, generators, or keras.utils.Sequence instances (since they generate
batches).
• verbose: 0 or 1. Verbosity mode. 0 = silent, 1 = progress bar.
• sample_weight: Optional Numpy array of weights for the test samples, used for weighting the
loss function. You can either pass a flat (1D) Numpy array with the same length as the input
samples (1:1 mapping between weights and samples), or in the case of temporal data, you can
pass a 2D array with shape (samples, sequence_length), to apply a different weight
to every timestep of every sample. In this case you should make sure to specify
sample_weight_mode="temporal" in compile().
• steps: Integer or None. Total number of steps (batches of samples) before declaring the
evaluation round finished. Ignored with the default value of None.
during evaluation. See callbacks.
Raises
• ValueError: in case of invalid arguments.
Returns
Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has
multiple outputs and/or metrics). The attribute model.metrics_names will give you the display
labels for the scalar outputs.
predict
predict(x, batch_size=None, verbose=0, steps=None, callbacks=None,
max_queue_size=10, workers=1, use_multiprocessing=False)
Generates output predictions for the input samples.

Arguments
inputs.
batches).
• verbose: Verbosity mode, 0 or 1.
• steps: Total number of steps (batches of samples) before declaring the prediction round
finished. Ignored with the default value of None.
during prediction. See callbacks.
Returns
Numpy array(s) of predictions.
Raises
• ValueError: In case of mismatch between the provided input data and the model's expectations,
or in case a stateful model receives a number of samples that is not a multiple of the batch size.
train_on_batch
train_on_batch(x, y, sample_weight=None, class_weight=None, reset_metrics=True)
Runs a single gradient update on a single batch of data.

Arguments
• x: Numpy array of training data, or list of Numpy arrays if the model has multiple inputs. If all
inputs in the model are named, you can also pass a dictionary mapping input names to Numpy
arrays.
• y: Numpy array of target data, or list of Numpy arrays if the model has multiple outputs. If all
outputs in the model are named, you can also pass a dictionary mapping output names to
Numpy arrays.
• sample_weight: Optional array of the same length as x, containing weights to apply to the
model's loss for each sample. In the case of temporal data, you can pass a 2D array with shape
(samples, sequence_length), to apply a different weight to every timestep of every sample. In
this case you should make sure to specify sample_weight_mode="temporal" in compile().
• class_weight: Optional dictionary mapping class indices (integers) to a weight (float) to apply
to the model's loss for the samples from this class during training. This can be useful to tell the
model to "pay more attention" to samples from an under-represented class.
• reset_metrics: If True, the metrics returned will be only for this batch. If False, the metrics
will be statefully accumulated across batches.
Returns
Scalar training loss (if the model has a single output and no metrics) or list of scalars (if the model has
test_on_batch
test_on_batch(x, y, sample_weight=None, reset_metrics=True)
Test the model on a single batch of samples.

Arguments
• x: Numpy array of test data, or list of Numpy arrays if the model has multiple inputs. If all
arrays.
Numpy arrays.
Returns
predict_on_batch
predict_on_batch(x)
Returns predictions for a single batch of samples.

Arguments
• x: Input samples, as a Numpy array.
Returns
fit_generator
fit_generator(generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None,
validation_data=None, validation_steps=None, validation_freq=1, class_weight=None,
max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True,
initial_epoch=0)
Trains the model on data generated batch-by-batch by a Python generator (or an instance of
Sequence).
The generator is run in parallel to the model, for efficiency. For instance, this allows you to do real-time
data augmentation on images on CPU in parallel to training your model on GPU.
The use of keras.utils.Sequence guarantees the ordering and guarantees the single use of
every input per epoch when using use_multiprocessing=True.
Arguments
• generator: A generator or an instance of Sequence (keras.utils.Sequence) object in
order to avoid duplicate data when using multiprocessing. The output of the generator must be
either
• a tuple (inputs, targets)
• a tuple (inputs, targets, sample_weights).
This tuple (a single output of the generator) makes a single batch. Therefore, all arrays in this
tuple must have the same length (equal to the size of this batch). Different batches may have
different sizes. For example, the last batch of the epoch is commonly smaller than the others, if
the size of the dataset is not divisible by the batch size. The generator is expected to loop over
its data indefinitely. An epoch finishes when steps_per_epoch batches have been seen by
the model.
• steps_per_epoch: Integer. Total number of steps (batches of samples) to yield from
generator before declaring one epoch finished and starting the next epoch. It should
typically be equal to ceil(num_samples / batch_size) Optional for Sequence: if
unspecified, will use the len(generator) as a number of steps.
• epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire
data provided, as defined by steps_per_epoch. Note that in conjunction with
initial_epoch, epochs is to be understood as "final epoch". The model is not trained for
a number of iterations given by epochs, but merely until the epoch of index epochs is
reached.
during training. See callbacks.
• validation_data: This can be either
• a generator or a Sequence object for the validation data
• tuple (x_val, y_val)
• tuple (x_val, y_val, val_sample_weights)
on which to evaluate the loss and any model metrics at the end of each epoch. The model will
not be trained on this data.
• validation_steps: Only relevant if validation_data is a generator. Total number of steps
(batches of samples) to yield from validation_data generator before stopping at the end
of every epoch. It should typically be equal to the number of samples of your validation dataset
divided by the batch size. Optional for Sequence: if unspecified, will use the
len(validation_data) as a number of steps.
• validation_freq: Only relevant if validation data is provided. Integer or

collections.Container instance (e.g. list, tuple, etc.). If an integer, specifies how many
training epochs to run before a new validation run is performed, e.g. validation_freq=2
runs validation every 2 epochs. If a Container, specifies the epochs on which to run validation,
e.g. validation_freq=[1, 2, 10] runs validation at the end of the 1st, 2nd, and 10th
epochs.
• max_queue_size: Integer. Maximum size for the generator queue. If unspecified,
max_queue_size will default to 10.
• workers: Integer. Maximum number of processes to spin up when using process-based
threading. If unspecified, workers will default to 1. If 0, will execute the generator on the
main thread.
• use_multiprocessing: Boolean. If True, use process-based threading. If unspecified,
use_multiprocessing will default to False. Note that because this implementation
relies on multiprocessing, you should not pass non-picklable arguments to the generator as they
can't be passed easily to children processes.
• shuffle: Boolean. Whether to shuffle the order of the batches at the beginning of each epoch.
Only used with instances of Sequence (keras.utils.Sequence). Has no effect when
steps_per_epoch is not None.
run).
Returns
applicable).
Raises
• ValueError: In case the generator yields data in an invalid format.
Example
def generate_arrays_from_file(path):
while True:
with open(path) as f:
for line in f:
# create numpy arrays of input data
# and labels, from each line in the file
x1, x2, y = process_line(line)
yield ({'input_1': x1, 'input_2': x2}, {'output': y})
model.fit_generator(generate_arrays_from_file('/my_file.txt'),
steps_per_epoch=10000, epochs=10)
evaluate_generator
evaluate_generator(generator, steps=None, callbacks=None, max_queue_size=10,
workers=1, use_multiprocessing=False, verbose=0)
Evaluates the model on a data generator.

The generator should return the same kind of data as accepted by test_on_batch.
Arguments
• generator: Generator yielding tuples (inputs, targets) or (inputs, targets, sample_weights) or an
instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using
multiprocessing.
• steps: Total number of steps (batches of samples) to yield from generator before stopping.
Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.
• max_queue_size: maximum size for the generator queue
• workers: Integer. Maximum number of processes to spin up when using process based
main thread.
• use_multiprocessing: if True, use process based threading. Note that because this
implementation relies on multiprocessing, you should not pass non picklable arguments to the
generator as they can't be passed easily to children processes.
• verbose: verbosity mode, 0 or 1.
Returns
Raises
predict_generator
predict_generator(generator, steps=None, callbacks=None, max_queue_size=10,
Generates predictions for the input samples from a data generator.

The generator should return the same kind of data as accepted by predict_on_batch.
Arguments
• generator: Generator yielding batches of input samples or an instance of Sequence
(keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing.
• max_queue_size: Maximum size for the generator queue.
main thread.
• use_multiprocessing: If True, use process based threading. Note that because this
Returns
Raises
get_layer
get_layer(name=None, index=None)
Retrieves a layer based on either its name (unique) or index.

If name and index are both provided, index will take precedence.
Indices are based on order of horizontal graph traversal (bottom-up).

Arguments
• name: String, name of layer.
• index: Integer, index of layer.
Returns
A layer instance.
Raises
• ValueError: In case of invalid layer name or index.
Model class API

In the functional API, given some input tensor(s) and output tensor(s), you can instantiate a Model via:
from keras.layers import Input, Dense
a = Input(shape=(32,))
b = Dense(32)(a)
model = Model(inputs=a, outputs=b)
This model will include all layers required in the computation of b given a.
In the case of multi-input or multi-output models, you can use lists as well:
model = Model(inputs=[a1, a2], outputs=[b1, b2, b3])
For a detailed introduction of what Model can do, read this guide to the Keras functional API.
Methods
compile
compile(optimizer, loss=None, metrics=None, loss_weights=None,
sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
Configures the model for training.
Arguments
• optimizer: String (name of optimizer) or optimizer instance. See optimizers.
• loss: String (name of objective function) or objective function or Loss instance. See losses. If
the model has multiple outputs, you can use a different loss on each output by passing a
dictionary or a list of losses. The loss value that will be minimized by the model will then be the
sum of all individual losses.
• metrics: List of metrics to be evaluated by the model during training and testing. Typically you
will use metrics=['accuracy']. To specify different metrics for different outputs of a
multi-output model, you could also pass a dictionary, such as metrics={'output_a':
'accuracy', 'output_b': ['accuracy', 'mse']}. You can also pass a list (len
= len(outputs)) of lists of metrics such as metrics=[['accuracy'], ['accuracy',
'mse']] or metrics=['accuracy', ['accuracy', 'mse']].
• loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight
the loss contributions of different model outputs. The loss value that will be minimized by the
model will then be the weighted sum of all individual losses, weighted by the loss_weights
coefficients. If a list, it is expected to have a 1:1 mapping to the model's outputs. If a dict, it is
expected to map output names (strings) to scalar coefficients.
• sample_weight_mode: If you need to do timestep-wise sample weighting (2D weights), set this
to "temporal". None defaults to sample-wise weights (1D). If the model has multiple
outputs, you can use a different sample_weight_mode on each output by passing a
dictionary or a list of modes.
• weighted_metrics: List of metrics to be evaluated and weighted by sample_weight or
class_weight during training and testing.
• target_tensors: By default, Keras will create placeholders for the model's target, which will be
fed with the target data during training. If instead you would like to use your own target tensors
(in turn, Keras will not expect external Numpy data for these targets at training time), you can
specify them via the target_tensors argument. It can be a single tensor (for a single-
output model), a list of tensors, or a dict mapping output names to target tensors.
• **kwargs: When using the Theano/CNTK backends, these arguments are passed into
K.function. When using the TensorFlow backend, these arguments are passed into
tf.Session.run.
Raises
• ValueError: In case of invalid arguments for optimizer, loss, metrics or
sample_weight_mode.
fit
fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None,
validation_split=0.0, validation_data=None, shuffle=True, class_weight=None,
sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None,
validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False)
Trains the model for a fixed number of epochs (iterations on a dataset).

Arguments
inputs.
batch_size will default to 32. Do not specify the batch_size if your data is in the form
of symbolic tensors, generators, or Sequence instances (since they generate batches).
• epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire x
and y data provided. Note that in conjunction with initial_epoch, epochs is to be
understood as "final epoch". The model is not trained for a number of iterations given by
epochs, but merely until the epoch of index epochs is reached.
during training and validation (if ). See callbacks.
• validation_split: Float between 0 and 1. Fraction of the training data to be used as validation
data. The model will set apart this fraction of the training data, will not train on it, and will
evaluate the loss and any model metrics on this data at the end of each epoch. The validation
data is selected from the last samples in the x and y data provided, before shuffling. This
argument is not supported when x is a generator or Sequence instance.
• validation_data: Data on which to evaluate the loss and any model metrics at the end of each
epoch. The model will not be trained on this data. validation_data will override
validation_split. validation_data could be: - tuple (x_val, y_val) of
Numpy arrays or tensors - tuple (x_val, y_val, val_sample_weights) of Numpy
arrays - dataset or a dataset iterator
For the first two cases, batch_size must be provided. For the last case,
validation_steps must be provided.
• shuffle: Boolean (whether to shuffle the training data before each epoch) or str (for 'batch').
'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-
sized chunks. Has no effect when steps_per_epoch is not None.
• sample_weight: Optional Numpy array of weights for the training samples, used for weighting
the loss function (during training only). You can either pass a flat (1D) Numpy array with the
same length as the input samples (1:1 mapping between weights and samples), or in the case of
temporal data, you can pass a 2D array with shape (samples, sequence_length), to
apply a different weight to every timestep of every sample. In this case you should make sure to
specify sample_weight_mode="temporal" in compile(). This argument is not
supported when x generator, or Sequence instance, instead provide the sample_weights as the
third element of x.
run).
• steps_per_epoch: Integer or None. Total number of steps (batches of samples) before declaring
one epoch finished and starting the next epoch. When training with input tensors such as
TensorFlow data tensors, the default None is equal to the number of samples in your dataset
divided by the batch size, or 1 if that cannot be determined.
• validation_steps: Only relevant if steps_per_epoch is specified. Total number of steps
(batches of samples) to validate before stopping.
• validation_steps: Only relevant if validation_data is provided and is a generator. Total
number of steps (batches of samples) to draw before stopping when performing validation at the
end of every epoch.
• validation_freq: Only relevant if validation data is provided. Integer or list/tuple/set. If an
integer, specifies how many training epochs to run before a new validation run is performed,
e.g. validation_freq=2 runs validation every 2 epochs. If a list, tuple, or set, specifies the
epochs on which to run validation, e.g. validation_freq=[1, 2, 10] runs validation at
the end of the 1st, 2nd, and 10th epochs.
• **kwargs: Used for backwards compatibility.
Returns
applicable).
Raises
• RuntimeError: If the model was never compiled.
• ValueError: In case of mismatch between the provided input data and what the model expects.
evaluate
evaluate(x=None, y=None, batch_size=None, verbose=1, sample_weight=None,
steps=None, callbacks=None, max_queue_size=10, workers=1,
use_multiprocessing=False)
Returns the loss value & metrics values for the model in test mode.
Arguments
inputs.
batches).
• verbose: 0 or 1. Verbosity mode. 0 = silent, 1 = progress bar.
• sample_weight: Optional Numpy array of weights for the test samples, used for weighting the
loss function. You can either pass a flat (1D) Numpy array with the same length as the input
samples (1:1 mapping between weights and samples), or in the case of temporal data, you can
pass a 2D array with shape (samples, sequence_length), to apply a different weight
to every timestep of every sample. In this case you should make sure to specify
sample_weight_mode="temporal" in compile().
• steps: Integer or None. Total number of steps (batches of samples) before declaring the
evaluation round finished. Ignored with the default value of None.
during evaluation. See callbacks.
Raises
• ValueError: in case of invalid arguments.
Returns
predict
predict(x, batch_size=None, verbose=0, steps=None, callbacks=None,
max_queue_size=10, workers=1, use_multiprocessing=False)
Generates output predictions for the input samples.

Arguments
inputs.
batches).
• verbose: Verbosity mode, 0 or 1.
• steps: Total number of steps (batches of samples) before declaring the prediction round
finished. Ignored with the default value of None.
during prediction. See callbacks.
Returns
Raises
• ValueError: In case of mismatch between the provided input data and the model's expectations,
or in case a stateful model receives a number of samples that is not a multiple of the batch size.
train_on_batch
train_on_batch(x, y, sample_weight=None, class_weight=None, reset_metrics=True)
Runs a single gradient update on a single batch of data.

Arguments
• x: Numpy array of training data, or list of Numpy arrays if the model has multiple inputs. If all
arrays.
Numpy arrays.
• class_weight: Optional dictionary mapping class indices (integers) to a weight (float) to apply
to the model's loss for the samples from this class during training. This can be useful to tell the
model to "pay more attention" to samples from an under-represented class.
Returns
Scalar training loss (if the model has a single output and no metrics) or list of scalars (if the model has
test_on_batch
test_on_batch(x, y, sample_weight=None, reset_metrics=True)
Test the model on a single batch of samples.

Arguments
• x: Numpy array of test data, or list of Numpy arrays if the model has multiple inputs. If all
arrays.
Numpy arrays.
Returns
predict_on_batch
predict_on_batch(x)
Returns predictions for a single batch of samples.

Arguments
• x: Input samples, as a Numpy array.
Returns
fit_generator
fit_generator(generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None,
validation_data=None, validation_steps=None, validation_freq=1, class_weight=None,
max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True,
initial_epoch=0)
Trains the model on data generated batch-by-batch by a Python generator (or an instance of
Sequence).
The generator is run in parallel to the model, for efficiency. For instance, this allows you to do real-time
data augmentation on images on CPU in parallel to training your model on GPU.
The use of keras.utils.Sequence guarantees the ordering and guarantees the single use of
every input per epoch when using use_multiprocessing=True.
Arguments
• generator: A generator or an instance of Sequence (keras.utils.Sequence) object in
order to avoid duplicate data when using multiprocessing. The output of the generator must be
either
• a tuple (inputs, targets)
• a tuple (inputs, targets, sample_weights).
This tuple (a single output of the generator) makes a single batch. Therefore, all arrays in this
tuple must have the same length (equal to the size of this batch). Different batches may have
different sizes. For example, the last batch of the epoch is commonly smaller than the others, if
the size of the dataset is not divisible by the batch size. The generator is expected to loop over
its data indefinitely. An epoch finishes when steps_per_epoch batches have been seen by
the model.
• steps_per_epoch: Integer. Total number of steps (batches of samples) to yield from
generator before declaring one epoch finished and starting the next epoch. It should
typically be equal to ceil(num_samples / batch_size) Optional for Sequence: if
unspecified, will use the len(generator) as a number of steps.
• epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire
data provided, as defined by steps_per_epoch. Note that in conjunction with
initial_epoch, epochs is to be understood as "final epoch". The model is not trained for
a number of iterations given by epochs, but merely until the epoch of index epochs is
reached.
• validation_data: This can be either
• a generator or a Sequence object for the validation data
• tuple (x_val, y_val)
• tuple (x_val, y_val, val_sample_weights)
on which to evaluate the loss and any model metrics at the end of each epoch. The model will
not be trained on this data.
• validation_steps: Only relevant if validation_data is a generator. Total number of steps
(batches of samples) to yield from validation_data generator before stopping at the end
of every epoch. It should typically be equal to the number of samples of your validation dataset
divided by the batch size. Optional for Sequence: if unspecified, will use the
len(validation_data) as a number of steps.
• validation_freq: Only relevant if validation data is provided. Integer or

collections.Container instance (e.g. list, tuple, etc.). If an integer, specifies how many
training epochs to run before a new validation run is performed, e.g. validation_freq=2
runs validation every 2 epochs. If a Container, specifies the epochs on which to run validation,
e.g. validation_freq=[1, 2, 10] runs validation at the end of the 1st, 2nd, and 10th
epochs.
• max_queue_size: Integer. Maximum size for the generator queue. If unspecified,
max_queue_size will default to 10.
• workers: Integer. Maximum number of processes to spin up when using process-based
main thread.
• use_multiprocessing: Boolean. If True, use process-based threading. If unspecified,
use_multiprocessing will default to False. Note that because this implementation
relies on multiprocessing, you should not pass non-picklable arguments to the generator as they
can't be passed easily to children processes.
• shuffle: Boolean. Whether to shuffle the order of the batches at the beginning of each epoch.
Only used with instances of Sequence (keras.utils.Sequence). Has no effect when
steps_per_epoch is not None.
run).
Returns
applicable).
Raises
Example
def generate_arrays_from_file(path):
while True:
with open(path) as f:
for line in f:
# create numpy arrays of input data
# and labels, from each line in the file
x1, x2, y = process_line(line)
yield ({'input_1': x1, 'input_2': x2}, {'output': y})
model.fit_generator(generate_arrays_from_file('/my_file.txt'),
steps_per_epoch=10000, epochs=10)
evaluate_generator
evaluate_generator(generator, steps=None, callbacks=None, max_queue_size=10,
Evaluates the model on a data generator.
The generator should return the same kind of data as accepted by test_on_batch.
Arguments
• generator: Generator yielding tuples (inputs, targets) or (inputs, targets, sample_weights) or an
instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using
multiprocessing.
• max_queue_size: maximum size for the generator queue
main thread.
• use_multiprocessing: if True, use process based threading. Note that because this
Returns
Raises
predict_generator
predict_generator(generator, steps=None, callbacks=None, max_queue_size=10,
Generates predictions for the input samples from a data generator.

The generator should return the same kind of data as accepted by predict_on_batch.
Arguments
• generator: Generator yielding batches of input samples or an instance of Sequence
(keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing.
• max_queue_size: Maximum size for the generator queue.
main thread.
• use_multiprocessing: If True, use process based threading. Note that because this
Returns
Raises
get_layer
get_layer(name=None, index=None)
Retrieves a layer based on either its name (unique) or index.

If name and index are both provided, index will take precedence.
Indices are based on order of horizontal graph traversal (bottom-up).

Arguments
• name: String, name of layer.
• index: Integer, index of layer.
Returns
A layer instance.
Raises
• ValueError: In case of invalid layer name or index.
About Keras layers
All Keras layers have a number of methods in common:
• layer.get_weights(): returns the weights of the layer as a list of Numpy arrays.
• layer.set_weights(weights): sets the weights of the layer from a list of Numpy
arrays (with the same shapes as the output of get_weights).
• layer.get_config(): returns a dictionary containing the configuration of the layer. The
layer can be reinstantiated from its config via:
layer = Dense(32)
config = layer.get_config()
reconstructed_layer = Dense.from_config(config)
Or:
from keras import layers
config = layer.get_config()
layer = layers.deserialize({'class_name': layer.__class__.__name__,
'config': config})
If a layer has a single node (i.e. if it isn't a shared layer), you can get its input tensor, output tensor,
input shape and output shape via:
• layer.input
• layer.output
• layer.input_shape
• layer.output_shape
If the layer has multiple nodes (see: the concept of layer node and shared layers), you can use the
following methods:
• layer.get_input_at(node_index)
• layer.get_output_at(node_index)
• layer.get_input_shape_at(node_index)
• layer.get_output_shape_at(node_index)
Dense
keras.layers.Dense(units, activation=None, use_bias=True,
kernel_initializer='glorot_uniform', bias_initializer='zeros',
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None)
Just your regular densely-connected NN layer.

Dense implements the operation: output = activation(dot(input, kernel) + bias)
where activation is the element-wise activation function passed as the activation argument,
kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only
applicable if use_bias is True).
Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product
with kernel.
Example
# as first layer in a sequential model:
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)
# after the first layer, you don't need to specify

# the size of the input anymore:
model.add(Dense(32))
Arguments
• units: Positive integer, dimensionality of the output space.
• activation: Activation function to use (see activations). If you don't specify anything, no
activation is applied (ie. "linear" activation: a(x) = x).
• use_bias: Boolean, whether the layer uses a bias vector.
• kernel_initializer: Initializer for the kernel weights matrix (see initializers).
• bias_initializer: Initializer for the bias vector (see initializers).
• kernel_regularizer: Regularizer function applied to the kernel weights matrix (see
regularizer).
• bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its "activation").
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel weights matrix (see
constraints).
• bias_constraint: Constraint function applied to the bias vector (see constraints).
Input shape
nD tensor with shape: (batch_size, ..., input_dim). The most common situation would be
a 2D input with shape (batch_size, input_dim).
Output shape
nD tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape
(batch_size, input_dim), the output would have shape (batch_size, units).
Activation
keras.layers.Activation(activation)
Applies an activation function to an output.

Arguments
• activation: name of activation function to use (see: activations), or alternatively, a Theano or
TensorFlow operation.
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples
axis) when using this layer as the first layer in a model.
Output shape
Same shape as input.
Dropout
keras.layers.Dropout(rate, noise_shape=None, seed=None)
Applies Dropout to the input.

Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training
time, which helps prevent overfitting.
Arguments
• rate: float between 0 and 1. Fraction of the input units to drop.
• noise_shape: 1D integer tensor representing the shape of the binary dropout mask that will be
multiplied with the input. For instance, if your inputs have shape (batch_size,
timesteps, features) and you want the dropout mask to be the same for all timesteps,
you can use noise_shape=(batch_size, 1, features).
• seed: A Python integer to use as random seed.
References
• Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Flatten
keras.layers.Flatten(data_format=None)
Flattens the input. Does not affect the batch size.

Arguments
• data_format: A string, one of channels_last (default) or channels_first. The
ordering of the dimensions in the inputs. The purpose of this argument is to preserve weight
ordering when switching a model from one data format to another. channels_last
corresponds to inputs with shape (batch, ..., channels) while channels_first
corresponds to inputs with shape (batch, channels, ...). It defaults to the
image_data_format value found in your Keras config file at ~/.keras/keras.json.
If you never set it, then it will be "channels_last".
Example
model.add(Conv2D(64, (3, 3),
input_shape=(3, 32, 32), padding='same',))
# now: model.output_shape == (None, 64, 32, 32)
# now: model.output_shape == (None, 65536)
Input
keras.engine.input_layer.Input()
Input() is used to instantiate a Keras tensor.
A Keras tensor is a tensor object from the underlying backend (Theano, TensorFlow or CNTK), which
we augment with certain attributes that allow us to build a Keras model just by knowing the inputs and
outputs of the model.
For instance, if a, b and c are Keras tensors, it becomes possible to do: model =
Model(input=[a, b], output=c)
The added Keras attributes are: _keras_shape: Integer shape tuple propagated via Keras-side shape
inference. _keras_history: Last layer applied to the tensor. the entire layer graph is retrievable
from that layer, recursively.
Arguments
• shape: A shape tuple (integer), not including the batch size. For instance, shape=(32,)
indicates that the expected input will be batches of 32-dimensional vectors.
• batch_shape: A shape tuple (integer), including the batch size. For instance,
batch_shape=(10, 32) indicates that the expected input will be batches of 10 32-
dimensional vectors. batch_shape=(None, 32) indicates batches of an arbitrary number
of 32-dimensional vectors.
• name: An optional name string for the layer. Should be unique in a model (do not reuse the
same name twice). It will be autogenerated if it isn't provided.
• dtype: The data type expected by the input, as a string (float32, float64, int32...)
• sparse: A boolean specifying whether the placeholder to be created is sparse.
• tensor: Optional existing tensor to wrap into the Input layer. If set, the layer will not create a
placeholder tensor.
Returns
A tensor.
Example
# this is a logistic regression in Keras
y = Dense(16, activation='softmax')(x)
model = Model(x, y)
Reshape
keras.layers.Reshape(target_shape)
Reshapes an output to a certain shape.

Arguments
• target_shape: target shape. Tuple of integers. Does not include the batch axis.
Input shape
Arbitrary, although all dimensions in the input shaped must be fixed. Use the keyword argument
input_shape (tuple of integers, does not include the batch axis) when using this layer as the first
layer in a model.
Output shape
(batch_size,) + target_shape
Example
# as first layer in a Sequential model
model.add(Reshape((3, 4), input_shape=(12,)))
# now: model.output_shape == (None, 3, 4)
# note: `None` is the batch dimension
# as intermediate layer in a Sequential model

model.add(Reshape((6, 2)))
# also supports shape inference using `-1` as dimension

model.add(Reshape((-1, 2, 2)))
# now: model.output_shape == (None, 3, 2, 2)
Permute
keras.layers.Permute(dims)
Permutes the dimensions of the input according to a given pattern.

Useful for e.g. connecting RNNs and convnets together.
Example
model.add(Permute((2, 1), input_shape=(10, 64)))
Arguments
• dims: Tuple of integers. Permutation pattern, does not include the samples dimension. Indexing
starts at 1. For instance, (2, 1) permutes the first and second dimension of the input.
Input shape
Output shape
Same as the input shape, but with the dimensions re-ordered according to the specified pattern.
RepeatVector
keras.layers.RepeatVector(n)
Repeats the input n times.

Example
# now: model.output_shape == (None, 32)
model.add(RepeatVector(3))
Arguments
• n: integer, repetition factor.
Input shape
2D tensor of shape (num_samples, features).
Output shape
3D tensor of shape (num_samples, n, features).
Lambda
keras.layers.Lambda(function, output_shape=None, mask=None, arguments=None)
Wraps arbitrary expression as a Layer object.
Examples
# add a x -> x^2 layer
model.add(Lambda(lambda x: x ** 2))
# add a layer that returns the concatenation

# of the positive part of the input and
# the opposite of the negative part
def antirectifier(x):
x -= K.mean(x, axis=1, keepdims=True)
x = K.l2_normalize(x, axis=1)
pos = K.relu(x)
neg = K.relu(-x)
return K.concatenate([pos, neg], axis=1)
def antirectifier_output_shape(input_shape):
shape = list(input_shape)
assert len(shape) == 2 # only valid for 2D tensors
shape[-1] *= 2
return tuple(shape)
model.add(Lambda(antirectifier,
output_shape=antirectifier_output_shape))
# add a layer that returns the hadamard product

# and sum of it from two input tensors
def hadamard_product_sum(tensors):
out1 = tensors[0] * tensors[1]
out2 = K.sum(out1, axis=-1)
return [out1, out2]
def hadamard_product_sum_output_shape(input_shapes):
shape1 = list(input_shapes[0])
shape2 = list(input_shapes[1])
assert shape1 == shape2 # else hadamard product isn't possible
return [tuple(shape1), tuple(shape2[:-1])]
x1 = Dense(32)(input_1)
x2 = Dense(32)(input_2)
layer = Lambda(hadamard_product_sum, hadamard_product_sum_output_shape)
x_hadamard, x_sum = layer([x1, x2])
Arguments
• function: The function to be evaluated. Takes input tensor or list of tensors as first argument.
• output_shape: Expected output shape from function. Only relevant when using Theano. Can be
a tuple or function. If a tuple, it only specifies the first dimension onward; sample dimension is
assumed either the same as the input: output_shape = (input_shape[0], ) +
output_shape or, the input is None and the sample dimension is also None:
output_shape = (None, ) + output_shape If a function, it specifies the entire
shape as a function of the input shape: output_shape = f(input_shape)
• mask: Either None (indicating no masking) or a Tensor indicating the input mask for
Embedding.
• arguments: optional dictionary of keyword arguments to be passed to the function.
Input shape
Arbitrary. Use the keyword argument input_shape (tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
Output shape
Specified by output_shape argument (or auto-inferred when using TensorFlow or CNTK).
ActivityRegularization
keras.layers.ActivityRegularization(l1=0.0, l2=0.0)
Layer that applies an update to the cost function based input activity.
Arguments
• l1: L1 regularization factor (positive float).
• l2: L2 regularization factor (positive float).
Input shape
Output shape
Masking
keras.layers.Masking(mask_value=0.0)
Masks a sequence by using a mask value to skip timesteps.

If all features for a given sample timestep are equal to mask_value, then the sample timestep will be
masked (skipped) in all downstream layers (as long as they support masking).
If any downstream layer does not support masking yet receives such an input mask, an exception will
be raised.
Example
Consider a Numpy data array x of shape (samples, timesteps, features), to be fed to an
LSTM layer. You want to mask sample #0 at timestep #3, and sample #2 at timestep #5, because you
lack features for these sample timesteps. You can do:
• set x[0, 3, :] = 0. and x[2, 5, :] = 0.
• insert a Masking layer with mask_value=0. before the LSTM layer:
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(32))
Arguments
• mask_value: Either None or mask value to skip
SpatialDropout1D
keras.layers.SpatialDropout1D(rate)
Spatial 1D version of Dropout.

This version performs the same function as Dropout, however it drops entire 1D feature maps instead
of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally
the case in early convolution layers) then regular dropout will not regularize the activations and will
otherwise just result in an effective learning rate decrease. In this case, SpatialDropout1D will help
promote independence between feature maps and should be used instead.
Arguments
Input shape
3D tensor with shape: (samples, timesteps, channels)
Output shape
Same as input
References
• Efficient Object Localization Using Convolutional Networks
SpatialDropout2D
keras.layers.SpatialDropout2D(rate, data_format=None)

of individual elements. If adjacent pixels within feature maps are strongly correlated (as is normally the
case in early convolution layers) then regular dropout will not regularize the activations and will
Arguments
• data_format: 'channels_first' or 'channels_last'. In 'channels_first' mode, the channels
dimension (the depth) is at index 1, in 'channels_last' mode is it at index 3. It defaults to the
Input shape
4D tensor with shape: (samples, channels, rows, cols) if data_format='channels_first' or
4D tensor with shape: (samples, rows, cols, channels) if data_format='channels_last'.
Output shape
Same as input
References
SpatialDropout3D
keras.layers.SpatialDropout3D(rate, data_format=None)

of individual elements. If adjacent voxels within feature maps are strongly correlated (as is normally
the case in early convolution layers) then regular dropout will not regularize the activations and will
Arguments
• data_format: 'channels_first' or 'channels_last'. In 'channels_first' mode, the channels
dimension (the depth) is at index 1, in 'channels_last' mode is it at index 4. It defaults to the
Input shape
5D tensor with shape: (samples, channels, dim1, dim2, dim3) if
data_format='channels_first' or 5D tensor with shape: (samples, dim1, dim2, dim3,
channels) if data_format='channels_last'.
Output shape
Same as input
References
Conv1D
keras.layers.Conv1D(filters, kernel_size, strides=1, padding='valid',
data_format='channels_last', dilation_rate=1, activation=None, use_bias=True,
1D convolution layer (e.g. temporal convolution).

This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or
temporal) dimension to produce a tensor of outputs. If use_bias is True, a bias vector is created and
added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
When using this layer as the first layer in a model, provide an input_shape argument (tuple of
integers or None, does not include the batch axis), e.g. input_shape=(10, 128) for time series
sequences of 10 time steps with 128 features per step in data_format="channels_last", or
(None, 128) for variable-length sequences with 128 features per step.
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the
convolution).
• kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D
convolution window.
• strides: An integer or tuple/list of a single integer, specifying the stride length of the
convolution. Specifying any stride value != 1 is incompatible with specifying any
dilation_rate value != 1.
• padding: One of "valid", "causal" or "same" (case-insensitive). "valid" means "no
padding". "same" results in padding the input such that the output has the same length as the
original input. "causal" results in causal (dilated) convolutions, e.g. output[t] does not
depend on input[t + 1:]. A zero padding is used such that the output has the same length
as the original input. Useful when modeling temporal data where the model should not violate
the temporal order. See WaveNet: A Generative Model for Raw Audio, section 2.1.
• data_format: A string, one of "channels_last" (default) or "channels_first". The
ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with
shape (batch, steps, channels) (default format for temporal data in Keras) while
"channels_first" corresponds to inputs with shape (batch, channels, steps).
• dilation_rate: an integer or tuple/list of a single integer, specifying the dilation rate to use for
dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible
with specifying any strides value != 1.
regularizer).
(see regularizer).
• kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
Input shape
3D tensor with shape: (batch, steps, channels)
Output shape
3D tensor with shape: (batch, new_steps, filters) steps value might have changed due
to padding or strides.
Conv2D
keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True,
2D convolution layer (e.g. spatial convolution over images).

This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of
outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if
activation is not None, it is applied to the outputs as well.
When using this layer as the first layer in a model, provide the keyword argument input_shape
(tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 3) for
128x128 RGB pictures in data_format="channels_last".
Arguments
convolution).
• kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D
convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
• strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the
height and width. Can be a single integer to specify the same value for all spatial dimensions.
Specifying any stride value != 1 is incompatible with specifying any dilation_rate value !
= 1.
• padding: one of "valid" or "same" (case-insensitive). Note that "same" is slightly
inconsistent across backends with strides != 1, as described here
• data_format: A string, one of "channels_last" or "channels_first". The ordering
of the dimensions in the inputs. "channels_last" corresponds to inputs with shape
(batch, height, width, channels) while "channels_first" corresponds to
inputs with shape (batch, channels, height, width). It defaults to the
• dilation_rate: an integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated
convolution. Can be a single integer to specify the same value for all spatial dimensions.
Currently, specifying any dilation_rate value != 1 is incompatible with specifying any
stride value != 1.
regularizer).
(see regularizer).
Input shape
4D tensor with shape: (batch, channels, rows, cols) if data_format is
"channels_first" or 4D tensor with shape: (batch, rows, cols, channels) if
data_format is "channels_last".
Output shape
4D tensor with shape: (batch, filters, new_rows, new_cols) if data_format is
"channels_first" or 4D tensor with shape: (batch, new_rows, new_cols, filters)
if data_format is "channels_last". rows and cols values might have changed due to
padding.
SeparableConv1D
keras.layers.SeparableConv1D(filters, kernel_size, strides=1, padding='valid',
data_format='channels_last', dilation_rate=1, depth_multiplier=1, activation=None,
use_bias=True, depthwise_initializer='glorot_uniform',
pointwise_initializer='glorot_uniform', bias_initializer='zeros',
depthwise_regularizer=None, pointwise_regularizer=None, bias_regularizer=None,
activity_regularizer=None, depthwise_constraint=None, pointwise_constraint=None,
bias_constraint=None)
Depthwise separable 1D convolution.

Separable convolutions consist in first performing a depthwise spatial convolution (which acts on each
input channel separately) followed by a pointwise convolution which mixes together the resulting
output channels. The depth_multiplier argument controls how many output channels are
generated per input channel in the depthwise step.
Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into
two smaller kernels, or as an extreme version of an Inception block.
Arguments
convolution).
• kernel_size: An integer or tuple/list of single integer, specifying the length of the 1D
convolution window.
• strides: An integer or tuple/list of single integer, specifying the stride length of the convolution.
= 1.
• padding: one of "valid" or "same" (case-insensitive).
(batch, steps, channels) while "channels_first" corresponds to inputs with
shape (batch, channels, steps).
• dilation_rate: An integer or tuple/list of a single integer, specifying the dilation rate to use for
dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible
with specifying any strides value != 1.
• depth_multiplier: The number of depthwise convolution output channels for each input
channel. The total number of depthwise convolution output channels will be equal to
filters_in * depth_multiplier.
• depthwise_initializer: Initializer for the depthwise kernel matrix (see initializers).
• pointwise_initializer: Initializer for the pointwise kernel matrix (see initializers).
• depthwise_regularizer: Regularizer function applied to the depthwise kernel matrix (see
regularizer).
• pointwise_regularizer: Regularizer function applied to the pointwise kernel matrix (see
regularizer).
(see regularizer).
• depthwise_constraint: Constraint function applied to the depthwise kernel matrix (see
constraints).
• pointwise_constraint: Constraint function applied to the pointwise kernel matrix (see
constraints).
Input shape
3D tensor with shape: (batch, channels, steps) if data_format is
"channels_first" or 3D tensor with shape: (batch, steps, channels) if
Output shape
3D tensor with shape: (batch, filters, new_steps) if data_format is
"channels_first" or 3D tensor with shape: (batch, new_steps, filters) if
data_format is "channels_last". new_steps values might have changed due to padding or
strides.
SeparableConv2D
keras.layers.SeparableConv2D(filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), depth_multiplier=1, activation=None,
use_bias=True, depthwise_initializer='glorot_uniform',
pointwise_initializer='glorot_uniform', bias_initializer='zeros',
depthwise_regularizer=None, pointwise_regularizer=None, bias_regularizer=None,
activity_regularizer=None, depthwise_constraint=None, pointwise_constraint=None,
bias_constraint=None)
Depthwise separable 2D convolution.

Separable convolution performs first a depthwise spatial convolution (which acts on each input channel
separately) followed by a pointwise convolution which mixes together the resulting output channels.
The depth_multiplier argument controls how many output channels are generated per input
channel in the depthwise step.
Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into
two smaller kernels, or as an extreme version of an Inception block.
Arguments
convolution).
dimensions.
= 1.
• dilation_rate: An integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated
convolution. Currently, specifying any dilation_rate value != 1 is incompatible with
specifying any strides value != 1.
• pointwise_initializer: Initializer for the pointwise kernel matrix (see initializers).
regularizer).
• pointwise_regularizer: Regularizer function applied to the pointwise kernel matrix (see
regularizer).
(see regularizer).
constraints).
• pointwise_constraint: Constraint function applied to the pointwise kernel matrix (see
constraints).
Input shape
Output shape
padding.
DepthwiseConv2D
keras.layers.DepthwiseConv2D(kernel_size, strides=(1, 1), padding='valid',
depth_multiplier=1, data_format=None, dilation_rate=(1, 1), activation=None,
use_bias=True, depthwise_initializer='glorot_uniform', bias_initializer='zeros',
depthwise_regularizer=None, bias_regularizer=None, activity_regularizer=None,
depthwise_constraint=None, bias_constraint=None)
Depthwise 2D convolution.
Depthwise convolution performs just the first step of a depthwise spatial convolution (which acts on
each input channel separately). The depth_multiplier argument controls how many output
channels are generated per input channel in the depthwise step.
Arguments
dimensions.
= 1.
If you never set it, then it will be 'channels_last'.
stride value != 1.
activation is applied (ie. 'linear' activation: a(x) = x).
regularizer).
• activity_regularizer: Regularizer function applied to the output of the layer (its 'activation').
(see regularizer).
constraints).
Input shape
Output shape
4D tensor with shape: (batch, channels * depth_multiplier, new_rows,
new_cols) if data_format is "channels_first" or 4D tensor with shape: (batch,
new_rows, new_cols, channels * depth_multiplier) if data_format is
"channels_last". rows and cols values might have changed due to padding.
Conv2DTranspose
keras.layers.Conv2DTranspose(filters, kernel_size, strides=(1, 1), padding='valid',
output_padding=None, data_format=None, dilation_rate=(1, 1), activation=None,
use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros',
Transposed convolution layer (sometimes called Deconvolution).

The need for transposed convolutions generally arises from the desire to use a transformation going in
the opposite direction of a normal convolution, i.e., from something that has the shape of the output of
some convolution to something that has the shape of its input while maintaining a connectivity pattern
that is compatible with said convolution.
(tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 3) for
128x128 RGB pictures in data_format="channels_last".
Arguments
convolution).
dimensions.
= 1.
• output_padding: An integer or tuple/list of 2 integers, specifying the amount of padding along
the height and width of the output tensor. Can be a single integer to specify the same value for
all spatial dimensions. The amount of output padding along a given dimension must be lower
than the stride along that same dimension. If set to None (default), the output shape is inferred.
stride value != 1.
regularizer).
(see regularizer).
Input shape
Output shape
padding. If output_padding is specified:
new_rows = ((rows - 1) * strides[0] + kernel_size[0]
- 2 * padding[0] + output_padding[0])
new_cols = ((cols - 1) * strides[1] + kernel_size[1]
References
• A guide to convolution arithmetic for deep learning
• Deconvolutional Networks
Conv3D
keras.layers.Conv3D(filters, kernel_size, strides=(1, 1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1, 1), activation=None, use_bias=True,
3D convolution layer (e.g. spatial convolution over volumes).

This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of
outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if
activation is not None, it is applied to the outputs as well.
(tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 128, 1)
for 128x128x128 volumes with a single channel, in data_format="channels_last".
Arguments
convolution).
• kernel_size: An integer or tuple/list of 3 integers, specifying the depth, height and width of the
3D convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
• strides: An integer or tuple/list of 3 integers, specifying the strides of the convolution along
each spatial dimension. Can be a single integer to specify the same value for all spatial
dimensions. Specifying any stride value != 1 is incompatible with specifying any
(batch, spatial_dim1, spatial_dim2, spatial_dim3, channels) while
"channels_first" corresponds to inputs with shape (batch, channels,
spatial_dim1, spatial_dim2, spatial_dim3). It defaults to the
stride value != 1.
regularizer).
(see regularizer).
Input shape
5D tensor with shape: (batch, channels, conv_dim1, conv_dim2, conv_dim3) if
data_format is "channels_first" or 5D tensor with shape: (batch, conv_dim1,
conv_dim2, conv_dim3, channels) if data_format is "channels_last".
Output shape
5D tensor with shape: (batch, filters, new_conv_dim1, new_conv_dim2,
new_conv_dim3) if data_format is "channels_first" or 5D tensor with shape: (batch,
new_conv_dim1, new_conv_dim2, new_conv_dim3, filters) if data_format is
"channels_last". new_conv_dim1, new_conv_dim2 and new_conv_dim3 values might
have changed due to padding.
[source]
Conv3DTranspose
keras.layers.Conv3DTranspose(filters, kernel_size, strides=(1, 1, 1),
padding='valid', output_padding=None, data_format=None, activation=None,
use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros',
Transposed convolution layer (sometimes called Deconvolution).

The need for transposed convolutions generally arises from the desire to use a transformation going in
the opposite direction of a normal convolution, i.e., from something that has the shape of the output of
some convolution to something that has the shape of its input while maintaining a connectivity pattern
that is compatible with said convolution.
(tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 128, 3)
for a 128x128x128 volume with 3 channels if data_format="channels_last".
Arguments
convolution).
• kernel_size: An integer or tuple/list of 3 integers, specifying the depth, height and width of the
3D convolution window. Can be a single integer to specify the same value for all spatial
dimensions.
depth, height and width. Can be a single integer to specify the same value for all spatial
dimensions. Specifying any stride value != 1 is incompatible with specifying any
• output_padding: An integer or tuple/list of 3 integers, specifying the amount of padding along
the depth, height, and width. Can be a single integer to specify the same value for all spatial
dimensions. The amount of output padding along a given dimension must be lower than the
stride along that same dimension. If set to None (default), the output shape is inferred.
(batch, depth, height, width, channels) while "channels_first"
corresponds to inputs with shape (batch, channels, depth, height, width). It
defaults to the image_data_format value found in your Keras config file at
~/.keras/keras.json. If you never set it, then it will be "channels_last".
stride value != 1.
regularizer).
(see regularizer).
Input shape
5D tensor with shape: (batch, channels, depth, rows, cols) if data_format is
"channels_first" or 5D tensor with shape: (batch, depth, rows, cols, channels)
if data_format is "channels_last".
Output shape
5D tensor with shape: (batch, filters, new_depth, new_rows, new_cols) if
data_format is "channels_first" or 5D tensor with shape: (batch, new_depth,
new_rows, new_cols, filters) if data_format is "channels_last". depth and
rows and cols values might have changed due to padding. If output_padding is specified::
new_depth = ((depth - 1) * strides[0] + kernel_size[0]
new_rows = ((rows - 1) * strides[1] + kernel_size[1]
new_cols = ((cols - 1) * strides[2] + kernel_size[2]
References
• A guide to convolution arithmetic for deep learning
• Deconvolutional Networks
[source]
Cropping1D
keras.layers.Cropping1D(cropping=(1, 1))
Cropping layer for 1D input (e.g. temporal sequence).

It crops along the time dimension (axis 1).
Arguments
• cropping: int or tuple of int (length 2) How many units should be trimmed off at the beginning
and end of the cropping dimension (axis 1). If a single int is provided, the same value will be
used for both.
Input shape
3D tensor with shape (batch, axis_to_crop, features)
Output shape
3D tensor with shape (batch, cropped_axis, features)
[source]
Cropping2D
keras.layers.Cropping2D(cropping=((0, 0), (0, 0)), data_format=None)
Cropping layer for 2D input (e.g. picture).

It crops along spatial dimensions, i.e. height and width.
Arguments
• cropping: int, or tuple of 2 ints, or tuple of 2 tuples of 2 ints.
• If int: the same symmetric cropping is applied to height and width.
• If tuple of 2 ints: interpreted as two different symmetric cropping values for height and
width: (symmetric_height_crop, symmetric_width_crop).
• If tuple of 2 tuples of 2 ints: interpreted as ((top_crop, bottom_crop),
(left_crop, right_crop))
Input shape
4D tensor with shape: - If data_format is "channels_last": (batch, rows, cols,
channels) - If data_format is "channels_first": (batch, channels, rows,
cols)
Output shape
4D tensor with shape: - If data_format is "channels_last": (batch, cropped_rows,
cropped_cols, channels) - If data_format is "channels_first": (batch,
channels, cropped_rows, cropped_cols)
Examples
# Crop the input 2D images or feature maps
model.add(Cropping2D(cropping=((2, 2), (4, 4)),
input_shape=(28, 28, 3)))
# now model.output_shape == (None, 24, 20, 3)
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Cropping2D(cropping=((2, 2), (2, 2))))
[source]
Cropping3D
keras.layers.Cropping3D(cropping=((1, 1), (1, 1), (1, 1)), data_format=None)
Cropping layer for 3D data (e.g. spatial or spatio-temporal).

Arguments
• cropping: int, or tuple of 3 ints, or tuple of 3 tuples of 2 ints.
• If int: the same symmetric cropping is applied to depth, height, and width.
• If tuple of 3 ints: interpreted as three different symmetric cropping values for depth,
height, and width: (symmetric_dim1_crop, symmetric_dim2_crop,
symmetric_dim3_crop).
• If tuple of 3 tuples of 2 ints: interpreted as ((left_dim1_crop,
right_dim1_crop), (left_dim2_crop, right_dim2_crop),
(left_dim3_crop, right_dim3_crop))
Input shape
5D tensor with shape: - If data_format is "channels_last": (batch,
first_axis_to_crop, second_axis_to_crop, third_axis_to_crop, depth) -
If data_format is "channels_first": (batch, depth, first_axis_to_crop,
second_axis_to_crop, third_axis_to_crop)
Output shape
first_cropped_axis, second_cropped_axis, third_cropped_axis, depth) -
If data_format is "channels_first": (batch, depth, first_cropped_axis,
second_cropped_axis, third_cropped_axis)
[source]
UpSampling1D
keras.layers.UpSampling1D(size=2)
Upsampling layer for 1D inputs.

Repeats each temporal step size times along the time axis.
Arguments
• size: integer. Upsampling factor.
Input shape
3D tensor with shape: (batch, steps, features).
Output shape
3D tensor with shape: (batch, upsampled_steps, features).
[source]
UpSampling2D
keras.layers.UpSampling2D(size=(2, 2), data_format=None, interpolation='nearest')

Repeats the rows and columns of the data by size[0] and size[1] respectively.
Arguments
• size: int, or tuple of 2 integers. The upsampling factors for rows and columns.
• interpolation: A string, one of nearest or bilinear. Note that CNTK does not support yet
the bilinear upscaling and that with Theano, only size=(2, 2) is possible.
Input shape
cols)
Output shape
upsampled_rows, upsampled_cols, channels) - If data_format is
"channels_first": (batch, channels, upsampled_rows, upsampled_cols)
[source]
UpSampling3D
keras.layers.UpSampling3D(size=(2, 2, 2), data_format=None)

Repeats the 1st, 2nd and 3rd dimensions of the data by size[0], size[1] and size[2] respectively.
Arguments
• size: int, or tuple of 3 integers. The upsampling factors for dim1, dim2 and dim3.
Input shape
5D tensor with shape: - If data_format is "channels_last": (batch, dim1, dim2,
dim3, channels) - If data_format is "channels_first": (batch, channels,
dim1, dim2, dim3)
Output shape
upsampled_dim1, upsampled_dim2, upsampled_dim3, channels) - If
data_format is "channels_first": (batch, channels, upsampled_dim1,
upsampled_dim2, upsampled_dim3)
[source]
ZeroPadding1D
keras.layers.ZeroPadding1D(padding=1)
Zero-padding layer for 1D input (e.g. temporal sequence).

Arguments
• padding: int, or tuple of int (length 2), or dictionary.
• If int:
How many zeros to add at the beginning and end of the padding dimension (axis 1).
• If tuple of int (length 2):
How many zeros to add at the beginning and at the end of the padding dimension
((left_pad, right_pad)).
Input shape
3D tensor with shape (batch, axis_to_pad, features)
Output shape
3D tensor with shape (batch, padded_axis, features)
[source]
ZeroPadding2D
keras.layers.ZeroPadding2D(padding=(1, 1), data_format=None)
Zero-padding layer for 2D input (e.g. picture).

This layer can add rows and columns of zeros at the top, bottom, left and right side of an image tensor.
Arguments
• padding: int, or tuple of 2 ints, or tuple of 2 tuples of 2 ints.
• If int: the same symmetric padding is applied to height and width.
• If tuple of 2 ints: interpreted as two different symmetric padding values for height and
width: (symmetric_height_pad, symmetric_width_pad).
• If tuple of 2 tuples of 2 ints: interpreted as ((top_pad, bottom_pad),
(left_pad, right_pad))
Input shape
cols)
Output shape
4D tensor with shape: - If data_format is "channels_last": (batch, padded_rows,
padded_cols, channels) - If data_format is "channels_first": (batch,
channels, padded_rows, padded_cols)
[source]
ZeroPadding3D
keras.layers.ZeroPadding3D(padding=(1, 1, 1), data_format=None)
Zero-padding layer for 3D data (spatial or spatio-temporal).

Arguments
• padding: int, or tuple of 3 ints, or tuple of 3 tuples of 2 ints.
• If int: the same symmetric padding is applied to height and width.
• If tuple of 3 ints: interpreted as three different symmetric padding values for depth,
height, and width: (symmetric_dim1_pad, symmetric_dim2_pad,
symmetric_dim3_pad).
• If tuple of 3 tuples of 2 ints: interpreted as ((left_dim1_pad,
right_dim1_pad), (left_dim2_pad, right_dim2_pad),
(left_dim3_pad, right_dim3_pad))
Input shape
first_axis_to_pad, second_axis_to_pad, third_axis_to_pad, depth) - If
data_format is "channels_first": (batch, depth, first_axis_to_pad,
second_axis_to_pad, third_axis_to_pad)
Output shape
first_padded_axis, second_padded_axis, third_axis_to_pad, depth) - If
data_format is "channels_first": (batch, depth, first_padded_axis,
second_padded_axis, third_axis_to_pad)
MaxPooling1D
keras.layers.MaxPooling1D(pool_size=2, strides=None, padding='valid',
data_format='channels_last')
Max pooling operation for temporal data.

Arguments
• pool_size: Integer, size of the max pooling windows.
• strides: Integer, or None. Factor by which to downscale. E.g. 2 will halve the input. If None, it
will default to pool_size.
• padding: One of "valid" or "same" (case-insensitive).
ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape
(batch, steps, features) while channels_first corresponds to inputs with
shape (batch, features, steps).
Input shape
• If data_format='channels_last': 3D tensor with shape: (batch_size, steps,
features)
• If data_format='channels_first': 3D tensor with shape: (batch_size,
features, steps)
Output shape
• If data_format='channels_last': 3D tensor with shape: (batch_size,
downsampled_steps, features)
features, downsampled_steps)
MaxPooling2D
keras.layers.MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid',
data_format=None)
Max pooling operation for spatial data.

Arguments
• pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2,
2) will halve the input in both spatial dimension. If only one integer is specified, the same
window length will be used for both dimensions.
• strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to
pool_size.
(batch, height, width, channels) while channels_first corresponds to
Input shape
• If data_format='channels_last': 4D tensor with shape: (batch_size, rows,
cols, channels)
channels, rows, cols)
Output shape
pooled_rows, pooled_cols, channels)
channels, pooled_rows, pooled_cols)
MaxPooling3D
keras.layers.MaxPooling3D(pool_size=(2, 2, 2), strides=None, padding='valid',
data_format=None)
Max pooling operation for 3D data (spatial or spatio-temporal).

Arguments
• pool_size: tuple of 3 integers, factors by which to downscale (dim1, dim2, dim3). (2, 2, 2) will
halve the size of the 3D input in each dimension.
• strides: tuple of 3 integers, or None. Strides values.
channels_first corresponds to inputs with shape (batch, channels,
Input shape
spatial_dim1, spatial_dim2, spatial_dim3, channels)
channels, spatial_dim1, spatial_dim2, spatial_dim3)
Output shape
pooled_dim1, pooled_dim2, pooled_dim3, channels)
channels, pooled_dim1, pooled_dim2, pooled_dim3)
AveragePooling1D
keras.layers.AveragePooling1D(pool_size=2, strides=None, padding='valid',
data_format='channels_last')
Average pooling for temporal data.

Arguments
• pool_size: Integer, size of the average pooling windows.
• strides: Integer, or None. Factor by which to downscale. E.g. 2 will halve the input. If None, it
will default to pool_size.
Input shape
features)
features, steps)
Output shape
downsampled_steps, features)
features, downsampled_steps)
AveragePooling2D
keras.layers.AveragePooling2D(pool_size=(2, 2), strides=None, padding='valid',
data_format=None)
Average pooling operation for spatial data.

Arguments
• pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2,
2) will halve the input in both spatial dimension. If only one integer is specified, the same
window length will be used for both dimensions.
• strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to
pool_size.
Input shape
cols, channels)
Output shape
pooled_rows, pooled_cols, channels)
channels, pooled_rows, pooled_cols)
AveragePooling3D
keras.layers.AveragePooling3D(pool_size=(2, 2, 2), strides=None, padding='valid',
data_format=None)
Average pooling operation for 3D data (spatial or spatio-temporal).

Arguments
• pool_size: tuple of 3 integers, factors by which to downscale (dim1, dim2, dim3). (2, 2, 2) will
halve the size of the 3D input in each dimension.
• strides: tuple of 3 integers, or None. Strides values.
Input shape
Output shape
pooled_dim1, pooled_dim2, pooled_dim3, channels)
channels, pooled_dim1, pooled_dim2, pooled_dim3)
GlobalMaxPooling1D
keras.layers.GlobalMaxPooling1D(data_format='channels_last')
Global max pooling operation for temporal data.
Arguments
Input shape
features)
features, steps)
Output shape
2D tensor with shape: (batch_size, features)
GlobalAveragePooling1D
keras.layers.GlobalAveragePooling1D(data_format='channels_last')
Global average pooling operation for temporal data.

Arguments
Input shape
features)
features, steps)
Output shape
2D tensor with shape: (batch_size, features)
GlobalMaxPooling2D
keras.layers.GlobalMaxPooling2D(data_format=None)
Global max pooling operation for spatial data.

Arguments
Input shape
cols, channels)
Output shape
2D tensor with shape: (batch_size, channels)
keras.layers.GlobalAveragePooling2D(data_format=None)
Global average pooling operation for spatial data.

Arguments
Input shape
cols, channels)
Output shape
GlobalMaxPooling3D
keras.layers.GlobalMaxPooling3D(data_format=None)
Global Max pooling operation for 3D data.

Arguments
Input shape
Output shape
keras.layers.GlobalAveragePooling3D(data_format=None)
Global Average pooling operation for 3D data.

Arguments
Input shape
Output shape
LocallyConnected1D
keras.layers.LocallyConnected1D(filters, kernel_size, strides=1, padding='valid',
data_format=None, activation=None, use_bias=True,
Locally-connected layer for 1D inputs.

The LocallyConnected1D layer works similarly to the Conv1D layer, except that weights are
unshared, that is, a different set of filters is applied at each different patch of the input.
Example
# apply a unshared weight convolution 1d of length 3 to a sequence with
# 10 timesteps, with 64 output filters
model.add(LocallyConnected1D(64, 3, input_shape=(10, 32)))
# now model.output_shape == (None, 8, 64)
# add a new conv1d on top
model.add(LocallyConnected1D(32, 3))
Arguments
convolution).
• kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D
convolution window.
• strides: An integer or tuple/list of a single integer, specifying the stride length of the
convolution. Specifying any stride value != 1 is incompatible with specifying any
• padding: Currently only supports "valid" (case-insensitive). "same" may be supported in
the future.
• data_format: String, one of channels_first, channels_last.
regularizer).
(see regularizer).
Input shape
3D tensor with shape: (batch_size, steps, input_dim)
Output shape
3D tensor with shape: (batch_size, new_steps, filters) steps value might have
changed due to padding or strides.
LocallyConnected2D
keras.layers.LocallyConnected2D(filters, kernel_size, strides=(1, 1),
padding='valid', data_format=None, activation=None, use_bias=True,
Locally-connected layer for 2D inputs.

The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are
unshared, that is, a different set of filters is applied at each different patch of the input.
Examples
# apply a 3x3 unshared weights convolution with 64 output filters
# on a 32x32 image with `data_format="channels_last"`:
model.add(LocallyConnected2D(64, (3, 3), input_shape=(32, 32, 3)))
# notice that this layer will consume (30*30)*(3*3*3*64)
# + (30*30)*64 parameters
# add a 3x3 unshared weights convolution on top, with 32 output filters:
model.add(LocallyConnected2D(32, (3, 3)))
Arguments
convolution).
• kernel_size: An integer or tuple/list of 2 integers, specifying the width and height of the 2D
dimensions.
width and height. Can be a single integer to specify the same value for all spatial dimensions.
• padding: Currently only support "valid" (case-insensitive). "same" will be supported in
future.
regularizer).
(see regularizer).
Input shape
4D tensor with shape: (samples, channels, rows, cols) if data_format='channels_first' or
4D tensor with shape: (samples, rows, cols, channels) if data_format='channels_last'.
Output shape
4D tensor with shape: (samples, filters, new_rows, new_cols) if
data_format='channels_first' or 4D tensor with shape: (samples, new_rows, new_cols,
filters) if data_format='channels_last'. rows and cols values might have changed due to
padding.
RNN
keras.engine.base_layer.wrapped_fn()
Base class for recurrent layers.

Arguments
• cell: A RNN cell instance. A RNN cell is a class that has:
• a call(input_at_t, states_at_t) method, returning (output_at_t,
states_at_t_plus_1). The call method of the cell can also take the optional
argument constants, see section "Note on passing external constants" below.
• a state_size attribute. This can be a single integer (single state) in which case it is
the size of the recurrent state (which should be the same as the size of the cell output).
This can also be a list/tuple of integers (one size per state).
• a output_size attribute. This can be a single integer or a TensorShape, which
represent the shape of the output. For backward compatible reason, if this attribute is not
available for the cell, the value will be inferred by the first element of the
state_size.
It is also possible for cell to be a list of RNN cell instances, in which cases the cells get
stacked one after the other in the RNN, implementing an efficient stacked RNN.
• return_sequences: Boolean. Whether to return the last output in the output sequence, or the full
sequence.
• return_state: Boolean. Whether to return the last state in addition to the output.
• go_backwards: Boolean (default False). If True, process the input sequence backwards and
return the reversed sequence.
• stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will
be used as initial state for the sample of index i in the following batch.
• unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will
be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive.
Unrolling is only suitable for short sequences.
• input_dim: dimensionality of the input (integer). This argument (or alternatively, the keyword
argument input_shape) is required when using this layer as the first layer in a model.
• input_length: Length of input sequences, to be specified when it is constant. This argument is
required if you are going to connect Flatten then Dense layers upstream (without it, the
shape of the dense outputs cannot be computed). Note that if the recurrent layer is not the first
layer in your model, you would need to specify the input length at the level of the first layer
(e.g. via the input_shape argument)
Input shape
3D tensor with shape (batch_size, timesteps, input_dim).
Output shape
• if return_state: a list of tensors. The first tensor is the output. The remaining tensors are
the last states, each with shape (batch_size, units). For example, the number of state
tensors is 1 (for RNN and GRU) or 2 (for LSTM).
• if return_sequences: 3D tensor with shape (batch_size, timesteps, units).
• else, 2D tensor with shape (batch_size, units).
Masking
This layer supports masking for input data with a variable number of timesteps. To introduce masks to
your data, use an Embedding layer with the mask_zero parameter set to True.
Note on using statefulness in RNNs

You can set RNN layers to be 'stateful', which means that the states computed for the samples in one
batch will be reused as initial states for the samples in the next batch. This assumes a one-to-one
mapping between samples in different successive batches.
To enable statefulness: - specify stateful=True in the layer constructor. - specify a fixed batch size
for your model, by passing if sequential model: batch_input_shape=(...) to the first layer in
your model. else for functional model with 1 or more Input layers: batch_shape=(...) to all the
first layers in your model. This is the expected shape of your inputs including the batch size. It should
be a tuple of integers, e.g. (32, 10, 100). - specify shuffle=False when calling fit().
To reset the states of your model, call .reset_states() on either a specific layer, or on your entire
model.
Note on specifying the initial state of RNNs
You can specify the initial state of RNN layers symbolically by calling them with the keyword
argument initial_state. The value of initial_state should be a tensor or list of tensors
representing the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by calling reset_states with the
keyword argument states. The value of states should be a numpy array or list of numpy arrays
representing the initial state of the RNN layer.
Note on passing external constants to RNNs
You can pass "external" constants to the cell using the constants keyword argument of
RNN.__call__ (as well as RNN.call) method. This requires that the cell.call method accepts
the same keyword argument constants. Such constants can be used to condition the cell
transformation on additional static inputs (not changing over time), a.k.a. an attention mechanism.
Examples
# First, let's define a RNN Cell, as a layer subclass.
class MinimalRNNCell(keras.layers.Layer):
def __init__(self, units, **kwargs):

self.units = units
self.state_size = units
super(MinimalRNNCell, self).__init__(**kwargs)
def build(self, input_shape):

self.kernel = self.add_weight(shape=(input_shape[-1], self.units),
initializer='uniform',
name='kernel')
self.recurrent_kernel = self.add_weight(
shape=(self.units, self.units),
name='recurrent_kernel')
self.built = True
def call(self, inputs, states):

prev_output = states[0]
h = K.dot(inputs, self.kernel)
output = h + K.dot(prev_output, self.recurrent_kernel)
return output, [output]
# Let's use this cell in a RNN layer:
cell = MinimalRNNCell(32)
x = keras.Input((None, 5))
layer = RNN(cell)
y = layer(x)
# Here's how to use the cell to build a stacked RNN:
cells = [MinimalRNNCell(32), MinimalRNNCell(64)]

x = keras.Input((None, 5))
layer = RNN(cells)
y = layer(x)
SimpleRNN
keras.layers.SimpleRNN(units, activation='tanh', use_bias=True,
kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal',
bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None,
bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
recurrent_constraint=None, bias_constraint=None, dropout=0.0,
recurrent_dropout=0.0, return_sequences=False, return_state=False,
go_backwards=False, stateful=False, unroll=False)
Fully-connected RNN where the output is to be fed back to input.
Arguments
• activation: Activation function to use (see activations). Default: hyperbolic tangent (tanh). If
you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
• kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation
of the inputs (see initializers).
• recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the
linear transformation of the recurrent state (see initializers).
regularizer).
• recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights
matrix (see regularizer).
(see regularizer).
constraints).
• recurrent_constraint: Constraint function applied to the recurrent_kernel weights
matrix (see constraints).
• dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the
inputs.
• recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear
transformation of the recurrent state.
sequence.
GRU
keras.layers.GRU(units, activation='tanh', recurrent_activation='sigmoid',
use_bias=True, kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal', bias_initializer='zeros',
kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=2,
return_sequences=False, return_state=False, go_backwards=False, stateful=False,
unroll=False, reset_after=False)
Gated Recurrent Unit - Cho et al. 2014.

There are two variants. The default one is based on 1406.1078v3 and has reset gate applied to hidden
state before matrix multiplication. The other one is based on original 1406.1078v1 and has the order
reversed.
The second variant is compatible with CuDNNGRU (GPU-only) and allows inference on CPU. Thus it
has separate biases for kernel and recurrent_kernel. Use 'reset_after'=True and
recurrent_activation='sigmoid'.
Arguments
• recurrent_activation: Activation function to use for the recurrent step (see activations).
Default: hard sigmoid (hard_sigmoid). If you pass None, no activation is applied (ie.
"linear" activation: a(x) = x).
regularizer).
(see regularizer).
constraints).
inputs.
• implementation: Implementation mode, either 1 or 2. Mode 1 will structure its operations as a
larger number of smaller dot products and additions, whereas mode 2 will batch them into
fewer, larger operations. These modes will have different performance profiles on different
hardware and for different applications.
sequence.
• reset_after: GRU convention (whether to apply reset gate after or before matrix multiplication).
False = "before" (default), True = "after" (CuDNN compatible).
References
• Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine
Translation
• On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
• Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
• A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
LSTM
keras.layers.LSTM(units, activation='tanh', recurrent_activation='sigmoid',
unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None,
recurrent_dropout=0.0, implementation=2, return_sequences=False,
return_state=False, go_backwards=False, stateful=False, unroll=False)
Long Short-Term Memory layer - Hochreiter 1997.

Arguments
of the inputs. (see initializers).
linear transformation of the recurrent state. (see initializers).
• unit_forget_bias: Boolean. If True, add 1 to the bias of the forget gate at initialization. Setting
it to true will also force bias_initializer="zeros". This is recommended in
Jozefowicz et al. (2015).
regularizer).
(see regularizer).
constraints).
inputs.
sequence.
• return_state: Boolean. Whether to return the last state in addition to the output. The returned
elements of the states list are the hidden state and the cell state, respectively.
References
• Long short-term memory
• Learning to forget: Continual prediction with LSTM
• Supervised sequence labeling with recurrent neural networks
ConvLSTM2D
keras.layers.ConvLSTM2D(filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), activation='tanh',
recurrent_activation='hard_sigmoid', use_bias=True,
bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None,
recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,
return_sequences=False, go_backwards=False, stateful=False, dropout=0.0,
recurrent_dropout=0.0)
Convolutional LSTM.
It is similar to an LSTM layer, but the input transformations and recurrent transformations are both
convolutional.
Arguments
• filters: Integer, the dimensionality of the output space (i.e. the number output of filters in the
convolution).
• kernel_size: An integer or tuple/list of n integers, specifying the dimensions of the convolution
window.
• strides: An integer or tuple/list of n integers, specifying the strides of the convolution.
= 1.
• data_format: A string, one of "channels_last" (default) or "channels_first". The
ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with
shape (batch, time, ..., channels) while "channels_first" corresponds to
inputs with shape (batch, time, channels, ...). It defaults to the
• dilation_rate: An integer or tuple/list of n integers, specifying the dilation rate to use for dilated
• activation: Activation function to use (see activations).
• unit_forget_bias: Boolean. If True, add 1 to the bias of the forget gate at initialization. Use in
combination with bias_initializer="zeros". This is recommended in Jozefowicz et
al. (2015).
regularizer).
(see regularizer).
constraints).
sequence.
• go_backwards: Boolean (default False). If True, process the input sequence backwards.
inputs.
Input shape
• if data_format='channels_first' 5D tensor with shape: (samples, time, channels,
rows, cols)
• if data_format='channels_last' 5D tensor with shape: (samples, time, rows, cols,
channels)
Output shape
• if return_sequences
• if data_format='channels_first' 5D tensor with shape: (samples, time,
filters, output_row, output_col)
• if data_format='channels_last' 5D tensor with shape: (samples, time,
output_row, output_col, filters)
• else
• if data_format='channels_first' 4D tensor with shape: (samples, filters,
output_row, output_col)
• if data_format='channels_last' 4D tensor with shape: (samples, output_row,
output_col, filters)
where o_row and o_col depend on the shape of the filter and the padding
Raises
• ValueError: in case of invalid constructor arguments.
References
• Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
The current implementation does not include the feedback loop on the cells output
ConvLSTM2DCell
keras.layers.ConvLSTM2DCell(filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), activation='tanh',
recurrent_activation='hard_sigmoid', use_bias=True,
bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None,
recurrent_regularizer=None, bias_regularizer=None, kernel_constraint=None,
recurrent_dropout=0.0)
Cell class for the ConvLSTM2D layer.

Arguments
convolution).
• kernel_size: An integer or tuple/list of n integers, specifying the dimensions of the convolution
window.
• strides: An integer or tuple/list of n integers, specifying the strides of the convolution.
= 1.
• data_format: A string, one of "channels_last" (default) or "channels_first". It
defaults to the image_data_format value found in your Keras config file at
~/.keras/keras.json. If you never set it, then it will be "channels_last".
• dilation_rate: An integer or tuple/list of n integers, specifying the dilation rate to use for dilated
• activation: Activation function to use (see activations).
• unit_forget_bias: Boolean. If True, add 1 to the bias of the forget gate at initialization. Use in
combination with bias_initializer="zeros". This is recommended in Jozefowicz et
al. (2015).
regularizer).
constraints).
inputs.
SimpleRNNCell
keras.layers.SimpleRNNCell(units, activation='tanh', use_bias=True,
bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None,
bias_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)
Cell class for SimpleRNN.

Arguments
regularizer).
constraints).
inputs.
GRUCell
keras.layers.GRUCell(units, activation='tanh', recurrent_activation='sigmoid',
kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,
dropout=0.0, recurrent_dropout=0.0, implementation=2, reset_after=False)
Cell class for the GRU layer.
Arguments
regularizer).
constraints).
inputs.
• reset_after: GRU convention (whether to apply reset gate after or before matrix multiplication).
False = "before" (default), True = "after" (CuDNN compatible).
LSTMCell
keras.layers.LSTMCell(units, activation='tanh', recurrent_activation='sigmoid',
bias_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=2)
Cell class for the LSTM layer.

Arguments
"linear" activation: a(x) = x).x
regularizer).
constraints).
inputs.
CuDNNGRU
keras.layers.CuDNNGRU(units, kernel_initializer='glorot_uniform',
activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
bias_constraint=None, return_sequences=False, return_state=False, stateful=False)
Fast GRU implementation backed by CuDNN.

Can only be run on GPU, with the TensorFlow backend.
Arguments
regularizer).
(see regularizer).
constraints).
• return_sequences: Boolean. Whether to return the last output. in the output sequence, or the
full sequence.
CuDNNLSTM
keras.layers.CuDNNLSTM(units, kernel_initializer='glorot_uniform',
recurrent_constraint=None, bias_constraint=None, return_sequences=False,
return_state=False, stateful=False)
Fast LSTM implementation with CuDNN.

Can only be run on GPU, with the TensorFlow backend.
Arguments
regularizer).
(see regularizer).
constraints).
• return_sequences: Boolean. Whether to return the last output. in the output sequence, or the
full sequence.
keras.layers.Embedding(input_dim, output_dim, embeddings_initializer='uniform',
embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None,
mask_zero=False, input_length=None)
Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -
0.2]]
This layer can only be used as the first layer in a model.
Example
model.add(Embedding(1000, 64, input_length=10))
# the model will take as input an integer matrix of size (batch, input_length).
# the largest integer (i.e. word index) in the input should be
# no larger than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch dimension.
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
assert output_array.shape == (32, 10, 64)
Arguments
• input_dim: int > 0. Size of the vocabulary, i.e. maximum integer index + 1.
• output_dim: int >= 0. Dimension of the dense embedding.
• embeddings_initializer: Initializer for the embeddings matrix (see initializers).
• embeddings_regularizer: Regularizer function applied to the embeddings matrix (see
regularizer).
(see regularizer).
• embeddings_constraint: Constraint function applied to the embeddings matrix (see
constraints).
• mask_zero: Whether or not the input value 0 is a special "padding" value that should be masked
out. This is useful when using recurrent layers which may take variable length input. If this is
True then all subsequent layers in the model need to support masking or an exception will be
raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary
(input_dim should equal size of vocabulary + 1).
• input_length: Length of input sequences, when it is constant. This argument is required if you
are going to connect Flatten then Dense layers upstream (without it, the shape of the dense
outputs cannot be computed).
Input shape
2D tensor with shape: (batch_size, sequence_length).
Output shape
3D tensor with shape: (batch_size, sequence_length, output_dim).
References
Add
keras.layers.Add()
Layer that adds a list of inputs.
It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same
shape).
Examples
import keras
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
# equivalent to added = keras.layers.add([x1, x2])
added = keras.layers.Add()([x1, x2])
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
Subtract
keras.layers.Subtract()
Layer that subtracts two inputs.

It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0]
- inputs[1]), also of the same shape.
Examples
import keras
# Equivalent to subtracted = keras.layers.subtract([x1, x2])
subtracted = keras.layers.Subtract()([x1, x2])
out = keras.layers.Dense(4)(subtracted)
Multiply
keras.layers.Multiply()
Layer that multiplies (element-wise) a list of inputs.

shape).
Average
keras.layers.Average()
Layer that averages a list of inputs.

shape).
Maximum
keras.layers.Maximum()
Layer that computes the maximum (element-wise) a list of inputs.

shape).
Minimum
keras.layers.Minimum()
Layer that computes the minimum (element-wise) a list of inputs.

shape).
Concatenate
keras.layers.Concatenate(axis=-1)
Layer that concatenates a list of inputs.

It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a
single tensor, the concatenation of all inputs.
Arguments
• axis: Axis along which to concatenate.
• **kwargs: standard layer keyword arguments.
Dot
keras.layers.Dot(axes, normalize=False)
Layer that computes a dot product between samples in two tensors.
E.g. if applied to a list of two tensors a and b of shape (batch_size, n), the output will be a
tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and
b[i].
Arguments
• axes: Integer or tuple of integers, axis or axes along which to take the dot product.
• normalize: Whether to L2-normalize samples along the dot product axis before taking the dot
product. If set to True, then the output of the dot product is the cosine proximity between the
two samples.
• **kwargs: Standard layer keyword arguments.
add
keras.layers.add(inputs)
Functional interface to the Add layer.
Arguments
• inputs: A list of input tensors (at least 2).
Returns
A tensor, the sum of the inputs.
Examples
import keras
added = keras.layers.add([x1, x2])
out = keras.layers.Dense(4)(added)
subtract
keras.layers.subtract(inputs)
Functional interface to the Subtract layer.
Arguments
• inputs: A list of input tensors (exactly 2).
Returns
A tensor, the difference of the inputs.
Examples
import keras
subtracted = keras.layers.subtract([x1, x2])
out = keras.layers.Dense(4)(subtracted)
multiply
keras.layers.multiply(inputs)
Functional interface to the Multiply layer.
Arguments
Returns
A tensor, the element-wise product of the inputs.
average
keras.layers.average(inputs)
Functional interface to the Average layer.
Arguments
Returns
A tensor, the average of the inputs.
maximum
keras.layers.maximum(inputs)
Functional interface to the Maximum layer.
Arguments
Returns
A tensor, the element-wise maximum of the inputs.
minimum
keras.layers.minimum(inputs)
Functional interface to the Minimum layer.
Arguments
Returns
A tensor, the element-wise minimum of the inputs.
concatenate
keras.layers.concatenate(inputs, axis=-1)
Functional interface to the Concatenate layer.
Arguments
• axis: Concatenation axis.
Returns
A tensor, the concatenation of the inputs alongside axis axis.
dot
keras.layers.dot(inputs, axes, normalize=False)
Functional interface to the Dot layer.
Arguments
• axes: Integer or tuple of integers, axis or axes along which to take the dot product.
• normalize: Whether to L2-normalize samples along the dot product axis before taking the dot
product. If set to True, then the output of the dot product is the cosine proximity between the
two samples.
Returns
A tensor, the dot product of the samples from the inputs.
LeakyReLU
keras.layers.LeakyReLU(alpha=0.3)
Leaky version of a Rectified Linear Unit.

It allows a small gradient when the unit is not active: f(x) = alpha * x for x < 0, f(x) =
x for x >= 0.
Input shape
Output shape
Same shape as the input.
Arguments
• alpha: float >= 0. Negative slope coefficient.
References
• Rectifier Nonlinearities Improve Neural Network Acoustic Models
PReLU
keras.layers.PReLU(alpha_initializer='zeros', alpha_regularizer=None,
alpha_constraint=None, shared_axes=None)
Parametric Rectified Linear Unit.

It follows: f(x) = alpha * x for x < 0, f(x) = x for x >= 0, where alpha is a
learned array with the same shape as x.
Input shape
Output shape
Arguments
• alpha_initializer: initializer function for the weights.
• alpha_regularizer: regularizer for the weights.
• alpha_constraint: constraint for the weights.
• shared_axes: the axes along which to share learnable parameters for the activation function.
For example, if the incoming feature maps are from a 2D convolution with output shape
(batch, height, width, channels), and you wish to share parameters across space
so that each filter only has one set of parameters, set shared_axes=[1, 2].
References
• Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet
Classification
ELU
keras.layers.ELU(alpha=1.0)
Exponential Linear Unit.

It follows: f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0.
Input shape
Output shape
Arguments
• alpha: scale for the negative factor.
References
• Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
ThresholdedReLU
keras.layers.ThresholdedReLU(theta=1.0)
Thresholded Rectified Linear Unit.

It follows: f(x) = x for x > theta, f(x) = 0 otherwise.
Input shape
Output shape
Arguments
• theta: float >= 0. Threshold location of activation.
References
• Zero-Bias Autoencoders and the Benefits of Co-Adapting Features
Softmax
keras.layers.Softmax(axis=-1)
Softmax activation function.

Input shape
Output shape
Arguments
• axis: Integer, axis along which the softmax normalization is applied.
ReLU
keras.layers.ReLU(max_value=None, negative_slope=0.0, threshold=0.0)
Rectified Linear Unit activation function.
With default values, it returns element-wise max(x, 0).
Otherwise, it follows: f(x) = max_value for x >= max_value, f(x) = x for threshold
<= x < max_value, f(x) = negative_slope * (x - threshold) otherwise.
Input shape
Output shape
Arguments
• max_value: float >= 0. Maximum activation value.
• negative_slope: float >= 0. Negative slope coefficient.
• threshold: float. Threshold value for thresholded activation.
BatchNormalization
keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True,
scale=True, beta_initializer='zeros', gamma_initializer='ones',
moving_mean_initializer='zeros', moving_variance_initializer='ones',
beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
gamma_constraint=None)
Batch normalization layer (Ioffe and Szegedy, 2014).

Normalize the activations of the previous layer at each batch, i.e. applies a transformation that
maintains the mean activation close to 0 and the activation standard deviation close to 1.
Arguments
• axis: Integer, the axis that should be normalized (typically the features axis). For instance, after
a Conv2D layer with data_format="channels_first", set axis=1 in
BatchNormalization.
• momentum: Momentum for the moving mean and the moving variance.
• epsilon: Small float added to variance to avoid dividing by zero.
• center: If True, add offset of beta to normalized tensor. If False, beta is ignored.
• scale: If True, multiply by gamma. If False, gamma is not used. When the next layer is linear
(also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.
• beta_initializer: Initializer for the beta weight.
• gamma_initializer: Initializer for the gamma weight.
• moving_mean_initializer: Initializer for the moving mean.
• moving_variance_initializer: Initializer for the moving variance.
• beta_regularizer: Optional regularizer for the beta weight.
• gamma_regularizer: Optional regularizer for the gamma weight.
• beta_constraint: Optional constraint for the beta weight.
• gamma_constraint: Optional constraint for the gamma weight.
Input shape
Output shape
References
• Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate
Shift
GaussianNoise
keras.layers.GaussianNoise(stddev)
Apply additive zero-centered Gaussian noise.

This is useful to mitigate overfitting (you could see it as a form of random data augmentation).
Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs.
As it is a regularization layer, it is only active at training time.
Arguments
• stddev: float, standard deviation of the noise distribution.
Input shape
Output shape
GaussianDropout
keras.layers.GaussianDropout(rate)
Apply multiplicative 1-centered Gaussian noise.

As it is a regularization layer, it is only active at training time.
Arguments
• rate: float, drop probability (as with Dropout). The multiplicative noise will have standard
deviation sqrt(rate / (1 - rate)).
Input shape
Output shape
References
AlphaDropout
keras.layers.AlphaDropout(rate, noise_shape=None, seed=None)
Applies Alpha Dropout to the input.

Alpha Dropout is a Dropout that keeps mean and variance of inputs to their original values, in order
to ensure the self-normalizing property even after this dropout. Alpha Dropout fits well to Scaled
Exponential Linear Units by randomly setting activations to the negative saturation value.
Arguments
• rate: float, drop probability (as with Dropout). The multiplicative noise will have standard
deviation sqrt(rate / (1 - rate)).
• noise_shape: A 1-D Tensor of type int32, representing the shape for randomly generated
keep/drop flags.
• seed: A Python integer to use as random seed.
Input shape
Output shape
References
• Self-Normalizing Neural Networks
TimeDistributed
keras.layers.TimeDistributed(layer)
This wrapper applies a layer to every temporal slice of an input.

The input should be at least 3D, and the dimension of index one will be considered to be the temporal
dimension.
Consider a batch of 32 samples, where each sample is a sequence of 10 vectors of 16 dimensions. The
batch input shape of the layer is then (32, 10, 16), and the input_shape, not including the
samples dimension, is (10, 16).
You can then use TimeDistributed to apply a Dense layer to each of the 10 timesteps,
independently:
# as the first layer in a model
model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))
The output will then have shape (32, 10, 8).
In subsequent layers, there is no need for the input_shape:

model.add(TimeDistributed(Dense(32)))
The output will then have shape (32, 10, 32).
TimeDistributed can be used with arbitrary layers, not just Dense, for instance with a Conv2D
layer:
model.add(TimeDistributed(Conv2D(64, (3, 3)),
input_shape=(10, 299, 299, 3)))
Arguments
• layer: a layer instance.
Bidirectional
keras.engine.base_layer.wrapped_fn()
Bidirectional wrapper for RNNs.

Arguments
• layer: Recurrent instance.
• merge_mode: Mode by which outputs of the forward and backward RNNs will be combined.
One of {'sum', 'mul', 'concat', 'ave', None}. If None, the outputs will not be combined, they will
be returned as a list.
• weights: Initial weights to load in the Bidirectional model
Raises
• ValueError: In case of invalid merge_mode argument.
Examples
model.add(Bidirectional(LSTM(10, return_sequences=True),
input_shape=(5, 10)))
model.add(Bidirectional(LSTM(10)))
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
Writing your own Keras layers

For simple, stateless custom operations, you are probably better off using layers.core.Lambda
layers. But for any custom operation that has trainable weights, you should implement your own layer.
Here is the skeleton of a Keras layer, as of Keras 2.0 (if you have an older version, please upgrade).
There are only three methods you need to implement:
• build(input_shape): this is where you will define your weights. This method must set
self.built = True at the end, which can be done by calling super([Layer],
self).build().
• call(x): this is where the layer's logic lives. Unless you want your layer to support masking,
you only have to care about the first argument passed to call: the input tensor.
• compute_output_shape(input_shape): in case your layer modifies the shape of its
input, you should specify here the shape transformation logic. This allows Keras to do
automatic shape inference.
from keras.layers import Layer
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):

self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)

# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):

return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):

return (input_shape[0], self.output_dim)
It is also possible to define Keras layers which have multiple input tensors and multiple output tensors.
To do this, you should assume that the inputs and outputs of the methods build(input_shape),
call(x) and compute_output_shape(input_shape) are lists. Here is an example, similar
to the one above:
from keras.layers import Layer
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):

self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)

assert isinstance(input_shape, list)
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[0][1], self.output_dim),
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):

assert isinstance(x, list)
a, b = x
return [K.dot(a, self.kernel) + b, K.mean(b, axis=-1)]

assert isinstance(input_shape, list)
shape_a, shape_b = input_shape
return [(shape_a[0], self.output_dim), shape_b[:-1]]
The existing Keras layers provide examples of how to implement almost anything. Never hesitate to
read the source code!
TimeseriesGenerator
keras.preprocessing.sequence.TimeseriesGenerator(data, targets, length,
sampling_rate=1, stride=1, start_index=0, end_index=None, shuffle=False,
reverse=False, batch_size=128)
Utility class for generating batches of temporal data.

This class takes in a sequence of data-points gathered at equal intervals, along with time series
parameters such as stride, length of history, etc., to produce batches for training/validation.
Arguments
• data: Indexable generator (such as list or Numpy array) containing consecutive data points
(timesteps). The data should be at 2D, and axis 0 is expected to be the time dimension.
• targets: Targets corresponding to timesteps in data. It should have same length as data.
• length: Length of the output sequences (in number of timesteps).
• sampling_rate: Period between successive individual timesteps within sequences. For rate r,
timesteps data[i], data[i-r], ... data[i - length] are used for create a sample
sequence.
• stride: Period between successive output sequences. For stride s, consecutive output samples
would be centered around data[i], data[i+s], data[i+2*s], etc.
• start_index: Data points earlier than start_index will not be used in the output sequences.
This is useful to reserve part of the data for test or validation.
• end_index: Data points later than end_index will not be used in the output sequences. This is
useful to reserve part of the data for test or validation.
• shuffle: Whether to shuffle output samples, or instead draw them in chronological order.
• reverse: Boolean: if true, timesteps in each output sample will be in reverse chronological
order.
• batch_size: Number of timeseries samples in each batch (except maybe the last one).
Returns
A Sequence instance.
Examples
from keras.preprocessing.sequence import TimeseriesGenerator
import numpy as np
data = np.array([[i] for i in range(50)])

targets = np.array([[i] for i in range(50)])
data_gen = TimeseriesGenerator(data, targets,

length=10, sampling_rate=2,
batch_size=2)
assert len(data_gen) == 20
batch_0 = data_gen[0]
x, y = batch_0
assert np.array_equal(x,
np.array([[[0], [2], [4], [6], [8]],
[[1], [3], [5], [7], [9]]]))
assert np.array_equal(y,
np.array([[10], [11]]))
pad_sequences
keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32',
padding='pre', truncating='pre', value=0.0)
Pads sequences to the same length.

This function transforms a list of num_samples sequences (lists of integers) into a 2D Numpy array
of shape (num_samples, num_timesteps). num_timesteps is either the maxlen argument
if provided, or the length of the longest sequence otherwise.
Sequences that are shorter than num_timesteps are padded with value at the end.
Sequences longer than num_timesteps are truncated so that they fit the desired length. The position
where padding or truncation happens is determined by the arguments padding and truncating,
respectively.
Pre-padding is the default.
Arguments
• sequences: List of lists, where each element is a sequence.
• maxlen: Int, maximum length of all sequences.
• dtype: Type of the output sequences. To pad sequences with variable length strings, you can use
object.
• padding: String, 'pre' or 'post': pad either before or after each sequence.
• truncating: String, 'pre' or 'post': remove values from sequences larger than maxlen, either at
the beginning or at the end of the sequences.
• value: Float or String, padding value.
Returns
• x: Numpy array with shape (len(sequences), maxlen)
Raises
• ValueError: In case of invalid values for truncating or padding, or in case of invalid
shape for a sequences entry.
skipgrams
keras.preprocessing.sequence.skipgrams(sequence, vocabulary_size, window_size=4,
negative_samples=1.0, shuffle=True, categorical=False, sampling_table=None,
seed=None)
Generates skipgram word pairs.

This function transforms a sequence of word indexes (list of integers) into tuples of words of the form:
• (word, word in the same window), with label 1 (positive samples).
• (word, random word from the vocabulary), with label 0 (negative samples).
Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word
Representations in Vector Space
Arguments
• sequence: A word sequence (sentence), encoded as a list of word indices (integers). If using a
sampling_table, word indices are expected to match the rank of the words in a reference
dataset (e.g. 10 would encode the 10-th most frequently occurring token). Note that index 0 is
expected to be a non-word and will be skipped.
• vocabulary_size: Int, maximum possible word index + 1
• window_size: Int, size of sampling windows (technically half-window). The window of a word
w_i will be [i - window_size, i + window_size+1].
• negative_samples: Float >= 0. 0 for no negative (i.e. random) samples. 1 for same number as
positive samples.
• shuffle: Whether to shuffle the word couples before returning them.
• categorical: bool. if False, labels will be integers (eg. [0, 1, 1 .. ]), if True, labels will
be categorical, e.g. [[1,0],[0,1],[0,1] .. ].
• sampling_table: 1D array of size vocabulary_size where the entry i encodes the
probability to sample a word of rank i.
• seed: Random seed.
Returns
couples, labels: where couples are int pairs and labels are either 0 or 1.
Note
By convention, index 0 in the vocabulary is a non-word and will be skipped.
make_sampling_table
keras.preprocessing.sequence.make_sampling_table(size, sampling_factor=1e-05)
Generates a word rank-based probabilistic sampling table.

Used for generating the sampling_table argument for skipgrams. sampling_table[i] is
the probability of sampling the word i-th most common word in a dataset (more common words should
be sampled less frequently, for balance).
The sampling probabilities are generated according to the sampling distribution used in word2vec:
p(word) = (min(1, sqrt(word_frequency / sampling_factor) /
(word_frequency / sampling_factor)))
We assume that the word frequencies follow Zipf's law (s=1) to derive a numerical approximation of
frequency(rank):
frequency(rank) ~ 1/(rank * (log(rank) + gamma) + 1/2 - 1/(12*rank))
where gamma is the Euler-Mascheroni constant.
Arguments
• size: Int, number of possible words to sample.
• sampling_factor: The sampling factor in the word2vec formula.
Returns
A 1D Numpy array of length size where the ith entry is the probability that a word of rank i should be
sampled.
Text Preprocessing
Tokenizer
keras.preprocessing.text.Tokenizer(num_words=None, filters='!"#$%&()*+,-./:;<=>?
@[\\]^_`{|}~\t\n', lower=True, split=' ', char_level=False, oov_token=None,
document_count=0)
Text tokenization utility class.

This class allows to vectorize a text corpus, by turning each text into either a sequence of integers (each
integer being the index of a token in a dictionary) or into a vector where the coefficient for each token
could be binary, based on word count, based on tf-idf...
Arguments
• num_words: the maximum number of words to keep, based on word frequency. Only the most
common num_words-1 words will be kept.
• filters: a string where each element is a character that will be filtered from the texts. The default
is all punctuation, plus tabs and line breaks, minus the ' character.
• lower: boolean. Whether to convert the texts to lowercase.
• split: str. Separator for word splitting.
• char_level: if True, every character will be treated as a token.
• oov_token: if given, it will be added to word_index and used to replace out-of-vocabulary
words during text_to_sequence calls
By default, all punctuation is removed, turning the texts into space-separated sequences of words
(words maybe include the ' character). These sequences are then split into lists of tokens. They will
then be indexed or vectorized.
0 is a reserved index that won't be assigned to any word.
hashing_trick
keras.preprocessing.text.hashing_trick(text, n, hash_function=None, filters='!"#$
%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n', lower=True, split=' ')
Converts a text to a sequence of indexes in a fixed-size hashing space.

Arguments
• text: Input text (string).
• n: Dimension of the hashing space.
• hash_function: defaults to python hash function, can be 'md5' or any function that takes in
input a string and returns a int. Note that 'hash' is not a stable hashing function, so it is not
consistent across different runs, while 'md5' is a stable hashing function.
• filters: list (or concatenation) of characters to filter out, such as punctuation. Default: !"#$
%&()*+,-./:;<=>?@[\]^_`{|}~\t\n, includes basic punctuation, tabs, and newlines.
• lower: boolean. Whether to set the text to lowercase.
Returns
A list of integer word indices (unicity non-guaranteed).
0 is a reserved index that won't be assigned to any word.
Two or more words may be assigned to the same index, due to possible collisions by the hashing
function. The probability of a collision is in relation to the dimension of the hashing space and the
number of distinct objects.
one_hot
keras.preprocessing.text.one_hot(text, n, filters='!"#$%&()*+,-./:;<=>?
@[\\]^_`{|}~\t\n', lower=True, split=' ')
One-hot encodes a text into a list of word indexes of size n.

This is a wrapper to the hashing_trick function using hash as the hashing function; unicity of
word to index mapping non-guaranteed.
Arguments
• n: int. Size of vocabulary.
• lower: boolean. Whether to set the text to lowercase.
Returns
List of integers in [1, n]. Each integer encodes a word (unicity non-guaranteed).
text_to_word_sequence
keras.preprocessing.text.text_to_word_sequence(text, filters='!"#$%&()*+,-./:;<=>?
@[\\]^_`{|}~\t\n', lower=True, split=' ')
Converts a text to a sequence of words (or tokens).

Arguments
• lower: boolean. Whether to convert the input to lowercase.
Returns
A list of words (or tokens).
Image Preprocessing
ImageDataGenerator class
keras.preprocessing.image.ImageDataGenerator(featurewise_center=False,
samplewise_center=False, featurewise_std_normalization=False,
samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06,
rotation_range=0, width_shift_range=0.0, height_shift_range=0.0,
brightness_range=None, shear_range=0.0, zoom_range=0.0, channel_shift_range=0.0,
fill_mode='nearest', cval=0.0, horizontal_flip=False, vertical_flip=False,
rescale=None, preprocessing_function=None, data_format='channels_last',
validation_split=0.0, interpolation_order=1, dtype='float32')
Generate batches of tensor image data with real-time data augmentation. The data will be looped over
(in batches).
Arguments
• featurewise_center: Boolean. Set input mean to 0 over the dataset, feature-wise.
• samplewise_center: Boolean. Set each sample mean to 0.
• featurewise_std_normalization: Boolean. Divide inputs by std of the dataset, feature-wise.
• samplewise_std_normalization: Boolean. Divide each input by its std.
• zca_epsilon: epsilon for ZCA whitening. Default is 1e-6.
• zca_whitening: Boolean. Apply ZCA whitening.
• rotation_range: Int. Degree range for random rotations.
• width_shift_range: Float, 1-D array-like or int
• float: fraction of total width, if < 1, or pixels if >= 1.
• 1-D array-like: random elements from the array.
• int: integer number of pixels from interval (-width_shift_range,
+width_shift_range)
• With width_shift_range=2 possible values are integers [-1, 0, +1], same as
with width_shift_range=[-1, 0, +1], while with
width_shift_range=1.0 possible values are floats in the interval [-1.0, +1.0).
• height_shift_range: Float, 1-D array-like or int
• float: fraction of total height, if < 1, or pixels if >= 1.
• 1-D array-like: random elements from the array.
• int: integer number of pixels from interval (-height_shift_range,
+height_shift_range)
• With height_shift_range=2 possible values are integers [-1, 0, +1], same
as with height_shift_range=[-1, 0, +1], while with
height_shift_range=1.0 possible values are floats in the interval [-1.0, +1.0).
• brightness_range: Tuple or list of two floats. Range for picking a brightness shift value from.
• shear_range: Float. Shear Intensity (Shear angle in counter-clockwise direction in degrees)
• zoom_range: Float or [lower, upper]. Range for random zoom. If a float, [lower, upper]
= [1-zoom_range, 1+zoom_range].
• channel_shift_range: Float. Range for random channel shifts.
• fill_mode: One of {"constant", "nearest", "reflect" or "wrap"}. Default is 'nearest'. Points
outside the boundaries of the input are filled according to the given mode:
• 'constant': kkkkkkkk|abcd|kkkkkkkk (cval=k)
• 'nearest': aaaaaaaa|abcd|dddddddd
• 'reflect': abcddcba|abcd|dcbaabcd
• 'wrap': abcdabcd|abcd|abcdabcd
• cval: Float or Int. Value used for points outside the boundaries when fill_mode =
"constant".
• horizontal_flip: Boolean. Randomly flip inputs horizontally.
• vertical_flip: Boolean. Randomly flip inputs vertically.
• rescale: rescaling factor. Defaults to None. If None or 0, no rescaling is applied, otherwise we
multiply the data by the value provided (after applying all other transformations).
• preprocessing_function: function that will be applied on each input. The function will run after
the image is resized and augmented. The function should take one argument: one image
(Numpy tensor with rank 3), and should output a Numpy tensor with the same shape.
• data_format: Image data format, either "channels_first" or "channels_last". "channels_last"
mode means that the images should have shape (samples, height, width,
channels), "channels_first" mode means that the images should have shape (samples,
channels, height, width). It defaults to the image_data_format value found in
your Keras config file at ~/.keras/keras.json. If you never set it, then it will be
"channels_last".
• validation_split: Float. Fraction of images reserved for validation (strictly between 0 and 1).
• dtype: Dtype to use for the generated arrays.
Examples
Example of using .flow(x, y):
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
# compute quantities required for featurewise normalization

# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(x_train)
# fits the model on batches with real-time data augmentation:

model.fit_generator(datagen.flow(x_train, y_train, batch_size=32),
steps_per_epoch=len(x_train) / 32, epochs=epochs)
# here's a more "manual" example

for e in range(epochs):
print('Epoch', e)
batches = 0
for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=32):
model.fit(x_batch, y_batch)
batches += 1
if batches >= len(x_train) / 32:
# we need to break the loop by hand because
# the generator loops indefinitely
break
Example of using .flow_from_directory(directory):

train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
batch_size=32,
model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=50,
validation_data=validation_generator,
validation_steps=800)
Example of transforming images and masks together.

# we create two instances with the same arguments
data_gen_args = dict(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=90,
zoom_range=0.2)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_datagen.fit(images, augment=True, seed=seed)
mask_datagen.fit(masks, augment=True, seed=seed)
image_generator = image_datagen.flow_from_directory(
'data/images',
class_mode=None,
seed=seed)
mask_generator = mask_datagen.flow_from_directory(
'data/masks',
class_mode=None,
seed=seed)
# combine generators into one which yields image and masks

train_generator = zip(image_generator, mask_generator)
train_generator,
epochs=50)
Example of using .flow_from_dataframe(dataframe, directory,:
train_df = pandas.read_csv("./train.csv")
valid_df = pandas.read_csv("./valid.csv")
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_dataframe(
dataframe=train_df,
directory='data/train',
x_col="filename",
y_col="class",
batch_size=32,
validation_generator = test_datagen.flow_from_dataframe(
dataframe=valid_df,
directory='data/validation',
x_col="filename",
y_col="class",
batch_size=32,
train_generator,
epochs=50,
validation_data=validation_generator,
validation_steps=800)
ImageDataGenerator methods
apply_transform
apply_transform(x, transform_parameters)
Applies a transformation to an image according to given parameters.

Arguments
• x: 3D tensor, single image.
• transform_parameters: Dictionary with string - parameter pairs describing the transformation.
Currently, the following parameters from the dictionary are used:
• 'theta': Float. Rotation angle in degrees.
• 'tx': Float. Shift in the x direction.
• 'ty': Float. Shift in the y direction.
• 'shear': Float. Shear angle in degrees.
• 'zx': Float. Zoom in the x direction.
• 'zy': Float. Zoom in the y direction.
• 'flip_horizontal': Boolean. Horizontal flip.
• 'flip_vertical': Boolean. Vertical flip.
• 'channel_shift_intencity': Float. Channel shift intensity.
• 'brightness': Float. Brightness shift intensity.
Returns
A transformed version of the input (same shape).
fit
fit(x, augment=False, rounds=1, seed=None)
Fits the data generator to some sample data.

This computes the internal data stats related to the data-dependent transformations, based on an array of
sample data.
Only required if featurewise_center or featurewise_std_normalization or
zca_whitening are set to True.
Arguments
• x: Sample data. Should have rank 4. In case of grayscale data, the channels axis should have
value 1, in case of RGB data, it should have value 3, and in case of RGBA data, it should have
value 4.
• augment: Boolean (default: False). Whether to fit on randomly augmented samples.
• rounds: Int (default: 1). If using data augmentation (augment=True), this is how many
augmentation passes over the data to use.
• seed: Int (default: None). Random seed.
flow
flow(x, y=None, batch_size=32, shuffle=True, sample_weight=None, seed=None,
save_to_dir=None, save_prefix='', save_format='png', subset=None)
Takes data & label arrays, generates batches of augmented data.

Arguments
• x: Input data. Numpy array of rank 4 or a tuple. If tuple, the first element should contain the
images and the second element another numpy array or a list of numpy arrays that gets passed
to the output without any modifications. Can be used to feed the model miscellaneous data
along with the images. In case of grayscale data, the channels axis of the image array should
have value 1, in case of RGB data, it should have value 3, and in case of RGBA data, it should
have value 4.
• y: Labels.
• batch_size: Int (default: 32).
• shuffle: Boolean (default: True).
• sample_weight: Sample weights.
• seed: Int (default: None).
• save_to_dir: None or str (default: None). This allows you to optionally specify a directory to
which to save the augmented pictures being generated (useful for visualizing what you are
doing).
• save_prefix: Str (default: ''). Prefix to use for filenames of saved pictures (only relevant if
save_to_dir is set).
• save_format: one of "png", "jpeg" (only relevant if save_to_dir is set). Default: "png".
• subset: Subset of data ("training" or "validation") if validation_split is set in
ImageDataGenerator.
Returns
An Iterator yielding tuples of (x, y) where x is a numpy array of image data (in the case of a
single image input) or a list of numpy arrays (in the case with additional inputs) and y is a numpy array
of corresponding labels. If 'sample_weight' is not None, the yielded tuples are of the form (x, y,
sample_weight). If y is None, only the numpy array x is returned.
flow_from_dataframe
flow_from_dataframe(dataframe, directory=None, x_col='filename', y_col='class',
weight_col=None, target_size=(256, 256), color_mode='rgb', classes=None,
class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None,
save_prefix='', save_format='png', subset=None, interpolation='nearest',
validate_filenames=True)
Takes the dataframe and the path to a directory and generates batches of augmented/normalized data.
A simple tutorial can be found here.
Arguments
• dataframe: Pandas dataframe containing the filepaths relative to directory (or absolute
paths if directory is None) of the images in a string column. It should include other column/
s depending on the class_mode:
• if class_mode is "categorical" (default value) it must include the y_col
column with the class/es of each image. Values in column can be string/list/tuple if a
single class or list/tuple if multiple classes.
• if class_mode is "binary" or "sparse" it must include the given y_col column
with class values as strings.
• if class_mode is "raw" or "multi_output" it should contain
the columns specified in y_col.
• if class_mode is "input" or None no extra column is needed.

• directory: string, path to the directory to read images from. If None, data in x_col
column should be absolute paths.
• x_col: string, column in dataframe that contains the filenames (or absolute paths if
directory is None).
• y_col: string or list, column/s in dataframe that has the target data.
• weight_col: string, column in dataframe that contains the sample weights. Default:
None.
• target_size: tuple of integers (height, width), default: (256, 256). The
dimensions to which all images found will be resized.
• color_mode: one of "grayscale", "rgb", "rgba". Default: "rgb". Whether the images will
be converted to have 1 or 3 color channels.
• classes: optional list of classes (e.g. ['dogs', 'cats']). Default: None. If not
provided, the list of classes will be automatically inferred from the y_col, which will
map to the label indices, will be alphanumeric). The dictionary containing the mapping
from class names to class indices can be obtained via the attribute class_indices.
• class_mode: one of "binary", "categorical", "input", "multi_output", "raw", sparse" or
None. Default: "categorical". Mode for yielding the targets:
• "binary": 1D numpy array of binary labels,
• "categorical": 2D numpy array of one-hot encoded labels. Supports multi-label
output.
• "input": images identical to input images (mainly used to work with autoencoders),
• "multi_output": list with the values of the different columns,
• "raw": numpy array of values in y_col column(s),
• "sparse": 1D numpy array of integer labels,
• None, no targets are returned (the generator will only yield batches of image data, which
is useful to use in model.predict_generator()).
• batch_size: size of the batches of data (default: 32).
• shuffle: whether to shuffle the data (default: True)
• seed: optional random seed for shuffling and transformations.
• save_to_dir: None or str (default: None). This allows you to optionally specify a
directory to which to save the augmented pictures being generated (useful for visualizing
what you are doing).
• save_prefix: str. Prefix to use for filenames of saved pictures (only relevant if
save_to_dir is set).
• save_format: one of "png", "jpeg" (only relevant if save_to_dir is set). Default:
"png".
• follow_links: whether to follow symlinks inside class subdirectories (default: False).
• subset: Subset of data ("training" or "validation") if validation_split
is set in ImageDataGenerator.
• interpolation: Interpolation method used to resample the image if the target size is
different from that of the loaded image. Supported methods are "nearest",
"bilinear", and "bicubic". If PIL version 1.1.3 or newer is installed,
"lanczos" is also supported. If PIL version 3.4.0 or newer is installed, "box" and
"hamming" are also supported. By default, "nearest" is used.
• validate_filenames: Boolean, whether to validate image filenames in x_col. If True,
invalid images will be ignored. Disabling this option can lead to speed-up in the
execution of this function. Default: True.
Returns
A DataFrameIterator yielding tuples of (x, y) where x is a numpy array containing a batch of
images with shape (batch_size, *target_size, channels) and y is a numpy array of
corresponding labels.
flow_from_directory
flow_from_directory(directory, target_size=(256, 256), color_mode='rgb',
classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None,
save_to_dir=None, save_prefix='', save_format='png', follow_links=False,
subset=None, interpolation='nearest')
Takes the path to a directory & generates batches of augmented data.

Arguments
• directory: string, path to the target directory. It should contain one subdirectory per class. Any
PNG, JPG, BMP, PPM or TIF images inside each of the subdirectories directory tree will be
included in the generator. See this script for more details.
• target_size: Tuple of integers (height, width), default: (256, 256). The dimensions
to which all images found will be resized.
• color_mode: One of "grayscale", "rgb", "rgba". Default: "rgb". Whether the images will be
converted to have 1, 3, or 4 channels.
• classes: Optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. If not
provided, the list of classes will be automatically inferred from the subdirectory names/structure
under directory, where each subdirectory will be treated as a different class (and the order
of the classes, which will map to the label indices, will be alphanumeric). The dictionary
containing the mapping from class names to class indices can be obtained via the attribute
class_indices.
• class_mode: One of "categorical", "binary", "sparse", "input", or None. Default: "categorical".
Determines the type of label arrays that are returned:
• "categorical" will be 2D one-hot encoded labels,
• "binary" will be 1D binary labels, "sparse" will be 1D integer labels,
• "input" will be images identical to input images (mainly used to work with
autoencoders).
• If None, no labels are returned (the generator will only yield batches of image data,
which is useful to use with model.predict_generator()). Please note that in
case of class_mode None, the data still needs to reside in a subdirectory of directory
for it to work correctly.
• batch_size: Size of the batches of data (default: 32).
• shuffle: Whether to shuffle the data (default: True) If set to False, sorts the data in alphanumeric
order.
• seed: Optional random seed for shuffling and transformations.
• save_to_dir: None or str (default: None). This allows you to optionally specify a directory to
which to save the augmented pictures being generated (useful for visualizing what you are
doing).
• save_prefix: Str. Prefix to use for filenames of saved pictures (only relevant if save_to_dir
is set).
• save_format: One of "png", "jpeg" (only relevant if save_to_dir is set). Default: "png".
• follow_links: Whether to follow symlinks inside class subdirectories (default: False).
• subset: Subset of data ("training" or "validation") if validation_split is set in
ImageDataGenerator.
• interpolation: Interpolation method used to resample the image if the target size is different
from that of the loaded image. Supported methods are "nearest", "bilinear", and
"bicubic". If PIL version 1.1.3 or newer is installed, "lanczos" is also supported. If PIL
version 3.4.0 or newer is installed, "box" and "hamming" are also supported. By default,
"nearest" is used.
Returns
A DirectoryIterator yielding tuples of (x, y) where x is a numpy array containing a batch of
images with shape (batch_size, *target_size, channels) and y is a numpy array of
corresponding labels.
get_random_transform
get_random_transform(img_shape, seed=None)
Generates random parameters for a transformation.

Arguments
• img_shape: Tuple of integers. Shape of the image that is transformed.
Returns
A dictionary containing randomly chosen parameters describing the transformation.
random_transform
random_transform(x, seed=None)
Applies a random transformation to an image.

Arguments
• x: 3D tensor, single image.
Returns
A randomly transformed version of the input (same shape).
standardize
standardize(x)
Applies the normalization configuration in-place to a batch of inputs.

x is changed in-place since the function is mainly used internally to standarize images and feed them to
your network. If a copy of x would be created instead it would have a significant performance cost. If
you want to apply this method without changing the input in-place you can call the method creating a
copy before:
standarize(np.copy(x))
Arguments
• x: Batch of inputs to be normalized.
Returns
The inputs, normalized.
Usage of loss functions
A loss function (or objective function, or optimization score function) is one of the two parameters
required to compile a model:
model.compile(loss='mean_squared_error', optimizer='sgd')
from keras import losses
model.compile(loss=losses.mean_squared_error, optimizer='sgd')
You can either pass the name of an existing loss function, or pass a TensorFlow/Theano symbolic
function that returns a scalar for each data-point and takes the following two arguments:
• y_true: True labels. TensorFlow/Theano tensor.
• y_pred: Predictions. TensorFlow/Theano tensor of the same shape as y_true.
The actual optimized objective is the mean of the output array across all datapoints.
For a few examples of such functions, check out the losses source.
Available loss functions

mean_squared_error
keras.losses.mean_squared_error(y_true, y_pred)
mean_absolute_error
keras.losses.mean_absolute_error(y_true, y_pred)
mean_absolute_percentage_error
keras.losses.mean_absolute_percentage_error(y_true, y_pred)
mean_squared_logarithmic_error
keras.losses.mean_squared_logarithmic_error(y_true, y_pred)
squared_hinge
keras.losses.squared_hinge(y_true, y_pred)
hinge
keras.losses.hinge(y_true, y_pred)
categorical_hinge
keras.losses.categorical_hinge(y_true, y_pred)
logcosh
keras.losses.logcosh(y_true, y_pred)
Logarithm of the hyperbolic cosine of the prediction error.

log(cosh(x)) is approximately equal to (x ** 2) / 2 for small x and to abs(x) -
log(2) for large x. This means that 'logcosh' works mostly like the mean squared error, but will not
be so strongly affected by the occasional wildly incorrect prediction.
Arguments
• y_true: tensor of true targets.
• y_pred: tensor of predicted targets.
Returns
Tensor with one scalar loss entry per sample.
huber_loss
keras.losses.huber_loss(y_true, y_pred, delta=1.0)
categorical_crossentropy
keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=False,
label_smoothing=0)
sparse_categorical_crossentropy
keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False,
axis=-1)
binary_crossentropy
keras.losses.binary_crossentropy(y_true, y_pred, from_logits=False,
label_smoothing=0)
kullback_leibler_divergence
keras.losses.kullback_leibler_divergence(y_true, y_pred)
poisson
keras.losses.poisson(y_true, y_pred)
cosine_proximity
keras.losses.cosine_proximity(y_true, y_pred, axis=-1)
is_categorical_crossentropy
keras.losses.is_categorical_crossentropy(loss)
Note: when using the categorical_crossentropy loss, your targets should be in categorical
format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is
all-zeros except for a 1 at the index corresponding to the class of the sample). In order to convert
integer targets into categorical targets, you can use the Keras utility to_categorical:
from keras.utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=None)
When using the sparse_categorical_crossentropy loss, your targets should be integer

targets. If you have categorical targets, you should use categorical_crossentropy.
categorical_crossentropy is another term for multi-class log loss.
Usage of metrics
A metric is a function that is used to judge the performance of your model. Metric functions are to be
supplied in the metrics parameter when a model is compiled.
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['mae', 'acc'])
from keras import metrics
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=[metrics.mae, metrics.categorical_accuracy])
A metric function is similar to a loss function, except that the results from evaluating a metric are not
used when training the model. You may use any of the loss functions as a metric function.
You can either pass the name of an existing metric, or pass a Theano/TensorFlow symbolic function
(see Custom metrics).
Arguments
• y_true: True labels. Theano/TensorFlow tensor.
• y_pred: Predictions. Theano/TensorFlow tensor of the same shape as y_true.
Returns
Single tensor value representing the mean of the output array across all datapoints.
Available metrics
accuracy
keras.metrics.accuracy(y_true, y_pred)
binary_accuracy
keras.metrics.binary_accuracy(y_true, y_pred, threshold=0.5)
categorical_accuracy
keras.metrics.categorical_accuracy(y_true, y_pred)
sparse_categorical_accuracy
keras.metrics.sparse_categorical_accuracy(y_true, y_pred)
top_k_categorical_accuracy
keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=5)
sparse_top_k_categorical_accuracy
keras.metrics.sparse_top_k_categorical_accuracy(y_true, y_pred, k=5)
cosine_proximity
keras.metrics.cosine_proximity(y_true, y_pred, axis=-1)
clone_metric
keras.metrics.clone_metric(metric)
Returns a clone of the metric if stateful, otherwise returns it as

is.
clone_metrics
keras.metrics.clone_metrics(metrics)
Clones the given metric list/dict.

In addition to the metrics above, you may use any of the loss functions described in the loss function
page as metrics.
Custom metrics
Custom metrics can be passed at the compilation step. The function would need to take (y_true,
y_pred) as arguments and return a single tensor value.
import keras.backend as K
def mean_pred(y_true, y_pred):

return K.mean(y_pred)
metrics=['accuracy', mean_pred])
Usage of optimizers
An optimizer is one of the two arguments required for compiling a Keras model:
from keras import optimizers
model.add(Dense(64, kernel_initializer='uniform', input_shape=(10,)))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

model.compile(loss='mean_squared_error', optimizer=sgd)
You can either instantiate an optimizer before passing it to model.compile() , as in the above
example, or you can call it by its name. In the latter case, the default parameters for the optimizer will
be used.
# pass optimizer by name: default parameters will be used
model.compile(loss='mean_squared_error', optimizer='sgd')
Parameters common to all Keras optimizers

The parameters clipnorm and clipvalue can be used with all optimizers to control gradient
clipping:
# All parameter gradients will be clipped to

# a maximum norm of 1.
sgd = optimizers.SGD(lr=0.01, clipnorm=1.)
# All parameter gradients will be clipped to

# a maximum value of 0.5 and
# a minimum value of -0.5.
sgd = optimizers.SGD(lr=0.01, clipvalue=0.5)
SGD
keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False)
Stochastic gradient descent optimizer.

Includes support for momentum, learning rate decay, and Nesterov momentum.
Arguments
• learning_rate: float >= 0. Learning rate.
• momentum: float >= 0. Parameter that accelerates SGD in the relevant direction and dampens
oscillations.
• nesterov: boolean. Whether to apply Nesterov momentum.
RMSprop
keras.optimizers.RMSprop(learning_rate=0.001, rho=0.9)
RMSProp optimizer.
It is recommended to leave the parameters of this optimizer at their default values (except the learning
rate, which can be freely tuned).
Arguments
• rho: float >= 0.
References
• rmsprop: Divide the gradient by a running average of its recent magnitude
Adagrad
keras.optimizers.Adagrad(learning_rate=0.01)
Adagrad optimizer.
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how
frequently a parameter gets updated during training. The more updates a parameter receives, the
smaller the learning rate.
It is recommended to leave the parameters of this optimizer at their default values.
Arguments
• learning_rate: float >= 0. Initial learning rate.
References
• Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Adadelta
keras.optimizers.Adadelta(learning_rate=1.0, rho=0.95)
Adadelta optimizer.
Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window
of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning
even when many updates have been done. Compared to Adagrad, in the original version of Adadelta
you don't have to set an initial learning rate. In this version, initial learning rate and decay factor can be
set, as in most other Keras optimizers.
It is recommended to leave the parameters of this optimizer at their default values.
Arguments
• learning_rate: float >= 0. Initial learning rate, defaults to 1. It is recommended to leave it at the
default value.
• rho: float >= 0. Adadelta decay factor, corresponding to fraction of gradient to keep at each
time step.
References
• Adadelta - an adaptive learning rate method
Adam
keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
Adam optimizer.
Default parameters follow those provided in the original paper.
Arguments
• beta_1: float, 0 < beta < 1. Generally close to 1.
• amsgrad: boolean. Whether to apply the AMSGrad variant of this algorithm from the paper
"On the Convergence of Adam and Beyond".
References
• Adam - A Method for Stochastic Optimization
• On the Convergence of Adam and Beyond
Adamax
keras.optimizers.Adamax(learning_rate=0.002, beta_1=0.9, beta_2=0.999)
Adamax optimizer from Adam paper's Section 7.

It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the
paper.
Arguments
References
• Adam - A Method for Stochastic Optimization
Nadam
keras.optimizers.Nadam(learning_rate=0.002, beta_1=0.9, beta_2=0.999)
Nesterov Adam optimizer.

Much like Adam is essentially RMSprop with momentum, Nadam is RMSprop with Nesterov
momentum.
Default parameters follow those provided in the paper. It is recommended to leave the parameters of
this optimizer at their default values.
Arguments
References
• Nadam report
• On the importance of initialization and momentum in deep learning
Usage of activations
Activations can either be used through an Activation layer, or through the activation argument
supported by all forward layers:
from keras.layers import Activation, Dense
model.add(Activation('tanh'))
This is equivalent to:

model.add(Dense(64, activation='tanh'))
You can also pass an element-wise TensorFlow/Theano/CNTK function as an activation:

model.add(Dense(64, activation=K.tanh))
Available activations
elu
keras.activations.elu(x, alpha=1.0)
Exponential linear unit.

Arguments
• x: Input tensor.
• alpha: A scalar, slope of negative section.
Returns
The exponential linear activation: x if x > 0 and alpha * (exp(x)-1) if x < 0.
References
• Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
softmax
keras.activations.softmax(x, axis=-1)
Softmax activation function.

Arguments
• axis: Integer, axis along which the softmax normalization is applied.
Returns
Tensor, output of softmax transformation.
Raises
• ValueError: In case dim(x) == 1.
selu
keras.activations.selu(x)
Scaled Exponential Linear Unit (SELU).

SELU is equal to: scale * elu(x, alpha), where alpha and scale are predefined constants. The
values of alpha and scale are chosen so that the mean and variance of the inputs are preserved
between two consecutive layers as long as the weights are initialized correctly (see lecun_normal
initialization) and the number of inputs is "large enough" (see references for more information).
Arguments
• x: A tensor or variable to compute the activation function for.
Returns
The scaled exponential unit activation: scale * elu(x, alpha).
Note
• To be used together with the initialization "lecun_normal".
• To be used together with the dropout variant "AlphaDropout".
References
softplus
keras.activations.softplus(x)
Softplus activation function.

Arguments
Returns
The softplus activation: log(exp(x) + 1).
softsign
keras.activations.softsign(x)
Softsign activation function.

Arguments
Returns
The softsign activation: x / (abs(x) + 1).
relu
keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0.0)
Rectified Linear Unit.

<= x < max_value, f(x) = alpha * (x - threshold) otherwise.
Arguments
• alpha: float. Slope of the negative part. Defaults to zero.
• max_value: float. Saturation threshold.
Returns
A tensor.
tanh
keras.activations.tanh(x)
Hyperbolic tangent activation function.

Arguments
Returns
The hyperbolic activation: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
sigmoid
keras.activations.sigmoid(x)
Sigmoid activation function.

Arguments
Returns
The sigmoid activation: 1 / (1 + exp(-x)).
hard_sigmoid
keras.activations.hard_sigmoid(x)
Hard sigmoid activation function.

Faster to compute than sigmoid activation.
Arguments
Returns
Hard sigmoid activation:
• 0 if x < -2.5
• 1 if x > 2.5
• 0.2 * x + 0.5 if -2.5 <= x <= 2.5.
exponential
keras.activations.exponential(x)
Exponential (base e) activation function.

Arguments
Returns
Exponential activation: exp(x).
linear
keras.activations.linear(x)
Linear (i.e. identity) activation function.

Arguments
Returns
Input tensor, unchanged.
On "Advanced Activations"
Activations that are more complex than a simple TensorFlow/Theano/CNTK function (eg. learnable
activations, which maintain a state) are available as Advanced Activation layers, and can be found in
the module keras.layers.advanced_activations. These include PReLU and LeakyReLU.
Usage of callbacks
A callback is a set of functions to be applied at given stages of the training procedure. You can use
callbacks to get a view on internal states and statistics of the model during training. You can pass a list
of callbacks (as the keyword argument callbacks) to the .fit() method of the Sequential or
Model classes. The relevant methods of the callbacks will then be called at each stage of the training.
Callback
keras.callbacks.callbacks.Callback()
Abstract base class used to build new callbacks.

Properties
• params: dict. Training parameters (eg. verbosity, batch size, number of epochs...).
• model: instance of keras.models.Model. Reference of the model being trained.
The logs dictionary that callback methods take as argument will contain keys for quantities relevant
to the current batch or epoch.
Currently, the .fit() method of the Sequential model class will include the following quantities
in the logs that it passes to its callbacks:
on_epoch_end: logs include acc and loss, and optionally include val_loss (if validation is
enabled in fit), and val_acc (if validation and accuracy monitoring are enabled). on_batch_begin:
logs include size, the number of samples in the current batch. on_batch_end: logs include loss, and
optionally acc (if accuracy monitoring is enabled).
BaseLogger
keras.callbacks.callbacks.BaseLogger(stateful_metrics=None)
Callback that accumulates epoch averages of metrics.

This callback is automatically applied to every Keras model.
Arguments
• stateful_metrics: Iterable of string names of metrics that should not be averaged over an epoch.
Metrics in this list will be logged as-is in on_epoch_end. All others will be averaged in
on_epoch_end.
TerminateOnNaN
keras.callbacks.callbacks.TerminateOnNaN()
Callback that terminates training when a NaN loss is encountered.
ProgbarLogger
keras.callbacks.callbacks.ProgbarLogger(count_mode='samples',
stateful_metrics=None)
Callback that prints metrics to stdout.

Arguments
• count_mode: One of "steps" or "samples". Whether the progress bar should count samples seen
or steps (batches) seen.
• stateful_metrics: Iterable of string names of metrics that should not be averaged over an epoch.
Metrics in this list will be logged as-is. All others will be averaged over time (e.g. loss, etc).
Raises
• ValueError: In case of invalid count_mode.
History
keras.callbacks.callbacks.History()
Callback that records events into a History object.
This callback is automatically applied to every Keras model. The History object gets returned by the
fit method of models.
ModelCheckpoint
keras.callbacks.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0,
save_best_only=False, save_weights_only=False, mode='auto', period=1)
Save the model after every epoch.

filepath can contain named formatting options, which will be filled with the values of epoch and
keys in logs (passed in on_epoch_end).
For example: if filepath is weights.{epoch:02d}-{val_loss:.2f}.hdf5, then the

model checkpoints will be saved with the epoch number and the validation loss in the filename.
Arguments
• filepath: string, path to save the model file.
• monitor: quantity to monitor.
• save_best_only: if save_best_only=True, the latest best model according to the quantity
monitored will not be overwritten.
• save_weights_only: if True, then only the model's weights will be saved
(model.save_weights(filepath)), else the full model is saved
(model.save(filepath)).
• mode: one of {auto, min, max}. If save_best_only=True, the decision to overwrite the
current save file is made based on either the maximization or the minimization of the monitored
quantity. For val_acc, this should be max, for val_loss this should be min, etc. In auto
mode, the direction is automatically inferred from the name of the monitored quantity.
• period: Interval (number of epochs) between checkpoints.
EarlyStopping
keras.callbacks.callbacks.EarlyStopping(monitor='val_loss', min_delta=0,
patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)
Stop training when a monitored quantity has stopped improving.

Arguments
• monitor: quantity to be monitored.
• min_delta: minimum change in the monitored quantity to qualify as an improvement, i.e. an
absolute change of less than min_delta, will count as no improvement.
• patience: number of epochs that produced the monitored quantity with no improvement after
which training will be stopped. Validation quantities may not be produced for every epoch, if
the validation frequency (model.fit(validation_freq=5)) is greater than one.
• verbose: verbosity mode.
• mode: one of {auto, min, max}. In min mode, training will stop when the quantity monitored
has stopped decreasing; in max mode it will stop when the quantity monitored has stopped
increasing; in auto mode, the direction is automatically inferred from the name of the
monitored quantity.
• baseline: Baseline value for the monitored quantity to reach. Training will stop if the model
doesn't show improvement over the baseline.
• restore_best_weights: whether to restore model weights from the epoch with the best value of
the monitored quantity. If False, the model weights obtained at the last step of training are used.
RemoteMonitor
keras.callbacks.callbacks.RemoteMonitor(root='http://localhost:9000',
path='/publish/epoch/end/', field='data', headers=None, send_as_json=False)
Callback used to stream events to a server.

Requires the requests library. Events are sent to root + '/publish/epoch/end/' by
default. Calls are HTTP POST, with a data argument which is a JSON-encoded dictionary of event
data. If send_as_json is set to True, the content type of the request will be application/json. Otherwise
the serialized JSON will be send within a form
Arguments
• root: String; root url of the target server.
• path: String; path relative to root to which the events will be sent.
• field: String; JSON field under which the data will be stored. The field is used only if the
payload is sent within a form (i.e. send_as_json is set to False).
• headers: Dictionary; optional custom HTTP headers.
• send_as_json: Boolean; whether the request should be send as application/json.
LearningRateScheduler
keras.callbacks.callbacks.LearningRateScheduler(schedule, verbose=0)
Learning rate scheduler.

Arguments
• schedule: a function that takes an epoch index as input (integer, indexed from 0) and current
learning rate and returns a new learning rate as output (float).
• verbose: int. 0: quiet, 1: update messages.
ReduceLROnPlateau
keras.callbacks.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1,
patience=10, verbose=0, mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)
Reduce learning rate when a metric has stopped improving.

Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This
callback monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the
learning rate is reduced.
Example
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.001)
model.fit(X_train, Y_train, callbacks=[reduce_lr])
Arguments
• monitor: quantity to be monitored.
• factor: factor by which the learning rate will be reduced. new_lr = lr * factor
• patience: number of epochs that produced the monitored quantity with no improvement after
which training will be stopped. Validation quantities may not be produced for every epoch, if
the validation frequency (model.fit(validation_freq=5)) is greater than one.
• verbose: int. 0: quiet, 1: update messages.
• mode: one of {auto, min, max}. In min mode, lr will be reduced when the quantity monitored
has stopped decreasing; in max mode it will be reduced when the quantity monitored has
stopped increasing; in auto mode, the direction is automatically inferred from the name of the
monitored quantity.
• min_delta: threshold for measuring the new optimum, to only focus on significant changes.
• cooldown: number of epochs to wait before resuming normal operation after lr has been
reduced.
• min_lr: lower bound on the learning rate.
CSVLogger
keras.callbacks.callbacks.CSVLogger(filename, separator=',', append=False)
Callback that streams epoch results to a csv file.

Supports all values that can be represented as a string, including 1D iterables such as np.ndarray.
Example
csv_logger = CSVLogger('training.log')
model.fit(X_train, Y_train, callbacks=[csv_logger])
Arguments
• filename: filename of the csv file, e.g. 'run/log.csv'.
• separator: string used to separate elements in the csv file.
• append: True: append if file exists (useful for continuing training). False: overwrite existing
file,
LambdaCallback
keras.callbacks.callbacks.LambdaCallback(on_epoch_begin=None, on_epoch_end=None,
on_batch_begin=None, on_batch_end=None, on_train_begin=None, on_train_end=None)
Callback for creating simple, custom callbacks on-the-fly.

This callback is constructed with anonymous functions that will be called at the appropriate time. Note
that the callbacks expects positional arguments, as:
• on_epoch_begin and on_epoch_end expect two positional arguments: epoch, logs
• on_batch_begin and on_batch_end expect two positional arguments: batch, logs
• on_train_begin and on_train_end expect one positional argument: logs
Arguments
• on_epoch_begin: called at the beginning of every epoch.
• on_epoch_end: called at the end of every epoch.
• on_batch_begin: called at the beginning of every batch.
• on_batch_end: called at the end of every batch.
• on_train_begin: called at the beginning of model training.
• on_train_end: called at the end of model training.
Example
# Print the batch number at the beginning of every batch.
batch_print_callback = LambdaCallback(
on_batch_begin=lambda batch,logs: print(batch))
# Stream the epoch loss to a file in JSON format. The file content
# is not well-formed JSON but rather has a JSON object per line.
import json
json_log = open('loss_log.json', mode='wt', buffering=1)
json_logging_callback = LambdaCallback(
on_epoch_end=lambda epoch, logs: json_log.write(
json.dumps({'epoch': epoch, 'loss': logs['loss']}) + '\n'),
on_train_end=lambda logs: json_log.close()
)
# Terminate some processes after having finished model training.

processes = ...
cleanup_callback = LambdaCallback(
on_train_end=lambda logs: [
p.terminate() for p in processes if p.is_alive()])
model.fit(...,
callbacks=[batch_print_callback,
json_logging_callback,
cleanup_callback])
TensorBoard
keras.callbacks.tensorboard_v1.TensorBoard(log_dir='./logs', histogram_freq=0,
batch_size=32, write_graph=True, write_grads=False, write_images=False,
embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None,
embeddings_data=None, update_freq='epoch')
TensorBoard basic visualizations.

TensorBoard is a visualization tool provided with TensorFlow.
This callback writes a log for TensorBoard, which allows you to visualize dynamic graphs of your
training and test metrics, as well as activation histograms for the different layers in your model.
If you have installed TensorFlow with pip, you should be able to launch TensorBoard from the
command line:
tensorboard --logdir=/full_path_to_your_logs
When using a backend other than TensorFlow, TensorBoard will still work (if you have TensorFlow
installed), but the only feature available will be the display of the losses and metrics plots.
Arguments
• log_dir: the path of the directory where to save the log files to be parsed by TensorBoard.
• histogram_freq: frequency (in epochs) at which to compute activation and weight histograms
for the layers of the model. If set to 0, histograms won't be computed. Validation data (or split)
must be specified for histogram visualizations.
• batch_size: size of batch of inputs to feed to the network for histograms computation.
• write_graph: whether to visualize the graph in TensorBoard. The log file can become quite
large when write_graph is set to True.
• write_grads: whether to visualize gradient histograms in TensorBoard. histogram_freq
must be greater than 0.
• write_images: whether to write model weights to visualize as image in TensorBoard.
• embeddings_freq: frequency (in epochs) at which selected embedding layers will be saved. If
set to 0, embeddings won't be computed. Data to be visualized in TensorBoard's Embedding tab
must be passed as embeddings_data.
• embeddings_layer_names: a list of names of layers to keep eye on. If None or empty list all
the embedding layer will be watched.
• embeddings_metadata: a dictionary which maps layer name to a file name in which metadata
for this embedding layer is saved. See the details about metadata files format. In case if the
same metadata file is used for all embedding layers, string can be passed.
• embeddings_data: data to be embedded at layers specified in embeddings_layer_names.
Numpy array (if the model has a single input) or list of Numpy arrays (if the model has multiple
inputs). Learn more about embeddings.
• update_freq: 'batch' or 'epoch' or integer. When using 'batch', writes the losses and
metrics to TensorBoard after each batch. The same applies for 'epoch'. If using an integer,
let's say 10000, the callback will write the metrics and losses to TensorBoard every 10000
samples. Note that writing too frequently to TensorBoard can slow down your training.
Create a callback
You can create a custom callback by extending the base class keras.callbacks.Callback. A
callback has access to its associated model through the class property self.model.
Here's a simple example saving a list of losses over each batch during training:
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):

self.losses.append(logs.get('loss'))
Example: recording loss history

class LossHistory(keras.callbacks.Callback):
self.losses = []
def on_batch_end(self, batch, logs={}):

self.losses.append(logs.get('loss'))
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
history = LossHistory()
model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0,
callbacks=[history])
print(history.losses)
# outputs
'''
[0.66047596406559383, 0.3547245744908703, ..., 0.25953155204159617,
0.25901699725311789]
'''
Example: model checkpoints
from keras.callbacks import ModelCheckpoint
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
'''
saves the model weights after each epoch if the validation loss decreased
'''
checkpointer = ModelCheckpoint(filepath='/tmp/weights.hdf5', verbose=1,
save_best_only=True)
model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0,
validation_data=(X_test, Y_test), callbacks=[checkpointer])
Datasets
CIFAR10 small image classification
Dataset of 50,000 32x32 color training images, labeled over 10 categories, and 10,000 test images.
Usage:
from keras.datasets import cifar10
• Returns:
• 2 tuples:
• x_train, x_test: uint8 array of RGB image data with shape (num_samples, 3, 32,
32) or (num_samples, 32, 32, 3) based on the image_data_format backend
setting of either channels_first or channels_last respectively.
• y_train, y_test: uint8 array of category labels (integers in range 0-9) with shape
(num_samples, 1).
CIFAR100 small image classification

Dataset of 50,000 32x32 color training images, labeled over 100 categories, and 10,000 test images.
Usage:
(x_train, y_train), (x_test, y_test) = cifar100.load_data(label_mode='fine')
• Returns:
• 2 tuples:
• x_train, x_test: uint8 array of RGB image data with shape (num_samples, 3, 32,
32) or (num_samples, 32, 32, 3) based on the image_data_format backend
setting of either channels_first or channels_last respectively.
• y_train, y_test: uint8 array of category labels with shape (num_samples, 1).
• Arguments:
• label_mode: "fine" or "coarse".
IMDB Movie reviews sentiment classification

Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have
been preprocessed, and each review is encoded as a sequence of word indexes (integers). For
convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3"
encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only
consider the top 10,000 most common words, but eliminate the top 20 most common words".
As a convention, "0" does not stand for a specific word, but instead is used to encode any unknown
word.
Usage:
from keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = imdb.load_data(path="imdb.npz",

num_words=None,
skip_top=0,
maxlen=None,
seed=113,
start_char=1,
oov_char=2,
index_from=3)
• Returns:
• 2 tuples:
• x_train, x_test: list of sequences, which are lists of indexes (integers). If the
num_words argument was specific, the maximum possible index value is
num_words-1. If the maxlen argument was specified, the largest possible
sequence length is maxlen.
• y_train, y_test: list of integer labels (1 or 0).
• Arguments:
• path: if you do not have the data locally (at '~/.keras/datasets/' + path), it
will be downloaded to this location.
• num_words: integer or None. Top most frequent words to consider. Any less frequent
word will appear as oov_char value in the sequence data.
• skip_top: integer. Top most frequent words to ignore (they will appear as oov_char
value in the sequence data).
• maxlen: int. Maximum sequence length. Any longer sequence will be truncated.
• seed: int. Seed for reproducible data shuffling.
• start_char: int. The start of a sequence will be marked with this character. Set to 1
because 0 is usually the padding character.
• oov_char: int. words that were cut out because of the num_words or skip_top limit
will be replaced with this character.
• index_from: int. Index actual words with this index and higher.
Reuters newswire topics classification

Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with the IMDB dataset, each
wire is encoded as a sequence of word indexes (same conventions).
Usage:
from keras.datasets import reuters
(x_train, y_train), (x_test, y_test) = reuters.load_data(path="reuters.npz",

num_words=None,
skip_top=0,
maxlen=None,
test_split=0.2,
seed=113,
start_char=1,
oov_char=2,
index_from=3)
The specifications are the same as that of the IMDB dataset, with the addition of:
• test_split: float. Fraction of the dataset to be used as test data.
This dataset also makes available the word index used for encoding the sequences:
word_index = reuters.get_word_index(path="reuters_word_index.json")
• Returns: A dictionary where key are words (str) and values are indexes (integer). eg.
word_index["giraffe"] might return 1234.
• Arguments:
• path: if you do not have the index file locally (at '~/.keras/datasets/' +
path), it will be downloaded to this location.
MNIST database of handwritten digits
Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
Usage:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
• Returns:
• 2 tuples:
• x_train, x_test: uint8 array of grayscale image data with shape (num_samples,
28, 28).
• y_train, y_test: uint8 array of digit labels (integers in range 0-9) with shape
(num_samples,).
• Arguments:
• path: if you do not have the index file locally (at '~/.keras/datasets/' +
path), it will be downloaded to this location.
Fashion-MNIST database of fashion articles

Dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000
images. This dataset can be used as a drop-in replacement for MNIST. The class labels are:
Label Description
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
Usage:
from keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
• Returns:
• 2 tuples:
• x_train, x_test: uint8 array of grayscale image data with shape (num_samples,
28, 28).
• y_train, y_test: uint8 array of labels (integers in range 0-9) with shape
(num_samples,).
Boston housing price regression dataset

Dataset taken from the StatLib library which is maintained at Carnegie Mellon University.
Samples contain 13 attributes of houses at different locations around the Boston suburbs in the late
1970s. Targets are the median values of the houses at a location (in k$).
Usage:
from keras.datasets import boston_housing
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
• Arguments:
• path: path where to cache the dataset locally (relative to ~/.keras/datasets).
• seed: Random seed for shuffling the data before computing the test split.
• test_split: fraction of the data to reserve as test set.
• Returns: Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).
Applications
Keras Applications are deep learning models that are made available alongside pre-trained weights.
These models can be used for prediction, feature extraction, and fine-tuning.
Weights are downloaded automatically when instantiating a model. They are stored at
~/.keras/models/.
Available models
Models for image classification with weights trained on ImageNet:
• Xception
• VGG16
• VGG19
• ResNet, ResNetV2
• InceptionV3
• InceptionResNetV2
• MobileNet
• MobileNetV2
• DenseNet
• NASNet
All of these architectures are compatible with all the backends (TensorFlow, Theano, and CNTK), and
upon instantiation the models will be built according to the image data format set in your Keras
configuration file at ~/.keras/keras.json. For instance, if you have set
image_data_format=channels_last, then any model loaded from this repository will get
built according to the TensorFlow data format convention, "Height-Width-Depth".
Note that: - For Keras < 2.2.0, The Xception model is only available for TensorFlow, due to its
reliance on SeparableConvolution layers. - For Keras < 2.1.5, The MobileNet model is
only available for TensorFlow, due to its reliance on DepthwiseConvolution layers.
Usage examples for image classification models

Classify ImageNet classes with ResNet50
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
model = ResNet50(weights='imagenet')
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265',
u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]
Extract features with VGG16

from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)

features = model.predict(x)
Extract features from an arbitrary intermediate layer with VGG19

from keras.applications.vgg19 import preprocess_input
import numpy as np
base_model = VGG19(weights='imagenet')
model = Model(inputs=base_model.input,
outputs=base_model.get_layer('block4_pool').output)
block4_pool_features = model.predict(x)
Fine-tune InceptionV3 on a new set of classes

from keras.layers import Dense, GlobalAveragePooling2D
# create the base pre-trained model

base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer

x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)
# this is the model we will train

model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
# compile the model (should be done *after* setting layers to non-trainable)

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# train the model on the new data for a few epochs
model.fit_generator(...)
# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
print(i, layer.name)
# we chose to train the top 2 inception blocks, i.e. we will freeze

# the first 249 layers and unfreeze the rest:
for layer in model.layers[:249]:
for layer in model.layers[249:]:
layer.trainable = True
# we need to recompile the model for these modifications to take effect

# we use SGD with a low learning rate
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9),
loss='categorical_crossentropy')
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)
Build InceptionV3 over a custom input tensor

from keras.layers import Input
# this could also be the output a different Keras model or layer

input_tensor = Input(shape=(224, 224, 3)) # this assumes K.image_data_format() ==
'channels_last'
model = InceptionV3(input_tensor=input_tensor, weights='imagenet',

include_top=True)
Documentation for individual models

Model Size Top-1 Accuracy Top-5 Accuracy Parameters Depth
Xception 88 MB 0.790 0.945 22,910,480 126
VGG16 528 MB 0.713 0.901 138,357,544 23
VGG19 549 MB 0.713 0.900 143,667,240 26
ResNet50 98 MB 0.749 0.921 25,636,712 -
ResNet101 171 MB 0.764 0.928 44,707,176 -
ResNet152 232 MB 0.766 0.931 60,419,944 -
Model Size Top-1 Accuracy Top-5 Accuracy Parameters Depth
ResNet50V2 98 MB 0.760 0.930 25,613,800 -
ResNet101V2 171 MB 0.772 0.938 44,675,560 -
ResNet152V2 232 MB 0.780 0.942 60,380,648 -
InceptionV3 92 MB 0.779 0.937 23,851,784 159
InceptionResNetV2 215 MB 0.803 0.953 55,873,736 572
MobileNet 16 MB 0.704 0.895 4,253,864 88
MobileNetV2 14 MB 0.713 0.901 3,538,984 88
DenseNet121 33 MB 0.750 0.923 8,062,504 121
DenseNet169 57 MB 0.762 0.932 14,307,880 169
DenseNet201 80 MB 0.773 0.936 20,242,984 201
NASNetMobile 23 MB 0.744 0.919 5,326,716 -
NASNetLarge 343 MB 0.825 0.960 88,949,818 -
The top-1 and top-5 accuracy refers to the model's performance on the ImageNet validation dataset.
Depth refers to the topological depth of the network. This includes activation layers, batch
normalization layers etc.
Xception
keras.applications.xception.Xception(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None, pooling=None, classes=1000)
Xception V1 model, with weights pre-trained on ImageNet.

On ImageNet, this model gets to a top-1 validation accuracy of 0.790 and a top-5 validation accuracy
of 0.945.
This model and can be built both with 'channels_first' data format (channels, height, width) or
'channels_last' data format (height, width, channels).
The default input size for this model is 299x299.
Arguments
• include_top: whether to include the fully-connected layer at the top of the network.
• weights: one of None (random initialization) or 'imagenet' (pre-training on ImageNet).
• input_tensor: optional Keras tensor (i.e. output of layers.Input()) to use as image input
for the model.
• input_shape: optional shape tuple, only to be specified if include_top is False (otherwise
the input shape has to be (299, 299, 3). It should have exactly 3 inputs channels, and
width and height should be no smaller than 71. E.g. (150, 150, 3) would be one valid
value.
• pooling: Optional pooling mode for feature extraction when include_top is False.
• None means that the output of the model will be the 4D tensor output of the last
convolutional block.
• 'avg' means that global average pooling will be applied to the output of the last
convolutional block, and thus the output of the model will be a 2D tensor.
• 'max' means that global max pooling will be applied.
• classes: optional number of classes to classify images into, only to be specified if
include_top is True, and if no weights argument is specified.
Returns
A Keras Model instance.
References
• Xception: Deep Learning with Depthwise Separable Convolutions
License
These weights are trained by ourselves and are released under the MIT license.
VGG16
keras.applications.vgg16.VGG16(include_top=True, weights='imagenet',
VGG16 model, with weights pre-trained on ImageNet.

This model can be built both with 'channels_first' data format (channels, height, width) or
Arguments
• include_top: whether to include the 3 fully-connected layers at the top of the network.
for the model.
the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3,
224, 224) (with 'channels_first' data format). It should have exactly 3 inputs
channels, and width and height should be no smaller than 32. E.g. (200, 200, 3) would be
one valid value.
Returns
References
• Very Deep Convolutional Networks for Large-Scale Image Recognition: please cite this paper if
you use the VGG models in your work.
License
These weights are ported from the ones released by VGG at Oxford under the Creative Commons
Attribution License.
VGG19
keras.applications.vgg19.VGG19(include_top=True, weights='imagenet',
VGG19 model, with weights pre-trained on ImageNet.

This model can be built both with 'channels_first' data format (channels, height, width) or
Arguments
• include_top: whether to include the 3 fully-connected layers at the top of the network.
for the model.
one valid value.
Returns
References
• Very Deep Convolutional Networks for Large-Scale Image Recognition
License
These weights are ported from the ones released by VGG at Oxford under the Creative Commons
Attribution License.
ResNet
keras.applications.resnet.ResNet50(include_top=True, weights='imagenet',
keras.applications.resnet_v2.ResNet50V2(include_top=True, weights='imagenet',
ResNet, ResNetV2 models, with weights pre-trained on ImageNet.


Arguments
for the model.
one valid value.
Returns
References
• ResNet: Deep Residual Learning for Image Recognition
• ResNetV2: Identity Mappings in Deep Residual Networks
License
These weights are ported from the following:
• ResNet: The original repository of Kaiming He under the MIT license.
• ResNetV2: Facebook under the BSD license.
InceptionV3
keras.applications.inception_v3.InceptionV3(include_top=True, weights='imagenet',
Inception V3 model, with weights pre-trained on ImageNet.

Arguments
for the model.
one valid value.
Returns
References
• Rethinking the Inception Architecture for Computer Vision
License
These weights are released under the Apache License.
InceptionResNetV2
keras.applications.inception_resnet_v2.InceptionResNetV2(include_top=True,
weights='imagenet', input_tensor=None, input_shape=None, pooling=None,
classes=1000)
Inception-ResNet V2 model, with weights pre-trained on ImageNet.

Arguments
for the model.
one valid value.
Returns
References
• Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
License
MobileNet
keras.applications.mobilenet.MobileNet(input_shape=None, alpha=1.0,
depth_multiplier=1, dropout=1e-3, include_top=True, weights='imagenet',
input_tensor=None, pooling=None, classes=1000)
MobileNet model, with weights pre-trained on ImageNet.

Arguments
one valid value.
• alpha: controls the width of the network.
• If alpha < 1.0, proportionally decreases the number of filters in each layer.
• If alpha > 1.0, proportionally increases the number of filters in each layer.
• If alpha = 1, default number of filters from the paper are used at each layer.
• depth_multiplier: depth multiplier for depthwise convolution (also called the resolution
multiplier)
• dropout: dropout rate
• weights: None (random initialization) or 'imagenet' (ImageNet weights)
for the model.
Returns
References
• MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
License
DenseNet
keras.applications.densenet.DenseNet121(include_top=True, weights='imagenet',
DenseNet models, with weights pre-trained on ImageNet.

Arguments
• blocks: numbers of building blocks for the four dense layers.
• weights: one of None (random initialization), 'imagenet' (pre-training on ImageNet), or the path
to the weights file to be loaded.
for the model.
one valid value.
• pooling: optional pooling mode for feature extraction when include_top is False.
• avg means that global average pooling will be applied to the output of the last
• max means that global max pooling will be applied.
Returns
A Keras model instance.
References
• Densely Connected Convolutional Networks (CVPR 2017 Best Paper Award)
License
These weights are released under the BSD 3-clause License.
NASNet
keras.applications.nasnet.NASNetLarge(input_shape=None, include_top=True,
weights='imagenet', input_tensor=None, pooling=None, classes=1000)
keras.applications.nasnet.NASNetMobile(input_shape=None, include_top=True,
weights='imagenet', input_tensor=None, pooling=None, classes=1000)
Neural Architecture Search Network (NASNet) models, with weights pre-trained on ImageNet.
The default input size for the NASNetLarge model is 331x331 and for the NASNetMobile model is
224x224.
Arguments
224, 224) (with 'channels_first' data format) for NASNetMobile or (331, 331,
3) (with 'channels_last' data format) or (3, 331, 331) (with
'channels_first' data format) for NASNetLarge. It should have exactly 3 inputs
one valid value.
• weights: None (random initialization) or 'imagenet' (ImageNet weights)
for the model.
Returns
References
• Learning Transferable Architectures for Scalable Image Recognition
License
MobileNetV2
keras.applications.mobilenet_v2.MobileNetV2(input_shape=None, alpha=1.0,
include_top=True, weights='imagenet', input_tensor=None, pooling=None,
classes=1000)
MobileNetV2 model, with weights pre-trained on ImageNet.

Arguments
• input_shape: optional shape tuple, to be specified if you would like to use a model with an input
img resolution that is not (224, 224, 3). It should have exactly 3 inputs channels (224, 224, 3).
You can also omit this option if you would like to infer input_shape from an input_tensor. If you
choose to include both input_tensor and input_shape then input_shape will be used if they
match, if the shapes do not match then we will throw an error. E.g. (160, 160, 3) would
be one valid value.
• alpha: controls the width of the network. This is known as the width multiplier in the
MobileNetV2 paper.
• If alpha < 1.0, proportionally decreases the number of filters in each layer.
• If alpha > 1.0, proportionally increases the number of filters in each layer.
• If alpha = 1, default number of filters from the paper are used at each layer.
• weights: one of None (random initialization), 'imagenet' (pre-training on ImageNet), or the path
to the weights file to be loaded.
for the model.
Returns
A Keras model instance.
Raises
ValueError: in case of invalid argument for weights, or invalid input shape, alpha, rows when
weights='imagenet'
References
• MobileNetV2: Inverted Residuals and Linear Bottlenecks
License
Keras backends
What is a "backend"?
Keras is a model-level library, providing high-level building blocks for developing deep learning
models. It does not handle low-level operations such as tensor products, convolutions and so on itself.
Instead, it relies on a specialized, well optimized tensor manipulation library to do so, serving as the
"backend engine" of Keras. Rather than picking one single tensor library and making the
implementation of Keras tied to that library, Keras handles the problem in a modular way, and several
different backend engines can be plugged seamlessly into Keras.
At this time, Keras has three backend implementations available: the TensorFlow backend, the Theano
backend, and the CNTK backend.
• TensorFlow is an open-source symbolic tensor manipulation framework developed by Google.
• Theano is an open-source symbolic tensor manipulation framework developed by LISA Lab at
Université de Montréal.
• CNTK is an open-source toolkit for deep learning developed by Microsoft.
In the future, we are likely to add more backend options.
Switching from one backend to another

If you have run Keras at least once, you will find the Keras configuration file at:
$HOME/.keras/keras.json
If it isn't there, you can create it.

NOTE for Windows Users: Please replace $HOME with %USERPROFILE%.
The default configuration file looks like this:

{
"epsilon": 1e-07,
}
Simply change the field backend to "theano", "tensorflow", or "cntk", and Keras will use
the new configuration next time you run any Keras code.
You can also define the environment variable KERAS_BACKEND and this will override what is defined
in your config file :
KERAS_BACKEND=tensorflow python -c "from keras import backend"
Using TensorFlow backend.
In Keras it is possible to load more backends than "tensorflow", "theano", and "cntk". Keras
can use external backends as well, and this can be performed by changing the keras.json
configuration file, and the "backend" setting. Suppose you have a Python module called
my_module that you wanted to use as your external backend. The keras.json configuration file
would be changed as follows:
{
"epsilon": 1e-07,
"backend": "my_package.my_module"
}
An external backend must be validated in order to be used, a valid backend must have the following
functions: placeholder, variable and function.
If an external backend is not valid due to missing a required entry, an error will be logged notifying
which entry/entries are missing.
keras.json details
The keras.json configuration file contains the following settings:
{
"epsilon": 1e-07,
}
You can change these settings by editing $HOME/.keras/keras.json.
• image_data_format: String, either "channels_last" or "channels_first". It

specifies which data format convention Keras will follow.
(keras.backend.image_data_format() returns it.)
• For 2D data (e.g. image), "channels_last" assumes (rows, cols, channels)
while "channels_first" assumes (channels, rows, cols).
• For 3D data, "channels_last" assumes (conv_dim1, conv_dim2, conv_dim3,
channels) while "channels_first" assumes (channels, conv_dim1,
conv_dim2, conv_dim3).
• epsilon: Float, a numeric fuzzing constant used to avoid dividing by zero in some operations.
• floatx: String, "float16", "float32", or "float64". Default float precision.
• backend: String, "tensorflow", "theano", or "cntk".
Using the abstract Keras backend to write new code

If you want the Keras modules you write to be compatible with both Theano (th) and TensorFlow
(tf), you have to write them via the abstract Keras backend API. Here's an intro.
You can import the backend module via:

The code below instantiates an input placeholder. It's equivalent to tf.placeholder() or

th.tensor.matrix(), th.tensor.tensor3(), etc.
inputs = K.placeholder(shape=(2, 4, 5))
# also works:
inputs = K.placeholder(shape=(None, 4, 5))
# also works:
inputs = K.placeholder(ndim=3)
The code below instantiates a variable. It's equivalent to tf.Variable() or th.shared().

import numpy as np
val = np.random.random((3, 4, 5))
var = K.variable(value=val)
# all-zeros variable:
var = K.zeros(shape=(3, 4, 5))
# all-ones:
var = K.ones(shape=(3, 4, 5))
Most tensor operations you will need can be done as you would in TensorFlow or Theano:
# Initializing Tensors with Random Numbers
b = K.random_uniform_variable(shape=(3, 4), low=0, high=1) # Uniform distribution
c = K.random_normal_variable(shape=(3, 4), mean=0, scale=1) # Gaussian distribution
d = K.random_normal_variable(shape=(3, 4), mean=0, scale=1)
# Tensor Arithmetic
a = b + c * K.abs(d)
c = K.dot(a, K.transpose(b))
a = K.sum(b, axis=1)
a = K.softmax(b)
a = K.concatenate([b, c], axis=-1)
# etc...
Backend functions
backend
keras.backend.backend()
Returns the name of the current backend (e.g. "tensorflow").

Returns
String, the name of the backend Keras is currently using.
Example
>>> keras.backend.backend()
'tensorflow'
symbolic
keras.backend.symbolic(func)
Decorator used in TensorFlow 2.0 to enter the Keras graph.

Arguments
• func: Function to decorate.
Returns
Decorated function.
eager
keras.backend.eager(func)
Decorator used in TensorFlow 2.0 to exit the Keras graph.

Arguments
• func: Function to decorate.
Returns
Decorated function.
get_uid
keras.backend.get_uid(prefix='')
Provides a unique UID given a string prefix.

Arguments
• prefix: string.
Returns
An integer.
Example
>>> keras.backend.get_uid('dense')
1
>>> keras.backend.get_uid('dense')
2
manual_variable_initialization
keras.backend.manual_variable_initialization(value)
Sets the manual variable initialization flag.

This boolean flag determines whether variables should be initialized as they are instantiated (default),
or if the user should handle the initialization.
Arguments
• value: Python boolean.
epsilon
keras.backend.epsilon()
Returns the value of the fuzz factor used in numeric expressions.

Returns
A float.
Example
>>> keras.backend.epsilon()
1e-07
reset_uids
keras.backend.reset_uids()
Resets graph identifiers.

set_epsilon
keras.backend.set_epsilon(e)
Sets the value of the fuzz factor used in numeric expressions.

Arguments
• e: float. New value of epsilon.
Example
>>> from keras import backend as K
>>> K.epsilon()
1e-07
>>> K.set_epsilon(1e-05)
>>> K.epsilon()
1e-05
floatx
keras.backend.floatx()
Returns the default float type, as a string. (e.g. 'float16', 'float32', 'float64').
Returns
String, the current default float type.
Example
>>> keras.backend.floatx()
'float32'
set_floatx
keras.backend.set_floatx(floatx)
Sets the default float type.

Arguments
• floatx: String, 'float16', 'float32', or 'float64'.
Example
>>> K.floatx()
'float32'
>>> K.set_floatx('float16')
>>> K.floatx()
'float16'
cast_to_floatx
keras.backend.cast_to_floatx(x)
Cast a Numpy array to the default Keras float type.

Arguments
• x: Numpy array.
Returns
The same Numpy array, cast to its new type.
Example
>>> K.floatx()
'float32'
>>> arr = numpy.array([1.0, 2.0], dtype='float64')
>>> arr.dtype
dtype('float64')
>>> new_arr = K.cast_to_floatx(arr)
>>> new_arr
array([ 1., 2.], dtype=float32)
>>> new_arr.dtype
dtype('float32')
image_data_format
keras.backend.image_data_format()
Returns the default image data format convention.

Returns
A string, either 'channels_first' or 'channels_last'
Example
>>> keras.backend.image_data_format()
'channels_first'
set_image_data_format
keras.backend.set_image_data_format(data_format)
Sets the value of the data format convention.

Arguments
• data_format: string. 'channels_first' or 'channels_last'.
Example
>>> K.image_data_format()
'channels_first'
>>> K.set_image_data_format('channels_last')
>>> K.image_data_format()
'channels_last'
learning_phase
keras.backend.learning_phase()
Returns the learning phase flag.

The learning phase flag is a bool tensor (0 = test, 1 = train) to be passed as input to any Keras function
that uses a different behavior at train time and test time.
Returns
Learning phase (scalar integer tensor or Python integer).
set_learning_phase
keras.backend.set_learning_phase(value)
Sets the learning phase to a fixed value.

Arguments
• value: Learning phase value, either 0 or 1 (integers).
Raises
• ValueError: if value is neither 0 nor 1.
clear_session
keras.backend.clear_session()
Destroys the current Keras graph and creates a new one.

Useful to avoid clutter from old models / layers.
is_sparse
keras.backend.is_sparse(tensor)
Returns whether a tensor is a sparse tensor.

Arguments
• tensor: A tensor instance.
Returns
A boolean.
Example
>>> a = K.placeholder((2, 2), sparse=False)
>>> print(K.is_sparse(a))
False
>>> b = K.placeholder((2, 2), sparse=True)
>>> print(K.is_sparse(b))
True
to_dense
keras.backend.to_dense(tensor)
Converts a sparse tensor into a dense tensor and returns it.
Arguments
• tensor: A tensor instance (potentially sparse).
Returns
A dense tensor.
Examples
>>> b = K.placeholder((2, 2), sparse=True)
>>> print(K.is_sparse(b))
True
>>> c = K.to_dense(b)
>>> print(K.is_sparse(c))
False
variable
keras.backend.variable(value, dtype=None, name=None, constraint=None)
Instantiates a variable and returns it.

Arguments
• value: Numpy array, initial value of the tensor.
• dtype: Tensor type.
• name: Optional name string for the tensor.
• constraint: Optional projection function to be applied to the variable after an optimizer update.
Returns
A variable instance (with Keras metadata included).
Examples
>>> val = np.array([[1, 2], [3, 4]])
>>> kvar = K.variable(value=val, dtype='float64', name='example_var')
>>> K.dtype(kvar)
'float64'
>>> print(kvar)
example_var
>>> K.eval(kvar)
array([[ 1., 2.],
[ 3., 4.]])
is_variable
keras.backend.is_variable(x)
constant
keras.backend.constant(value, dtype=None, shape=None, name=None)
Creates a constant tensor.

Arguments
• value: A constant value (or list)
• dtype: The type of the elements of the resulting tensor.
• shape: Optional dimensions of resulting tensor.
• name: Optional name for the tensor.
Returns
A Constant Tensor.
is_keras_tensor
keras.backend.is_keras_tensor(x)
Returns whether x is a Keras tensor.
A "Keras tensor" is a tensor that was returned by a Keras layer, (Layer class) or by Input.
Arguments
• x: A candidate tensor.
Returns
A boolean: Whether the argument is a Keras tensor.
Raises
• ValueError: In case x is not a symbolic tensor.
Examples
>>> from keras.layers import Input, Dense
>>> np_var = numpy.array([1, 2])
>>> K.is_keras_tensor(np_var) # A numpy array is not a symbolic tensor.
ValueError
>>> k_var = tf.placeholder('float32', shape=(1,1))
>>> # A variable indirectly created outside of keras is not a Keras tensor.
>>> K.is_keras_tensor(k_var)
False
>>> keras_var = K.variable(np_var)
>>> # A variable created with the keras backend is not a Keras tensor.
>>> K.is_keras_tensor(keras_var)
False
>>> keras_placeholder = K.placeholder(shape=(2, 4, 5))
>>> # A placeholder is not a Keras tensor.
>>> K.is_keras_tensor(keras_placeholder)
False
>>> keras_input = Input([10])
>>> K.is_keras_tensor(keras_input) # An Input is a Keras tensor.
True
>>> keras_layer_output = Dense(10)(keras_input)
>>> # Any Keras layer output is a Keras tensor.
>>> K.is_keras_tensor(keras_layer_output)
True
is_tensor
keras.backend.is_tensor(x)
placeholder
keras.backend.placeholder(shape=None, ndim=None, dtype=None, sparse=False,
name=None)
Instantiates a placeholder tensor and returns it.

Arguments
• shape: Shape of the placeholder (integer tuple, may include None entries).
• ndim: Number of axes of the tensor. At least one of {shape, ndim} must be specified. If both
are specified, shape is used.
• dtype: Placeholder type.
• sparse: Boolean, whether the placeholder should have a sparse type.
• name: Optional name string for the placeholder.
Returns
Tensor instance (with Keras metadata included).
Examples
>>> input_ph = K.placeholder(shape=(2, 4, 5))
>>> input_ph._keras_shape
(2, 4, 5)
>>> input_ph
<tf.Tensor 'Placeholder_4:0' shape=(2, 4, 5) dtype=float32>
is_placeholder
keras.backend.is_placeholder(x)
Returns whether x is a placeholder.
Arguments
• x: A candidate placeholder.
Returns
Boolean.
shape
keras.backend.shape(x)
Returns the symbolic shape of a tensor or variable.

Arguments
• x: A tensor or variable.
Returns
A symbolic shape (which is itself a tensor).
Examples
# TensorFlow example
>>> tf_session = K.get_session()
>>> val = np.array([[1, 2], [3, 4]])
>>> kvar = K.variable(value=val)
>>> inputs = keras.backend.placeholder(shape=(2, 4, 5))
>>> K.shape(kvar)
<tf.Tensor 'Shape_8:0' shape=(2,) dtype=int32>
>>> K.shape(inputs)
<tf.Tensor 'Shape_9:0' shape=(3,) dtype=int32>
# To get integer shape (Instead, you can use K.int_shape(x))
>>> K.shape(kvar).eval(session=tf_session)
array([2, 2], dtype=int32)
>>> K.shape(inputs).eval(session=tf_session)
array([2, 4, 5], dtype=int32)
int_shape
keras.backend.int_shape(x)
Returns the shape of tensor or variable as a tuple of int or None entries.

Arguments
• x: Tensor or variable.
Returns
A tuple of integers (or None entries).
Examples
>>> inputs = K.placeholder(shape=(2, 4, 5))
>>> K.int_shape(inputs)
(2, 4, 5)
>>> val = np.array([[1, 2], [3, 4]])
>>> K.int_shape(kvar)
(2, 2)
Numpy implementation
def int_shape(x):
return x.shape
ndim
keras.backend.ndim(x)
Returns the number of axes in a tensor, as an integer.

Arguments
Returns
Integer (scalar), number of axes.
Examples
>>> inputs = K.placeholder(shape=(2, 4, 5))
>>> val = np.array([[1, 2], [3, 4]])
>>> K.ndim(inputs)
3
>>> K.ndim(kvar)
2
def ndim(x):
return x.ndim
size
keras.backend.size(x, name=None)
Returns the size of a tensor.

Arguments
• name: A name for the operation (optional).
Returns
Size of the tensor.
Examples
>>> val = np.array([[1, 2], [3, 4]])
>>> K.size(inputs)
<tf.Tensor: id=9, shape=(), dtype=int32, numpy=4>
dtype
keras.backend.dtype(x)
Returns the dtype of a Keras tensor or variable, as a string.

Arguments
Returns
String, dtype of x.
Examples
>>> K.dtype(K.placeholder(shape=(2,4,5)))
'float32'
>>> K.dtype(K.placeholder(shape=(2,4,5), dtype='float32'))
'float32'
>>> K.dtype(K.placeholder(shape=(2,4,5), dtype='float64'))
'float64'
# Keras variable
>>> kvar = K.variable(np.array([[1, 2], [3, 4]]))
>>> K.dtype(kvar)
'float32_ref'
>>> kvar = K.variable(np.array([[1, 2], [3, 4]]), dtype='float32')
>>> K.dtype(kvar)
'float32_ref'
def dtype(x):
return x.dtype.name
eval
keras.backend.eval(x)
Evaluates the value of a tensor.

Arguments
• x: A tensor.
Returns
A Numpy array.
Examples
>>> kvar = K.variable(np.array([[1, 2], [3, 4]]), dtype='float32')
>>> K.eval(kvar)
array([[ 1., 2.],
[ 3., 4.]], dtype=float32)
def eval(x):
return x
zeros
keras.backend.zeros(shape, dtype=None, name=None)
Instantiates an all-zeros variable and returns it.

Arguments
• shape: Tuple of integers, shape of returned Keras variable
• dtype: String, data type of returned Keras variable
• name: String, name of returned Keras variable
Returns
A variable (including Keras metadata), filled with 0.0. Note that if shape was symbolic, we cannot
return a variable, and will return a dynamically-shaped tensor instead.
Example
>>> kvar = K.zeros((3,4))
>>> K.eval(kvar)
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]], dtype=float32)
def zeros(shape, dtype=floatx(), name=None):
return np.zeros(shape, dtype=dtype)
ones
keras.backend.ones(shape, dtype=None, name=None)
Instantiates an all-ones variable and returns it.

Arguments
• shape: Tuple of integers, shape of returned Keras variable.
• dtype: String, data type of returned Keras variable.
• name: String, name of returned Keras variable.
Returns
A Keras variable, filled with 1.0. Note that if shape was symbolic, we cannot return a variable, and
will return a dynamically-shaped tensor instead.
Example
>>> kvar = K.ones((3,4))
>>> K.eval(kvar)
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]], dtype=float32)
def ones(shape, dtype=floatx(), name=None):
return np.ones(shape, dtype=dtype)
eye
keras.backend.eye(size, dtype=None, name=None)
Instantiate an identity matrix and returns it.

Arguments
• size: Tuple, number of rows and columns. If Integer, number of rows.
• dtype: String, data type of returned Keras variable.
Returns
A Keras variable, an identity matrix.
Example
>>> K.eval(K.eye(3))
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]], dtype=float32)
>>> K.eval(K.eye((2, 3)))
array([[1., 0., 0.],
[0., 1., 0.]], dtype=float32)
def eye(size, dtype=None, name=None):
if isinstance(size, (list, tuple)):
n, m = size
else:
n, m = size, size
return np.eye(n, m, dtype=dtype)
zeros_like
keras.backend.zeros_like(x, dtype=None, name=None)
Instantiates an all-zeros variable of the same shape as another tensor.

Arguments
• x: Keras variable or Keras tensor.
• dtype: String, dtype of returned Keras variable. None uses the dtype of x.
• name: String, name for the variable to create.
Returns
A Keras variable with the shape of x filled with zeros.
Example
>>> kvar = K.variable(np.random.random((2,3)))
>>> kvar_zeros = K.zeros_like(kvar)
>>> K.eval(kvar_zeros)
array([[ 0., 0., 0.],
[ 0., 0., 0.]], dtype=float32)
def zeros_like(x, dtype=floatx(), name=None):
return np.zeros_like(x, dtype=dtype)
ones_like
keras.backend.ones_like(x, dtype=None, name=None)
Instantiates an all-ones variable of the same shape as another tensor.

Arguments
• x: Keras variable or tensor.
• dtype: String, dtype of returned Keras variable. None uses the dtype of x.
Returns
A Keras variable with the shape of x filled with ones.
Example
>>> kvar = K.variable(np.random.random((2,3)))
>>> kvar_ones = K.ones_like(kvar)
>>> K.eval(kvar_ones)
array([[ 1., 1., 1.],
[ 1., 1., 1.]], dtype=float32)
def ones_like(x, dtype=floatx(), name=None):
return np.ones_like(x, dtype=dtype)
identity
keras.backend.identity(x, name=None)
Returns a tensor with the same content as the input tensor.

Arguments
• x: The input tensor.
Returns
A tensor of the same shape, type and content.
random_uniform_variable
keras.backend.random_uniform_variable(shape, low, high, dtype=None, name=None,
seed=None)
Instantiates a variable with values drawn from a uniform distribution.

Arguments
• low: Float, lower boundary of the output interval.
• high: Float, upper boundary of the output interval.
• dtype: String, dtype of returned Keras variable.
• seed: Integer, random seed.
Returns
A Keras variable, filled with drawn samples.
Example
>>> kvar = K.random_uniform_variable((2,3), 0, 1)
>>> kvar
<tensorflow.python.ops.variables.Variable object at 0x10ab40b10>
>>> K.eval(kvar)
array([[ 0.10940075, 0.10047495, 0.476143 ],
[ 0.66137183, 0.00869417, 0.89220798]], dtype=float32)
def random_uniform_variable(shape, low, high, dtype=None, name=None, seed=None):
return (high - low) * np.random.random(shape).astype(dtype) + low
random_normal_variable
keras.backend.random_normal_variable(shape, mean, scale, dtype=None, name=None,
seed=None)
Instantiates a variable with values drawn from a normal distribution.

Arguments
• mean: Float, mean of the normal distribution.
• scale: Float, standard deviation of the normal distribution.
• dtype: String, dtype of returned Keras variable.
Returns
A Keras variable, filled with drawn samples.
Example
>>> kvar = K.random_normal_variable((2,3), 0, 1)
>>> kvar
<tensorflow.python.ops.variables.Variable object at 0x10ab12dd0>
>>> K.eval(kvar)
array([[ 1.19591331, 0.68685907, -0.63814116],
[ 0.92629528, 0.28055015, 1.70484698]], dtype=float32)
def random_normal_variable(shape, mean, scale, dtype=None, name=None, seed=None):
return scale * np.random.randn(*shape).astype(dtype) + mean
count_params
keras.backend.count_params(x)
Returns the static number of elements in a Keras variable or tensor.

Arguments
• x: Keras variable or tensor.
Returns
Integer, the number of elements in x, i.e., the product of the array's static dimensions.
Example
>>> kvar = K.zeros((2,3))
>>> K.count_params(kvar)
6
>>> K.eval(kvar)
array([[ 0., 0., 0.],
[ 0., 0., 0.]], dtype=float32)
def count_params(x):
return x.size
cast
keras.backend.cast(x, dtype)
Casts a tensor to a different dtype and returns it.

You can cast a Keras variable but it still returns a Keras tensor.
Arguments
• x: Keras tensor (or variable).
• dtype: String, either ('float16', 'float32', or 'float64').
Returns
Keras tensor with dtype dtype.
Example
>>> input = K.placeholder((2, 3), dtype='float32')
>>> input
<tf.Tensor 'Placeholder_2:0' shape=(2, 3) dtype=float32>
# It doesn't work in-place as below.
>>> K.cast(input, dtype='float16')
<tf.Tensor 'Cast_1:0' shape=(2, 3) dtype=float16>
>>> input
# you need to assign it.
>>> input = K.cast(input, dtype='float16')
>>> input
<tf.Tensor 'Cast_2:0' shape=(2, 3) dtype=float16>
update
keras.backend.update(x, new_x)
Update the value of x to new_x.
Arguments
• x: A Variable.
• new_x: A tensor of same shape as x.
Returns
The variable x updated.
update_add
keras.backend.update_add(x, increment)
Update the value of x by adding increment.
Arguments
• x: A Variable.
• increment: A tensor of same shape as x.
Returns
update_sub
keras.backend.update_sub(x, decrement)
Update the value of x by subtracting decrement.
Arguments
• x: A Variable.
• decrement: A tensor of same shape as x.
Returns
moving_average_update
keras.backend.moving_average_update(x, value, momentum)
Compute the moving average of a variable.

Arguments
• x: A Variable.
• value: A tensor with the same shape as x.
• momentum: The moving average momentum.
Returns
An operation to update the variable.
dot
keras.backend.dot(x, y)
Multiplies 2 tensors (and/or variables) and returns a tensor.

When attempting to multiply a nD tensor with a nD tensor, it reproduces the Theano behavior. (e.g.
(2, 3) * (4, 3, 5) -> (2, 4, 5))
Arguments
• y: Tensor or variable.
Returns
A tensor, dot product of x and y.
Examples
# dot product between tensors
>>> x = K.placeholder(shape=(2, 3))
>>> y = K.placeholder(shape=(3, 4))
>>> xy = K.dot(x, y)
>>> xy
<tf.Tensor 'MatMul_9:0' shape=(2, 4) dtype=float32>
# dot product between tensors

>>> x = K.placeholder(shape=(32, 28, 3))
>>> y = K.placeholder(shape=(3, 4))
>>> xy
<tf.Tensor 'MatMul_9:0' shape=(32, 28, 4) dtype=float32>
# Theano-like behavior example

>>> x = K.random_uniform_variable(shape=(2, 3), low=0, high=1)
>>> y = K.ones((4, 3, 5))
>>> K.int_shape(xy)
(2, 4, 5)
def dot(x, y):
return np.dot(x, y)
batch_dot
keras.backend.batch_dot(x, y, axes=None)
Batchwise dot product.

batch_dot is used to compute dot product of x and y when x and y are data in batches, i.e. in a
shape of (batch_size, :). batch_dot results in a tensor or variable with less dimensions than
the input. If the number of dimensions is reduced to 1, we use expand_dims to make sure that ndim
is at least 2.
Arguments
• x: Keras tensor or variable with ndim >= 2.
• y: Keras tensor or variable with ndim >= 2.
• axes: int or tuple(int, int). Target dimensions to be reduced.
Returns
A tensor with shape equal to the concatenation of x's shape (less the dimension that was summed over)
and y's shape (less the batch dimension and the dimension that was summed over). If the final rank is
1, we reshape it to (batch_size, 1).
Examples
Assume x = [[1, 2], [3, 4]] and y = [[5, 6], [7, 8]] batch_dot(x, y,
axes=1) = [[17], [53]] which is the main diagonal of x.dot(y.T), although we never
have to calculate the off-diagonal elements.
Pseudocode:
inner_products = []
for xi, yi in zip(x, y):
inner_products.append(xi.dot(yi))
result = stack(inner_products)
Shape inference: Let x's shape be (100, 20) and y's shape be (100, 30, 20). If axes is (1, 2),
to find the output shape of resultant tensor, loop through each dimension in x's shape and y's shape:
• x.shape[0] : 100 : append to output shape

• x.shape[1] : 20 : do not append to output shape, dimension 1 of x has been summed over.
(dot_axes[0] = 1)
• y.shape[0] : 100 : do not append to output shape, always ignore first dimension of y
• y.shape[1] : 30 : append to output shape
• y.shape[2] : 20 : do not append to output shape, dimension 2 of y has been summed over.
(dot_axes[1] = 2) output_shape = (100, 30)
>>> x_batch = K.ones(shape=(32, 20, 1))
>>> y_batch = K.ones(shape=(32, 30, 20))
>>> xy_batch_dot = K.batch_dot(x_batch, y_batch, axes=(1, 2))
>>> K.int_shape(xy_batch_dot)
(32, 1, 30)
Show the Numpy implementation wzxhzdk:102
transpose
keras.backend.transpose(x)
Transposes a tensor and returns it.

Arguments
Returns
A tensor.
Examples
>>> var = K.variable([[1, 2, 3], [4, 5, 6]])
>>> K.eval(var)
array([[ 1., 2., 3.],
[ 4., 5., 6.]], dtype=float32)
>>> var_transposed = K.transpose(var)
>>> K.eval(var_transposed)
array([[ 1., 4.],
[ 2., 5.],
[ 3., 6.]], dtype=float32)
>>> inputs = K.placeholder((2, 3))

>>> inputs
>>> input_transposed = K.transpose(inputs)
>>> input_transposed
<tf.Tensor 'transpose_4:0' shape=(3, 2) dtype=float32>
def transpose(x):
return np.transpose(x)
gather
keras.backend.gather(reference, indices)
Retrieves the elements of indices indices in the tensor reference.
Arguments
• reference: A tensor.
• indices: An integer tensor of indices.
Returns
A tensor of same type as reference.
def gather(reference, indices):
return reference[indices]
max
keras.backend.max(x, axis=None, keepdims=False)
Maximum value in a tensor.
Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to find maximum values. If
None (default), finds the maximum over all dimensions.
• keepdims: A boolean, whether to keep the dimensions or not. If keepdims is False, the rank
of the tensor is reduced by 1. If keepdims is True, the reduced dimension is retained with
length 1.
Returns
A tensor with maximum values of x.
def max(x, axis=None, keepdims=False):
if isinstance(axis, list):
axis = tuple(axis)
return np.max(x, axis=axis, keepdims=keepdims)
min
keras.backend.min(x, axis=None, keepdims=False)
Minimum value in a tensor.

Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to find minimum values. If
None (default), finds the minimum over all dimensions.
length 1.
Returns
A tensor with miminum values of x.
def min(x, axis=None, keepdims=False):
axis = tuple(axis)
return np.min(x, axis=axis, keepdims=keepdims)
sum
keras.backend.sum(x, axis=None, keepdims=False)
Sum of the values in a tensor, alongside the specified axis.

Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to sum over. If None (default),
sums over all dimensions.
length 1.
Returns
A tensor with sum of x.
def sum(x, axis=None, keepdims=False):
axis = tuple(axis)
return np.sum(x, axis=axis, keepdims=keepdims)
prod
keras.backend.prod(x, axis=None, keepdims=False)
Multiplies the values in a tensor, alongside the specified axis.

Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to compute the product. If
None (default), computes the product over all dimensions.
length 1.
Returns
A tensor with the product of elements of x.
def prod(x, axis=None, keepdims=False):
axis = tuple(axis)
return np.prod(x, axis=axis, keepdims=keepdims)
cumsum
keras.backend.cumsum(x, axis=0)
Cumulative sum of the values in a tensor, alongside the specified axis.

Arguments
• axis: An integer, the axis to compute the sum.
Returns
A tensor of the cumulative sum of values of x along axis. Numpy implementation
def cumsum(x, axis=0):
return np.cumsum(x, axis=axis)
cumprod
keras.backend.cumprod(x, axis=0)
Cumulative product of the values in a tensor, alongside the specified axis.

Arguments
• axis: An integer, the axis to compute the product.
Returns
A tensor of the cumulative product of values of x along axis. Numpy implementation
def cumprod(x, axis=0):
return np.cumprod(x, axis=axis)
var
keras.backend.var(x, axis=None, keepdims=False)
Variance of a tensor, alongside the specified axis.

Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to compute the variance. If
None (default), computes the variance over all dimensions.
length 1.
Returns
A tensor with the variance of elements of x. Numpy implementation
def var(x, axis=None, keepdims=False):
axis = tuple(axis)
return np.var(x, axis=axis, keepdims=keepdims)
std
keras.backend.std(x, axis=None, keepdims=False)
Standard deviation of a tensor, alongside the specified axis.

Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to compute the standard
deviation. If None (default), computes the standard deviation over all dimensions.
length 1.
Returns
A tensor with the standard deviation of elements of x. Numpy implementation
def std(x, axis=None, keepdims=False):
axis = tuple(axis)
return np.std(x, axis=axis, keepdims=keepdims)
mean
keras.backend.mean(x, axis=None, keepdims=False)
Mean of a tensor, alongside the specified axis.

Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to compute the mean. If None
(default), computes the mean over all dimensions.
of the tensor is reduced by 1 for each entry in axis. If keepdims is True, the reduced
dimensions are retained with length 1.
Returns
A tensor with the mean of elements of x. Numpy implementation
def mean(x, axis=None, keepdims=False):
axis = tuple(axis)
return np.mean(x, axis=axis, keepdims=keepdims)
any
keras.backend.any(x, axis=None, keepdims=False)
Bitwise reduction (logical OR).

Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to compute the logical or. If
None (default), computes the logical or over all dimensions.
• keepdims: whether the drop or broadcast the reduction axes.
Returns
A uint8 tensor (0s and 1s). Numpy implementation
def any(x, axis=None, keepdims=False):
axis = tuple(axis)
return np.any(x, axis=axis, keepdims=keepdims)
all
keras.backend.all(x, axis=None, keepdims=False)
Bitwise reduction (logical AND).

Arguments
• axis: An integer or list of integers in [-rank(x), rank(x)), the axes to compute the logical and. If
None (default), computes the logical and over all dimensions.
• keepdims: whether the drop or broadcast the reduction axes.
Returns
A uint8 tensor (0s and 1s). Numpy implementation
def all(x, axis=None, keepdims=False):
axis = tuple(axis)
return np.all(x, axis=axis, keepdims=keepdims)
argmax
keras.backend.argmax(x, axis=-1)
Returns the index of the maximum value along an axis.

Arguments
• axis: axis along which to perform the reduction.
Returns
A tensor. Numpy implementation
def argmax(x, axis=-1):
return np.argmax(x, axis=axis)
argmin
keras.backend.argmin(x, axis=-1)
Returns the index of the minimum value along an axis.

Arguments
• axis: axis along which to perform the reduction.
Returns
def argmin(x, axis=-1):
return np.argmin(x, axis=axis)
square
keras.backend.square(x)
Element-wise square.
Arguments
Returns
A tensor.
abs
keras.backend.abs(x)
Element-wise absolute value.

Arguments
Returns
A tensor.
sqrt
keras.backend.sqrt(x)
Element-wise square root.

Arguments
Returns
def sqrt(x):
y = np.sqrt(x)
y[np.isnan(y)] = 0.
return y
exp
keras.backend.exp(x)
Element-wise exponential.
Arguments
Returns
A tensor.
log
keras.backend.log(x)
Element-wise log.
Arguments
Returns
A tensor.
logsumexp
keras.backend.logsumexp(x, axis=None, keepdims=False)
Computes log(sum(exp(elements across dimensions of a tensor))).

This function is more numerically stable than log(sum(exp(x))). It avoids overflows caused by taking
the exp of large inputs and underflows caused by taking the log of small inputs.
Arguments
• axis: axis: An integer or list of integers in [-rank(x), rank(x)), the axes to compute the
logsumexp. If None (default), computes the logsumexp over all dimensions.
length 1.
Returns
The reduced tensor. Numpy implementation
def logsumexp(x, axis=None, keepdims=False):
axis = tuple(axis)
return sp.special.logsumexp(x, axis=axis, keepdims=keepdims)
round
keras.backend.round(x)
Element-wise rounding to the closest integer.

In case of tie, the rounding mode used is "half to even".
Arguments
Returns
A tensor.
sign
keras.backend.sign(x)
Element-wise sign.
Arguments
Returns
A tensor.
pow
keras.backend.pow(x, a)
Element-wise exponentiation.
Arguments
• a: Python integer.
Returns
def pow(x, a=1.):
return np.power(x, a)
clip
keras.backend.clip(x, min_value, max_value)
Element-wise value clipping.

Arguments
• min_value: Python float, integer or tensor.
• max_value: Python float, integer or tensor.
Returns
def clip(x, min_value, max_value):
return np.clip(x, min_value, max_value)
equal
keras.backend.equal(x, y)
Element-wise equality between two tensors.

Arguments
Returns
A bool tensor.
def equal(x, y):
return x == y
not_equal
keras.backend.not_equal(x, y)
Element-wise inequality between two tensors.

Arguments
Returns
A bool tensor.
def not_equal(x, y):
return x != y
greater
keras.backend.greater(x, y)
Element-wise truth value of (x > y).

Arguments
Returns
A bool tensor.
def greater(x, y):
return x > y
greater_equal
keras.backend.greater_equal(x, y)
Element-wise truth value of (x >= y).

Arguments
Returns
A bool tensor.
def greater_equal(x, y):
return x >= y
less
keras.backend.less(x, y)
Element-wise truth value of (x < y).

Arguments
Returns
A bool tensor.
def less(x, y):
return x < y
less_equal
keras.backend.less_equal(x, y)
Element-wise truth value of (x <= y).

Arguments
Returns
A bool tensor.
def less_equal(x, y):
return x <= y
maximum
keras.backend.maximum(x, y)
Element-wise maximum of two tensors.

Arguments
Returns
A tensor.
def maximum(x, y):
return np.maximum(x, y)
minimum
keras.backend.minimum(x, y)
Element-wise minimum of two tensors.

Arguments
Returns
A tensor.
def minimum(x, y):
return np.minimum(x, y)
sin
keras.backend.sin(x)
Computes sin of x element-wise.

Arguments
Returns
A tensor.
cos
keras.backend.cos(x)
Computes cos of x element-wise.

Arguments
Returns
A tensor.
normalize_batch_in_training
keras.backend.normalize_batch_in_training(x, gamma, beta, reduction_axes,
epsilon=0.001)
Computes mean and std for batch then apply batch_normalization on batch.
Arguments
• x: Input tensor or variable.
• gamma: Tensor by which to scale the input.
• beta: Tensor with which to center the input.
• reduction_axes: iterable of integers, axes over which to normalize.
• epsilon: Fuzz factor.
Returns
A tuple length of 3, (normalized_tensor, mean, variance).
batch_normalization
keras.backend.batch_normalization(x, mean, var, beta, gamma, axis=-1,
epsilon=0.001)
Applies batch normalization on x given mean, var, beta and gamma.

I.e. returns: output = (x - mean) / sqrt(var + epsilon) * gamma + beta
Arguments
• x: Input tensor or variable.
• mean: Mean of batch.
• var: Variance of batch.
• beta: Tensor with which to center the input.
• gamma: Tensor by which to scale the input.
• axis: Integer, the axis that should be normalized. (typically the features axis).
• epsilon: Fuzz factor.
Returns
A tensor.
def batch_normalization(x, mean, var, beta, gamma, axis=-1, epsilon=0.001):
return ((x - mean) / sqrt(var + epsilon)) * gamma + beta
concatenate
keras.backend.concatenate(tensors, axis=-1)
Concatenates a list of tensors alongside the specified axis.

Arguments
• tensors: list of tensors to concatenate.
• axis: concatenation axis.
Returns
A tensor.
reshape
keras.backend.reshape(x, shape)
Reshapes a tensor to the specified shape.

Arguments
• shape: Target shape tuple.
Returns
A tensor.
permute_dimensions
keras.backend.permute_dimensions(x, pattern)
Permutes axes in a tensor.

Arguments
• pattern: A tuple of dimension indices, e.g. (0, 2, 1).
Returns
A tensor.
resize_images
keras.backend.resize_images(x, height_factor, width_factor, data_format,
interpolation='nearest')
Resizes the images contained in a 4D tensor.

Arguments
• x: Tensor or variable to resize.
• height_factor: Positive integer.
• width_factor: Positive integer.
• data_format: string, "channels_last" or "channels_first".
• interpolation: A string, one of nearest or bilinear.
Returns
A tensor.
Raises
• ValueError: if data_format is
neither "channels_last" or "channels_first".
resize_volumes
keras.backend.resize_volumes(x, depth_factor, height_factor, width_factor,
data_format)
Resizes the volume contained in a 5D tensor.

Arguments
• x: Tensor or variable to resize.
• depth_factor: Positive integer.
• height_factor: Positive integer.
• width_factor: Positive integer.
Returns
A tensor.
Raises
repeat_elements
keras.backend.repeat_elements(x, rep, axis)
Repeats the elements of a tensor along an axis, like np.repeat.
If x has shape (s1, s2, s3) and axis is 1, the output will have shape (s1, s2 * rep, s3).
Arguments
• rep: Python integer, number of times to repeat.
• axis: Axis along which to repeat.
Returns
A tensor.
repeat
keras.backend.repeat(x, n)
Repeats a 2D tensor.
if x has shape (samples, dim) and n is 2, the output will have shape (samples, 2, dim).
Arguments
• n: Python integer, number of times to repeat.
Returns
A tensor.
arange
keras.backend.arange(start, stop=None, step=1, dtype='int32')
Creates a 1D tensor containing a sequence of integers.

The function arguments use the same convention as Theano's arange: if only one argument is provided,
it is in fact the "stop" argument and "start" is 0.
The default type of the returned tensor is 'int32' to match TensorFlow's default.
Arguments
• start: Start value.
• stop: Stop value.
• step: Difference between two successive values.
• dtype: Integer dtype to use.
Returns
An integer tensor.
tile
keras.backend.tile(x, n)
Creates a tensor by tiling x by n.
Arguments
• x: A tensor or variable
• n: A list of integer. The length must be the same as the number of dimensions in x.
Returns
A tiled tensor.
Example
>>> kvar = K.variable(np.random.random((2, 3)))
>>> kvar_tile = K.tile(K.eye(2), (2, 3))
>>> K.eval(kvar_tile)
array([[1., 0., 1., 0., 1., 0.],
[0., 1., 0., 1., 0., 1.],
[1., 0., 1., 0., 1., 0.],
[0., 1., 0., 1., 0., 1.]], dtype=float32)
def tile(x, n):
return np.tile(x, n)
flatten
keras.backend.flatten(x)
Flatten a tensor.
Arguments
Returns
A tensor, reshaped into 1-D
batch_flatten
keras.backend.batch_flatten(x)
Turn a nD tensor into a 2D tensor with same 0th dimension.

In other words, it flattens each data samples of a batch.
Arguments
Returns
A tensor.
expand_dims
keras.backend.expand_dims(x, axis=-1)
Adds a 1-sized dimension at index "axis".

Arguments
• axis: Position where to add a new axis.
Returns
A tensor with expanded dimensions.
squeeze
keras.backend.squeeze(x, axis)
Removes a 1-dimension from the tensor at index "axis".

Arguments
• axis: Axis to drop.
Returns
A tensor with the same data as x but reduced dimensions.
temporal_padding
keras.backend.temporal_padding(x, padding=(1, 1))
Pads the middle dimension of a 3D tensor.

Arguments
• padding: Tuple of 2 integers, how many zeros to add at the start and end of dim 1.
Returns
A padded 3D tensor.
spatial_2d_padding
keras.backend.spatial_2d_padding(x, padding=((1, 1), (1, 1)), data_format=None)
Pads the 2nd and 3rd dimensions of a 4D tensor.

Arguments
• padding: Tuple of 2 tuples, padding pattern.
Returns
A padded 4D tensor.
Raises
spatial_3d_padding
keras.backend.spatial_3d_padding(x, padding=((1, 1), (1, 1), (1, 1)),
data_format=None)
Pads 5D tensor with zeros along the depth, height, width dimensions.
Pads these dimensions with respectively "padding[0]", "padding[1]" and "padding[2]" zeros left and
right.
For 'channels_last' data_format, the 2nd, 3rd and 4th dimension will be padded. For 'channels_first'
data_format, the 3rd, 4th and 5th dimension will be padded.
Arguments
• padding: Tuple of 3 tuples, padding pattern.
Returns
A padded 5D tensor.
Raises
stack
keras.backend.stack(x, axis=0)
Stacks a list of rank R tensors into a rank R+1 tensor.
Arguments
• x: List of tensors.
• axis: Axis along which to perform stacking.
Returns
A tensor.
def stack(x, axis=0):
return np.stack(x, axis=axis)
one_hot
keras.backend.one_hot(indices, num_classes)
Computes the one-hot representation of an integer tensor.

Arguments
• indices: nD integer tensor of shape (batch_size, dim1, dim2, ... dim(n-1))
• num_classes: Integer, number of classes to consider.
Returns
(n + 1)D one hot representation of the input with shape (batch_size, dim1, dim2, ...
dim(n-1), num_classes)
reverse
keras.backend.reverse(x, axes)
Reverses a tensor along the specified axes.

Arguments
• x: Tensor to reverse.
• axes: Integer or iterable of integers. Axes to reverse.
Returns
A tensor.
def reverse(x, axes):
if isinstance(axes, list):
axes = tuple(axes)
return np.flip(x, axes)
slice
keras.backend.slice(x, start, size)
Extracts a slice from a tensor.

Arguments
• start: Integer list/tuple or tensor indicating the start indices of the slice along each axis.
• size: Integer list/tuple or tensor indicating how many dimensions to slice along each axis.
Returns
A sliced tensor:
new_x = x[start[0]: start[0] + size[0], ..., start[-1]: start[-1] + size[-1]]
Raises
• ValueError: if the dimension and the size of indices mismatches.
def slice(x, start, size):
slices = [py_slice(i, i + j) for i, j in zip(start, size)]
return x[tuple(slices)]
get_value
keras.backend.get_value(x)
Returns the value of a variable.

Arguments
• x: input variable.
Returns
A Numpy array.
batch_get_value
keras.backend.batch_get_value(ops)
Returns the value of more than one tensor variable.

Arguments
• ops: list of ops to run.
Returns
A list of Numpy arrays.
set_value
keras.backend.set_value(x, value)
Sets the value of a variable, from a Numpy array.

Arguments
• x: Variable to set to a new value.
• value: Value to set the tensor to, as a Numpy array (of the same shape).
batch_set_value
keras.backend.batch_set_value(tuples)
Sets the values of many tensor variables at once.

Arguments
• tuples: a list of tuples (tensor, value). value should be a Numpy array.
print_tensor
keras.backend.print_tensor(x, message='')
Prints message and the tensor value when evaluated.
Note that print_tensor returns a new tensor identical to x which should be used in the following
code. Otherwise the print operation is not taken into account during evaluation.
Example
>>> x = K.print_tensor(x, message="x is: ")
Arguments
• x: Tensor to print.
• message: Message to print jointly with the tensor.
Returns
The same tensor x, unchanged.
function
keras.backend.function(inputs, outputs, updates=None)
gradients
keras.backend.gradients(loss, variables)
Returns the gradients of loss w.r.t. variables.
Arguments
• loss: Scalar tensor to minimize.
• variables: List of variables.
Returns
A gradients tensor.
stop_gradient
keras.backend.stop_gradient(variables)
Returns variables but with zero gradient w.r.t. every other variable.
Arguments
• variables: tensor or list of tensors to consider constant with respect to any other variable.
Returns
A single tensor or a list of tensors (depending on the passed argument) that has constant gradient with
respect to any other variable.
rnn
keras.backend.rnn(step_function, inputs, initial_states, go_backwards=False,
mask=None, constants=None, unroll=False, input_length=None)
Iterates over the time dimension of a tensor.

Arguments
• step_function: Parameters: inputs: Tensor with shape (samples, ...) (no time dimension),
representing input for the batch of samples at a certain time step. states: List of tensors. Returns:
outputs: Tensor with shape (samples, ...) (no time dimension), new_states: List of tensors, same
length and shapes as 'states'.
• inputs: Tensor of temporal data of shape (samples, time, ...) (at least 3D).
• initial_states: Tensor with shape (samples, ...) (no time dimension), containing the initial values
for the states used in the step function.
• go_backwards: Boolean. If True, do the iteration over the time dimension in reverse order and
• mask: Binary tensor with shape (samples, time), with a zero for every element that is masked.
• constants: A list of constant values passed at each step.
• unroll: Whether to unroll the RNN or to use a symbolic loop (while_loop or scan
depending on backend).
• input_length: Static number of timesteps in the input.
Returns
A tuple, (last_output, outputs, new_states).
last_output: The latest output of the rnn, of shape (samples, ...) outputs: Tensor with shape
(samples, time, ...) where each entry outputs[s, t] is the output of the step function at
time t for sample s. new_states: List of tensors, latest states returned by the step function, of shape
(samples, ...).
Raises
• ValueError: If input dimension is less than 3.
• ValueError: If unroll is True but input timestep is not a fixed number.
• ValueError: If mask is provided (not None) but states is not provided (len(states) == 0).
switch
keras.backend.switch(condition, then_expression, else_expression)
Switches between two operations depending on a scalar value.

Note that both then_expression and else_expression should be symbolic tensors of the
same shape.
Arguments
• condition: tensor (int or bool).
• then_expression: either a tensor, or a callable that returns a tensor.
• else_expression: either a tensor, or a callable that returns a tensor.
Returns
The selected tensor.
Raises
• ValueError: If rank of condition is greater than rank of expressions.
def switch(condition, then_expression, else_expression):
cond_float = condition.astype(floatx())
while cond_float.ndim < then_expression.ndim:
cond_float = cond_float[..., np.newaxis]
return cond_float * then_expression + (1 - cond_float) * else_expression
in_train_phase
keras.backend.in_train_phase(x, alt, training=None)
Selects x in train phase, and alt otherwise.
Note that alt should have the same shape as x.
Arguments
• x: What to return in train phase (tensor or callable that returns a tensor).
• alt: What to return otherwise (tensor or callable that returns a tensor).
• training: Optional scalar tensor (or Python boolean, or Python integer) specifying the learning
phase.
Returns
Either x or alt based on the training flag. the training flag defaults to
K.learning_phase().
in_test_phase
keras.backend.in_test_phase(x, alt, training=None)
Selects x in test phase, and alt otherwise.
Note that alt should have the same shape as x.
Arguments
• x: What to return in test phase (tensor or callable that returns a tensor).
• alt: What to return otherwise (tensor or callable that returns a tensor).
• training: Optional scalar tensor (or Python boolean, or Python integer) specifying the learning
phase.
Returns
Either x or alt based on K.learning_phase.
relu
keras.backend.relu(x, alpha=0.0, max_value=None, threshold=0.0)
Rectified linear unit.

<= x < max_value, f(x) = alpha * (x - threshold) otherwise.
Arguments
• alpha: A scalar, slope of negative section (default=0.).
• max_value: float. Saturation threshold.
Returns
A tensor.
def relu(x, alpha=0., max_value=None, threshold=0.):
if max_value is None:
max_value = np.inf
above_threshold = x * (x >= threshold)
above_threshold = np.clip(above_threshold, 0.0, max_value)
below_threshold = alpha * (x - threshold) * (x < threshold)
return below_threshold + above_threshold
elu
keras.backend.elu(x, alpha=1.0)
Exponential linear unit.

Arguments
• x: A tensor or variable to compute the activation function for.
• alpha: A scalar, slope of negative section.
Returns
A tensor.
def elu(x, alpha=1.):
return x * (x > 0) + alpha * (np.exp(x) - 1.) * (x < 0)
softmax
keras.backend.softmax(x, axis=-1)
Softmax of a tensor.
Arguments
• axis: The dimension softmax would be performed on. The default is -1 which indicates the last
dimension.
Returns
A tensor.
def softmax(x, axis=-1):
y = np.exp(x - np.max(x, axis, keepdims=True))
return y / np.sum(y, axis, keepdims=True)
softplus
keras.backend.softplus(x)
Softplus of a tensor.
Arguments
Returns
A tensor.
def softplus(x):
return np.log(1. + np.exp(x))
softsign
keras.backend.softsign(x)
Softsign of a tensor.
Arguments
Returns
A tensor.
def softsign(x):
return x / (1 + np.abs(x))
categorical_crossentropy
keras.backend.categorical_crossentropy(target, output, from_logits=False, axis=-1)
Categorical crossentropy between an output tensor and a target tensor.

Arguments
• target: A tensor of the same shape as output.
• output: A tensor resulting from a softmax (unless from_logits is True, in which case
output is expected to be the logits).
• from_logits: Boolean, whether output is the result of a softmax, or is a tensor of logits.
• axis: Int specifying the channels axis. axis=-1 corresponds to data format
channels_last, and axis=1 corresponds to data format channels_first.
Returns
Output tensor.
Raises
• ValueError: if axis is neither -1 nor one of the axes of output.
sparse_categorical_crossentropy
keras.backend.sparse_categorical_crossentropy(target, output, from_logits=False,
axis=-1)
Categorical crossentropy with integer targets.

Arguments
• target: An integer tensor.
• output: A tensor resulting from a softmax (unless from_logits is True, in which case
output is expected to be the logits).
• from_logits: Boolean, whether output is the result of a softmax, or is a tensor of logits.
• axis: Int specifying the channels axis. axis=-1 corresponds to data format
channels_last, and axis=1 corresponds to data format channels_first.
Returns
Output tensor.
Raises
• ValueError: if axis is neither -1 nor one of the axes of output.
binary_crossentropy
keras.backend.binary_crossentropy(target, output, from_logits=False)
Binary crossentropy between an output tensor and a target tensor.

Arguments
• target: A tensor with the same shape as output.
• output: A tensor.
• from_logits: Whether output is expected to be a logits tensor. By default, we consider that
output encodes a probability distribution.
Returns
A tensor.
sigmoid
keras.backend.sigmoid(x)
Element-wise sigmoid.
Arguments
Returns
A tensor.
def sigmoid(x):
return 1. / (1. + np.exp(-x))
hard_sigmoid
keras.backend.hard_sigmoid(x)
Segment-wise linear approximation of sigmoid.

Faster than sigmoid. Returns 0. if x < -2.5, 1. if x > 2.5. In -2.5 <= x <= 2.5, returns
0.2 * x + 0.5.
Arguments
Returns
A tensor.
def hard_sigmoid(x):
y = 0.2 * x + 0.5
return np.clip(y, 0, 1)
tanh
keras.backend.tanh(x)
Element-wise tanh.
Arguments
Returns
A tensor.
def tanh(x):
return np.tanh(x)
dropout
keras.backend.dropout(x, level, noise_shape=None, seed=None)
Sets entries in x to zero at random, while scaling the entire tensor.
Arguments
• x: tensor
• level: fraction of the entries in the tensor that will be set to 0.
• noise_shape: shape for randomly generated keep/drop flags, must be broadcastable to the shape
of x
• seed: random seed to ensure determinism.
Returns
l2_normalize
keras.backend.l2_normalize(x, axis=None)
Normalizes a tensor wrt the L2 norm alongside the specified axis.

Arguments
• axis: axis along which to perform normalization.
Returns
A tensor.
def l2_normalize(x, axis=-1):
y = np.max(np.sum(x ** 2, axis, keepdims=True), axis, keepdims=True)
return x / np.sqrt(y)
in_top_k
keras.backend.in_top_k(predictions, targets, k)
Returns whether the targets are in the top k predictions.
Arguments
• predictions: A tensor of shape (batch_size, classes) and type float32.
• targets: A 1D tensor of length batch_size and type int32 or int64.
• k: An int, number of top elements to consider.
Returns
A 1D tensor of length batch_size and type bool. output[i] is True if predictions[i,
targets[i]] is within top-k values of predictions[i].
conv1d
keras.backend.conv1d(x, kernel, strides=1, padding='valid', data_format=None,
dilation_rate=1)
1D convolution.
Arguments
• kernel: kernel tensor.
• strides: stride integer.
• padding: string, "same", "causal" or "valid".
• dilation_rate: integer dilate rate.
Returns
A tensor, result of 1D convolution.
Raises
• ValueError: If data_format is neither "channels_last" nor "channels_first".
conv2d
keras.backend.conv2d(x, kernel, strides=(1, 1), padding='valid', data_format=None,
dilation_rate=(1, 1))
2D convolution.
Arguments
• strides: strides tuple.
• padding: string, "same" or "valid".
• data_format: string, "channels_last" or "channels_first". Whether to use Theano
or TensorFlow/CNTK data format for inputs/kernels/outputs.
• dilation_rate: tuple of 2 integers.
Returns
Raises
conv2d_transpose
keras.backend.conv2d_transpose(x, kernel, output_shape, strides=(1, 1),
padding='valid', data_format=None, dilation_rate=(1, 1))
2D deconvolution (i.e. transposed convolution).

Arguments
• output_shape: 1D int tensor for the output shape.
Returns
A tensor, result of transposed 2D convolution.
Raises
separable_conv1d
keras.backend.separable_conv1d(x, depthwise_kernel, pointwise_kernel, strides=1,
padding='valid', data_format=None, dilation_rate=1)
1D convolution with separable filters.

Arguments
• x: input tensor
• depthwise_kernel: convolution kernel for the depthwise convolution.
• pointwise_kernel: kernel for the 1x1 convolution.
• strides: stride integer.
• dilation_rate: integer dilation rate.
Returns
Output tensor.
Raises
separable_conv2d
keras.backend.separable_conv2d(x, depthwise_kernel, pointwise_kernel, strides=(1,
1), padding='valid', data_format=None, dilation_rate=(1, 1))

Arguments
• x: input tensor
• pointwise_kernel: kernel for the 1x1 convolution.
• strides: strides tuple (length 2).
• dilation_rate: tuple of integers, dilation rates for the separable convolution.
Returns
Output tensor.
Raises
depthwise_conv2d
keras.backend.depthwise_conv2d(x, depthwise_kernel, strides=(1, 1),
padding='valid', data_format=None, dilation_rate=(1, 1))

Arguments
• x: input tensor
• strides: strides tuple (length 2).
• dilation_rate: tuple of integers, dilation rates for the separable convolution.
Returns
Output tensor.
Raises
conv3d
keras.backend.conv3d(x, kernel, strides=(1, 1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1, 1))
3D convolution.
Arguments
Returns
Raises
conv3d_transpose
keras.backend.conv3d_transpose(x, kernel, output_shape, strides=(1, 1, 1),
padding='valid', data_format=None)
3D deconvolution (i.e. transposed convolution).

Arguments
• x: input tensor.
• output_shape: 1D int tensor for the output shape.
Returns
A tensor, result of transposed 3D convolution.
Raises
pool2d
keras.backend.pool2d(x, pool_size, strides=(1, 1), padding='valid',
data_format=None, pool_mode='max')
2D Pooling.
Arguments
• pool_size: tuple of 2 integers.
• strides: tuple of 2 integers.
• pool_mode: string, "max" or "avg".
Returns
A tensor, result of 2D pooling.
Raises
• ValueError: if pool_mode is neither "max" or "avg".
pool3d
keras.backend.pool3d(x, pool_size, strides=(1, 1, 1), padding='valid',
data_format=None, pool_mode='max')
3D Pooling.
Arguments
• pool_size: tuple of 3 integers.
• strides: tuple of 3 integers.
• pool_mode: string, "max" or "avg".
Returns
A tensor, result of 3D pooling.
Raises
• ValueError: if pool_mode is neither "max" or "avg".
local_conv1d
keras.backend.local_conv1d(inputs, kernel, kernel_size, strides, data_format=None)
Apply 1D conv with un-shared weights.

Arguments
• inputs: 3D tensor with shape: (batch_size, steps, input_dim)
• kernel: the unshared weight for convolution, with shape (output_length, feature_dim, filters)
• kernel_size: a tuple of a single integer, specifying the length of the 1D convolution window
• strides: a tuple of a single integer, specifying the stride length of the convolution
• data_format: the data format, channels_first or channels_last
Returns
the tensor after 1d conv with un-shared weights, with shape (batch_size, output_length, filters)
Raises
local_conv2d
keras.backend.local_conv2d(inputs, kernel, kernel_size, strides, output_shape,
data_format=None)
Apply 2D conv with un-shared weights.

Arguments
• inputs: 4D tensor with shape: (batch_size, filters, new_rows, new_cols) if
data_format='channels_first' or 4D tensor with shape: (batch_size, new_rows, new_cols, filters)
if data_format='channels_last'.
• kernel: the unshared weight for convolution, with shape (output_items, feature_dim, filters)
• kernel_size: a tuple of 2 integers, specifying the width and height of the 2D convolution
window.
• strides: a tuple of 2 integers, specifying the strides of the convolution along the width and
height.
• output_shape: a tuple with (output_row, output_col)
• data_format: the data format, channels_first or channels_last
Returns
A 4d tensor with shape: (batch_size, filters, new_rows, new_cols) if data_format='channels_first' or 4D
tensor with shape: (batch_size, new_rows, new_cols, filters) if data_format='channels_last'.
Raises
• ValueError: if data_format is neither channels_last or channels_first.
bias_add
keras.backend.bias_add(x, bias, data_format=None)
Adds a bias vector to a tensor.

Arguments
• bias: Bias tensor to add.
Returns
Output tensor.
Raises
ValueError: In one of the two cases below: 1. invalid data_format argument. 2. invalid bias shape.
the bias should be either a vector or a tensor with ndim(x) - 1 dimension Numpy implementation
random_normal
keras.backend.random_normal(shape, mean=0.0, stddev=1.0, dtype=None, seed=None)
Returns a tensor with normal distribution of values.

Arguments
• shape: A tuple of integers, the shape of tensor to create.
• mean: A float, mean of the normal distribution to draw samples.
• stddev: A float, standard deviation of the normal distribution to draw samples.
• dtype: String, dtype of returned tensor.
Returns
A tensor.
random_uniform
keras.backend.random_uniform(shape, minval=0.0, maxval=1.0, dtype=None, seed=None)
Returns a tensor with uniform distribution of values.

Arguments
• minval: A float, lower boundary of the uniform distribution to draw samples.
• maxval: A float, upper boundary of the uniform distribution to draw samples.
Returns
A tensor.
random_binomial
keras.backend.random_binomial(shape, p=0.0, dtype=None, seed=None)
Returns a tensor with random binomial distribution of values.

Arguments
• p: A float, 0. <= p <= 1, probability of binomial distribution.
Returns
A tensor.
truncated_normal
keras.backend.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=None, seed=None)
Returns a tensor with truncated random normal distribution of values.

The generated values follow a normal distribution with specified mean and standard deviation, except
that values whose magnitude is more than two standard deviations from the mean are dropped and re-
picked.
Arguments
• mean: Mean of the values.
• stddev: Standard deviation of the values.
Returns
A tensor.
ctc_label_dense_to_sparse
keras.backend.ctc_label_dense_to_sparse(labels, label_lengths)
Converts CTC labels from dense to sparse.

Arguments
• labels: dense CTC labels.
• label_lengths: length of the labels.
Returns
A sparse tensor representation of the labels.
ctc_batch_cost
keras.backend.ctc_batch_cost(y_true, y_pred, input_length, label_length)
Runs CTC loss algorithm on each batch element.

Arguments
• y_true: tensor (samples, max_string_length) containing the truth labels.
• y_pred: tensor (samples, time_steps, num_categories) containing the
prediction, or output of the softmax.
• input_length: tensor (samples, 1) containing the sequence length for each batch item in
y_pred.
• label_length: tensor (samples, 1) containing the sequence length for each batch item in
y_true.
Returns
Tensor with shape (samples,1) containing the CTC loss of each element.
ctc_decode
keras.backend.ctc_decode(y_pred, input_length, greedy=True, beam_width=100,
top_paths=1, merge_repeated=False)
Decodes the output of a softmax.

Can use either greedy search (also known as best path) or a constrained dictionary search.
Arguments
• y_pred: tensor (samples, time_steps, num_categories) containing the
prediction, or output of the softmax.
• input_length: tensor (samples, ) containing the sequence length for each batch item in
y_pred.
• greedy: perform much faster best-path search if True. This does not use a dictionary.
• beam_width: if greedy is False: a beam search decoder will be used with a beam of this
width.
• top_paths: if greedy is False, how many of the most probable paths will be returned.
• merge_repeated: if greedy is False, merge repeated classes in the output beams.
Returns
• Tuple: List: if greedy is True, returns a list of one element that contains the decoded
sequence. If False, returns the top_paths most probable decoded sequences. Important:
blank labels are returned as -1. Tensor (top_paths, ) that contains the log probability of
each decoded sequence.
control_dependencies
keras.backend.control_dependencies(control_inputs)
A context manager that specifies control dependencies.

Arguments
• control_inputs: A list of Operation or Tensor objects which must be executed or computed
before running the operations defined in the context. Can also be None to clear the control
dependencies.
Returns
A context manager.
map_fn
keras.backend.map_fn(fn, elems, name=None, dtype=None)
Map the function fn over the elements elems and return the outputs.
Arguments
• fn: Callable that will be called upon each element in elems
• elems: tensor
• name: A string name for the map node in the graph
• dtype: Output data type.
Returns
Tensor with dtype dtype.
foldl
keras.backend.foldl(fn, elems, initializer=None, name=None)
Reduce elems using fn to combine them from left to right.

Arguments
• fn: Callable that will be called upon each element in elems and an accumulator, for instance
lambda acc, x: acc + x
• elems: tensor
• initializer: The first value used (elems[0] in case of None)
• name: A string name for the foldl node in the graph
Returns
Tensor with same type and shape as initializer.
foldr
keras.backend.foldr(fn, elems, initializer=None, name=None)
Reduce elems using fn to combine them from right to left.

Arguments
• fn: Callable that will be called upon each element in elems and an accumulator, for instance
lambda acc, x: acc + x
• elems: tensor
• initializer: The first value used (elems[-1] in case of None)
• name: A string name for the foldr node in the graph
Returns
Tensor with same type and shape as initializer.
Usage of initializers
Initializations define the way to set the initial random weights of Keras layers.
The keyword arguments used for passing initializers to layers will depend on the layer. Usually it is
simply kernel_initializer and bias_initializer:
model.add(Dense(64,
kernel_initializer='random_uniform',
bias_initializer='zeros'))
Available initializers
The following built-in initializers are available as part of the keras.initializers module:
Initializer
keras.initializers.Initializer()
Initializer base class: all initializers inherit from this class.
Zeros
keras.initializers.Zeros()
Initializer that generates tensors initialized to 0.
Ones
keras.initializers.Ones()
Initializer that generates tensors initialized to 1.
Constant
keras.initializers.Constant(value=0)
Initializer that generates tensors initialized to a constant value.

Arguments
• value: float; the value of the generator tensors.
RandomNormal
keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None)
Initializer that generates tensors with a normal distribution.

Arguments
• mean: a python scalar or a scalar tensor. Mean of the random values to generate.
• stddev: a python scalar or a scalar tensor. Standard deviation of the random values to generate.
• seed: A Python integer. Used to seed the random generator.
RandomUniform
keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None)
Initializer that generates tensors with a uniform distribution.

Arguments
• minval: A python scalar or a scalar tensor. Lower bound of the range of random values to
generate.
• maxval: A python scalar or a scalar tensor. Upper bound of the range of random values to
generate. Defaults to 1 for float types.
TruncatedNormal
keras.initializers.TruncatedNormal(mean=0.0, stddev=0.05, seed=None)
Initializer that generates a truncated normal distribution.

These values are similar to values from a RandomNormal except that values more than two standard
deviations from the mean are discarded and redrawn. This is the recommended initializer for neural
network weights and filters.
Arguments
• mean: a python scalar or a scalar tensor. Mean of the random values to generate.
• stddev: a python scalar or a scalar tensor. Standard deviation of the random values to generate.
VarianceScaling
keras.initializers.VarianceScaling(scale=1.0, mode='fan_in', distribution='normal',
seed=None)
Initializer capable of adapting its scale to the shape of weights.

With distribution="normal", samples are drawn from a truncated normal distribution centered
on zero, with stddev = sqrt(scale / n) where n is:
• number of input units in the weight tensor, if mode = "fan_in"

• number of output units, if mode = "fan_out"
• average of the numbers of input and output units, if mode = "fan_avg"
With distribution="uniform", samples are drawn from a uniform distribution within [-limit,
limit], with limit = sqrt(3 * scale / n).
Arguments
• scale: Scaling factor (positive float).
• mode: One of "fan_in", "fan_out", "fan_avg".
• distribution: Random distribution to use. One of "normal", "uniform".
Raises
• ValueError: In case of an invalid value for the "scale", mode" or "distribution" arguments.
Orthogonal
keras.initializers.Orthogonal(gain=1.0, seed=None)
Initializer that generates a random orthogonal matrix.

Arguments
• gain: Multiplicative factor to apply to the orthogonal matrix.
References
• Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Identity
keras.initializers.Identity(gain=1.0)
Initializer that generates the identity matrix.
Only use for 2D matrices. If the desired matrix is not square, it gets padded with zeros for the
additional rows/columns.
Arguments
• gain: Multiplicative factor to apply to the identity matrix.
lecun_uniform
keras.initializers.lecun_uniform(seed=None)
LeCun uniform initializer.

It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(3 /
fan_in) where fan_in is the number of input units in the weight tensor.
Arguments
Returns
An initializer.
References
• Efficient BackProp
glorot_normal
keras.initializers.glorot_normal(seed=None)
Glorot normal initializer, also called Xavier normal initializer.

It draws samples from a truncated normal distribution centered on 0 with stddev = sqrt(2 /
(fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and
fan_out is the number of output units in the weight tensor.
Arguments
Returns
An initializer.
References
• Understanding the difficulty of training deep feedforward neural networks
glorot_uniform
keras.initializers.glorot_uniform(seed=None)
Glorot uniform initializer, also called Xavier uniform initializer.

(fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and
fan_out is the number of output units in the weight tensor.
Arguments
Returns
An initializer.
References
• Understanding the difficulty of training deep feedforward neural networks
he_normal
keras.initializers.he_normal(seed=None)
He normal initializer.
Arguments
Returns
An initializer.
References
Classification
lecun_normal
keras.initializers.lecun_normal(seed=None)
LeCun normal initializer.

Arguments
Returns
An initializer.
References
• Efficient Backprop
he_uniform
keras.initializers.he_uniform(seed=None)
He uniform variance scaling initializer.

Arguments
Returns
An initializer.
References
Classification
An initializer may be passed as a string (must match one of the available initializers above), or as a
callable:
from keras import initializers
model.add(Dense(64, kernel_initializer=initializers.random_normal(stddev=0.01)))
# also works; will use the default parameters.

model.add(Dense(64, kernel_initializer='random_normal'))
Using custom initializers

If passing a custom callable, then it must take the argument shape (shape of the variable to initialize)
and dtype (dtype of generated values):
def my_init(shape, dtype=None):

return K.random_normal(shape, dtype=dtype)
model.add(Dense(64, kernel_initializer=my_init))
Usage of regularizers
Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These
penalties are incorporated in the loss function that the network optimizes.
The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers
Dense, Conv1D, Conv2D and Conv3D have a unified API.
These layers expose 3 keyword arguments:

• kernel_regularizer: instance of keras.regularizers.Regularizer
• bias_regularizer: instance of keras.regularizers.Regularizer
• activity_regularizer: instance of keras.regularizers.Regularizer
Example
from keras import regularizers
model.add(Dense(64, input_dim=64,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Available penalties
keras.regularizers.l1(0.)
keras.regularizers.l2(0.)
keras.regularizers.l1_l2(l1=0.01, l2=0.01)
Developing new regularizers

Any function that takes in a weight matrix and returns a loss contribution tensor can be used as a
regularizer, e.g.:
def l1_reg(weight_matrix):
return 0.01 * K.sum(K.abs(weight_matrix))
model.add(Dense(64, input_dim=64,
kernel_regularizer=l1_reg))
Alternatively, you can write your regularizers in an object-oriented way; see the keras/regularizers.py
module for examples.
Usage of constraints
Functions from the constraints module allow setting constraints (eg. non-negativity) on network
parameters during optimization.
The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers
Dense, Conv1D, Conv2D and Conv3D have a unified API.
These layers expose 2 keyword arguments:

• kernel_constraint for the main weights matrix
• bias_constraint for the bias.
from keras.constraints import max_norm
model.add(Dense(64, kernel_constraint=max_norm(2.)))
Available constraints
MaxNorm
keras.constraints.MaxNorm(max_value=2, axis=0)
MaxNorm weight constraint.

Constrains the weights incident to each hidden unit to have a norm less than or equal to a desired value.
Arguments
• max_value: the maximum norm for the incoming weights.
• axis: integer, axis along which to calculate weight norms. For instance, in a Dense layer the
weight matrix has shape (input_dim, output_dim), set axis to 0 to constrain each
weight vector of length (input_dim,). In a Conv2D layer with
data_format="channels_last", the weight tensor has shape (rows, cols,
input_depth, output_depth), set axis to [0, 1, 2] to constrain the weights of
each filter tensor of size (rows, cols, input_depth).
References
NonNeg
keras.constraints.NonNeg()
Constrains the weights to be non-negative.
UnitNorm
keras.constraints.UnitNorm(axis=0)
Constrains the weights incident to each hidden unit to have unit norm.
Arguments
MinMaxNorm
keras.constraints.MinMaxNorm(min_value=0.0, max_value=1.0, rate=1.0, axis=0)
MinMaxNorm weight constraint.

Constrains the weights incident to each hidden unit to have the norm between a lower bound and an
upper bound.
Arguments
• min_value: the minimum norm for the incoming weights.
• max_value: the maximum norm for the incoming weights.
• rate: rate for enforcing the constraint: weights will be rescaled to yield (1 - rate) *
norm + rate * norm.clip(min_value, max_value). Effectively, this means that
rate=1.0 stands for strict enforcement of the constraint, while rate<1.0 means that weights will
be rescaled at each step to slowly move towards a value inside the desired interval.
Model visualization
Keras provides utility functions to plot a Keras model (using graphviz).
This will plot a graph of the model and save it to a file:

from keras.utils import plot_model
plot_model(model, to_file='model.png')
plot_model takes four optional arguments:
• show_shapes (defaults to False) controls whether output shapes are shown in the graph.
• show_layer_names (defaults to True) controls whether layer names are shown in the graph.
• expand_nested (defaults to False) controls whether to expand nested models into clusters in
the graph.
• dpi (defaults to 96) controls image dpi.
You can also directly obtain the pydot.Graph object and render it yourself, for example to show it in
an ipython notebook :
from IPython.display import SVG
from keras.utils import model_to_dot
SVG(model_to_dot(model).create(prog='dot', format='svg'))
Training history visualization

The fit() method on a Keras Model returns a History object. The History.history
attribute is a dictionary recording training loss values and metrics values at successive epochs, as well
as validation loss values and validation metrics values (if applicable). Here is a simple example using
matplotlib to generate loss & accuracy plots for training & validation:
import matplotlib.pyplot as plt
history = model.fit(x, y, validation_split=0.25, epochs=50, batch_size=16,

verbose=1)
# Plot training & validation accuracy values
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()
# Plot training & validation loss values

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()
Wrappers for the Scikit-Learn API

You can use Sequential Keras models (single-input only) as part of your Scikit-Learn workflow via
the wrappers found at keras.wrappers.scikit_learn.py.
There are two wrappers available:

keras.wrappers.scikit_learn.KerasClassifier(build_fn=None,
**sk_params), which implements the Scikit-Learn classifier interface,
keras.wrappers.scikit_learn.KerasRegressor(build_fn=None,
**sk_params), which implements the Scikit-Learn regressor interface.
Arguments
• build_fn: callable function or class instance
• sk_params: model parameters & fitting parameters
build_fn should construct, compile and return a Keras model, which will then be used to fit/predict.
One of the following three values could be passed to build_fn:
1. A function
2. An instance of a class that implements the __call__ method
3. None. This means you implement a class that inherits from either KerasClassifier or
KerasRegressor. The __call__ method of the present class will then be treated as the
default build_fn.
sk_params takes both model parameters and fitting parameters. Legal model parameters are the
arguments of build_fn. Note that like all other estimators in scikit-learn, build_fn should provide
default values for its arguments, so that you could create the estimator without passing any values to
sk_params.
sk_params could also accept parameters for calling fit, predict, predict_proba, and
score methods (e.g., epochs, batch_size). fitting (predicting) parameters are selected in the
following order:
1. Values passed to the dictionary arguments of fit, predict, predict_proba, and score
methods
2. Values passed to sk_params
3. The default values of the keras.models.Sequential fit, predict,
predict_proba and score methods
When using scikit-learn's grid_search API, legal tunable parameters are those you could pass to
sk_params, including fitting parameters. In other words, you could use grid_search to search for
the best batch_size or epochs as well as the model parameters.
CustomObjectScope
keras.utils.CustomObjectScope()
Provides a scope that changes to _GLOBAL_CUSTOM_OBJECTS cannot escape.
Code within a with statement will be able to access custom objects by name. Changes to global
custom objects persist within the enclosing with statement. At end of the with statement, global
custom objects are reverted to state at beginning of the with statement.
Example
Consider a custom object MyObject (e.g. a class):
with CustomObjectScope({'MyObject':MyObject}):
layer = Dense(..., kernel_regularizer='MyObject')
# save, load, etc. will recognize custom object by name
HDF5Matrix
keras.utils.HDF5Matrix(datapath, dataset, start=0, end=None, normalizer=None)
Representation of HDF5 dataset to be used instead of a Numpy array.

Example
x_data = HDF5Matrix('input/file.hdf5', 'data')
model.predict(x_data)
Providing start and end allows use of a slice of the dataset.
Optionally, a normalizer function (or lambda) can be given. This will be called on every slice of data
retrieved.
Arguments
• datapath: string, path to a HDF5 file
• dataset: string, name of the HDF5 dataset in the file specified in datapath
• start: int, start of desired slice of the specified dataset
• end: int, end of desired slice of the specified dataset
• normalizer: function to be called on data when retrieved
Returns
An array-like HDF5 dataset.
Sequence
keras.utils.Sequence()
Base object for fitting to a sequence of data, such as a dataset.

Every Sequence must implement the __getitem__ and the __len__ methods. If you want to
modify your dataset between epochs you may implement on_epoch_end. The method
__getitem__ should return a complete batch.
Notes
Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only
train once on each sample per epoch which is not the case with generators.
Examples
from skimage.io import imread
from skimage.transform import resize
import numpy as np
# Here, `x_set` is list of path to the images

# and `y_set` are the associated classes.
class CIFAR10Sequence(Sequence):
def __init__(self, x_set, y_set, batch_size):

self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return int(np.ceil(len(self.x) / float(self.batch_size)))
def __getitem__(self, idx):

batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
return np.array([
resize(imread(file_name), (200, 200))
for file_name in batch_x]), np.array(batch_y)
to_categorical
keras.utils.to_categorical(y, num_classes=None, dtype='float32')
Converts a class vector (integers) to binary class matrix.

E.g. for use with categorical_crossentropy.
Arguments
• y: class vector to be converted into a matrix (integers from 0 to num_classes).
• num_classes: total number of classes.
• dtype: The data type expected by the input, as a string (float32, float64, int32...)
Returns
A binary matrix representation of the input. The classes axis is placed last.
Example
# Consider an array of 5 labels out of a set of 3 classes {0, 1, 2}:
> labels
array([0, 2, 1, 2, 0])
# `to_categorical` converts this into a matrix with as many
# columns as there are classes. The number of rows
# stays the same.
> to_categorical(labels)
array([[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.]], dtype=float32)
normalize
keras.utils.normalize(x, axis=-1, order=2)
Normalizes a Numpy array.

Arguments
• x: Numpy array to normalize.
• axis: axis along which to normalize.
• order: Normalization order (e.g. 2 for L2 norm).
Returns
A normalized copy of the array.
get_file
keras.utils.get_file(fname, origin, untar=False, md5_hash=None, file_hash=None,
cache_subdir='datasets', hash_algorithm='auto', extract=False,
archive_format='auto', cache_dir=None)
Downloads a file from a URL if it not already in the cache.

By default the file at the url origin is downloaded to the cache_dir ~/.keras, placed in the
cache_subdir datasets, and given the filename fname. The final location of a file example.txt
would therefore be ~/.keras/datasets/example.txt.
Files in tar, tar.gz, tar.bz, and zip formats can also be extracted. Passing a hash will verify the file after
download. The command line programs shasum and sha256sum can compute the hash.
Arguments
• fname: Name of the file. If an absolute path /path/to/file.txt is specified the file will
be saved at that location.
• origin: Original URL of the file.
• untar: Deprecated in favor of 'extract'. boolean, whether the file should be decompressed
• md5_hash: Deprecated in favor of 'file_hash'. md5 hash of the file for verification
• file_hash: The expected hash string of the file after download. The sha256 and md5 hash
algorithms are both supported.
• cache_subdir: Subdirectory under the Keras cache dir where the file is saved. If an absolute
path /path/to/folder is specified the file will be saved at that location.
• hash_algorithm: Select the hash algorithm to verify the file. options are 'md5', 'sha256', and
'auto'. The default 'auto' detects the hash algorithm in use.
• extract: True tries extracting the file as an Archive, like tar or zip.
• archive_format: Archive format to try for extracting the file. Options are 'auto', 'tar', 'zip', and
None. 'tar' includes tar, tar.gz, and tar.bz files. The default 'auto' is ['tar', 'zip']. None or an empty
list will return no matches found.
• cache_dir: Location to store cached files, when None it defaults to the Keras Directory.
Returns
Path to the downloaded file
print_summary
keras.utils.print_summary(model, line_length=None, positions=None, print_fn=None)
Prints a summary of a model.

Arguments
• model: Keras model instance.
• line_length: Total length of printed lines (e.g. set this to adapt the display to different terminal
window sizes).
• positions: Relative or absolute positions of log elements in each line. If not provided, defaults
to [.33, .55, .67, 1.].
• print_fn: Print function to use. It will be called on each line of the summary. You can set it to a
custom function in order to capture the string summary. It defaults to print (prints to stdout).
plot_model
keras.utils.plot_model(model, to_file='model.png', show_shapes=False,
show_layer_names=True, rankdir='TB', expand_nested=False, dpi=96)
Converts a Keras model to dot format and save to a file.

Arguments
• model: A Keras model instance
• to_file: File name of the plot image.
• show_shapes: whether to display shape information.
• show_layer_names: whether to display layer names.
• rankdir: rankdir argument passed to PyDot, a string specifying the format of the plot: 'TB'
creates a vertical plot; 'LR' creates a horizontal plot.
• expand_nested: whether to expand nested models into clusters.
• dpi: dot DPI.
Returns
A Jupyter notebook Image object if Jupyter is installed. This enables in-line display of the model plots
in notebooks.
multi_gpu_model
keras.utils.multi_gpu_model(model, gpus=None, cpu_merge=True, cpu_relocation=False)
Replicates a model on different GPUs.

Specifically, this function implements single-machine multi-GPU data parallelism. It works in the
following way:
• Divide the model's input(s) into multiple sub-batches.
• Apply a model copy on each sub-batch. Every model copy is executed on a dedicated GPU.
• Concatenate the results (on CPU) into one big batch.
E.g. if your batch_size is 64 and you use gpus=2, then we will divide the input into 2 sub-batches
of 32 samples, process each sub-batch on one GPU, then return the full batch of 64 processed samples.
This induces quasi-linear speedup on up to 8 GPUs.
This function is only available with the TensorFlow backend for the time being.
Arguments
• model: A Keras model instance. To avoid OOM errors, this model could have been built on
CPU, for instance (see usage example below).
• gpus: Integer >= 2 or list of integers, number of GPUs or list of GPU IDs on which to create
model replicas.
• cpu_merge: A boolean value to identify whether to force merging model weights under the
scope of the CPU or not.
• cpu_relocation: A boolean value to identify whether to create the model's weights under the
scope of the CPU. If the model is not defined under any preceding device scope, you can still
rescue it by activating this option.
Returns
A Keras Model instance which can be used just like the initial model argument, but which distributes
its workload on multiple GPUs.
Examples
Example 1 - Training models with weights merge on CPU
import tensorflow as tf
from keras.applications import Xception
from keras.utils import multi_gpu_model
import numpy as np
num_samples = 1000
height = 224
width = 224
num_classes = 1000
# Instantiate the base model (or "template" model).

# We recommend doing this with under a CPU device scope,
# so that the model's weights are hosted on CPU memory.
# Otherwise they may end up hosted on a GPU, which would
# complicate weight sharing.
with tf.device('/cpu:0'):
model = Xception(weights=None,
input_shape=(height, width, 3),
classes=num_classes)
# Replicates the model on 8 GPUs.

# This assumes that your machine has 8 available GPUs.
parallel_model = multi_gpu_model(model, gpus=8)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
# Generate dummy data.

x = np.random.random((num_samples, height, width, 3))
y = np.random.random((num_samples, num_classes))
# This `fit` call will be distributed on 8 GPUs.

# Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)
# Save model via the template model (which shares the same weights):
model.save('my_model.h5')
Example 2 - Training models with weights merge on CPU using cpu_relocation

..
# Not needed to change the device scope for model definition:
model = Xception(weights=None, ..)
try:
parallel_model = multi_gpu_model(model, cpu_relocation=True)
print("Training using multiple GPUs..")
except ValueError:
parallel_model = model
print("Training using single GPU or CPU..")
parallel_model.compile(..)
..
Example 3 - Training models with weights merge on GPU (recommended for NV-link)
..
# Not needed to change the device scope for model definition:
model = Xception(weights=None, ..)
try:
parallel_model = multi_gpu_model(model, cpu_merge=False)
print("Training using multiple GPUs..")
except:
parallel_model = model
print("Training using single GPU or CPU..")
parallel_model.compile(..)
..
On model saving
To save the multi-gpu model, use .save(fname) or .save_weights(fname) with the template
model (the argument you passed to multi_gpu_model), rather than the model returned by
multi_gpu_model.
On Github Issues and Pull Requests
Found a bug? Have a new feature to suggest? Want to contribute changes to the codebase? Make sure
to read this first.
Bug reporting
Your code doesn't work, and you have determined that the issue lies with Keras? Follow these steps to
report a bug.
1. Your bug may already be fixed. Make sure to update to the current Keras master branch, as well
as the latest Theano/TensorFlow/CNTK master branch. To easily update Theano: pip
install git+git://github.com/Theano/Theano.git --upgrade
2. Search for similar issues. Make sure to delete is:open on the issue search to find solved
tickets as well. It's possible somebody has encountered this bug already. Also remember to
check out Keras' FAQ. Still having a problem? Open an issue on Github to let us know.
3. Make sure you provide us with useful information about your configuration: what OS are you
using? What Keras backend are you using? Are you running on GPU? If so, what is your
version of Cuda, of cuDNN? What is your GPU?
4. Provide us with a script to reproduce the issue. This script should be runnable as-is and should
not require external data download (use randomly generated data if you need to run a model on
some test data). We recommend that you use Github Gists to post your code. Any issue that
cannot be reproduced is likely to be closed.
5. If possible, take a stab at fixing the bug yourself --if you can!
The more information you provide, the easier it is for us to validate that there is a bug and the faster
we'll be able to take action. If you want your issue to be resolved quickly, following the steps above is
crucial.
Requesting a Feature
You can also use Tensorflow Github issues to request features you would like to see in Keras, or
changes in the Keras API.
1. Provide a clear and detailed explanation of the feature you want and why it's important to add.
Keep in mind that we want features that will be useful to the majority of our users and not just a
small subset. If you're just targeting a minority of users, consider writing an add-on library for
Keras. It is crucial for Keras to avoid bloating the API and codebase.
2. Provide code snippets demonstrating the API you have in mind and illustrating the use cases of
your feature. Of course, you don't need to write any real code at this point!
3. After discussing the feature you may choose to attempt a Pull Request on tf.keras. If you're at
all able, start writing some code. We always have more work to do than time to do it. If you can
write some code then that will speed the process along.
Requests for Contributions

This is the board where we list current outstanding issues and features to be added. If you want to start
contributing to Keras, this is the place to start.
Pull Requests
Where should I submit my pull request?
Note:
We are no longer adding new features to multi-backend Keras (we only fix bugs), as we are refocusing
development efforts on tf.keras. If you are still interested in submitting a feature pull request, please
direct it to tf.keras in the TensorFlow repository instead.
1. Keras improvements and bugfixes go to the Keras master branch.
2. Experimental new features such as layers and datasets go to keras-contrib. Unless it is a new
feature listed in Requests for Contributions, in which case it belongs in core Keras. If you think
your feature belongs in core Keras, you can submit a design doc to explain your feature and
argue for it (see explanations below).
Please note that PRs that are primarily about code style (as opposed to fixing bugs, improving docs, or
adding new functionality) will likely be rejected.
Here's a quick guide to submitting your improvements:
1. If your PR introduces a change in functionality, make sure you start by writing a design doc and
sending it to the Keras mailing list to discuss whether the change should be made, and how to
handle it. This will save you from having your PR closed down the road! Of course, if your PR
is a simple bug fix, you don't need to do that. The process for writing and submitting design
docs is as follow:
• Start from this Google Doc template, and copy it to new Google doc.
• Fill in the content. Note that you will need to insert code examples. To insert code, use a
Google Doc extension such as CodePretty (there are several such extensions available).
• Set sharing settings to "everyone with the link is allowed to comment"
• Send the document to keras-users@googlegroups.com with a subject that starts
with [API DESIGN REVIEW] (all caps) so that we notice it.
• Wait for comments, and answer them as they come. Edit the proposal as necessary.
• The proposal will finally be approved or rejected. Once approved, you can send out Pull
Requests or ask others to write Pull Requests.
2. Write the code (or get others to write it). This is the hard part!
3. Make sure any new function or class you introduce has proper docstrings. Make sure any code
you touch still has up-to-date docstrings and documentation. Docstring style should be
respected. In particular, they should be formatted in MarkDown, and there should be sections
for Arguments, Returns, Raises (if applicable). Look at other docstrings in the codebase
for examples.
4. Write tests. Your code should have full unit test coverage. If you want to see your PR merged
promptly, this is crucial.
5. Run our test suite locally. It's easy: from the Keras folder, simply run: py.test tests/.
• You will need to install the test requirements as well: pip install -e .[tests].
6. Make sure all tests are passing:
• with the Theano backend, on Python 2.7 and Python 3.6. Make sure you have the
development version of Theano.
• with the TensorFlow backend, on Python 2.7 and Python 3.6. Make sure you have the
development version of TensorFlow.
• with the CNTK backend, on Python 2.7 and Python 3.6. Make sure you have the
development version of CNTK.
7. We use PEP8 syntax conventions, but we aren't dogmatic when it comes to line length. Make
sure your lines stay reasonably sized, though. To make your life easier, we recommend running
a PEP8 linter:
• Install PEP8 packages: pip install pep8 pytest-pep8 autopep8
• Run a standalone PEP8 check: py.test --pep8 -m pep8
• You can automatically fix some PEP8 error by running: autopep8 -i --select
<errors> <FILENAME> for example: autopep8 -i --select E128
tests/keras/backend/test_backends.py
8. When committing, use appropriate, descriptive commit messages.
9. Update the documentation. If introducing new functionality, make sure you include code
snippets demonstrating the usage of your new feature.
10.Submit your PR. If your changes have been approved in a previous discussion, and if you have
complete (and passing) unit tests as well as proper docstrings/documentation, your PR is likely
to be merged promptly.
Adding new examples
Even if you don't contribute to the Keras source code, if you have an application of Keras that is
concise and powerful, please consider adding it to our collection of examples. Existing examples show
idiomatic Keras code: make sure to keep your own script in the same spirit.
An implementation of sequence to sequence

learning for performing addition
Input: "535+61" Output: "596" Padding is handled by using a repeated sentinel character (space)
Input may optionally be reversed, shown to increase performance in many tasks in: "Learning to
Execute" http://arxiv.org/abs/1410.4615 and "Sequence to Sequence Learning with Neural Networks"
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
Theoretically it introduces shorter term dependencies between source and target.
Two digits reversed: + One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in
55 epochs
Three digits reversed: + One layer LSTM (128 HN), 50k training examples = 99% train/test accuracy
in 100 epochs
Four digits reversed: + One layer LSTM (128 HN), 400k training examples = 99% train/test accuracy
in 20 epochs
Five digits reversed: + One layer LSTM (128 HN), 550k training examples = 99% train/test accuracy in
30 epochs
from __future__ import print_function
import numpy as np
from six.moves import range
class CharacterTable(object):
"""Given a set of characters:
+ Encode them to a one-hot integer representation
+ Decode the one-hot or integer representation to their character output
+ Decode a vector of probabilities to their character output
"""
def __init__(self, chars):
"""Initialize character table.
# Arguments
chars: Characters that can appear in the input.
"""
self.chars = sorted(set(chars))
self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
self.indices_char = dict((i, c) for i, c in enumerate(self.chars))
def encode(self, C, num_rows):

"""One-hot encode given string C.
# Arguments
C: string, to be encoded.
num_rows: Number of rows in the returned one-hot encoding. This is
used to keep the # of rows for each data the same.
"""
x = np.zeros((num_rows, len(self.chars)))
for i, c in enumerate(C):
x[i, self.char_indices[c]] = 1
return x
def decode(self, x, calc_argmax=True):

"""Decode the given vector or 2D array to their character output.
# Arguments
x: A vector or a 2D array of probabilities or one-hot representations;
or a vector of character indices (used with `calc_argmax=False`).
calc_argmax: Whether to find the character index with maximum
probability, defaults to `True`.
"""
if calc_argmax:
x = x.argmax(axis=-1)
return ''.join(self.indices_char[x] for x in x)
class colors:
ok = '\033[92m'
fail = '\033[91m'
close = '\033[0m'
# Parameters for the model and dataset.

TRAINING_SIZE = 50000
DIGITS = 3
REVERSE = True
# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of

# int is DIGITS.
MAXLEN = DIGITS + 1 + DIGITS
# All the numbers, plus sign and space for padding.

chars = '0123456789+ '
ctable = CharacterTable(chars)
questions = []
expected = []
seen = set()
print('Generating data...')
while len(questions) < TRAINING_SIZE:
f = lambda: int(''.join(np.random.choice(list('0123456789'))
for i in range(np.random.randint(1, DIGITS + 1))))
a, b = f(), f()
# Skip any addition questions we've already seen
# Also skip any such that x+Y == Y+x (hence the sorting).
key = tuple(sorted((a, b)))
if key in seen:
continue
seen.add(key)
# Pad the data with spaces such that it is always MAXLEN.
q = '{}+{}'.format(a, b)
query = q + ' ' * (MAXLEN - len(q))
ans = str(a + b)
# Answers can be of maximum size DIGITS + 1.
ans += ' ' * (DIGITS + 1 - len(ans))
if REVERSE:
# Reverse the query, e.g., '12+345 ' becomes ' 543+21'. (Note the
# space used for padding.)
query = query[::-1]
questions.append(query)
expected.append(ans)
print('Total addition questions:', len(questions))
print('Vectorization...')
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
y[i] = ctable.encode(sentence, DIGITS + 1)
# Shuffle (x, y) in unison as the later parts of x will almost all be larger
# digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]
# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]
print('Training Data:')
print(x_train.shape)
print(y_train.shape)
print('Validation Data:')
print(x_val.shape)
print(y_val.shape)
# Try replacing GRU, or SimpleRNN.

RNN = layers.LSTM
HIDDEN_SIZE = 128
BATCH_SIZE = 128
LAYERS = 1
print('Build model...')
# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
# Note: In a situation where your input sequences have a variable length,
# use input_shape=(None, num_feature).
model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars))))
# As the decoder RNN's input, repeatedly provide with the last output of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(DIGITS + 1))
# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(LAYERS):
# By setting return_sequences to True, return not only the last output but
# all the outputs so far in the form of (num_samples, timesteps,
# output_dim). This is necessary as TimeDistributed in the below expects
# the first dimension to be the timesteps.
model.add(RNN(HIDDEN_SIZE, return_sequences=True))
# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
model.add(layers.TimeDistributed(layers.Dense(len(chars), activation='softmax')))
optimizer='adam',
model.summary()
# Train the model each generation and show predictions against the validation
# dataset.
for iteration in range(1, 200):
print()
print('-' * 50)
print('Iteration', iteration)
batch_size=BATCH_SIZE,
epochs=1,
# Select 10 samples from the validation set at random so we can visualize
# errors.
for i in range(10):
ind = np.random.randint(0, len(x_val))
rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
preds = model.predict_classes(rowx, verbose=0)
q = ctable.decode(rowx[0])
correct = ctable.decode(rowy[0])
guess = ctable.decode(preds[0], calc_argmax=False)
print('Q', q[::-1] if REVERSE else q, end=' ')
print('T', correct, end=' ')
if correct == guess:
print(colors.ok + '☑' + colors.close, end=' ')
else:
print(colors.fail + '☒' + colors.close, end=' ')
This example demonstrates how to write custom

layers for Keras.
We build a custom activation layer called 'Antirectifier', which modifies the shape of the tensor that
passes through it. We need to specify two methods: compute_output_shape and call.
Note that the same result can also be achieved via a Lambda layer.
Because our custom layer is written with primitives from the Keras backend (K), our code can run both
on TensorFlow and Theano.
import keras
class Antirectifier(layers.Layer):
'''This is the combination of a sample-wise
L2 normalization with the concatenation of the
positive part of the input with the negative part
of the input. The result is a tensor of samples that are
twice as large as the input samples.
It can be used in place of a ReLU.
# Input shape
2D tensor of shape (samples, n)
# Output shape
2D tensor of shape (samples, 2*n)
# Theoretical justification
When applying ReLU, assuming that the distribution
of the previous output is approximately centered around 0.,
you are discarding half of your input. This is inefficient.
Antirectifier allows to return all-positive outputs like ReLU,

without discarding any data.
Tests on MNIST show that Antirectifier allows to train networks

with twice less parameters yet with comparable
classification accuracy as an equivalent ReLU-based network.
'''

shape = list(input_shape)
assert len(shape) == 2 # only valid for 2D tensors
shape[-1] *= 2
return tuple(shape)
def call(self, inputs):

inputs -= K.mean(inputs, axis=1, keepdims=True)
inputs = K.l2_normalize(inputs, axis=1)
pos = K.relu(inputs)
neg = K.relu(-inputs)
return K.concatenate([pos, neg], axis=1)
# global parameters
batch_size = 128
num_classes = 10
epochs = 40
# the data, split between train and test sets

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# build the model

model.add(layers.Dense(256, input_shape=(784,)))
model.add(Antirectifier())
model.add(layers.Dropout(0.1))
model.add(layers.Dense(256))
model.add(Antirectifier())
model.add(layers.Dropout(0.1))
model.add(layers.Dense(num_classes))
model.add(layers.Activation('softmax'))
# compile the model

# train the model

batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
# next, compare with an equivalent network

# with2x bigger Dense layers and ReLU
Trains two recurrent neural networks based

upon a story and a question.
The resulting merged vector is then queried to answer a range of bAbI tasks.
The results are comparable to those for an LSTM model provided in Weston et al.: "Towards AI-
Complete Question Answering: A Set of Prerequisite Toy Tasks" http://arxiv.org/abs/1502.05698
Task Number FB LSTM Baseline Keras QA

QA1 - Single Supporting Fact 50 52.1
QA2 - Two Supporting Facts 20 37.0
QA3 - Three Supporting Facts 20 20.5
Task Number FB LSTM Baseline Keras QA
QA4 - Two Arg. Relations 61 62.9
QA5 - Three Arg. Relations 70 61.9
QA6 - yes/No Questions 48 50.7
QA7 - Counting 49 78.9
QA8 - Lists/Sets 45 77.2
QA9 - Simple Negation 64 64.0
QA10 - Indefinite Knowledge 44 47.7
QA11 - Basic Coreference 72 74.9
QA12 - Conjunction 74 76.4
QA13 - Compound Coreference 94 94.4
QA14 - Time Reasoning 27 34.8
QA15 - Basic Deduction 21 32.4
QA16 - Basic Induction 23 50.6
QA17 - Positional Reasoning 51 49.1
QA18 - Size Reasoning 52 90.8
QA19 - Path Finding 8 9.0
QA20 - Agent's Motivations 91 90.7
For the resources related to the bAbI project, refer to:
https://research.facebook.com/researchers/1543934539189348
Notes
• With default word, sentence, and query vector sizes, the GRU model achieves:
• 52.1% test accuracy on QA1 in 20 epochs (2 seconds per epoch on CPU)
• 37.0% test accuracy on QA2 in 20 epochs (16 seconds per epoch on CPU) In comparison, the
Facebook paper achieves 50% and 20% for the LSTM baseline.
• The task does not traditionally parse the question separately. This likely improves accuracy and
is a good example of merging two RNNs.
• The word vector embeddings are not shared between the story and question RNNs.
• See how the accuracy changes given 10,000 training samples (en-10k) instead of only 1000.
1000 was used in order to be comparable to the original paper.
• Experiment with GRU, LSTM, and JZS1-3 as they give subtly different results.
• The length and noise (i.e. 'useless' story components) impact the ability of LSTMs / GRUs to
provide the correct answer. Given only the supporting facts, these RNNs can achieve 100%
accuracy on many tasks. Memory networks and neural networks that use attentional processes
can efficiently search through this noise to find the relevant statements, improving performance
substantially. This becomes especially obvious on QA2 and QA3, both far longer than QA1.
from functools import reduce
import re
import tarfile
import numpy as np
from keras.utils.data_utils import get_file

from keras.layers.embeddings import Embedding
from keras.layers import recurrent
from keras.preprocessing.sequence import pad_sequences
def tokenize(sent):
'''Return the tokens of a sentence including punctuation.
>>> tokenize('Bob dropped the apple. Where is the apple?')

['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']
'''
return [x.strip() for x in re.split(r'(\W+)', sent) if x.strip()]
def parse_stories(lines, only_supporting=False):

'''Parse stories provided in the bAbi tasks format
If only_supporting is true,
only the sentences that support the answer are kept.
'''
data = []
story = []
for line in lines:
line = line.decode('utf-8').strip()
nid, line = line.split(' ', 1)
nid = int(nid)
if nid == 1:
story = []
if '\t' in line:
q, a, supporting = line.split('\t')
q = tokenize(q)
if only_supporting:
# Only select the related substory
supporting = map(int, supporting.split())
substory = [story[i - 1] for i in supporting]
else:
# Provide all the substories
substory = [x for x in story if x]
data.append((substory, q, a))
story.append('')
else:
sent = tokenize(line)
story.append(sent)
return data
def get_stories(f, only_supporting=False, max_length=None):

'''Given a file name, read the file, retrieve the stories,
and then convert the sentences into a single story.
If max_length is supplied,
any stories longer than max_length tokens will be discarded.
'''
data = parse_stories(f.readlines(), only_supporting=only_supporting)
flatten = lambda data: reduce(lambda x, y: x + y, data)
data = [(flatten(story), q, answer) for story, q, answer in data
if not max_length or len(flatten(story)) < max_length]
return data
def vectorize_stories(data, word_idx, story_maxlen, query_maxlen):

xs = []
xqs = []
ys = []
for story, query, answer in data:
x = [word_idx[w] for w in story]
xq = [word_idx[w] for w in query]
# let's not forget that index 0 is reserved
y = np.zeros(len(word_idx) + 1)
y[word_idx[answer]] = 1
xs.append(x)
xqs.append(xq)
ys.append(y)
return (pad_sequences(xs, maxlen=story_maxlen),
pad_sequences(xqs, maxlen=query_maxlen), np.array(ys))
RNN = recurrent.LSTM
EMBED_HIDDEN_SIZE = 50
SENT_HIDDEN_SIZE = 100
QUERY_HIDDEN_SIZE = 100
BATCH_SIZE = 32
EPOCHS = 20
print('RNN / Embed / Sent / Query = {}, {}, {}, {}'.format(RNN,
EMBED_HIDDEN_SIZE,
SENT_HIDDEN_SIZE,
QUERY_HIDDEN_SIZE))
try:
path = get_file('babi-tasks-v1-2.tar.gz',
origin='https://s3.amazonaws.com/text-datasets/'
'babi_tasks_1-20_v1-2.tar.gz')
except:
print('Error downloading dataset, please download it manually:\n'
'$ wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2'
'.tar.gz\n'
'$ mv tasks_1-20_v1-2.tar.gz ~/.keras/datasets/babi-tasks-v1-2.tar.gz')
raise
# Default QA1 with 1000 samples

# challenge = 'tasks_1-20_v1-2/en/qa1_single-supporting-fact_{}.txt'
# QA1 with 10,000 samples
# challenge = 'tasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_{}.txt'
# QA2 with 1000 samples
challenge = 'tasks_1-20_v1-2/en/qa2_two-supporting-facts_{}.txt'
# challenge = 'tasks_1-20_v1-2/en-10k/qa2_two-supporting-facts_{}.txt'
with tarfile.open(path) as tar:
train = get_stories(tar.extractfile(challenge.format('train')))
test = get_stories(tar.extractfile(challenge.format('test')))
vocab = set()
for story, q, answer in train + test:
vocab |= set(story + q + [answer])
vocab = sorted(vocab)
# Reserve 0 for masking via pad_sequences

vocab_size = len(vocab) + 1
word_idx = dict((c, i + 1) for i, c in enumerate(vocab))
story_maxlen = max(map(len, (x for x, _, _ in train + test)))
query_maxlen = max(map(len, (x for _, x, _ in train + test)))
x, xq, y = vectorize_stories(train, word_idx, story_maxlen, query_maxlen)

tx, txq, ty = vectorize_stories(test, word_idx, story_maxlen, query_maxlen)
print('vocab = {}'.format(vocab))
print('x.shape = {}'.format(x.shape))
print('xq.shape = {}'.format(xq.shape))
print('y.shape = {}'.format(y.shape))
print('story_maxlen, query_maxlen = {}, {}'.format(story_maxlen, query_maxlen))
sentence = layers.Input(shape=(story_maxlen,), dtype='int32')

encoded_sentence = layers.Embedding(vocab_size, EMBED_HIDDEN_SIZE)(sentence)
encoded_sentence = RNN(SENT_HIDDEN_SIZE)(encoded_sentence)
question = layers.Input(shape=(query_maxlen,), dtype='int32')

encoded_question = layers.Embedding(vocab_size, EMBED_HIDDEN_SIZE)(question)
encoded_question = RNN(QUERY_HIDDEN_SIZE)(encoded_question)
merged = layers.concatenate([encoded_sentence, encoded_question])

preds = layers.Dense(vocab_size, activation='softmax')(merged)
model = Model([sentence, question], preds)

model.compile(optimizer='adam',
print('Training')
model.fit([x, xq], y,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_split=0.05)
print('Evaluation')
loss, acc = model.evaluate([tx, txq], ty,
batch_size=BATCH_SIZE)
print('Test loss / test accuracy = {:.4f} / {:.4f}'.format(loss, acc))
Trains a memory network on the bAbI dataset.

References:
• Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov, Alexander M. Rush, "Towards
AI-Complete Question Answering: A Set of Prerequisite Toy Tasks"
• Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, "End-To-End Memory
Networks"
Reaches 98.6% accuracy on task 'single_supporting_fact_10k' after 120 epochs. Time per epoch: 3s on
CPU (core i7).
from keras.models import Sequential, Model

from keras.layers.embeddings import Embedding
from keras.layers import Input, Activation, Dense, Permute, Dropout
from keras.layers import add, dot, concatenate
from keras.preprocessing.sequence import pad_sequences
from functools import reduce
import tarfile
import numpy as np
import re
def tokenize(sent):
'''Return the tokens of a sentence including punctuation.
>>> tokenize('Bob dropped the apple. Where is the apple?')

['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']
'''
return [x.strip() for x in re.split(r'(\W+)?', sent) if x.strip()]
def parse_stories(lines, only_supporting=False):

'''Parse stories provided in the bAbi tasks format
If only_supporting is true, only the sentences

that support the answer are kept.
'''
data = []
story = []
for line in lines:
line = line.decode('utf-8').strip()
nid, line = line.split(' ', 1)
nid = int(nid)
if nid == 1:
story = []
if '\t' in line:
q, a, supporting = line.split('\t')
q = tokenize(q)
if only_supporting:
# Only select the related substory
supporting = map(int, supporting.split())
substory = [story[i - 1] for i in supporting]
else:
# Provide all the substories
substory = [x for x in story if x]
data.append((substory, q, a))
story.append('')
else:
sent = tokenize(line)
story.append(sent)
return data
def get_stories(f, only_supporting=False, max_length=None):

'''Given a file name, read the file,
retrieve the stories,
and then convert the sentences into a single story.
If max_length is supplied,
any stories longer than max_length tokens will be discarded.
'''
data = parse_stories(f.readlines(), only_supporting=only_supporting)
flatten = lambda data: reduce(lambda x, y: x + y, data)
data = [(flatten(story), q, answer) for story, q, answer in data
if not max_length or len(flatten(story)) < max_length]
return data
def vectorize_stories(data):
inputs, queries, answers = [], [], []
for story, query, answer in data:
inputs.append([word_idx[w] for w in story])
queries.append([word_idx[w] for w in query])
answers.append(word_idx[answer])
return (pad_sequences(inputs, maxlen=story_maxlen),
pad_sequences(queries, maxlen=query_maxlen),
np.array(answers))
try:
path = get_file('babi-tasks-v1-2.tar.gz',
origin='https://s3.amazonaws.com/text-datasets/'
'babi_tasks_1-20_v1-2.tar.gz')
except:
print('Error downloading dataset, please download it manually:\n'
'$ wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2'
'.tar.gz\n'
'$ mv tasks_1-20_v1-2.tar.gz ~/.keras/datasets/babi-tasks-v1-2.tar.gz')
raise
challenges = {
'single_supporting_fact_10k': 'tasks_1-20_v1-2/en-10k/qa1_'
'single-supporting-fact_{}.txt',
'two_supporting_facts_10k': 'tasks_1-20_v1-2/en-10k/qa2_'
'two-supporting-facts_{}.txt',
}
challenge_type = 'single_supporting_fact_10k'
challenge = challenges[challenge_type]
print('Extracting stories for the challenge:', challenge_type)

with tarfile.open(path) as tar:
train_stories = get_stories(tar.extractfile(challenge.format('train')))
test_stories = get_stories(tar.extractfile(challenge.format('test')))
vocab = set()
for story, q, answer in train_stories + test_stories:
vocab |= set(story + q + [answer])
vocab = sorted(vocab)
# Reserve 0 for masking via pad_sequences

vocab_size = len(vocab) + 1
story_maxlen = max(map(len, (x for x, _, _ in train_stories + test_stories)))
query_maxlen = max(map(len, (x for _, x, _ in train_stories + test_stories)))
print('-')
print('Vocab size:', vocab_size, 'unique words')
print('Story max length:', story_maxlen, 'words')
print('Query max length:', query_maxlen, 'words')
print('Number of training stories:', len(train_stories))
print('Number of test stories:', len(test_stories))
print('-')
print('Here\'s what a "story" tuple looks like (input, query, answer):')
print(train_stories[0])
print('-')
print('Vectorizing the word sequences...')
word_idx = dict((c, i + 1) for i, c in enumerate(vocab))

inputs_train, queries_train, answers_train = vectorize_stories(train_stories)
inputs_test, queries_test, answers_test = vectorize_stories(test_stories)
print('-')
print('inputs: integer tensor of shape (samples, max_length)')
print('inputs_train shape:', inputs_train.shape)
print('inputs_test shape:', inputs_test.shape)
print('-')
print('queries: integer tensor of shape (samples, max_length)')
print('queries_train shape:', queries_train.shape)
print('queries_test shape:', queries_test.shape)
print('-')
print('answers: binary (1 or 0) tensor of shape (samples, vocab_size)')
print('answers_train shape:', answers_train.shape)
print('answers_test shape:', answers_test.shape)
print('-')
print('Compiling...')
# placeholders
input_sequence = Input((story_maxlen,))
question = Input((query_maxlen,))
# encoders
# embed the input sequence into a sequence of vectors
input_encoder_m = Sequential()
input_encoder_m.add(Embedding(input_dim=vocab_size,
output_dim=64))
input_encoder_m.add(Dropout(0.3))
# output: (samples, story_maxlen, embedding_dim)
# embed the input into a sequence of vectors of size query_maxlen

input_encoder_c = Sequential()
input_encoder_c.add(Embedding(input_dim=vocab_size,
output_dim=query_maxlen))
input_encoder_c.add(Dropout(0.3))
# output: (samples, story_maxlen, query_maxlen)
# embed the question into a sequence of vectors

question_encoder = Sequential()
question_encoder.add(Embedding(input_dim=vocab_size,
output_dim=64,
input_length=query_maxlen))
question_encoder.add(Dropout(0.3))
# output: (samples, query_maxlen, embedding_dim)
# encode input sequence and questions (which are indices)

# to sequences of dense vectors
input_encoded_m = input_encoder_m(input_sequence)
input_encoded_c = input_encoder_c(input_sequence)
question_encoded = question_encoder(question)
# compute a 'match' between the first input vector sequence

# and the question vector sequence
# shape: `(samples, story_maxlen, query_maxlen)`
match = dot([input_encoded_m, question_encoded], axes=(2, 2))
match = Activation('softmax')(match)
# add the match matrix with the second input vector sequence
response = add([match, input_encoded_c]) # (samples, story_maxlen, query_maxlen)
response = Permute((2, 1))(response) # (samples, query_maxlen, story_maxlen)
# concatenate the match matrix with the question vector sequence

answer = concatenate([response, question_encoded])
# the original paper uses a matrix multiplication for this reduction step.
# we choose to use a RNN instead.
answer = LSTM(32)(answer) # (samples, 32)
# one regularization layer -- more would probably be needed.

answer = Dropout(0.3)(answer)
answer = Dense(vocab_size)(answer) # (samples, vocab_size)
# we output a probability distribution over the vocabulary
answer = Activation('softmax')(answer)
# build the final model

model = Model([input_sequence, question], answer)
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy',
# train
model.fit([inputs_train, queries_train], answers_train,
batch_size=32,
epochs=120,
validation_data=([inputs_test, queries_test], answers_test))
Train a simple deep CNN on the CIFAR10 small
images dataset.
It gets to 75% validation accuracy in 25 epochs, and 79% after 50 epochs. (it's still underfitting at that
point, though).
import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dense, Dropout, Activation, Flatten
import os
batch_size = 32
num_classes = 10
epochs = 100
data_augmentation = True
num_predictions = 20
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'keras_cifar10_trained_model.h5'
# The data, split between train and test sets:

print('x_train shape:', x_train.shape)
# Convert class vectors to binary class matrices.

model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Conv2D(32, (3, 3)))
model.add(Conv2D(64, (3, 3), padding='same'))

model.add(Conv2D(64, (3, 3)))
model.add(Dense(num_classes))
# initiate RMSprop optimizer
opt = keras.optimizers.RMSprop(learning_rate=0.0001, decay=1e-6)
# Let's train the model using RMSprop

optimizer=opt,
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
if not data_augmentation:
print('Not using data augmentation.')
epochs=epochs,
validation_data=(x_test, y_test),
shuffle=True)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
zca_epsilon=1e-06, # epsilon for ZCA whitening
rotation_range=0, # randomly rotate images in the range (degrees, 0 to
180)
# randomly shift images horizontally (fraction of total width)
# randomly shift images vertically (fraction of total height)
shear_range=0., # set range for random shear
zoom_range=0., # set range for random zoom
channel_shift_range=0., # set range for random channel shifts
# set mode for filling points outside the input boundaries
fill_mode='nearest',
cval=0., # value used for fill_mode = "constant"
horizontal_flip=True, # randomly flip images
vertical_flip=False, # randomly flip images
# set rescaling factor (applied before any other transformation)
rescale=None,
# set function that will be applied on each input
preprocessing_function=None,
# image data format, either "channels_first" or "channels_last"
data_format=None,
# fraction of images reserved for validation (strictly between 0 and 1)
# Compute quantities required for feature-wise normalization

# (std, mean, and principal components if ZCA whitening is applied).
# Fit the model on the batches generated by datagen.flow().

model.fit_generator(datagen.flow(x_train, y_train,
batch_size=batch_size),
epochs=epochs,
workers=4)
# Save model and weights

if not os.path.isdir(save_dir):
os.makedirs(save_dir)
model_path = os.path.join(save_dir, model_name)
model.save(model_path)
print('Saved trained model at %s ' % model_path)
# Score trained model.

scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
Trains a ResNet on the CIFAR10 dataset.

ResNet v1: Deep Residual Learning for Image Recognition
ResNet v2: Identity Mappings in Deep Residual Networks
Model n 200-epoch accuracy Original paper accuracy sec/epoch GTX1080Ti

ResNet20 v1 3 92.16 % 91.25 % 35
ResNet32 v1 5 92.46 % 92.49 % 50
ResNet44 v1 7 92.50 % 92.83 % 70
ResNet56 v1 9 92.71 % 93.03 % 90
ResNet110 v1 18 92.65 % 93.39+-.16 % 165
ResNet164 v1 27 -% 94.07 % -
ResNet1001 v1 N/A -% 92.39 % -

Model n 200-epoch accuracy Original paper accuracy sec/epoch GTX1080Ti

ResNet20 v2 2 -% -% ---
ResNet32 v2 N/A NA % NA % NA
ResNet44 v2 N/A NA % NA % NA
ResNet56 v2 6 93.01 % NA % 100
ResNet110 v2 12 93.15 % 93.63 % 180
ResNet164 v2 18 -% 94.54 % -
ResNet1001 v2 111 -% 95.08+-.14 % -
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
import numpy as np
import os
# Training parameters
batch_size = 32 # orig paper trained all networks with batch_size=128
epochs = 200
data_augmentation = True
num_classes = 10
# Subtracting pixel mean improves accuracy

subtract_pixel_mean = True
# Model parameter
# ----------------------------------------------------------------------------
# | | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model | n | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
# |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20 | 3 (2)| 92.16 | 91.25 | ----- | ----- | 35 (---)
# ResNet32 | 5(NA)| 92.46 | 92.49 | NA | NA | 50 ( NA)
# ResNet44 | 7(NA)| 92.50 | 92.83 | NA | NA | 70 ( NA)
# ResNet56 | 9 (6)| 92.71 | 93.03 | 93.01 | NA | 90 (100)
# ResNet110 |18(12)| 92.65 | 93.39+-.16| 93.15 | 93.63 | 165(180)
# ResNet164 |27(18)| ----- | 94.07 | ----- | 94.54 | ---(---)
# ResNet1001| (111)| ----- | 92.39 | ----- | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------
n = 3
# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 1
# Computed depth from supplied model parameter n

if version == 1:
depth = n * 6 + 2
elif version == 2:
depth = n * 9 + 2
# Model name, depth and version

model_type = 'ResNet%dv%d' % (depth, version)
# Load the CIFAR10 data.

# Input image dimensions.

input_shape = x_train.shape[1:]
# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
# If subtract pixel mean is enabled

if subtract_pixel_mean:
x_train_mean = np.mean(x_train, axis=0)
x_train -= x_train_mean
x_test -= x_train_mean

print('y_train shape:', y_train.shape)
# Convert class vectors to binary class matrices.

def lr_schedule(epoch):
"""Learning Rate Schedule
Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
Called automatically every epoch as part of callbacks during training.
# Arguments
epoch (int): The number of epochs
# Returns
lr (float32): learning rate
"""
lr = 1e-3
if epoch > 180:
lr *= 0.5e-3
elif epoch > 160:
lr *= 1e-3
elif epoch > 120:
lr *= 1e-2
elif epoch > 80:
lr *= 1e-1
print('Learning rate: ', lr)
return lr
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1,
activation='relu',
batch_normalization=True,
conv_first=True):
"""2D Convolution-Batch Normalization-Activation stack builder
# Arguments
inputs (tensor): input tensor from input image or previous layer
num_filters (int): Conv2D number of filters
kernel_size (int): Conv2D square kernel dimensions
strides (int): Conv2D square stride dimensions
activation (string): activation name
batch_normalization (bool): whether to include batch normalization
conv_first (bool): conv-bn-activation (True) or
bn-activation-conv (False)
# Returns
x (tensor): tensor as input to the next layer
"""
conv = Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=l2(1e-4))
x = inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
else:
if batch_normalization:
if activation is not None:
x = Activation(activation)(x)
x = conv(x)
return x
def resnet_v1(input_shape, depth, num_classes=10):

"""ResNet Version 1 Model builder [a]
Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
Last ReLU is after the shortcut connection.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filters is
doubled. Within each stage, the layers have the same number filters and the
same number of filters.
Features maps sizes:
stage 0: 32x32, 16
stage 1: 16x16, 32
stage 2: 8x8, 64
The Number of parameters is approx the same as Table 6 of [a]:
ResNet20 0.27M
ResNet32 0.46M
ResNet44 0.66M
ResNet56 0.85M
ResNet110 1.7M
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 6 != 0:
raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
# Start model definition.
num_filters = 16
num_res_blocks = int((depth - 2) / 6)
inputs = Input(shape=input_shape)
x = resnet_layer(inputs=inputs)
# Instantiate the stack of residual units
for stack in range(3):
for res_block in range(num_res_blocks):
strides = 1
if stack > 0 and res_block == 0: # first layer but not first stack
strides = 2 # downsample
y = resnet_layer(inputs=x,
num_filters=num_filters,
strides=strides)
y = resnet_layer(inputs=y,
activation=None)
if stack > 0 and res_block == 0: # first layer but not first stack
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
x = Activation('relu')(x)
num_filters *= 2
# Add classifier on top.

# v1 does not use BN after last shortcut connection-ReLU
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
def resnet_v2(input_shape, depth, num_classes=10):

"""ResNet Version 2 Model builder [b]
Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as

bottleneck layer
First shortcut connection per layer is 1 x 1 Conv2D.
Second and onwards shortcut connection is identity.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filter maps is
doubled. Within each stage, the layers have the same number filters and the
same filter map sizes.
Features maps sizes:
conv1 : 32x32, 16
stage 0: 32x32, 64
stage 1: 16x16, 128
stage 2: 8x8, 256
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 16
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True)
# Instantiate the stack of residual units

for stage in range(3):
for res_block in range(num_res_blocks):
activation = 'relu'
batch_normalization = True
strides = 1
if stage == 0:
num_filters_out = num_filters_in * 4
if res_block == 0: # first layer and first stage
activation = None
batch_normalization = False
else:
num_filters_out = num_filters_in * 2
if res_block == 0: # first layer but not first stage
strides = 2 # downsample
# bottleneck residual unit

y = resnet_layer(inputs=x,
kernel_size=1,
strides=strides,
activation=activation,
batch_normalization=batch_normalization,
conv_first=False)
conv_first=False)
num_filters=num_filters_out,
kernel_size=1,
conv_first=False)
if res_block == 0:
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters_out,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
num_filters_in = num_filters_out
# Add classifier on top.

# v2 has BN-ReLU before Pooling
x = Activation('relu')(x)
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(num_classes,
activation='softmax',
kernel_initializer='he_normal')(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
if version == 2:
model = resnet_v2(input_shape=input_shape, depth=depth)
else:
model = resnet_v1(input_shape=input_shape, depth=depth)
optimizer=Adam(learning_rate=lr_schedule(0)),
model.summary()
print(model_type)
# Prepare model model saving directory.

save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'cifar10_%s_model.{epoch:03d}.h5' % model_type
if not os.path.isdir(save_dir):
os.makedirs(save_dir)
filepath = os.path.join(save_dir, model_name)
# Prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath=filepath,
monitor='val_acc',
verbose=1,
save_best_only=True)
lr_scheduler = LearningRateScheduler(lr_schedule)
lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
cooldown=0,
patience=5,
min_lr=0.5e-6)
callbacks = [checkpoint, lr_reducer, lr_scheduler]
# Run training, with or without data augmentation.

if not data_augmentation:
print('Not using data augmentation.')
epochs=epochs,
shuffle=True,
callbacks=callbacks)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and realtime data augmentation:
# set input mean to 0 over the dataset
featurewise_center=False,
# set each sample mean to 0
samplewise_center=False,
# divide inputs by std of dataset
featurewise_std_normalization=False,
# divide each input by its std
samplewise_std_normalization=False,
# apply ZCA whitening
zca_whitening=False,
# epsilon for ZCA whitening
zca_epsilon=1e-06,
# randomly rotate images in the range (deg 0 to 180)
rotation_range=0,
# randomly shift images horizontally
# randomly shift images vertically
# set range for random shear
shear_range=0.,
# set range for random zoom
zoom_range=0.,
# set range for random channel shifts
channel_shift_range=0.,
# set mode for filling points outside the input boundaries
fill_mode='nearest',
# value used for fill_mode = "constant"
cval=0.,
# randomly flip images
horizontal_flip=True,
# randomly flip images
vertical_flip=False,
# set rescaling factor (applied before any other transformation)
rescale=None,
# set function that will be applied on each input
preprocessing_function=None,
# image data format, either "channels_first" or "channels_last"
data_format=None,
# fraction of images reserved for validation (strictly between 0 and 1)
# Compute quantities required for featurewise normalization

# (std, mean, and principal components if ZCA whitening is applied).
# Fit the model on the batches generated by datagen.flow().

model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
epochs=epochs, verbose=1, workers=4,
callbacks=callbacks)
# Score trained model.

scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
Visualization of the filters of VGG16, via

gradient ascent in input space.
This script can run on CPU in a few minutes.
Results example:
import time
import numpy as np
from PIL import Image as pil_image
from keras.preprocessing.image import save_img
from keras.applications import vgg16
def normalize(x):
"""utility function to normalize a tensor.
# Arguments
x: An input tensor.
# Returns
The normalized input tensor.
"""
return x / (K.sqrt(K.mean(K.square(x))) + K.epsilon())
def deprocess_image(x):
"""utility function to convert a float array into a valid uint8 image.
# Arguments
x: A numpy-array representing the generated image.
# Returns
A processed numpy-array, which could be used in e.g. imshow.
"""
# normalize tensor: center on 0., ensure std is 0.25
x -= x.mean()
x /= (x.std() + K.epsilon())
x *= 0.25
# clip to [0, 1]
x += 0.5
x = np.clip(x, 0, 1)
# convert to RGB array

x *= 255
if K.image_data_format() == 'channels_first':
x = x.transpose((1, 2, 0))
x = np.clip(x, 0, 255).astype('uint8')
return x
def process_image(x, former):

"""utility function to convert a valid uint8 image back into a float array.
Reverses `deprocess_image`.
# Arguments
x: A numpy-array, which could be used in e.g. imshow.
former: The former numpy-array.
Need to determine the former mean and variance.
# Returns
A processed numpy-array representing the generated image.
"""
return (x / 255 - 0.5) * 4 * former.std() + former.mean()
def visualize_layer(model,
layer_name,
step=1.,
epochs=15,
upscaling_steps=9,
upscaling_factor=1.2,
output_dim=(412, 412),
filter_range=(0, None)):
"""Visualizes the most relevant filters of one conv-layer in a certain model.
# Arguments
model: The model containing layer_name.
layer_name: The name of the layer to be visualized.
Has to be a part of model.
step: step size for gradient ascent.
epochs: Number of iterations for gradient ascent.
upscaling_steps: Number of upscaling steps.
Starting image is in this case (80, 80).
upscaling_factor: Factor to which to slowly upgrade
the image towards output_dim.
output_dim: [img_width, img_height] The output image dimensions.
filter_range: Tupel[lower, upper]
Determines the to be computed filter numbers.
If the second value is `None`,
the last filter will be inferred as the upper boundary.
"""
def _generate_filter_image(input_img,
layer_output,
filter_index):
"""Generates image for one particular filter.
# Arguments
input_img: The input-image Tensor.
layer_output: The output-image Tensor.
filter_index: The to be processed filter number.
Assumed to be valid.
#Returns
Either None if no image could be generated.
or a tuple of the image (array) itself and the last loss.
"""
s_time = time.time()
# we build a loss function that maximizes the activation

# of the nth filter of the layer considered
loss = K.mean(layer_output[:, filter_index, :, :])
else:
loss = K.mean(layer_output[:, :, :, filter_index])
# we compute the gradient of the input picture wrt this loss

grads = K.gradients(loss, input_img)[0]
# normalization trick: we normalize the gradient

grads = normalize(grads)
# this function returns the loss and grads given the input picture
iterate = K.function([input_img], [loss, grads])
# we start from a gray image with some random noise

intermediate_dim = tuple(
int(x / (upscaling_factor ** upscaling_steps)) for x in output_dim)
input_img_data = np.random.random(
(1, 3, intermediate_dim[0], intermediate_dim[1]))
else:
input_img_data = np.random.random(
(1, intermediate_dim[0], intermediate_dim[1], 3))
input_img_data = (input_img_data - 0.5) * 20 + 128
# Slowly upscaling towards the original size prevents
# a dominating high-frequency of the to visualized structure
# as it would occur if we directly compute the 412d-image.
# Behaves as a better starting point for each following dimension
# and therefore avoids poor local minima
for up in reversed(range(upscaling_steps)):
# we run gradient ascent for e.g. 20 steps
for _ in range(epochs):
loss_value, grads_value = iterate([input_img_data])
input_img_data += grads_value * step
# some filters get stuck to 0, we can skip them

if loss_value <= K.epsilon():
return None
# Calculate upscaled dimension

intermediate_dim = tuple(
int(x / (upscaling_factor ** up)) for x in output_dim)
# Upscale
img = deprocess_image(input_img_data[0])
img = np.array(pil_image.fromarray(img).resize(intermediate_dim,
pil_image.BICUBIC))
input_img_data = np.expand_dims(
process_image(img, input_img_data[0]), 0)
# decode the resulting input image

img = deprocess_image(input_img_data[0])
e_time = time.time()
print('Costs of filter {:3}: {:5.0f} ( {:4.2f}s )'.format(filter_index,
loss_value,
e_time - s_time))
return img, loss_value
def _draw_filters(filters, n=None):

"""Draw the best filters in a nxn grid.
# Arguments
filters: A List of generated images and their corresponding losses
for each processed filter.
n: dimension of the grid.
If none, the largest possible square will be used
"""
if n is None:
n = int(np.floor(np.sqrt(len(filters))))
# the filters that have the highest loss are assumed to be better-looking.
# we will only keep the top n*n filters.
filters.sort(key=lambda x: x[1], reverse=True)
filters = filters[:n * n]
# build a black picture with enough space for

# e.g. our 8 x 8 filters of size 412 x 412, with a 5px margin in between
MARGIN = 5
width = n * output_dim[0] + (n - 1) * MARGIN
height = n * output_dim[1] + (n - 1) * MARGIN
stitched_filters = np.zeros((width, height, 3), dtype='uint8')
# fill the picture with our saved filters

for i in range(n):
for j in range(n):
img, _ = filters[i * n + j]
width_margin = (output_dim[0] + MARGIN) * i
height_margin = (output_dim[1] + MARGIN) * j
stitched_filters[
width_margin: width_margin + output_dim[0],
height_margin: height_margin + output_dim[1], :] = img
# save the result to disk

save_img('vgg_{0:}_{1:}x{1:}.png'.format(layer_name, n), stitched_filters)
# this is the placeholder for the input images

assert len(model.inputs) == 1
input_img = model.inputs[0]
# get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers[1:]])
output_layer = layer_dict[layer_name]
assert isinstance(output_layer, layers.Conv2D)
# Compute to be processed filter range

filter_lower = filter_range[0]
filter_upper = (filter_range[1]
if filter_range[1] is not None
else len(output_layer.get_weights()[1]))
assert(filter_lower >= 0
and filter_upper <= len(output_layer.get_weights()[1])
and filter_upper > filter_lower)
print('Compute filters {:} to {:}'.format(filter_lower, filter_upper))
# iterate through each filter and generate its corresponding image

processed_filters = []
for f in range(filter_lower, filter_upper):
img_loss = _generate_filter_image(input_img, output_layer.output, f)
if img_loss is not None:

processed_filters.append(img_loss)
print('{} filter processed.'.format(len(processed_filters)))

# Finally draw and store the best filters to disk
_draw_filters(processed_filters)
if __name__ == '__main__':
# the name of the layer we want to visualize
# (see model definition at keras/applications/vgg16.py)
LAYER_NAME = 'block5_conv1'
# build the VGG16 network with ImageNet weights

vgg = vgg16.VGG16(weights='imagenet', include_top=False)
print('Model loaded.')
vgg.summary()
# example function call

visualize_layer(vgg, LAYER_NAME)
This script demonstrates the use of a
convolutional LSTM network.
This network is used to predict the next frame of an artificially generated movie which contains moving
squares.
from keras.layers.convolutional import Conv3D
from keras.layers.convolutional_recurrent import ConvLSTM2D
from keras.layers.normalization import BatchNormalization
import numpy as np
import pylab as plt
# We create a layer which take as input movies of shape

# (n_frames, width, height, channels) and returns a movie
# of identical shape.
seq = Sequential()
seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
input_shape=(None, 40, 40, 1),
padding='same', return_sequences=True))
seq.add(BatchNormalization())



seq.add(Conv3D(filters=1, kernel_size=(3, 3, 3),

activation='sigmoid',
padding='same', data_format='channels_last'))
seq.compile(loss='binary_crossentropy', optimizer='adadelta')
# Artificial data generation:

# Generate movies with 3 to 7 moving squares inside.
# The squares are of shape 1x1 or 2x2 pixels,
# which move linearly over time.
# For convenience we first create movies with bigger width and height (80x80)
# and at the end we select a 40x40 window.
def generate_movies(n_samples=1200, n_frames=15):

row = 80
col = 80
noisy_movies = np.zeros((n_samples, n_frames, row, col, 1), dtype=np.float)
shifted_movies = np.zeros((n_samples, n_frames, row, col, 1),
dtype=np.float)
for i in range(n_samples):
# Add 3 to 7 moving squares
n = np.random.randint(3, 8)
for j in range(n):
# Initial position
xstart = np.random.randint(20, 60)
ystart = np.random.randint(20, 60)
# Direction of motion
directionx = np.random.randint(0, 3) - 1
directiony = np.random.randint(0, 3) - 1
# Size of the square

w = np.random.randint(2, 4)
for t in range(n_frames):
x_shift = xstart + directionx * t
y_shift = ystart + directiony * t
noisy_movies[i, t, x_shift - w: x_shift + w,
y_shift - w: y_shift + w, 0] += 1
# Make it more robust by adding noise.

# The idea is that if during inference,
# the value of the pixel is not exactly one,
# we need to train the network to be robust and still
# consider it as a pixel belonging to a square.
if np.random.randint(0, 2):
noise_f = (-1)**np.random.randint(0, 2)
noisy_movies[i, t,
x_shift - w - 1: x_shift + w + 1,
y_shift - w - 1: y_shift + w + 1,
0] += noise_f * 0.1
# Shift the ground truth by 1

x_shift = xstart + directionx * (t + 1)
y_shift = ystart + directiony * (t + 1)
shifted_movies[i, t, x_shift - w: x_shift + w,
y_shift - w: y_shift + w, 0] += 1
# Cut to a 40x40 window

noisy_movies = noisy_movies[::, ::, 20:60, 20:60, ::]
shifted_movies = shifted_movies[::, ::, 20:60, 20:60, ::]
noisy_movies[noisy_movies >= 1] = 1
shifted_movies[shifted_movies >= 1] = 1
return noisy_movies, shifted_movies
# Train the network

noisy_movies, shifted_movies = generate_movies(n_samples=1200)
seq.fit(noisy_movies[:1000], shifted_movies[:1000], batch_size=10,
epochs=300, validation_split=0.05)
# Testing the network on one movie

# feed it with the first 7 positions and then
# predict the new positions
which = 1004
track = noisy_movies[which][:7, ::, ::, ::]
for j in range(16):
new_pos = seq.predict(track[np.newaxis, ::, ::, ::, ::])
new = new_pos[::, -1, ::, ::, ::]
track = np.concatenate((track, new), axis=0)
# And then compare the predictions

# to the ground truth
track2 = noisy_movies[which][::, ::, ::, ::]
for i in range(15):
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(121)
if i >= 7:
ax.text(1, 3, 'Predictions !', fontsize=20, color='w')
else:
ax.text(1, 3, 'Initial trajectory', fontsize=20)
toplot = track[i, ::, ::, 0]
plt.imshow(toplot)
ax = fig.add_subplot(122)
plt.text(1, 3, 'Ground truth', fontsize=20)
toplot = track2[i, ::, ::, 0]

if i >= 2:
toplot = shifted_movies[which][i - 1, ::, ::, 0]
plt.imshow(toplot)
plt.savefig('%i_animate.png' % (i + 1))
Deep Dreaming in Keras.

Run the script with:
python deep_dream.py path_to_your_base_image.jpg prefix_for_results
e.g.:
python deep_dream.py img/mypic.jpg results/dream
from keras.preprocessing.image import load_img, save_img, img_to_array

import numpy as np
import scipy
import argparse
from keras.applications import inception_v3

parser = argparse.ArgumentParser(description='Deep Dreams with Keras.')

parser.add_argument('base_image_path', metavar='base', type=str,
help='Path to the image to transform.')
parser.add_argument('result_prefix', metavar='res_prefix', type=str,
help='Prefix for the saved results.')
args = parser.parse_args()
base_image_path = args.base_image_path
result_prefix = args.result_prefix
# These are the names of the layers

# for which we try to maximize activation,
# as well as their weight in the final loss
# we try to maximize.
# You can tweak these setting to obtain new visual effects.
settings = {
'features': {
'mixed2': 0.2,
'mixed3': 0.5,
'mixed4': 2.,
'mixed5': 1.5,
},
}
def preprocess_image(image_path):
# Util function to open, resize and format pictures
# into appropriate tensors.
img = load_img(image_path)
img = img_to_array(img)
img = np.expand_dims(img, axis=0)
img = inception_v3.preprocess_input(img)
return img
def deprocess_image(x):
# Util function to convert a tensor into a valid image.
x = x.reshape((3, x.shape[2], x.shape[3]))
else:
x = x.reshape((x.shape[1], x.shape[2], 3))
x /= 2.
x += 0.5
x *= 255.
x = np.clip(x, 0, 255).astype('uint8')
return x
K.set_learning_phase(0)
# Build the InceptionV3 network with our placeholder.

# The model will be loaded with pre-trained ImageNet weights.
model = inception_v3.InceptionV3(weights='imagenet',
include_top=False)
dream = model.input
print('Model loaded.')
# Get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers])
# Define the loss.

loss = K.variable(0.)
for layer_name in settings['features']:
# Add the L2 norm of the features of a layer to the loss.
if layer_name not in layer_dict:
raise ValueError('Layer ' + layer_name + ' not found in model.')
coeff = settings['features'][layer_name]
x = layer_dict[layer_name].output
# We avoid border artifacts by only involving non-border pixels in the loss.
scaling = K.prod(K.cast(K.shape(x), 'float32'))
loss = loss + coeff * K.sum(K.square(x[:, :, 2: -2, 2: -2])) / scaling
else:
loss = loss + coeff * K.sum(K.square(x[:, 2: -2, 2: -2, :])) / scaling
# Compute the gradients of the dream wrt the loss.

grads = K.gradients(loss, dream)[0]
# Normalize gradients.
grads /= K.maximum(K.mean(K.abs(grads)), K.epsilon())
# Set up function to retrieve the value

# of the loss and gradients given an input image.
outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)
def eval_loss_and_grads(x):
outs = fetch_loss_and_grads([x])
loss_value = outs[0]
grad_values = outs[1]
return loss_value, grad_values
def resize_img(img, size):

img = np.copy(img)
factors = (1, 1,
float(size[0]) / img.shape[2],
float(size[1]) / img.shape[3])
else:
factors = (1,
1)
return scipy.ndimage.zoom(img, factors, order=1)
def gradient_ascent(x, iterations, step, max_loss=None):

for i in range(iterations):
loss_value, grad_values = eval_loss_and_grads(x)
if max_loss is not None and loss_value > max_loss:
break
print('..Loss value at', i, ':', loss_value)
x += step * grad_values
return x
"""Process:
- Load the original image.

- Define a number of processing scales (i.e. image shapes),
from smallest to largest.
- Resize the original image to the smallest scale.
- For every scale, starting with the smallest (i.e. current one):
- Run gradient ascent
- Upscale image to the next scale
- Reinject the detail that was lost at upscaling time
- Stop when we are back to the original size.
To obtain the detail lost during upscaling, we simply

take the original image, shrink it down, upscale it,
and compare the result to the (resized) original image.
"""
# Playing with these hyperparameters will also allow you to achieve new effects
step = 0.01 # Gradient ascent step size
num_octave = 3 # Number of scales at which to run gradient ascent
octave_scale = 1.4 # Size ratio between scales
iterations = 20 # Number of ascent steps per scale
max_loss = 10.
img = preprocess_image(base_image_path)
original_shape = img.shape[2:]
else:
original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])
for shape in successive_shapes:

print('Processing image shape', shape)
img = resize_img(img, shape)
img = gradient_ascent(img,
iterations=iterations,
step=step,
max_loss=max_loss)
upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
same_size_original = resize_img(original_img, shape)
lost_detail = same_size_original - upscaled_shrunk_original_img
img += lost_detail
shrunk_original_img = resize_img(original_img, shape)
save_img(result_prefix + '.png', deprocess_image(np.copy(img)))
Optical character recognition

This example uses a convolutional stack followed by a recurrent stack and a CTC logloss function to
perform optical character recognition of generated text images. I have no evidence of whether it
actually learns general shapes of text, or just is able to recognize all the different fonts thrown at it...the
purpose is more to demonstrate CTC inside of Keras. Note that the font list may need to be updated for
the particular OS in use.
This starts off with 4 letter words. For the first 12 epochs, the difficulty is gradually increased using the
TextImageGenerator class which is both a generator class for test/train data and a Keras callback class.
After 20 epochs, longer sequences are thrown at it by recompiling the model to handle a wider image
and rebuilding the word list to include two words separated by a space.
The table below shows normalized edit distance values. Theano uses a slightly different CTC
implementation, hence the different results.
Epoch TF TH
10 0.027 0.064
15 0.038 0.035
20 0.043 0.045
25 0.014 0.019
Additional dependencies
This requires cairo and editdistance packages:
First, install the Cairo library: https://cairographics.org/

Then install Python dependencies:
pip install cairocffi
pip install editdistance
Created by Mike Henry https://github.com/mbhenry/

import os
import itertools
import codecs
import re
import datetime
import cairocffi as cairo
import editdistance
import numpy as np
from scipy import ndimage
import pylab
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Input, Dense, Activation
from keras.layers import Reshape, Lambda
from keras.layers.merge import add, concatenate
from keras.layers.recurrent import GRU
import keras.callbacks
OUTPUT_DIR = 'image_ocr'
# character classes and matching regex filter

regex = r'^[a-z ]+$'
alphabet = u'abcdefghijklmnopqrstuvwxyz '
np.random.seed(55)
# this creates larger "blotches" of noise which look

# more realistic than just adding gaussian noise
# assumes greyscale with pixels ranging from 0 to 1
def speckle(img):
severity = np.random.uniform(0, 0.6)
blur = ndimage.gaussian_filter(np.random.randn(*img.shape) * severity, 1)
img_speck = (img + blur)
img_speck[img_speck > 1] = 1
img_speck[img_speck <= 0] = 0
return img_speck
# paints the string in a random location the bounding box

# also uses a random font, a slight random rotation,
# and a random amount of speckle noise
def paint_text(text, w, h, rotate=False, ud=False, multi_fonts=False):

surface = cairo.ImageSurface(cairo.FORMAT_RGB24, w, h)
with cairo.Context(surface) as context:
context.set_source_rgb(1, 1, 1) # White
context.paint()
# this font list works in CentOS 7
if multi_fonts:
fonts = [
'Century Schoolbook', 'Courier', 'STIX',
'URW Chancery L', 'FreeMono']
context.select_font_face(
np.random.choice(fonts),
cairo.FONT_SLANT_NORMAL,
np.random.choice([cairo.FONT_WEIGHT_BOLD,
cairo.FONT_WEIGHT_NORMAL]))
else:
context.select_font_face('Courier',
cairo.FONT_SLANT_NORMAL,
cairo.FONT_WEIGHT_BOLD)
context.set_font_size(25)
box = context.text_extents(text)
border_w_h = (4, 4)
if box[2] > (w - 2 * border_w_h[1]) or box[3] > (h - 2 * border_w_h[0]):
raise IOError(('Could not fit string into image.'
'Max char count is too large for given image width.'))
# teach the RNN translational invariance by

# fitting text box randomly on canvas, with some room to rotate
max_shift_x = w - box[2] - border_w_h[0]
max_shift_y = h - box[3] - border_w_h[1]
top_left_x = np.random.randint(0, int(max_shift_x))
if ud:
top_left_y = np.random.randint(0, int(max_shift_y))
else:
top_left_y = h // 2
context.move_to(top_left_x - int(box[0]), top_left_y - int(box[1]))
context.set_source_rgb(0, 0, 0)
context.show_text(text)
buf = surface.get_data()
a = np.frombuffer(buf, np.uint8)
a.shape = (h, w, 4)
a = a[:, :, 0] # grab single channel
a = a.astype(np.float32) / 255
a = np.expand_dims(a, 0)
if rotate:
a = image.random_rotation(a, 3 * (w - top_left_x) / w + 1)
a = speckle(a)
return a
def shuffle_mats_or_lists(matrix_list, stop_ind=None):

ret = []
assert all([len(i) == len(matrix_list[0]) for i in matrix_list])
len_val = len(matrix_list[0])
if stop_ind is None:
stop_ind = len_val
assert stop_ind <= len_val
a = list(range(stop_ind))
np.random.shuffle(a)
a += list(range(stop_ind, len_val))
for mat in matrix_list:
if isinstance(mat, np.ndarray):
ret.append(mat[a])
elif isinstance(mat, list):
ret.append([mat[i] for i in a])
else:
raise TypeError('`shuffle_mats_or_lists` only supports '
'numpy.array and list objects.')
return ret
# Translation of characters to unique integer values

def text_to_labels(text):
ret = []
for char in text:
ret.append(alphabet.find(char))
return ret
# Reverse translation of numerical classes back to characters

def labels_to_text(labels):
ret = []
for c in labels:
if c == len(alphabet): # CTC Blank
ret.append("")
else:
ret.append(alphabet[c])
return "".join(ret)
# only a-z and space..probably not to difficult

# to expand to uppercase and symbols
def is_valid_str(in_str):
search = re.compile(regex, re.UNICODE).search
return bool(search(in_str))
# Uses generator functions to supply train/test with

# data. Image renderings and text are created on the fly
# each time with random perturbations
class TextImageGenerator(keras.callbacks.Callback):
def __init__(self, monogram_file, bigram_file, minibatch_size,

img_w, img_h, downsample_factor, val_split,
absolute_max_string_len=16):
self.minibatch_size = minibatch_size
self.img_w = img_w
self.img_h = img_h
self.monogram_file = monogram_file
self.bigram_file = bigram_file
self.downsample_factor = downsample_factor
self.val_split = val_split
self.blank_label = self.get_output_size() - 1
self.absolute_max_string_len = absolute_max_string_len
def get_output_size(self):
return len(alphabet) + 1
# num_words can be independent of the epoch size due to the use of generators
# as max_string_len grows, num_words can grow
def build_word_list(self, num_words, max_string_len=None, mono_fraction=0.5):
assert max_string_len <= self.absolute_max_string_len
assert num_words % self.minibatch_size == 0
assert (self.val_split * num_words) % self.minibatch_size == 0
self.num_words = num_words
self.string_list = [''] * self.num_words
tmp_string_list = []
self.max_string_len = max_string_len
self.Y_data = np.ones([self.num_words, self.absolute_max_string_len]) * -1
self.X_text = []
self.Y_len = [0] * self.num_words
def _is_length_of_word_valid(word):
return (max_string_len == -1 or
max_string_len is None or
len(word) <= max_string_len)
# monogram file is sorted by frequency in english speech

with codecs.open(self.monogram_file, mode='r', encoding='utf-8') as f:
for line in f:
if len(tmp_string_list) == int(self.num_words * mono_fraction):
break
word = line.rstrip()
if _is_length_of_word_valid(word):
tmp_string_list.append(word)
# bigram file contains common word pairings in english speech

with codecs.open(self.bigram_file, mode='r', encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
if len(tmp_string_list) == self.num_words:
break
columns = line.lower().split()
word = columns[0] + ' ' + columns[1]
if is_valid_str(word) and _is_length_of_word_valid(word):
tmp_string_list.append(word)
if len(tmp_string_list) != self.num_words:
raise IOError('Could not pull enough words'
'from supplied monogram and bigram files.')
# interlace to mix up the easy and hard words
self.string_list[::2] = tmp_string_list[:self.num_words // 2]
self.string_list[1::2] = tmp_string_list[self.num_words // 2:]
for i, word in enumerate(self.string_list):

self.Y_len[i] = len(word)
self.Y_data[i, 0:len(word)] = text_to_labels(word)
self.X_text.append(word)
self.Y_len = np.expand_dims(np.array(self.Y_len), 1)
self.cur_val_index = self.val_split
self.cur_train_index = 0
# each time an image is requested from train/val/test, a new random

# painting of the text is performed
def get_batch(self, index, size, train):
# width and height are backwards from typical Keras convention
# because width is the time dimension when it gets fed into the RNN
X_data = np.ones([size, 1, self.img_w, self.img_h])
else:
X_data = np.ones([size, self.img_w, self.img_h, 1])
labels = np.ones([size, self.absolute_max_string_len])

input_length = np.zeros([size, 1])
label_length = np.zeros([size, 1])
source_str = []
for i in range(size):
# Mix in some blank inputs. This seems to be important for
# achieving translational invariance
if train and i > size - 4:
X_data[i, 0, 0:self.img_w, :] = self.paint_func('')[0, :, :].T
else:
X_data[i, 0:self.img_w, :, 0] = self.paint_func('',)[0, :, :].T
labels[i, 0] = self.blank_label
input_length[i] = self.img_w // self.downsample_factor - 2
label_length[i] = 1
source_str.append('')
else:
X_data[i, 0, 0:self.img_w, :] = (
self.paint_func(self.X_text[index + i])[0, :, :].T)
else:
X_data[i, 0:self.img_w, :, 0] = (
self.paint_func(self.X_text[index + i])[0, :, :].T)
labels[i, :] = self.Y_data[index + i]
input_length[i] = self.img_w // self.downsample_factor - 2
label_length[i] = self.Y_len[index + i]
source_str.append(self.X_text[index + i])
inputs = {'the_input': X_data,
'the_labels': labels,
'input_length': input_length,
'label_length': label_length,
'source_str': source_str # used for visualization only
}
outputs = {'ctc': np.zeros([size])} # dummy data for dummy loss function
return (inputs, outputs)
def next_train(self):
while 1:
ret = self.get_batch(self.cur_train_index,
self.minibatch_size, train=True)
self.cur_train_index += self.minibatch_size
if self.cur_train_index >= self.val_split:
self.cur_train_index = self.cur_train_index % 32
(self.X_text, self.Y_data, self.Y_len) = shuffle_mats_or_lists(
[self.X_text, self.Y_data, self.Y_len], self.val_split)
yield ret
def next_val(self):
while 1:
ret = self.get_batch(self.cur_val_index,
self.minibatch_size, train=False)
self.cur_val_index += self.minibatch_size
if self.cur_val_index >= self.num_words:
self.cur_val_index = self.val_split + self.cur_val_index % 32
yield ret

self.build_word_list(16000, 4, 1)
self.paint_func = lambda text: paint_text(
text, self.img_w, self.img_h,
rotate=False, ud=False, multi_fonts=False)
def on_epoch_begin(self, epoch, logs={}):

# rebind the paint function to implement curriculum learning
if 3 <= epoch < 6:
rotate=False, ud=True, multi_fonts=False)
elif 6 <= epoch < 9:
rotate=False, ud=True, multi_fonts=True)
elif epoch >= 9:
rotate=True, ud=True, multi_fonts=True)
if epoch >= 21 and self.max_string_len < 12:
self.build_word_list(32000, 12, 0.5)
# the actual loss calc occurs here despite it not being

# an internal Keras loss function
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
# For a real OCR application, this should be beam search with a dictionary
# and language model. For this example, best path is sufficient.
def decode_batch(test_func, word_batch):

out = test_func([word_batch])[0]
ret = []
for j in range(out.shape[0]):
out_best = list(np.argmax(out[j, 2:], 1))
out_best = [k for k, g in itertools.groupby(out_best)]
outstr = labels_to_text(out_best)
ret.append(outstr)
return ret
class VizCallback(keras.callbacks.Callback):
def __init__(self, run_name, test_func, text_img_gen, num_display_words=6):

self.test_func = test_func
self.output_dir = os.path.join(
OUTPUT_DIR, run_name)
self.text_img_gen = text_img_gen
self.num_display_words = num_display_words
if not os.path.exists(self.output_dir):
os.makedirs(self.output_dir)
def show_edit_distance(self, num):

num_left = num
mean_norm_ed = 0.0
mean_ed = 0.0
while num_left > 0:
word_batch = next(self.text_img_gen)[0]
num_proc = min(word_batch['the_input'].shape[0], num_left)
decoded_res = decode_batch(self.test_func,
word_batch['the_input'][0:num_proc])
for j in range(num_proc):
edit_dist = editdistance.eval(decoded_res[j],
word_batch['source_str'][j])
mean_ed += float(edit_dist)
mean_norm_ed += float(edit_dist) / len(word_batch['source_str'][j])
num_left -= num_proc
mean_norm_ed = mean_norm_ed / num
mean_ed = mean_ed / num
print('\nOut of %d samples: Mean edit distance:'
'%.3f Mean normalized edit distance: %0.3f'
% (num, mean_ed, mean_norm_ed))
def on_epoch_end(self, epoch, logs={}):

self.model.save_weights(
os.path.join(self.output_dir, 'weights%02d.h5' % (epoch)))
self.show_edit_distance(256)
word_batch = next(self.text_img_gen)[0]
res = decode_batch(self.test_func,
word_batch['the_input'][0:self.num_display_words])
if word_batch['the_input'][0].shape[0] < 256:
cols = 2
else:
cols = 1
for i in range(self.num_display_words):
pylab.subplot(self.num_display_words // cols, cols, i + 1)
the_input = word_batch['the_input'][i, 0, :, :]
else:
the_input = word_batch['the_input'][i, :, :, 0]
pylab.imshow(the_input.T, cmap='Greys_r')
pylab.xlabel(
'Truth = \'%s\'\nDecoded = \'%s\'' %
(word_batch['source_str'][i], res[i]))
fig = pylab.gcf()
fig.set_size_inches(10, 13)
pylab.savefig(os.path.join(self.output_dir, 'e%02d.png' % (epoch)))
pylab.close()
def train(run_name, start_epoch, stop_epoch, img_w):

# Input Parameters
img_h = 64
words_per_epoch = 16000
val_split = 0.2
val_words = int(words_per_epoch * (val_split))
# Network parameters
conv_filters = 16
kernel_size = (3, 3)
pool_size = 2
time_dense_size = 32
rnn_size = 512
minibatch_size = 32
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
fdir = os.path.dirname(
get_file('wordlists.tgz',
origin='http://www.mythic-ai.com/datasets/wordlists.tgz',
untar=True))
img_gen = TextImageGenerator(
monogram_file=os.path.join(fdir, 'wordlist_mono_clean.txt'),
bigram_file=os.path.join(fdir, 'wordlist_bi_clean.txt'),
minibatch_size=minibatch_size,
img_w=img_w,
img_h=img_h,
downsample_factor=(pool_size ** 2),
val_split=words_per_epoch - val_words)
act = 'relu'
input_data = Input(name='the_input', shape=input_shape, dtype='float32')
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv1')(input_data)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv2')(inner)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)
conv_to_rnn_dims = (img_w // (pool_size ** 2),

(img_h // (pool_size ** 2)) * conv_filters)
inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)
# cuts down input size going into RNN:

inner = Dense(time_dense_size, activation=act, name='dense1')(inner)
# Two layers of bidirectional GRUs

# GRU seems to work as well, if not better than LSTM:
gru_1 = GRU(rnn_size, return_sequences=True,
kernel_initializer='he_normal', name='gru1')(inner)
gru_1b = GRU(rnn_size, return_sequences=True,
go_backwards=True, kernel_initializer='he_normal',
name='gru1_b')(inner)
gru1_merged = add([gru_1, gru_1b])
gru_2 = GRU(rnn_size, return_sequences=True,
kernel_initializer='he_normal', name='gru2')(gru1_merged)
gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True,
kernel_initializer='he_normal', name='gru2_b')(gru1_merged)
# transforms RNN output to character activations:

inner = Dense(img_gen.get_output_size(), kernel_initializer='he_normal',
name='dense2')(concatenate([gru_2, gru_2b]))
y_pred = Activation('softmax', name='softmax')(inner)
Model(inputs=input_data, outputs=y_pred).summary()
labels = Input(name='the_labels',
shape=[img_gen.absolute_max_string_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(
ctc_lambda_func, output_shape=(1,),
name='ctc')([y_pred, labels, input_length, label_length])
# clipnorm seems to speeds up convergence

sgd = SGD(learning_rate=0.02,
decay=1e-6,
momentum=0.9,
nesterov=True)
model = Model(inputs=[input_data, labels, input_length, label_length],

outputs=loss_out)
# the loss calc occurs elsewhere, so use a dummy lambda func for the loss
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)
if start_epoch > 0:
weight_file = os.path.join(
OUTPUT_DIR,
os.path.join(run_name, 'weights%02d.h5' % (start_epoch - 1)))
model.load_weights(weight_file)
# captures output of softmax so we can decode the output during visualization
test_func = K.function([input_data], [y_pred])
viz_cb = VizCallback(run_name, test_func, img_gen.next_val())
generator=img_gen.next_train(),
steps_per_epoch=(words_per_epoch - val_words) // minibatch_size,
epochs=stop_epoch,
validation_data=img_gen.next_val(),
validation_steps=val_words // minibatch_size,
callbacks=[viz_cb, img_gen],
initial_epoch=start_epoch)
if __name__ == '__main__':
run_name = datetime.datetime.now().strftime('%Y:%m:%d:%H:%M:%S')
train(run_name, 0, 20, 128)
# increase to wider images and start at epoch 20.
# The learned weights are reloaded
train(run_name, 20, 25, 512)
Trains a Bidirectional LSTM on the IMDB

sentiment classification task.
Output after 4 epochs on CPU: ~0.8146 Time per epoch on CPU (Core i7): ~150s.
import numpy as np
from keras.preprocessing import sequence

from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
max_features = 20000
# cut texts after this number of words
# (among top max_features most common words)
maxlen = 100
batch_size = 32
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Pad sequences (samples x time)')

x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_test shape:', x_test.shape)
y_train = np.array(y_train)
y_test = np.array(y_test)
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64)))
# try using different optimizers and different optimizer configs

model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
print('Train...')
epochs=4,
validation_data=[x_test, y_test])
This example demonstrates the use of

Convolution1D for text classification.
Gets to 0.89 test accuracy after 2 epochs.
90s/epoch on Intel i5 2.4Ghz CPU.
10s/epoch on Tesla K40 GPU.

from keras.layers import Conv1D, GlobalMaxPooling1D
# set parameters:
max_features = 5000
maxlen = 400
batch_size = 32
embedding_dims = 50
filters = 250
kernel_size = 3
hidden_dims = 250
epochs = 2

# we start off with an efficient embedding layer which maps

# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features,
embedding_dims,
input_length=maxlen))
# we add a Convolution1D, which will learn filters

# word group filters of size filter_length:
model.add(Conv1D(filters,
kernel_size,
padding='valid',
activation='relu',
strides=1))
# we use max pooling:
model.add(GlobalMaxPooling1D())
# We add a vanilla hidden layer:

model.add(Dense(hidden_dims))
# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1))
model.add(Activation('sigmoid'))
optimizer='adam',
epochs=epochs,
Train a recurrent convolutional network on the

IMDB sentiment classification task.
Gets to 0.8498 test accuracy after 2 epochs. 41 s/epoch on K520 GPU.

# Embedding
maxlen = 100
embedding_size = 128
# Convolution
kernel_size = 5
filters = 64
pool_size = 4
# LSTM
lstm_output_size = 70
# Training
batch_size = 30
epochs = 2
'''
Note:
batch_size is highly sensitive.
Only 2 epochs are needed as the dataset is very small.
'''

model.add(Embedding(max_features, embedding_size, input_length=maxlen))
model.add(Conv1D(filters,
kernel_size,
padding='valid',
activation='relu',
strides=1))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(lstm_output_size))
model.add(Dense(1))
model.add(Activation('sigmoid'))
optimizer='adam',
print('Train...')
epochs=epochs,
score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
This example demonstrates the use of fasttext

for text classification
Based on Joulin et al's paper:
Bags of Tricks for Efficient Text Classification
Results on IMDB datasets with uni and bi-gram embeddings:
Embedding Accuracy, 5 epochs Speed (s/epoch) Hardware

Uni-gram 0.8813 8 i7 CPU
Bi-gram 0.9056 2 GTx 980M GPU
import numpy as np

from keras.layers import GlobalAveragePooling1D
def create_ngram_set(input_list, ngram_value=2):

"""
Extract a set of n-grams from a list of integers.
>>> create_ngram_set([1, 4, 9, 4, 1, 4], ngram_value=2)

{(4, 9), (4, 1), (1, 4), (9, 4)}
>>> create_ngram_set([1, 4, 9, 4, 1, 4], ngram_value=3)

[(1, 4, 9), (4, 9, 4), (9, 4, 1), (4, 1, 4)]
"""
return set(zip(*[input_list[i:] for i in range(ngram_value)]))
def add_ngram(sequences, token_indice, ngram_range=2):

"""
Augment the input list of list (sequences) by appending n-grams values.
Example: adding bi-gram

>>> sequences = [[1, 3, 4, 5], [1, 3, 7, 9, 2]]
>>> token_indice = {(1, 3): 1337, (9, 2): 42, (4, 5): 2017}
>>> add_ngram(sequences, token_indice, ngram_range=2)
[[1, 3, 4, 5, 1337, 2017], [1, 3, 7, 9, 2, 1337, 42]]
Example: adding tri-gram
>>> sequences = [[1, 3, 4, 5], [1, 3, 7, 9, 2]]
>>> token_indice = {(1, 3): 1337, (9, 2): 42, (4, 5): 2017, (7, 9, 2): 2018}
>>> add_ngram(sequences, token_indice, ngram_range=3)
[[1, 3, 4, 5, 1337, 2017], [1, 3, 7, 9, 2, 1337, 42, 2018]]
"""
new_sequences = []
for input_list in sequences:
new_list = input_list[:]
for ngram_value in range(2, ngram_range + 1):
for i in range(len(new_list) - ngram_value + 1):
ngram = tuple(new_list[i:i + ngram_value])
if ngram in token_indice:
new_list.append(token_indice[ngram])
new_sequences.append(new_list)
return new_sequences
# Set parameters:
# ngram_range = 2 will add bi-grams features
ngram_range = 1
maxlen = 400
batch_size = 32
embedding_dims = 50
epochs = 5
print('Average train sequence length: {}'.format(
np.mean(list(map(len, x_train)), dtype=int)))
print('Average test sequence length: {}'.format(
np.mean(list(map(len, x_test)), dtype=int)))
if ngram_range > 1:
print('Adding {}-gram features'.format(ngram_range))
# Create set of unique n-gram from the training set.
ngram_set = set()
for input_list in x_train:
for i in range(2, ngram_range + 1):
set_of_ngram = create_ngram_set(input_list, ngram_value=i)
ngram_set.update(set_of_ngram)
# Dictionary mapping n-gram token to a unique integer.

# Integer values are greater than max_features in order
# to avoid collision with existing features.
start_index = max_features + 1
token_indice = {v: k + start_index for k, v in enumerate(ngram_set)}
indice_token = {token_indice[k]: k for k in token_indice}
# max_features is the highest integer that could be found in the dataset.

max_features = np.max(list(indice_token.keys())) + 1
# Augmenting x_train and x_test with n-grams features

x_train = add_ngram(x_train, token_indice, ngram_range)
x_test = add_ngram(x_test, token_indice, ngram_range)
print('Average train sequence length: {}'.format(
np.mean(list(map(len, x_train)), dtype=int)))
print('Average test sequence length: {}'.format(
np.mean(list(map(len, x_test)), dtype=int)))

# we start off with an efficient embedding layer which maps

# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features,
embedding_dims,
input_length=maxlen))
# we add a GlobalAveragePooling1D, which will average the embeddings

# of all words in the document
model.add(GlobalAveragePooling1D())
# We project onto a single unit output layer, and squash it with a sigmoid:
optimizer='adam',
epochs=epochs,
Trains an LSTM model on the IMDB sentiment

classification task.
The dataset is actually too small for LSTM to be of any advantage compared to simpler, much faster
methods such as TF-IDF + LogReg.
Notes
• RNNs are tricky. Choice of batch size is important, choice of loss and optimizer is critical, etc.
Some configurations won't converge.
• LSTM loss decrease patterns during training can be quite different from what you see with
CNNs/MLPs/etc.
from keras.layers import Dense, Embedding
# cut texts after this number of words (among top max_features most common words)
maxlen = 80
batch_size = 32

model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
# try using different optimizers and different optimizer configs

optimizer='adam',
print('Train...')
epochs=15,
score, acc = model.evaluate(x_test, y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
Sequence to sequence example in Keras

(character-level).
This script demonstrates how to implement a basic character-level sequence-to-sequence model. We
apply it to translating short English sentences into short French sentences, character-by-character. Note
that it is fairly unusual to do character-level machine translation, as word-level models are more
common in this domain.
Summary of the algorithm
• We start with input sequences from a domain (e.g. English sentences) and corresponding target
sequences from another domain (e.g. French sentences).
• An encoder LSTM turns input sequences to 2 state vectors (we keep the last LSTM state and
discard the outputs).
• A decoder LSTM is trained to turn the target sequences into the same sequence but offset by one
timestep in the future, a training process called "teacher forcing" in this context. It uses as initial
state the state vectors from the encoder. Effectively, the decoder learns to generate
targets[t+1...] given targets[...t], conditioned on the input sequence.
• In inference mode, when we want to decode unknown input sequences, we:
• Encode the input sequence into state vectors
• Start with a target sequence of size 1 (just the start-of-sequence character)
• Feed the state vectors and 1-char target sequence to the decoder to produce predictions
for the next character
• Sample the next character using these predictions (we simply use argmax).
• Append the sampled character to the target sequence
• Repeat until we generate the end-of-sequence character or we hit the character limit.
Data download
English to French sentence pairs.
Lots of neat sentence pairs datasets.
References
• Sequence to Sequence Learning with Neural Networks
• Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine
Translation

from keras.layers import Input, LSTM, Dense
import numpy as np
batch_size = 64 # Batch size for training.

epochs = 100 # Number of epochs to train for.
latent_dim = 256 # Latent dimensionality of the encoding space.
num_samples = 10000 # Number of samples to train on.
# Path to the data txt file on disk.
data_path = 'fra-eng/fra.txt'
# Vectorize the data.

input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, 'r', encoding='utf-8') as f:
lines = f.read().split('\n')
for line in lines[: min(num_samples, len(lines) - 1)]:
input_text, target_text = line.split('\t')
# We use "tab" as the "start sequence" character
# for the targets, and "\n" as "end sequence" character.
target_text = '\t' + target_text + '\n'
input_texts.append(input_text)
target_texts.append(target_text)
for char in input_text:
if char not in input_characters:
input_characters.add(char)
for char in target_text:
if char not in target_characters:
target_characters.add(char)
input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])
print('Number of samples:', len(input_texts))

print('Number of unique input tokens:', num_encoder_tokens)
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for inputs:', max_encoder_seq_length)
print('Max sequence length for outputs:', max_decoder_seq_length)
input_token_index = dict(
[(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
[(char, i) for i, char in enumerate(target_characters)])
encoder_input_data = np.zeros(
(len(input_texts), max_encoder_seq_length, num_encoder_tokens),
dtype='float32')
decoder_input_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
decoder_target_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):

for t, char in enumerate(input_text):
encoder_input_data[i, t, input_token_index[char]] = 1.
encoder_input_data[i, t + 1:, input_token_index[' ']] = 1.
for t, char in enumerate(target_text):
# decoder_target_data is ahead of decoder_input_data by one timestep
decoder_input_data[i, t, target_token_index[char]] = 1.
if t > 0:
# decoder_target_data will be ahead by one timestep
# and will not include the start character.
decoder_target_data[i, t - 1, target_token_index[char]] = 1.
decoder_input_data[i, t + 1:, target_token_index[' ']] = 1.
decoder_target_data[i, t:, target_token_index[' ']] = 1.
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard èncoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using èncoder_states` as initial state.

decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn

# èncoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
epochs=epochs,
# Save model
model.save('s2s.h5')
# Next: inference mode (sampling).

# Here's the drill:
# 1) encode input and retrieve initial decoder state
# 2) run one step of decoder with this initial state
# and a "start of sequence" token as target.
# Output will be the next target token
# 3) Repeat with the current target token and current states
# Define sampling models

encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
# Reverse-lookup token index to decode sequences back to

# something readable.
reverse_input_char_index = dict(
(i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
(i, char) for char, i in target_token_index.items())
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.

target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences

# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length

# or find stop character.
if (sampled_char == '\n' or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).

target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c]
return decoded_sentence
for seq_index in range(100):

# Take one sequence (part of the training set)
# for trying out decoding.
input_seq = encoder_input_data[seq_index: seq_index + 1]
decoded_sentence = decode_sequence(input_seq)
print('-')
print('Input sentence:', input_texts[seq_index])
print('Decoded sentence:', decoded_sentence)
Restore a character-level sequence to sequence

model from to generate predictions.
This script loads the s2s.h5 model saved by lstm_seq2seq.py and generates sequences from it. It
assumes that no changes have been made (for example: latent_dim is unchanged, and the input data
and model architecture are unchanged).
See lstm_seq2seq.py for more details on the model architecture and how it is trained.
from keras.models import Model, load_model

from keras.layers import Input
import numpy as np
batch_size = 64 # Batch size for training.

epochs = 100 # Number of epochs to train for.
latent_dim = 256 # Latent dimensionality of the encoding space.
num_samples = 10000 # Number of samples to train on.
# Path to the data txt file on disk.
data_path = 'fra-eng/fra.txt'
# Vectorize the data. We use the same approach as the training script.
# NOTE: the data must be identical, in order for the character -> integer
# mappings to be consistent.
# We omit encoding target_texts since they are not needed.
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, 'r', encoding='utf-8') as f:
lines = f.read().split('\n')
for line in lines[: min(num_samples, len(lines) - 1)]:
input_text, target_text = line.split('\t')
# We use "tab" as the "start sequence" character
# for the targets, and "\n" as "end sequence" character.
target_text = '\t' + target_text + '\n'
input_texts.append(input_text)
target_texts.append(target_text)
for char in input_text:
if char not in input_characters:
input_characters.add(char)
for char in target_text:
if char not in target_characters:
target_characters.add(char)
input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])
print('Number of samples:', len(input_texts))

print('Number of unique input tokens:', num_encoder_tokens)
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for inputs:', max_encoder_seq_length)
print('Max sequence length for outputs:', max_decoder_seq_length)
input_token_index = dict(
[(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
[(char, i) for i, char in enumerate(target_characters)])
encoder_input_data = np.zeros(
(len(input_texts), max_encoder_seq_length, num_encoder_tokens),
dtype='float32')
for i, input_text in enumerate(input_texts):

for t, char in enumerate(input_text):
encoder_input_data[i, t, input_token_index[char]] = 1.
# Restore the model and construct the encoder and decoder.

model = load_model('s2s.h5')
encoder_inputs = model.input[0] # input_1

encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output # lstm_1
encoder_states = [state_h_enc, state_c_enc]
encoder_model = Model(encoder_inputs, encoder_states)
decoder_inputs = model.input[1] # input_2

decoder_state_input_h = Input(shape=(latent_dim,), name='input_3')
decoder_state_input_c = Input(shape=(latent_dim,), name='input_4')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = model.layers[3]
decoder_outputs, state_h_dec, state_c_dec = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[4]
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
# Reverse-lookup token index to decode sequences back to

# something readable.
reverse_input_char_index = dict(
(i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
(i, char) for char, i in target_token_index.items())
# Decodes an input sequence. Future work should support beam search.

def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.

# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences

# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).

target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c]
return decoded_sentence
for seq_index in range(100):

# Take one sequence (part of the training set)
# for trying out decoding.
input_seq = encoder_input_data[seq_index: seq_index + 1]
decoded_sentence = decode_sequence(input_seq)
print('-')
print('Input sentence:', input_texts[seq_index])
print('Decoded sentence:', decoded_sentence)
How to use a stateful LSTM model, stateful vs

stateless LSTM performance comparison
More documentation about the Keras LSTM model
The models are trained on an input/output pair, where the input is a generated uniformly distributed
random sequence of length = input_len, and the output is a moving average of the input with
window length = tsteps. Both input_len and tsteps are defined in the "editable parameters"
section.
A larger tsteps value means that the LSTM will need more memory to figure out the input-output
relationship. This memory length is controlled by the lahead variable (more details below).
The rest of the parameters are:

• input_len: the length of the generated input sequence
• lahead: the input sequence length that the LSTM is trained on for each output point
• batch_size, epochs: same parameters as in the model.fit(...) function
When lahead > 1, the model input is preprocessed to a "rolling window view" of the data, with the
window length = lahead. This is similar to sklearn's view_as_windows with window_shape
being a single number.
When lahead < tsteps, only the stateful LSTM converges because its statefulness allows it to see
beyond the capability that lahead gave it to fit the n-point average. The stateless LSTM does not have
this capability, and hence is limited by its lahead parameter, which is not sufficient to see the n-point
average.
When lahead >= tsteps, both the stateful and stateless LSTM converge.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from keras.layers import Dense, LSTM
# ----------------------------------------------------------
# EDITABLE PARAMETERS
# Read the documentation in the script head for more details
# ----------------------------------------------------------
# length of input
input_len = 1000
# The window length of the moving average used to generate

# the output from the input in the input/output pair used
# to train the LSTM
# e.g. if tsteps=2 and input=[1, 2, 3, 4, 5],
# then output=[1.5, 2.5, 3.5, 4.5]
tsteps = 2
# The input sequence length that the LSTM is trained on for each output point
lahead = 1
# training parameters passed to "model.fit(...)"

batch_size = 1
epochs = 10
# ------------
# MAIN PROGRAM
# ------------
print("*" * 33)
if lahead >= tsteps:
print("STATELESS LSTM WILL ALSO CONVERGE")
else:
print("STATELESS LSTM WILL NOT CONVERGE")
print("*" * 33)
np.random.seed(1986)
print('Generating Data...')
def gen_uniform_amp(amp=1, xn=10000):

"""Generates uniform random data between
-amp and +amp
and of length xn
# Arguments
amp: maximum/minimum range of uniform data
xn: length of series
"""
data_input = np.random.uniform(-1 * amp, +1 * amp, xn)
data_input = pd.DataFrame(data_input)
return data_input
# Since the output is a moving average of the input,

# the first few points of output will be NaN
# and will be dropped from the generated data
# before training the LSTM.
# Also, when lahead > 1,
# the preprocessing step later of "rolling window view"
# will also cause some points to be lost.
# For aesthetic reasons,
# in order to maintain generated data length = input_len after pre-processing,
# add a few points to account for the values that will be lost.
to_drop = max(tsteps - 1, lahead - 1)
data_input = gen_uniform_amp(amp=0.1, xn=input_len + to_drop)
# set the target to be a N-point average of the input

expected_output = data_input.rolling(window=tsteps, center=False).mean()
# when lahead > 1, need to convert the input to "rolling window view"
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.repeat.html
if lahead > 1:
data_input = np.repeat(data_input.values, repeats=lahead, axis=1)
data_input = pd.DataFrame(data_input)
for i, c in enumerate(data_input.columns):
data_input[c] = data_input[c].shift(i)
# drop the nan

expected_output = expected_output[to_drop:]
data_input = data_input[to_drop:]
print('Input shape:', data_input.shape)

print('Output shape:', expected_output.shape)
print('Input head: ')
print(data_input.head())
print('Output head: ')
print(expected_output.head())
print('Input tail: ')
print(data_input.tail())
print('Output tail: ')
print(expected_output.tail())
print('Plotting input and expected output')

plt.plot(data_input[0][:10], '.')
plt.plot(expected_output[0][:10], '-')
plt.legend(['Input', 'Expected output'])
plt.title('Input')
plt.show()
def create_model(stateful):
model.add(LSTM(20,
input_shape=(lahead, 1),
stateful=stateful))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
return model
print('Creating Stateful Model...')

model_stateful = create_model(stateful=True)
# split train/test data

def split_data(x, y, ratio=0.8):
to_train = int(input_len * ratio)
# tweak to match with batch_size
to_train -= to_train % batch_size
x_train = x[:to_train]
y_train = y[:to_train]
x_test = x[to_train:]
y_test = y[to_train:]
# tweak to match with batch_size

to_drop = x.shape[0] % batch_size
if to_drop > 0:
x_test = x_test[:-1 * to_drop]
y_test = y_test[:-1 * to_drop]
# some reshaping
reshape_3 = lambda x: x.values.reshape((x.shape[0], x.shape[1], 1))
x_train = reshape_3(x_train)
x_test = reshape_3(x_test)
reshape_2 = lambda x: x.values.reshape((x.shape[0], 1))

y_train = reshape_2(y_train)
y_test = reshape_2(y_test)
return (x_train, y_train), (x_test, y_test)
(x_train, y_train), (x_test, y_test) = split_data(data_input, expected_output)

print('x_train.shape: ', x_train.shape)
print('y_train.shape: ', y_train.shape)
print('x_test.shape: ', x_test.shape)
print('y_test.shape: ', y_test.shape)
print('Training')
for i in range(epochs):
print('Epoch', i + 1, '/', epochs)
# Note that the last state for sample i in a batch will
# be used as initial state for sample i in the next batch.
# Thus we are simultaneously training on batch_size series with
# lower resolution than the original series contained in data_input.
# Each of these series are offset by one step and can be
# extracted with data_input[i::batch_size].
model_stateful.fit(x_train,
y_train,
epochs=1,
verbose=1,
shuffle=False)
model_stateful.reset_states()
print('Predicting')
predicted_stateful = model_stateful.predict(x_test, batch_size=batch_size)
print('Creating Stateless Model...')

model_stateless = create_model(stateful=False)
print('Training')
model_stateless.fit(x_train,
y_train,
epochs=epochs,
verbose=1,
shuffle=False)
print('Predicting')
predicted_stateless = model_stateless.predict(x_test, batch_size=batch_size)
# ----------------------------
print('Plotting Results')
plt.subplot(3, 1, 1)
plt.plot(y_test)
plt.title('Expected')
# drop the first "tsteps-1" because it is not possible to predict them
# since the "previous" timesteps to use do not exist
plt.plot((y_test - predicted_stateful).flatten()[tsteps - 1:])
plt.title('Stateful: Expected - Predicted')
plt.plot((y_test - predicted_stateless).flatten())
plt.title('Stateless: Expected - Predicted')
plt.show()
Example script to generate text from Nietzsche's

writings.
At least 20 epochs are required before the generated text starts sounding coherent.
It is recommended to run this script on GPU, as recurrent networks are quite computationally intensive.
If you try this script on new data, make sure your corpus has at least ~100k characters. ~1M is better.
from keras.callbacks import LambdaCallback
from keras.optimizers import RMSprop
import numpy as np
import random
import sys
import io
path = get_file(
'nietzsche.txt',
origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
with io.open(path, encoding='utf-8') as f:
text = f.read().lower()
print('corpus length:', len(text))
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))
# cut the text in semi-redundant sequences of maxlen characters

maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
sentences.append(text[i: i + maxlen])
next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
x[i, t, char_indices[char]] = 1
y[i, char_indices[next_chars[i]]] = 1
# build the model: a single LSTM

model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='softmax'))
optimizer = RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
def sample(preds, temperature=1.0):

# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
def on_epoch_end(epoch, _):

# Function invoked at end of each epoch. Prints generated text.
print()
print('----- Generating text after Epoch: %d' % epoch)
start_index = random.randint(0, len(text) - maxlen - 1)

for diversity in [0.2, 0.5, 1.0, 1.2]:
print('----- diversity:', diversity)
generated = ''
sentence = text[start_index: start_index + maxlen]
generated += sentence
print('----- Generating with seed: "' + sentence + '"')
sys.stdout.write(generated)
for i in range(400):
x_pred = np.zeros((1, maxlen, len(chars)))
for t, char in enumerate(sentence):
x_pred[0, t, char_indices[char]] = 1.
preds = model.predict(x_pred, verbose=0)[0]

next_index = sample(preds, diversity)
next_char = indices_char[next_index]
sentence = sentence[1:] + next_char
sys.stdout.write(next_char)
sys.stdout.flush()
print()
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
model.fit(x, y,
batch_size=128,
epochs=60,
callbacks=[print_callback])
Train an Auxiliary Classifier GAN (ACGAN) on

the MNIST dataset.
More details on Auxiliary Classifier GANs.
You should start to see reasonable images after ~5 epochs, and good images by ~15 epochs. You should
use a GPU, as the convolution-heavy operations are very slow on the CPU. Prefer the TensorFlow
backend if you plan on iterating, as the compilation time can be a blocker using Theano.
Timings:
Hardware Backend Time / Epoch

CPU TF 3 hrs
Titan X (maxwell) TF 4 min
Titan X (maxwell) TH 7 min
Consult Auxiliary Classifier Generative Adversarial Networks in Keras for more information and
example output.
from collections import defaultdict

try:
import cPickle as pickle
except ImportError:
import pickle
from PIL import Image
from six.moves import range

from keras.layers import Input, Dense, Reshape, Flatten, Embedding, Dropout
from keras.layers import BatchNormalization
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import Conv2DTranspose, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils.generic_utils import Progbar
import numpy as np
np.random.seed(1337)
num_classes = 10
def build_generator(latent_size):
# we will map a pair of (z, L), where z is a latent vector and L is a
# label drawn from P_c, to image space (..., 28, 28, 1)
cnn = Sequential()
cnn.add(Dense(3 * 3 * 384, input_dim=latent_size, activation='relu'))

cnn.add(Reshape((3, 3, 384)))
# upsample to (7, 7, ...)

cnn.add(Conv2DTranspose(192, 5, strides=1, padding='valid',
activation='relu',
kernel_initializer='glorot_normal'))
cnn.add(BatchNormalization())
# upsample to (14, 14, ...)

cnn.add(Conv2DTranspose(96, 5, strides=2, padding='same',
activation='relu',
cnn.add(BatchNormalization())
# upsample to (28, 28, ...)

cnn.add(Conv2DTranspose(1, 5, strides=2, padding='same',
activation='tanh',
# this is the z space commonly referred to in GAN papers

latent = Input(shape=(latent_size, ))
# this will be our label
image_class = Input(shape=(1,), dtype='int32')
cls = Embedding(num_classes, latent_size,

embeddings_initializer='glorot_normal')(image_class)
# hadamard product between z-space and a class conditional embedding

h = layers.multiply([latent, cls])
fake_image = cnn(h)
return Model([latent, image_class], fake_image)
def build_discriminator():
# build a relatively standard conv net, with LeakyReLUs as suggested in
# the reference paper
cnn = Sequential()
cnn.add(Conv2D(32, 3, padding='same', strides=2,

input_shape=(28, 28, 1)))
cnn.add(LeakyReLU(0.2))
cnn.add(Dropout(0.3))
cnn.add(Conv2D(64, 3, padding='same', strides=1))



cnn.add(Flatten())
image = Input(shape=(28, 28, 1))
features = cnn(image)
# first output (name=generation) is whether or not the discriminator

# thinks the image that is being shown is fake, and the second output
# (name=auxiliary) is the class that the discriminator thinks the image
# belongs to.
fake = Dense(1, activation='sigmoid', name='generation')(features)
aux = Dense(num_classes, activation='softmax', name='auxiliary')(features)
return Model(image, [fake, aux])
if __name__ == '__main__':
# batch and latent size taken from the paper
epochs = 100
batch_size = 100
latent_size = 100
# Adam parameters suggested in https://arxiv.org/abs/1511.06434

adam_lr = 0.0002
adam_beta_1 = 0.5
# build the discriminator

print('Discriminator model:')
discriminator = build_discriminator()
discriminator.compile(
optimizer=Adam(learning_rate=adam_lr, beta_1=adam_beta_1),
loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
)
discriminator.summary()
# build the generator

generator = build_generator(latent_size)
latent = Input(shape=(latent_size, ))
image_class = Input(shape=(1,), dtype='int32')
# get a fake image

fake = generator([latent, image_class])
# we only want to be able to train generation for the combined model

discriminator.trainable = False
fake, aux = discriminator(fake)
combined = Model([latent, image_class], [fake, aux])
print('Combined model:')
combined.compile(
optimizer=Adam(learning_rate=adam_lr, beta_1=adam_beta_1),
loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
)
combined.summary()
# get our mnist data, and force it to be of shape (..., 28, 28, 1) with
# range [-1, 1]
x_train = (x_train.astype(np.float32) - 127.5) / 127.5
x_train = np.expand_dims(x_train, axis=-1)
x_test = (x_test.astype(np.float32) - 127.5) / 127.5

x_test = np.expand_dims(x_test, axis=-1)
num_train, num_test = x_train.shape[0], x_test.shape[0]
train_history = defaultdict(list)
test_history = defaultdict(list)
for epoch in range(1, epochs + 1):

print('Epoch {}/{}'.format(epoch, epochs))
num_batches = int(np.ceil(x_train.shape[0] / float(batch_size)))

progress_bar = Progbar(target=num_batches)
epoch_gen_loss = []
epoch_disc_loss = []
for index in range(num_batches):

# get a batch of real images
image_batch = x_train[index * batch_size:(index + 1) * batch_size]
label_batch = y_train[index * batch_size:(index + 1) * batch_size]
# generate a new batch of noise

noise = np.random.uniform(-1, 1, (len(image_batch), latent_size))
# sample some labels from p_c

sampled_labels = np.random.randint(0, num_classes, len(image_batch))
# generate a batch of fake images, using the generated labels as a

# conditioner. We reshape the sampled labels to be
# (len(image_batch), 1) so that we can feed them into the embedding
# layer as a length one sequence
generated_images = generator.predict(
[noise, sampled_labels.reshape((-1, 1))], verbose=0)
x = np.concatenate((image_batch, generated_images))
# use one-sided soft real/fake labels

# Salimans et al., 2016
# https://arxiv.org/pdf/1606.03498.pdf (Section 3.4)
soft_zero, soft_one = 0, 0.95
y = np.array(
[soft_one] * len(image_batch) + [soft_zero] * len(image_batch))
aux_y = np.concatenate((label_batch, sampled_labels), axis=0)
# we don't want the discriminator to also maximize the classification

# accuracy of the auxiliary classifier on generated images, so we
# don't train discriminator to produce class labels for generated
# images (see https://openreview.net/forum?id=rJXTf9Bxg).
# To preserve sum of sample weights for the auxiliary classifier,
# we assign sample weight of 2 to the real images.
disc_sample_weight = [np.ones(2 * len(image_batch)),
np.concatenate((np.ones(len(image_batch)) * 2,
np.zeros(len(image_batch))))]
# see if the discriminator can figure itself out...

epoch_disc_loss.append(discriminator.train_on_batch(
x, [y, aux_y], sample_weight=disc_sample_weight))
# make new noise. we generate 2 * batch size here such that we have
# the generator optimize over an identical number of images as the
# discriminator
noise = np.random.uniform(-1, 1, (2 * len(image_batch), latent_size))
sampled_labels = np.random.randint(0, num_classes, 2 *
len(image_batch))
# we want to train the generator to trick the discriminator

# For the generator, we want all the {fake, not-fake} labels to say
# not-fake
trick = np.ones(2 * len(image_batch)) * soft_one
epoch_gen_loss.append(combined.train_on_batch(
[noise, sampled_labels.reshape((-1, 1))],
[trick, sampled_labels]))
progress_bar.update(index + 1)
print('Testing for epoch {}:'.format(epoch))

# evaluate the testing loss here
# generate a new batch of noise

noise = np.random.uniform(-1, 1, (num_test, latent_size))
# sample some labels from p_c and generate images from them
sampled_labels = np.random.randint(0, num_classes, num_test)
[noise, sampled_labels.reshape((-1, 1))], verbose=False)
x = np.concatenate((x_test, generated_images))
y = np.array([1] * num_test + [0] * num_test)
aux_y = np.concatenate((y_test, sampled_labels), axis=0)
# see if the discriminator can figure itself out...

discriminator_test_loss = discriminator.evaluate(
x, [y, aux_y], verbose=False)
discriminator_train_loss = np.mean(np.array(epoch_disc_loss), axis=0)
# make new noise

noise = np.random.uniform(-1, 1, (2 * num_test, latent_size))
sampled_labels = np.random.randint(0, num_classes, 2 * num_test)
trick = np.ones(2 * num_test)
generator_test_loss = combined.evaluate(
[noise, sampled_labels.reshape((-1, 1))],
[trick, sampled_labels], verbose=False)
generator_train_loss = np.mean(np.array(epoch_gen_loss), axis=0)
# generate an epoch report on performance

train_history['generator'].append(generator_train_loss)
train_history['discriminator'].append(discriminator_train_loss)
test_history['generator'].append(generator_test_loss)
test_history['discriminator'].append(discriminator_test_loss)
print('{0:<22s} | {1:4s} | {2:15s} | {3:5s}'.format(

'component', *discriminator.metrics_names))
print('-' * 65)
ROW_FMT = '{0:<22s} | {1:<4.2f} | {2:<15.4f} | {3:<5.4f}'

print(ROW_FMT.format('generator (train)',
*train_history['generator'][-1]))
print(ROW_FMT.format('generator (test)',
*test_history['generator'][-1]))
print(ROW_FMT.format('discriminator (train)',
*train_history['discriminator'][-1]))
print(ROW_FMT.format('discriminator (test)',
*test_history['discriminator'][-1]))
# save weights every epoch

generator.save_weights(
'params_generator_epoch_{0:03d}.hdf5'.format(epoch), True)
discriminator.save_weights(
'params_discriminator_epoch_{0:03d}.hdf5'.format(epoch), True)
# generate some digits to display
num_rows = 40
noise = np.tile(np.random.uniform(-1, 1, (num_rows, latent_size)),
(num_classes, 1))
sampled_labels = np.array([
[i] * num_rows for i in range(num_classes)
]).reshape(-1, 1)
# get a batch to display

[noise, sampled_labels], verbose=0)
# prepare real images sorted by class label

real_labels = y_train[(epoch - 1) * num_rows * num_classes:
epoch * num_rows * num_classes]
indices = np.argsort(real_labels, axis=0)
real_images = x_train[(epoch - 1) * num_rows * num_classes:
epoch * num_rows * num_classes][indices]
# display generated images, white separator, real images

img = np.concatenate(
(generated_images,
np.repeat(np.ones_like(x_train[:1]), num_rows, axis=0),
real_images))
# arrange them into a grid

img = (np.concatenate([r.reshape(-1, 28)
for r in np.split(img, 2 * num_classes + 1)
], axis=-1) * 127.5 + 127.5).astype(np.uint8)
Image.fromarray(img).save(
'plot_epoch_{0:03d}_generated.png'.format(epoch))
with open('acgan-history.pkl', 'wb') as f:

pickle.dump({'train': train_history, 'test': test_history}, f)

Keras - TF2 - Book

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Keras - TF2 - Book

Uploaded by

Copyright:

Available Formats

Indice

Why use Keras?.......................................................................................................................................13

Keras prioritizes developer experience

Keras has broad adoption in the industry and the research

Keras makes it easy to turn models into products

Keras development is backed by key companies in the deep

You have just found Keras.

Multi-backend Keras and tf.keras:

Getting started: 30 seconds to Keras

Stacking layers is as easy as .add():

model.add(Dense(units=64, activation='relu', input_dim=100))

You can now iterate on your training data in batches:

Alternatively, you can feed batches to your model manually:

Evaluate your performance in one line:

Or generate predictions on new data:

• Alternatively: install Keras from the GitHub source:

Then, cd to the Keras folder and run the install command:

Configuring your Keras backend

Why this name, Keras?

Getting started with the Keras Sequential model

Specifying the input shape

As such, the following snippets are strictly equivalent:

# For a binary classification problem

# For a mean squared error regression problem

# For custom metrics

def mean_pred(y_true, y_pred):

# Generate dummy data

# Train the model, iterating on the data in batches of 32 samples

# For a single-input model with 10 classes (categorical classification):

# Generate dummy data

# Convert labels to categorical one-hot encoding

# Train the model, iterating on the data in batches of 32 samples

Multilayer Perceptron (MLP) for multi-class softmax classification:

# Generate dummy data

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

MLP for binary classification:

# Generate dummy data

# Generate dummy data

model.add(Conv2D(64, (3, 3), activation='relu'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

Sequence classification with LSTM:

model.fit(x_train, y_train, batch_size=16, epochs=10)

Sequence classification with 1D convolutions:

model.fit(x_train, y_train, batch_size=16, epochs=10)

Stacked LSTM for sequence classification

from keras.models import Sequential

# expected input data shape: (batch_size, timesteps, data_dim)

# Generate dummy training data

# Generate dummy validation data

Same stacked LSTM model, rendered "stateful"

# Expected input batch shape: (batch_size, timesteps, data_dim)

# Generate dummy training data

# Generate dummy validation data

Getting started with the Keras functional API

Let's start with something simple.

First example: a densely-connected network

# This returns a tensor

# a layer instance is callable on a tensor, and returns a tensor

# This creates a model that includes

All models are callable, just like layers

# Input tensor for sequences of 20 timesteps,

Multi-input and multi-output models

# This embedding layer will encode the input sequence