You are on page 1of 2

TF Records Vs HDF5

Dataset contains 250 images for 40 different classes, so a total of 10,000 images
Original size for each image was (1920 X 1200)
Memory required for these 10,000 original images was 37.3 GB
Each image was resized to (224 X 224) before storing it.

Importing Dependencies

In [1]: # importing writer and generator modules from dataset package


from dataset.writer import FileWriter
from dataset.reader import DatasetGenerator
...

Storing Images
In [2]: # Instantiating file writer object
writer = FileWriter()

[INFO] Preparing writer..

[INFO] Reading dataset..


[INFO] One Hot encoding the labels..

In [3]: # creating tf records file


writer.create_tfrecord()

Creating tfrecord file: 100% |##################################| Time: 0:30:01

In [4]: # creating hdf5 file


writer.create_hdf5()

Creating hdf5 file: 100% |######################################| Time: 0:29:20

Memory used

Reading Images

Reading full dataset, i.e. completing 1 epoch


In [5]: # Instantiating data generator object
dataGen = DatasetGenerator()

[INFO] Preparing Dataset Generator

In [6]: # reading tf records file


dataGen.read_tfrecord()

WARNING:tensorflow:From D:\shubham\Research\1. Handling Mass Data\dataset\reader.


py:120: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset
_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, ret
urn the `Dataset` object directly from your input function. As a last resort, you
can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.

Reading tfrecord file: 100% |###################################| Time: 0:00:44

In [7]: # reading hdf5 file


dataGen.read_hdf5()

Reading hdf5 file: 100% |#######################################| Time: 0:00:42

You might also like