0% found this document useful (0 votes)
151 views2 pages

Deep Learning for Image Captioning

This capstone project proposes image and video captioning using deep learning. The student will implement a system that can automatically generate captions for images and videos by learning from sample data. This goes beyond existing models that only generate labels or basic captions, and will integrate image recognition with a model to predict emojis, stickers, and more descriptive captions. The project aims to fill research gaps in saving user time for captioning and providing more intelligence to automatic systems. Expected outcomes are automatic caption generation from images and videos, and predicting emojis and stickers.

Uploaded by

shivam5singh-25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views2 pages

Deep Learning for Image Captioning

This capstone project proposes image and video captioning using deep learning. The student will implement a system that can automatically generate captions for images and videos by learning from sample data. This goes beyond existing models that only generate labels or basic captions, and will integrate image recognition with a model to predict emojis, stickers, and more descriptive captions. The project aims to fill research gaps in saving user time for captioning and providing more intelligence to automatic systems. Expected outcomes are automatic caption generation from images and videos, and predicting emojis and stickers.

Uploaded by

shivam5singh-25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

CAPSTONE PROJECT

 TOPIC :- Image Captioning Using Deep Learning

 DESCRIPTION :-
In this project, we implemented an image and video captioning system, which
automatically generates informative captions, based on the learning from sample images,
used to train the program.

In this project we are using the concept of "Image recognition” to extract the image
features and to classify them. Previous attempts of “Image recognition” were only useful
in generating labels or creating basic captions example: “like for an image where a dog is
sitting on the floor it will generate simply about the object in the image and its position”,
which poses no outcome or any insights beneficial for humans.

Our Aim is to use this extracted features and integrate it with a model, which will predict
and generate “emojis”, “stickers” and captions for the given input. This model can be
used as an application for social media platforms where infinite numbers of images are
uploaded.

The source of dataset for this project is from “Labelme” and “Google Open Images”, which
contains millions of images (annotated and labeled). The vast range of dataset available
will allow us to analyze the data more easily, and will help in testing the data more
accurately. In future if required, more such sources for datasets can be explored like
“Kaggle” and other repositories.

 NATURE:- General

 What novelty do you see in the proposed research/ project work by the student?

1. Existing models of this topic generates only labels but in this project, integrated systems
of labels and auto generated caption form those labels, is one of the added feature
2. The real time image form the camera can also be utilized for the above mentioned
purposes.
3. All the possible outcomes will be done in the existing real time
 Is it feasible to carry out the proposed work with the facilities available in home? If yes,
Please mention how the project/ research work shall be carried out.

Yes, absolutely the proposed work can easily be carried out with the facilities available at
home and it would be quite feasible and in order to carry out such project/research the
knowledge of the following subjects is required.

1. Python :-
o OpenCV : used to analysis the Image .
o Numpy : used to handle the image array.
o Scipy : used for mathematical operation.
o Matplotlib : API used to plot .
o Tensorflow : used to implement standard machine learning algorithm .
2. Algorithm :-
o CNN( Convolution Neural Network)
o RNN( Recurrent Neural Network)
o Classification based algorithm
Programming Languages - Python,R,
Platforms - OpenCV,Tensor Flow,Jupyter Notebook,Anaconda,R studio
Libraries - Keras,Numpy,Scipy

 Mention the research gap that the proposed/research work intends to fill.

1. To save user time for the caption of image.


2. To providing the intelligence to the system.
3. Integrated system of Image processing and caption generation

 What are the expected research/project outcomes from this proposal submitted by the
students?

1. To generate automatic captions from the given input image.


2. To generate automatic captions from the given input Video.
3. To predict the “emojis” and “stickers” for the given input.

You might also like