You are on page 1of 18

RNS INSTITUTE OF TECHNOLOGY

BENGALURU - 98
DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING

Mini Project on Artificial Intelligence & Machine Learning (21CS54 )


Presentation on
IMAGE CAPTIONING SYSTEM
Simran Taj USN: 1RN21IS186
Prerana DV 1RN21IS187
Yashas R 1RN21IS179

Faculty In-Charge
Ms. Aishwarya G
Assistant Professor
Dept. of ISE, RNSIT
AGENDA

 Abstract
 Introduction
 Datasets
 Algorithm used
 Data Preparation
 Exploratory Data Analysis
 Feature Selection
 Model Generation
 Results
 Conclusion & Future Enhancement
 References
ABSTRACT
• This project introduces an advanced image captioning system that not only generates descriptive captions and identifies class
names for user-uploaded images but also enhances the user experience by presenting visually similar images from a dataset.
Leveraging the VGG16 convolutional neural network (CNN) pre-trained on a large dataset, our system achieves accurate
classification and caption generation for various fashion items, including shirts, t-shirts, pants, tops, crops, dresses, and more.

• Moreover, to enrich the user experience, our system offers a feature that retrieves visually similar images from a curated
dataset based on the content of the user's input image. Utilizing advanced image similarity algorithms and feature extraction
techniques, we present users with a selection of visually related images, enhancing their exploration and understanding of
fashion trends and styles

• Keywords: Image captioning, fashion classification, VGG16 CNN, visual similarity, feature extraction, fashion trends, user
experience, image retrieval, dataset enrichment, fashion items.

Dept of ISE, RNSIT 2023-2024 Page No


INTRODUCTION

In the realm of artificial intelligence and computer vision, image


classification stands as a fundamental task with wide-ranging
applications. The ability to automatically recognize and categorize
objects within images has immense potential, spanning from medical
diagnosis to autonomous vehicles. In this project, we embark on a
journey into the domain of image classification, leveraging advanced
deep-learning techniques to build a robust model capable of accurately
identifying various fashion items.
DATASET

1. Custom Dataset: Our dataset is tailored to our specific task of fashion classification. It is
organized into a directory structure where each folder represents a distinct fashion category,
such as shirts, t-shirts, dresses, tops, pants, skirts, jackets, etc.
2. Class Labels: Each folder in your dataset contains images belonging to a particular fashion
category. For instance, the "shirt" folder contains images of different types of shirts, while the
"tshirt" folder contains images of various t-shirts, and so on. This organization enables the
model to learn to differentiate between different fashion items.
3. Data Augmentation: You're using data augmentation techniques to enhance the variety of
training examples and improve the model's ability to generalize. Techniques such as rescaling,
shearing, zooming, and horizontal flipping are applied to the images during training. This
augmentation helps expose the model to a wider range of variations within each fashion
category, making it more robust to different styles, angles, and lighting conditions.
4. Training Process: The model's performance is evaluated using metrics like accuracy, which
measures how well it predicts the correct fashion category for new, unseen images.
ALGORITHM USED

1. Collect the dataset and organize it into different classes.


2. Load the pre-trained model and train a new model.
3. Receive user input.
4. Preprocess the image.
5. Search for patterns in the dataset images and collect the matching
images.
6. If a match is found: Display the class name and similar images.
7. Else: No matching found in the dataset.
DATA PREPARATION

• Dataset Collection: The dataset was collected from various online sources and organized into different classes
representing fashion item categories.

• Image Preprocessing: Images were resized to a uniform size of 224x224 pixels and normalized to standardize
pixel values within the range [0, 1].

• Data Augmentation: Augmentation techniques such as rotation, flipping, and zooming were applied to
increase the diversity and robustness of the dataset.
EXPLORATORY ANALYSIS

• We conduct exploratory analysis on the dataset to gain insights into its characteristics:

1. Sample Image Visualization: Visualizations of sample images from each fashion item category are presented to
understand the diversity of the dataset.

2. Class Distribution Analysis: The distribution of images across different fashion item categories is analyzed to
identify any class imbalances or biases.

3. Data Quality Assessment: Potential challenges or issues in the dataset, such as noise or outliers, are examined to
ensure data quality.

4. Model Performance Evaluation: Graphs depicting error and accuracy metrics are generated to evaluate the
performance of models trained on the dataset. These metrics provide insights into the effectiveness of the models
and help identify areas for improvement.
EXPLORATORY ANALYSIS

Class Distribution of Dress


EXPLORATORY ANALYSIS
Training Loss and Training Accuracy Plots
FEATURE SELECTION
Utilization of Pre-trained VGG16 Model
 Feature Extraction:

• The VGG16 model, pre-trained on the ImageNet dataset, is utilized for feature extraction.

• Features are extracted from images using the convolutional layers of the VGG16 model.

 Advantages:

• Leveraging pre-trained models reduces the need for manual feature engineering.

• The VGG16 architecture has shown effectiveness in capturing high-level features from images.

 Transfer Learning:

• Transfer learning is employed to adapt the pre-trained model to the specific task of classifying fashion item images.

• By fine-tuning the model, it can learn task-specific features while retaining the general knowledge acquired during pre-training.

 Feature Representation:

• Extracted features are represented as high-dimensional vectors, capturing essential visual patterns present in the images.
MODEL GENERATION

 Prediction Setup: The FashionClassifier's final step involves a special layer called Dense, which is set up to
give probabilities for each fashion category. This means it calculates the likelihood of an image belonging to
each category, like shirts, pants, or dresses.

 Compilation and Training: It uses the Adam optimizer, which helps adjust the model's learning process to
improve accuracy. It also employs categorical cross-entropy as a way to measure how well the model is
performing during training. Additionally, accuracy is monitored closely to ensure the model is learning
effectively.

 ImageDataGenerator: This generator applies various techniques, like resizing, rotating, or flipping images,
to create different versions of the same pictures. This variety helps the model learn to recognize fashion
items more robustly, even when they appear differently.
MODEL GENERATION

 Prediction Setup: The FashionClassifier's final step involves a special layer called Dense, which is set up to
give probabilities for each fashion category. This means it calculates the likelihood of an image belonging to
each category, like shirts, pants, or dresses.

 Compilation and Training: It uses the Adam optimizer, which helps adjust the model's learning process to
improve accuracy. It also employs categorical cross-entropy as a way to measure how well the model is
performing during training. Additionally, accuracy is monitored closely to ensure the model is learning
effectively.

 ImageDataGenerator: This generator applies various techniques, like resizing, rotating, or flipping images,
to create different versions of the same pictures. This variety helps the model learn to recognize fashion
items more robustly, even when they appear differently.
RESULTS
We present the results obtained from applying the trained fashion item classification model to a test image. We
showcase the predicted class label along with the confidence level and display similar images from the training
dataset for visual comparison.
CONCLUSIONS
• The conclusion chapter summarizes the key findings, contributions, and implications of the project. It
provides a comprehensive overview of the project's objectives, methodologies, results, and discussions. The
conclusion highlights the significance of the project in addressing the problem statement, discusses potential
future directions for research or application, and concludes with a final reflection on the project's outcomes.
FUTURE ENHANCEMENTS
• Model Improvement: Fine-tune models, explore ensemble methods, and experiment with advanced
architectures.

• Data Augmentation: Implement advanced techniques and custom strategies to generate diverse training samples.

• User Experience: Develop interactive interfaces, personalize recommendations, and incorporate user feedback.

• Scalability and Deployment: Deploy on cloud or edge platforms, ensure continuous monitoring for
performance.

• Model Interpretability: Integrate explainable AI techniques and visualizations for transparency.

• Domain-Specific Enhancements: Incorporate multi-modal data, and analyze fashion trends for better insights.
REFERENCES

1. Stuart J. Russell and Peter Norvig, Artificial Intelligence, 3rd Edition, Pearson,2015

2. S. Sridhar, M Vijayalakshmi “Machine Learning”. Oxford ,2021

3. Amazon.in

4. ChatGPT (openai.com)

5. https://youtu.be/Gz_PsRRxrHM?si=PljSPl65CuLnTCQh
THANK YOU

You might also like