Professional Documents
Culture Documents
Group Members
YUSUF SODAWALA: 2021300127
Mentor KRISHNAN SUBRAMANIAN: 2021300130
Prof. Jignesh Sisodia SAAD SURVE: 2021300131
INTRODUCTION 02
Objective: Develop image processing for chatbots with VQA.
Aim: Enhance chatbots' understanding of visual content.
Focus: Extracting meaning from images for natural language interaction.
Importance: Improves user engagement and interaction.
Components:
Theoretical underpinnings of VQA
Image processing techniques
Model development and evaluation
PROBLEM DEFINITION, SCOPE &
OBJECTIVES OF THE PROJECT
03
Scope of the Project:
Utilization of Advanced Image Processing:
Leverage state-of-the-art algorithms and
frameworks (e.g., TensorFlow, PyTorch).
Explore CNNs, attention mechanisms, and
transfer learning for enhanced image
understanding.
Focus on Accessibility and Inclusivity:
Prioritize VQA model development for
accessibility.
Create multimodal interfaces for effective
interaction by visually impaired users.
LITERATURE SURVEY 04
Paper Title Author(s) Methodology Inference
Learning content
and context with Chao Yang, Su Feng, Develops two architectural Improved model performance
language bias for Dongsheng Li, Huawei Shen, branches to deal with bias learning: by addressing bias in content
visual question Guoqing Wang, and Bin Jiang. Content and Context branches. and context.
answering.
Beyond accuracy: A
Provides a structured approach
consolidated tool for Introduces a novel digital
Dirk Vath, Pascal Tilli, and for evaluating and testing VQA
visual question framework to analyze, evaluate, and
Ngoc Thang Vu. models and datasets.
answering test VQA models and datasets.
benchmarking.
Medical visual
Li-Ming Zhan, Bo Liu, Lu Fan,
question answering Question conditioned reasoning Enhances reasoning capabilities
Jiaxin Chen, and Xiao Ming
via conditional module for medical VQA models. in medical VQA tasks.
Wu.
reasoning.
Debiased visual
Shiquan Wen, Guanghui Xu, Technique to recognize and Addresses bias issues in VQA
question answering
Mingkui Tan, Qingyao Wu, and alleviate negative bias effects model training to improve
from feature and
Qi Wu. during training. performance.
sample perspectives.
Main Block
Overview of the
System
DESIGN AND METHODOLOGY
COLOR
OCR RECOGNITION
OBJECT
SCENE RECOGNITION
RECOGNITION
IMPLEMENTATION DETAILS
07
VILT MODEL FOR VQA:
VILT MODEL IMPLEMENTED VIA HUGGING FACE'S TRANSFORMERS LIBRARY.
INTEGRATES VISUAL AND TEXTUAL DATA FOR TASKS LIKE VISUAL QUESTION
ANSWERING.
UTILIZES PRE-TRAINED TRANSFORMER ARCHITECTURES.
DURING INFERENCE, PROCESSES INPUT IMAGES AND QUESTIONS.
EMPLOYS MULTI-MODAL FUSION TECHNIQUES.
ENABLES ACCURATE PREDICTIONS THROUGH CONTEXTUAL UNDERSTANDING.
OCR MODEL FOR TEXT EXTRACTION:
IMPLEMENTED OCR MODEL USING EASYOCR LIBRARY.
DESIGNED TO EXTRACT TEXT FROM IMAGES OR DOCUMENTS.
LEVERAGES PRE-TRAINED DEEP LEARNING MODELS FOR CHARACTER RECOGNITION.
CAPABLE OF HANDLING MULTIPLE LANGUAGES AND SCRIPTS.
PROVIDES FAST AND ACCURATE TEXT EXTRACTION CAPABILITIES.
SUPPORTS VARIOUS IMAGE FORMATS SUCH AS JPEG, PNG, AND PDF.
OFFERS STRAIGHTFORWARD INTEGRATION INTO EXISTING APPLICATIONS OR WORKFLOWS.
SUITABLE FOR TASKS LIKE DOCUMENT DIGITIZATION, TEXT EXTRACTION FROM IMAGES,
AND MORE.
08
TECHNOLOGY STACK
PROJECT PLAN AND TIMELINE
09
Phase 3 - and
Phase 0 - 8/1/24 Phase 1.5 - 27/3/23
35% completion beyond
Problem Identification
Next semester