Welcome to Scribd!

Esther Oduntan 6378 Poster

Uploaded by

0% found this document useful (0 votes)

6 views1 page

This document discusses using a multi-modal attention mechanism to improve text visual question answering systems. It proposes applying this technique to a pre-trained MC4 model with enhanced optical character recognition (OCR). The methodology uses Faster R-CNN to extract image features, GloVe to extract word embeddings from questions, and Google OCR and Rosetta OCR to extract text from images. These features are fed into a transformer model. The results show integrating enhanced OCR into an M4C model improves performance on the TextVQA dataset over using the M4C model alone.

Original Description:

Original Title

Esther_Oduntan_6378_poster

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views1 page

Esther Oduntan 6378 Poster

Uploaded by

Tanankem

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

Text Representation Enhancement using Multi-Modal Attention

Mechanism for Text Visual Question Answering

Esther Oduntan Ph.D and Joseph Domguia
AI|Machine Learning|Data Science
●
ABSTRACT ●
METHODOLOGY RESULTS
Text Visual Questions and Answering is an integral part of The method used in this study, is a multi-
communication.Researchers in Natural Language Processing and
Computer Vision have been used various architectures such as modal technique that involves more than
Bottom-up and Top-Down Visual Attention Mechanism, Visual-BERT, one input and outputs. The inputs are the
M4C to mention a few. In this research, we looked into the application
of multi-modal attention mechanism to improve the question and
questions, images and optical character
answering system. Hence,it was observed that the quality of answer reader (OCR) extracted features. The
generated from the pre-trained MC4 model with an improved OCR will images features were extracted using the
enhance the performance for text visual question and answering
system. Faster R-CNN model, the contextual
word embedding features from the
●
INTRODUCTION questions were extracted using global
vec(GLO Vec) and text on images were
Visual Question Answering (VQA) system can be defined extracted using the google ocr-app. All
as an algorithm that takes as input an image and a the input features were passed into the
natural language question about the image and generates
a natural language answer as the output. This is a transformer. The dataset that was used is
computer vision task where a system is given a text- the TextVQA dataset.
based question about an image, and it must infer the
answer. The main idea in Text VQA is that the search and
the reasoning part must be performed over the content of
an image. The system must be able to detect objects, it
needs to classify a scene and needs world
knowledge,commonsense reasoning and knowledge
reasoning are necessary.
●
RELATED WORKS

“Bottom-up and top-down attention for image

captioning and visual question answering,” by P.
Anderson, X. He, C. Buehler, D. Teney, M. Johnson, ●
CONCLUSION

S. Gould, and L. Zhang(2018). This project has been able to implement a multi-modal
attention mechanism for visual question answering
system, by integrating Google OCR and Rosetta OCR
“Multimodal grid features and cell pointers for on an M4C pre-trained model. Out of the three
scene text visual question answering,” by L. The Multimodal OCR-Transformer approaches intended to use in enhancing text
representation, two of the approaches were
Gómez, A. F. Biten, R. Tito, A. Mafla, and D. Based VQA Architecture implemented, while the third approach of training from
scratch and fine-tuning the pre-trained M4C model will
Karatzas(2020) be examined for future work

Cs601 Assignment 2 Helping Material
Document98 pages
Cs601 Assignment 2 Helping Material
cs619finalproject.com
No ratings yet
Chapter 15 Basic Concepts and Elements of Internal Control
Document9 pages
Chapter 15 Basic Concepts and Elements of Internal Control
Richard de Leon
No ratings yet
Experiment No: 4-Characteristics of BJT in CE Configuration Aim
Document6 pages
Experiment No: 4-Characteristics of BJT in CE Configuration Aim
GANESH KUMAR B eee2018
100% (2)
4 - Maintain and Repair Computer Systems and Networks
Document7 pages
4 - Maintain and Repair Computer Systems and Networks
Johnny Pancito Rodriguez
No ratings yet
7305-01 DeltaV Safety Instrumented System (SIS) Overview
Document27 pages
7305-01 DeltaV Safety Instrumented System (SIS) Overview
Mohsen E
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Activity Complete Your Project Charter Assignment Answer
Document3 pages
Activity Complete Your Project Charter Assignment Answer
Deboraj Sarkar
100% (5)
Project Reoprt
Document2 pages
Project Reoprt
RAJ JAISWAL
No ratings yet
Latest Base Paper
Document4 pages
Latest Base Paper
Aradina Babu
No ratings yet
Attention-Based Deep Learning Model For Image Captioning - A Comparative Study
Document8 pages
Attention-Based Deep Learning Model For Image Captioning - A Comparative Study
mare
No ratings yet
EAES Effective Augmented Embedding Spaces For Text-Based Image Captioning
Document10 pages
EAES Effective Augmented Embedding Spaces For Text-Based Image Captioning
nhannguyen
No ratings yet
Automated Caption Generator For The Visually Imapired: Abstract: Automated Captioning of Photos Is A Mission That
Document6 pages
Automated Caption Generator For The Visually Imapired: Abstract: Automated Captioning of Photos Is A Mission That
anchal
No ratings yet
Artificial Vision For The Visually Impaired: Siddhi Potdar Sakshi Gupta Siddharth Kulkarni Aditya Kawale Pallavi Dhade
Document10 pages
Artificial Vision For The Visually Impaired: Siddhi Potdar Sakshi Gupta Siddharth Kulkarni Aditya Kawale Pallavi Dhade
sonali kamble
No ratings yet
Department of Electronics and Communication Engineering
Document25 pages
Department of Electronics and Communication Engineering
Keerthy
No ratings yet
Image To Text Converter and Translator Using Deep Learning and Image Processing
Document4 pages
Image To Text Converter and Translator Using Deep Learning and Image Processing
Darling Rupesh
No ratings yet
Multi Mod Al
Document10 pages
Multi Mod Al
El Salvador
No ratings yet
Visual Question Answering: A State of The Art Review: Sruthy Manmadhan Binsu C. Kovoor
Document41 pages
Visual Question Answering: A State of The Art Review: Sruthy Manmadhan Binsu C. Kovoor
vita
No ratings yet
Major Project Phase 1
Document18 pages
Major Project Phase 1
saadsurve131702
No ratings yet
Text Extraction From Digital Images With Text To Speech Conversion and Language Translation
Document3 pages
Text Extraction From Digital Images With Text To Speech Conversion and Language Translation
International Journal of Innovative Science and Research Technology
No ratings yet
Paper 8928
Document4 pages
Paper 8928
IJARSCT Journal
No ratings yet
Image Captioning - A Deep Learning Approach
Document4 pages
Image Captioning - A Deep Learning Approach
Mandipalli Sai shilpa
No ratings yet
Aic - 2022 - 35 2 - Aic 35 2 Aic210172 - Aic 35 Aic210172
Document19 pages
Aic - 2022 - 35 2 - Aic 35 2 Aic210172 - Aic 35 Aic210172
mohamed walid
No ratings yet
PGCON Paper Final
Document4 pages
PGCON Paper Final
Arbaaz Shaikh
No ratings yet
Multilayer Dense Attention Model For Image Caption
Document11 pages
Multilayer Dense Attention Model For Image Caption
Pallavi Bharti
No ratings yet
An Improved Automatic Image Annotation Approach Using Convolutional Neural Network-Slantlet Transform
Document13 pages
An Improved Automatic Image Annotation Approach Using Convolutional Neural Network-Slantlet Transform
sem
No ratings yet
Sign Language Recognition From Digital Videos Using Feature Pyramid Network With Detection Transformer
Document13 pages
Sign Language Recognition From Digital Videos Using Feature Pyramid Network With Detection Transformer
Miral Elnakib
No ratings yet
DL Group 6 Rep
Document11 pages
DL Group 6 Rep
iit2020211
No ratings yet
6 - 23 - Deep Learning Approaches On Image Captioning A Review
Document41 pages
6 - 23 - Deep Learning Approaches On Image Captioning A Review
Phạm Thanh Trường
No ratings yet
Image Captioning Generator Using Deep Machine Learning
Document3 pages
Image Captioning Generator Using Deep Machine Learning
Editor IJTSRD
No ratings yet
Anish CV New PDF
Document5 pages
Anish CV New PDF
wahidjaan
No ratings yet
Speech To Image Translation Framework For Teacher-Student Learning - Using Ieee Format
Document6 pages
Speech To Image Translation Framework For Teacher-Student Learning - Using Ieee Format
anu
No ratings yet
Surrvey Paper On Intelligent Reader For Visually Impaired People
Document5 pages
Surrvey Paper On Intelligent Reader For Visually Impaired People
Revathi D
No ratings yet
Final
Document31 pages
Final
Aditya Kumar
No ratings yet
Automatic Image Captioning Using Neural Networks
Document9 pages
Automatic Image Captioning Using Neural Networks
ABD BEST
No ratings yet
Visual Question Answering A State of The Art Review
Document41 pages
Visual Question Answering A State of The Art Review
Meryem RAHMOUNI
No ratings yet
Textually Enriched Neural Module Networks For Visual Question Answering
Document9 pages
Textually Enriched Neural Module Networks For Visual Question Answering
Matt
No ratings yet
Text-to-Image Generation Using Deep Learning
Document6 pages
Text-to-Image Generation Using Deep Learning
experimental mechanics
No ratings yet
Extracting Text Part Using MATLAB: Poonam Rani, Payal Taneja Daulat Sihag
Document3 pages
Extracting Text Part Using MATLAB: Poonam Rani, Payal Taneja Daulat Sihag
husan
No ratings yet
Image Captioning Using R-CNN & LSTM Deep Learning Model
Document4 pages
Image Captioning Using R-CNN & LSTM Deep Learning Model
International Journal of Innovative Science and Research Technology
No ratings yet
Character Recoganization
Document6 pages
Character Recoganization
Inderpreet singh
No ratings yet
Conference Paper A5
Document9 pages
Conference Paper A5
20at1a3145
No ratings yet
TCS Ocr
Document39 pages
TCS Ocr
Throw Away
No ratings yet
Learning Transferable Visual Models From Natural Language Supervision
Document47 pages
Learning Transferable Visual Models From Natural Language Supervision
bob cell
No ratings yet
Vdocuments - MX Award-Oppenheimer
Document18 pages
Vdocuments - MX Award-Oppenheimer
andersonvallejosflores
No ratings yet
Earbook
Document5 pages
Earbook
IJRASETPublications
No ratings yet
Automatic Image Captioning Combining Natural Language Processing and
Document14 pages
Automatic Image Captioning Combining Natural Language Processing and
Mebratu Abuye
No ratings yet
Image Description Generator Using Deep Learning
Document7 pages
Image Description Generator Using Deep Learning
IJRASETPublications
No ratings yet
Image Captioning Using Deep Learning and NLP Techniques
Document14 pages
Image Captioning Using Deep Learning and NLP Techniques
IJRASETPublications
No ratings yet
(IJCST-V10I4P20) :khanaghavalle G R, Arvind Nachiappan L, Bharath Vyas S, Chaitanya M
Document5 pages
(IJCST-V10I4P20) :khanaghavalle G R, Arvind Nachiappan L, Bharath Vyas S, Chaitanya M
EighthSenseGroup
No ratings yet
(IJIT-V7I2P1) :nandita Bharambe, Pooja Barhate, Prachi Dhannawat
Document3 pages
(IJIT-V7I2P1) :nandita Bharambe, Pooja Barhate, Prachi Dhannawat
IJITJournals
No ratings yet
Research of Sentiment Analysis Based On Long-Sequence-Term-Memory Model
Document6 pages
Research of Sentiment Analysis Based On Long-Sequence-Term-Memory Model
Shamsul Bashar
No ratings yet
Reinforced Mnemonic Reader For Machine Comprehension: Minghao Hu Yuxing Peng Xipeng Qiu
Document8 pages
Reinforced Mnemonic Reader For Machine Comprehension: Minghao Hu Yuxing Peng Xipeng Qiu
wangxy802
No ratings yet
Gujarat Technological University
Document4 pages
Gujarat Technological University
Hina Chavda
No ratings yet
An Analysis of Graph Convolutional Networks and Recent Datasets For Visual Question Answering
Document24 pages
An Analysis of Graph Convolutional Networks and Recent Datasets For Visual Question Answering
vita
No ratings yet
Image Captionbot For Assistive Technology
Document3 pages
Image Captionbot For Assistive Technology
International Journal of Innovative Science and Research Technology
No ratings yet
Dinov 2
Document31 pages
Dinov 2
Zi Wei
No ratings yet
Jiang 2021
Document11 pages
Jiang 2021
Aadith Thillai Arasu S
No ratings yet
Image Captioning Model Using Attention and Object
Document17 pages
Image Captioning Model Using Attention and Object
Arun Pratap Singh
No ratings yet
A Survey of Evolution of Image Captioning PDF
Document18 pages
A Survey of Evolution of Image Captioning PDF
_mbo
No ratings yet
FINAL YEAR PROJECT PAPERedit2
Document4 pages
FINAL YEAR PROJECT PAPERedit2
Tejas Garhewal
No ratings yet
Char RCG TH
Document11 pages
Char RCG TH
ahmadabdullah1797
No ratings yet
Handwritten Hindi Character Recognition Using Deep Learning Techniques
Document8 pages
Handwritten Hindi Character Recognition Using Deep Learning Techniques
Ms. Divya Konikkara
No ratings yet
Investigating The Effect of Bd-Craft To Text Detection Algorithms
Document16 pages
Investigating The Effect of Bd-Craft To Text Detection Algorithms
Adam Hansen
No ratings yet
Dcnn-Bigru Text Classification Model Based On Bert Embedding
Document6 pages
Dcnn-Bigru Text Classification Model Based On Bert Embedding
Larissa Feliciana
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Soal MTCNA Yang Sering Keluar
Document30 pages
Soal MTCNA Yang Sering Keluar
irfanmaulana20
No ratings yet
Database Management System Assignment 1: Ruthra Devi A 19E084
Document14 pages
Database Management System Assignment 1: Ruthra Devi A 19E084
Ruthra Devi
0% (1)
Komal Jangid
Document22 pages
Komal Jangid
Ashish Mohare
No ratings yet
CCTV Surveillance Policy Document (IPS, 2013)
Document23 pages
CCTV Surveillance Policy Document (IPS, 2013)
Razi Mahri
No ratings yet
#Include #Include #Include #Include #Define NULL 0
Document54 pages
#Include #Include #Include #Include #Define NULL 0
Ashish Mathew
No ratings yet
Kaspersky Statement Regarding Data Processing For Marketing Purposes (Marketing Statement)
Document2 pages
Kaspersky Statement Regarding Data Processing For Marketing Purposes (Marketing Statement)
j4k4l0d4n9
No ratings yet
Installation: Ups Features and Functions
Document3 pages
Installation: Ups Features and Functions
Catalina Olteanu
No ratings yet
Pitch Ideas Free Template: Insert The Subtitle of Your Presentation
Document25 pages
Pitch Ideas Free Template: Insert The Subtitle of Your Presentation
Yousuf Alamin
No ratings yet
CGTMD Coalitional Games
Document17 pages
CGTMD Coalitional Games
rajan20202000
No ratings yet
Anime Face Generation Using DC-GANs
Document6 pages
Anime Face Generation Using DC-GANs
HSSSHS
No ratings yet
Zoo Animals Color by Number Worksheet: Name
Document1 page
Zoo Animals Color by Number Worksheet: Name
ShelothaWinaDinar
No ratings yet
Getting Started With The Spyder IDE
Document7 pages
Getting Started With The Spyder IDE
rheed1
No ratings yet
G120C - Lista de Parametros
Document484 pages
G120C - Lista de Parametros
Ricky Mclaughlin
No ratings yet
Documentation (Repaired)
Document80 pages
Documentation (Repaired)
Anees Ur Rehman
100% (1)
XXXX ATM Principle: ISSUE2.0
Document30 pages
XXXX ATM Principle: ISSUE2.0
anon_164514627
No ratings yet
ARRL - Computer Networking Conference 7 (1988)
Document204 pages
ARRL - Computer Networking Conference 7 (1988)
qrzdx?
No ratings yet
Orani District Hospital
Document13 pages
Orani District Hospital
Brandon Lewis
No ratings yet
Create Storage Bins in SAP WM
Document4 pages
Create Storage Bins in SAP WM
Harish Kumar
No ratings yet
Umair Paracha: Contact Info
Document2 pages
Umair Paracha: Contact Info
Umair
No ratings yet
Ratio Proportion Test
Document2 pages
Ratio Proportion Test
Taniya Datta
100% (1)
DMX-SPI-203 LED Controller: 2. Product Dimension
Document3 pages
DMX-SPI-203 LED Controller: 2. Product Dimension
Nany Chocokat
No ratings yet
White Paper WiFi 6 Accelerates A Path To A Hyper-Connected World
Document10 pages
White Paper WiFi 6 Accelerates A Path To A Hyper-Connected World
Theophilus Aika
No ratings yet
Crash Analysis and Simulation
Document26 pages
Crash Analysis and Simulation
Nyakakussaga
No ratings yet
How To Use My Disney Experience Family and Friends
Document1 page
How To Use My Disney Experience Family and Friends
Dante Arguijo
No ratings yet