Multimedia and Games: Project Proposal

Multimedia and Games
Project proposal
Thanh-Hai Tran
1
Outline
 Context and Motivation
 Existing approaches
 Proposed solutions
 Topics
 Schedule
2
Context and Motivation
 Context:
 In Vietnam, there are about 2,6m death people
 The number of persons giving translation service is very limited
(about 20)
 The death people have difficulty to access to information (medicinal,
heath, security, education, etc)
 Motivation: building an automatic system supporting
Vietnamese death people to understand information.
 Ideal system: speech2sign = speech2text + text2sign
 In this work: text2sign
3
An example
4
Existing approach using Computer
graphic
5
Existing approach
6
Existing approach
7
Existing approach using generative model
Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks,
January 2020, International Journal of Computer Vision 8
Generated results
Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks,
January 2020, International Journal of Computer Vision 9
Our proposed solution
10
T1: Overview of VSL
 Main tasks:
 Study research on Vietnamese sign language (VSL)
 Main characters of VSL
 Set of common vocabulary in VSL
 Resources of VSL (video/text)
 List and contact with people giving translation service for data
collection
 Expected result
 Report about the mentioned problems
 All resources are organized in a shared drive (e.g. google)
 Required skills:
 Reading research paper, summary, and report
 Good communication and organization
11
T2: Related works
 Main tasks:
 Study existing works on Sign language production with two main
approaches: computer graphic and computer vision based
 Method, Dataset
 Obtained results
 Code (if accessible)
 Share your knowledge with other groups
 Expected result
 Report about related works
 All resources /references/ links/ library/dataset) are organized in a
shared drive (e.g. google)
 Reading research paper in English, summary, and report
 Good background in math, probability
 Some knowledge about Image processing, Computer vision 12
T3: Building a dictionary for VSL
 Main tasks:
 Study some existing datasets of American SL, Japanese SL, German SL
and their data collection method and organization
 Collect a set of phrase, words for VSL in some specific domain (each
domain a subset: for example: coronavirus outbreak, healthcare, daily news,
weather) from different sources in Internet
 Organize and encode the data according the format proposed in existing
dataset
 Communicate with an real user translator to shorten the phrases (working
with Group 1)
 Expected results:
 4 sets of common phrases in four domains coronavirus outbreak, healthcare,
daily news, weather
 These sets are organized and encoded according to a pre-defined format
 Reading research paper in English, summary, and report
 Some knowledge about natural language processing
13
T4: Shorten phrase (text2gloss)
 Main tasks:
 Study existing methods for automatically shorten textual phrases
(text2gloss)
 Select one method (e.g opensouce with good performance) and implement it
(prefer programming in Python)
 Evaluate the obtained results with the ground truth (done by real user
translator).
 A module: input is a textual phrase, output is a shorten phrase
 Quantitative evaluation and report
 Good in implementation in Python
 Some knowledge about natural language processing
14
T5: video dataset collection
 Main tasks:
 Study existing dataset of Vietnamese Sign Languages
 Invite some people know VSL to perform collected subsets.
 Video
 Annotated video (video with corresponding text using ELAN tool)
 Know to setup camera, synchronize camera
 More and less programming, OpenCV
15
T6: video dataset collection
 Main tasks:
 Study existing dataset of Vietnamese Sign Languages
 Invite some people know VSL to perform collected subsets.
 Video
 Annotated video (video with corresponding text using ELAN tool)
 Know to setup camera, synchronize camera
 More and less programming, OpenCV
 Ref: SMILE Swiss German Sign Language Dataset
16
An example
17
T6: Biometric face recognition
 Main tasks:
 Study existing methods for biometric face recognition
 Select and implement a method
 Evaluate on a dataset
 A module of biometric face recognition
 Programming in Python, OpenCV,
 More and less in Image processing, CV, Deep learning
18
An example
19
T7: Human pose estimation
 Main tasks:
 Study existing methods for human pose estimation
 Using OpenPose library to estimate pose of human
 Evaluate the obtained results with the ground truth.
 A module: input is video of frames, output is video of poses
 Good in implementation in Python
 Some knowledge about Computer vision, deep learning
20
An example
21
T8: Create 3D model of human with
skeleton using blender
 Main tasks:
 Using blender, create 3D model of human (MakeHuman)
 Programming to move the human according a pre-defined (with the skeleton
we have determined in previous task
 A 3D model
 A module to move a human skeleton
 Good in Computer graphic, blender
22
T9: Create animation
23
T10: Gloss2Pose
 Main tasks:
 Study existing methods
 Select a method (opensource) and implement
 Test on a dataset
 Module in Python
 Good in Computer Vision
24
T10: Gloss2Pose
25
T11: Pose2Video
 Main tasks:
26
An example
27
T12: text2action
 Main tasks:
28
An example
29
Hope you enjoy and succeed !
30

Multimedia and Games: Project Proposal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multimedia and Games: Project Proposal

Uploaded by

Copyright:

Available Formats

Multimedia and Games

 Ref: SMILE Swiss German Sign Language Dataset

You might also like