You are on page 1of 30

Multimedia and Games

Project proposal

Thanh-Hai Tran

1
Outline
 Context and Motivation
 Existing approaches
 Proposed solutions
 Topics
 Schedule

2
Context and Motivation
 Context:
 In Vietnam, there are about 2,6m death people
 The number of persons giving translation service is very limited
(about 20)
 The death people have difficulty to access to information (medicinal,
heath, security, education, etc)
 Motivation: building an automatic system supporting
Vietnamese death people to understand information.
 Ideal system: speech2sign = speech2text + text2sign
 In this work: text2sign

3
An example

4
Existing approach using Computer
graphic

5
Existing approach

6
Existing approach

7
Existing approach using generative model

Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks,
January 2020, International Journal of Computer Vision 8
Generated results

Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks,
January 2020, International Journal of Computer Vision 9
Our proposed solution

10
T1: Overview of VSL
 Main tasks:
 Study research on Vietnamese sign language (VSL)
 Main characters of VSL
 Set of common vocabulary in VSL
 Resources of VSL (video/text)
 List and contact with people giving translation service for data
collection
 Expected result
 Report about the mentioned problems
 All resources are organized in a shared drive (e.g. google)
 Required skills:
 Reading research paper, summary, and report
 Good communication and organization

11
T2: Related works
 Main tasks:
 Study existing works on Sign language production with two main
approaches: computer graphic and computer vision based
 Method, Dataset
 Obtained results
 Code (if accessible)
 Share your knowledge with other groups
 Expected result
 Report about related works
 All resources /references/ links/ library/dataset) are organized in a
shared drive (e.g. google)
 Required skills:
 Reading research paper in English, summary, and report
 Good background in math, probability
 Some knowledge about Image processing, Computer vision 12
T3: Building a dictionary for VSL
 Main tasks:
 Study some existing datasets of American SL, Japanese SL, German SL
and their data collection method and organization
 Collect a set of phrase, words for VSL in some specific domain (each
domain a subset: for example: coronavirus outbreak, healthcare, daily news,
weather) from different sources in Internet
 Organize and encode the data according the format proposed in existing
dataset
 Communicate with an real user translator to shorten the phrases (working
with Group 1)
 Expected results:
 4 sets of common phrases in four domains coronavirus outbreak, healthcare,
daily news, weather
 These sets are organized and encoded according to a pre-defined format
 Required skills:
 Reading research paper in English, summary, and report
 Some knowledge about natural language processing
13
T4: Shorten phrase (text2gloss)
 Main tasks:
 Study existing methods for automatically shorten textual phrases
(text2gloss)
 Select one method (e.g opensouce with good performance) and implement it
(prefer programming in Python)
 Evaluate the obtained results with the ground truth (done by real user
translator).
 Expected results:
 A module: input is a textual phrase, output is a shorten phrase
 Quantitative evaluation and report
 Required skills:
 Good in implementation in Python
 Some knowledge about natural language processing

14
T5: video dataset collection
 Main tasks:
 Study existing dataset of Vietnamese Sign Languages
 Invite some people know VSL to perform collected subsets.
 Expected results:
 Video
 Annotated video (video with corresponding text using ELAN tool)
 Quantitative evaluation and report
 Required skills:
 Know to setup camera, synchronize camera
 More and less programming, OpenCV

15
T6: video dataset collection
 Main tasks:
 Study existing dataset of Vietnamese Sign Languages
 Invite some people know VSL to perform collected subsets.
 Expected results:
 Video
 Annotated video (video with corresponding text using ELAN tool)
 Quantitative evaluation and report
 Required skills:
 Know to setup camera, synchronize camera
 More and less programming, OpenCV

 Ref: SMILE Swiss German Sign Language Dataset

16
An example

17
T6: Biometric face recognition
 Main tasks:
 Study existing methods for biometric face recognition
 Select and implement a method
 Evaluate on a dataset
 Expected results:
 A module of biometric face recognition
 Quantitative evaluation and report
 Required skills:
 Programming in Python, OpenCV,
 More and less in Image processing, CV, Deep learning

18
An example

19
T7: Human pose estimation
 Main tasks:
 Study existing methods for human pose estimation
 Using OpenPose library to estimate pose of human
 Evaluate the obtained results with the ground truth.
 Expected results:
 A module: input is video of frames, output is video of poses
 Quantitative evaluation and report
 Required skills:
 Good in implementation in Python
 Some knowledge about Computer vision, deep learning

20
An example

21
T8: Create 3D model of human with
skeleton using blender
 Main tasks:
 Using blender, create 3D model of human (MakeHuman)
 Programming to move the human according a pre-defined (with the skeleton
we have determined in previous task
 Expected results:
 A 3D model
 A module to move a human skeleton
 Quantitative evaluation and report
 Required skills:
 Good in Computer graphic, blender

22
T9: Create animation

23
T10: Gloss2Pose
 Main tasks:
 Study existing methods
 Select a method (opensource) and implement
 Test on a dataset
 Expected results:
 Module in Python
 Quantitative evaluation and report
 Required skills:
 Good in Computer Vision

24
T10: Gloss2Pose

25
T11: Pose2Video
 Main tasks:
 Study existing methods
 Select a method (opensource) and implement
 Test on a dataset
 Expected results:
 Module in Python
 Quantitative evaluation and report
 Required skills:
 Good in Computer Vision

26
An example

27
T12: text2action
 Main tasks:
 Study existing methods
 Select a method (opensource) and implement
 Test on a dataset
 Expected results:
 Module in Python
 Quantitative evaluation and report
 Required skills:
 Good in Computer Vision

28
An example

29
Hope you enjoy and succeed !
30

You might also like