ESA - UE20CS461A - Project Phase - 2 Template

UE20CS461A – Project Phase – 2
END SEMESTER ASSESSMENT

Outline
 Abstract
 Summary of Requirements and Design ( Phase - 1)
 Summary of Methodology / Approach (Phase - 1)
 Design Description
 Modules and Implementation Details
 Project Demonstration and Walkthrough
 Test Plan and Strategy
 Results and Discussion
 Lessons Learnt
 Conclusion and Future Work
 References
Abstract
• Generating 2D faces from textual descriptions is a

challenging task that requires a combination of natural
language processing and computer graphics techniques.
• In this project, we propose a novel approach using
Generative Adversarial Networks (GANs) to generate
The problem 2D facial models from textual descriptions.
addressed in the • We explore the effectiveness of different GAN
project. architectures and evaluate our approach using
quantitative and qualitative measures.
• Our results demonstrate that our approach can generate
realistic and diverse 2D facial models from textual
descriptions.
Abstract
• The scope of criminal face generation using AI involves

creating synthetic images or models of faces for suspects
in crimes or for use in law enforcement investigations.
• This technology utilizes machine learning algorithms
Provide a basic and data from real faces to generate new, unique faces.
introduction of the • These generated faces can then be used to create
composites or identikits, which can aid in the
project and also an identification of criminal suspects.
overview of scope it • The goal is to assist the police in their investigations and
entails. increase the accuracy and efficiency of criminal
identification processes.
• However, the use of AI in criminal face generation also
raises concerns about privacy and ethical issues
Abstract
• The objective of this project is to develop an advanced face

sketching software tailored for criminal departments, aiming to
overcome the inefficiencies and inaccuracies inherent in the
current manual process. Leveraging state-of-the-art
technologies such as Generative Adversarial Networks (GANs)
and diffusion models, the software aims to generate accurate
Include objective and 2D facial images from textual descriptions.
adopted approach. • The approach involves augmenting existing facial datasets with
labelled Indian faces to address regional inclusivity concerns. A
user-friendly interface, coupled with a face tuning feature, will
enhance the usability and accuracy of the software for law
enforcement agencies. The adoption of transcribers will enable
the conversion of audio descriptions to text for seamless
integration..
Abstract
• User Interface
• Audio to Text
• Image Generation
Set the context • Face Tuning
• Different Poses
• Custom Dataset
Summary of Requirements and Design
• Summary of Requirements (A bulleted list of the major requirements.
• Functional Requirements:
Process audio for face generation.
Generate 2D faces with variations.
• Non Functional Requirements:

Performance
Safety
Security and Privacy
Usability
• Software Requirements:
Operating System
Programming language and libraries/Frameworks
• Hardware Requirements:
Processor
GUI
• A background review of the state of the art in the

relevant field showing strength and weakness.
• The current state of AI-based face sketching software for criminal

departments benefits from advancements in technologies like GANs and
machine learning, ensuring realistic and accurate facial image generation.
Efforts to include diverse cultural datasets, such as Indian faces, showcase a
commitment to representation. User-friendly interfaces and features like
face tuning enhance usability. However, challenges include potential biases
in training datasets, ethical and legal concerns, limited style control features,
and acknowledged security risks. Addressing these weaknesses is crucial for
ensuring fairness, ethical use, and robust security in the deployment of
these tools for criminal investigations.
• Design Details (Research Projects)
• The AI-powered criminal face generation project employs a novel

approach centered on Generative Adversarial Networks (GANs) to
address dataset limitations and enhance inclusivity.
• The architecture involves a GAN model for diverse and realistic image
generation, utilizing meticulously collected Indian face data.
• The user interface facilitates input through audio file.
• Performance requirements emphasize accuracy, real-time generation,
and scalability, with security measures like data encryption.
• Dependencies on data quality and assumptions regarding GAN
effectiveness are acknowledged. The project aims to contribute a
cutting-edge tool for law enforcement by navigating the intersections
of technology, ethics, and societal impact.
Summary of Methodology / Approach
Research Project
• Proposed Methodology
a) Model Architecture
Summary of Methodology / Approach
b) Details of the approach- benefits/drawbacks

• MP3 to Text Conversion:
Objective: Convert MP3 files containing textual descriptions of criminals into machine-readable text.
• Text Description to GAN Module:

Objective: Feed the textual descriptions into a GAN for criminal face image generation.
• Image Generation:
Objective: Generate facial images based on the input textual descriptions, focusing on realism and diversity.
• Attribute Extraction using MXNet:

Objective: Extract specific attributes (e.g., age, gender, facial expressions) from the generated images.
• Enhancements for Realism and Variation:

• Denoising Diffusion Model:
Objective: Improve image quality by applying a denoising diffusion model.
• Image Angle Variation:

Objective: Generate images from different angles for a more comprehensive dataset.
• AttnGAN Editing for Final Image:

Objective: Refine and edit generated images for better realism and alignment with specific attributes.
Design Description
Research Project
• GAN and its logical workflow.

Design Description
• the logical workflow.
• The generator produces a new data point, such as an image, from an input random
noise vector.
• The generator is to provide data points that are comparable to the actual data
points.
• The discriminator attempts to differentiate between the genuine data points and the
created data points from the generator.
• The discriminator's objective is to correctly identify the real data points and tell
them apart from the phoney ones produced by the generator.
• As the generator attempts to trick the discriminator by producing better and more
realistic data points, the discriminator learns to better distinguish between the real
and generated data points. This adversarial training process for the GAN takes place.
Design Description
• The discriminator is updated in response to feedback from the generator, and both
the real and created data points are used to update the discriminator.
• The discriminator becomes better at telling the difference between actual and
produced data points, and the generator gets better at producing convincing data
points that deceive the discriminator.
• The generator should ultimately produce data points that are identical to the real
data points in order to produce a high-quality generative model.
Modules and Implementation Details
 Enlist all the modules/ features of the application.

 Data Collection Module
 Image Download Module
 Data Preprocessing Module
 Data Receiver Module
 Machine Learning Module
 User Interface Module
 Module-wise implementation details that include
 Module name, Technology used, code explanation.

1. Data Collection Module:
 Module Name: Web Scraping and Image Retrieval
 Technology Used: Octoparse for web scraping, Tab Save Chrome extension
for efficient image downloading.
2. Image Download Module:

 Module Name: Image Retrieval with Tab Save Chrome Extension
 Technology Used: Leveraging the Tab Save Chrome extension for
downloading images to the local system using the scraped links obtained
from Octoparse.
3. Data Preprocessing Module:
 Module Name: Attribute Annotation and Standardization
 Technology Used: Used a mxnet model trained on celeba to classify
attributes of custom dataset.
4. Data Receiver Module:

 Module Name: Speech to Text Converter
 Technology Used: Implementation of OpenAI's Whisper speech recognition
model to transcribe input audio file to text.
6. Machine Learning Module:

 Module Name: Face Generation Algorithm
 Technology Used: Implementing machine learning algorithms, possibly
utilizing deep learning techniques such as Generative Adversarial Networks
(GANs) for realistic face synthesis.
 Module Name: Face Poses Algorithm
 Technology Used: Implemented a diffusion model that reconstructs face to
generate it from different angles or points of view.
 Module Name: Face Editing Algorithm
 Technology Used: Implementing a machine learning algorithm to make minimal
changes to few facial features using GANs.
7. User Interface Module:

 Module Name: User-Friendly Interface
 Technology Used: Designing an intuitive interface for users, potentially using
web-based frameworks such as gradio for accessibility.
 Interpretation with Algorithms & Pseudocode used.

(applicable for Research projects)
• Custom Dataset
 Speech to text Converter

 DF – GAN Training Parameters

 DF – GAN Generator Architecture

• DF – GAN Discriminator Architecture

• DF –GAN Training Script

• Editing Script
• Editing Script
 Generating
Poses script
• MXNET Attribute
Classifier
Project Demonstration
 Exhibit the working

demonstration of complete project.
Project Demonstration
 Data set creation

Walkthrough
Demo and Product walk thru

Test Plan and Strategy
Provide,
• Testing activities that are carried out along with
timeline.
• What are the test methods followed? and Why?
• (Example) – Functional Testing (Unit, Integration,
Note:
…) • Appropriate modifications can
• Non – Functional Testing (Performance, Security,
be done for Research Projects
…)
• What is the test environment? (Explain the role • Add as many slides as required
of each member in the team)
• Benefits of this approach & are there any
drawbacks?
• Test tools used? Automated test tools? Open-
source tools?
Results and Discussion
Results and discussions on the experimentation conducted after testing.
Are the results same as expected? Is it as per initial estimates planned? If there
is a deviation, give the reasons for the change.
Results obtained in comparison other with other technology/methodology

including graphs/charts (if applicable).
Product based projects can explain how your product meets the requirements.
Clearly tie each test to the requirement (forward and backward traceability).
Schedule
Discussion of
how well the
schedule was
met.
You can add planned efforts (as per initial estimates) and actual
a table, efforts.
If there is a deviation, what is the reason for the

change.
Documentation
Show the evidences, status of the below documents:

 Project report submitted in department?
 IEEE (similar) Format of Paper current status? Which Conferences are
you targeting?
 Video (2-3 minutes) of your project? Please Play.
 Add the Github repository link.
 All artifacts of your project uploaded in the CSE Project repository?
Lessons Learnt
• Discuss on the lessons learned and what you could have done
differently knowing what you now know
• Give an overview of issues that has been overcome in this project.

Conclusion and Future work
Successfully created a custom dataset that has a higher diversity of

Indian images.
Successful implementation of a GAN that takes an audio file as an
input, generates a realistic image based on the textual description and
also provides slight modifications and poses to the generated image.
Further improvements of this project include increasing the size of the

dataset and possibility of adding annotation files for the images.
Improving the accuracy of the image generated by the GAN and a more
streamlined project.
References
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei
Huang, Xiaodong He “AttnGAN: Fine-Grained Text to Image
Generation with Attentional Generative Adversarial Networks” Published
in 2017
Ming Tao, Hao Tang, Fei Wu, Xiaoyuan Jing, Bing-Kun Bao, Changsheng
Xu “DF-GAN: A Simple and Effective Baseline for Text-to-Image
Synthesis” Published in 2022
Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang “DM-GAN: Dynamic

Memory Generative Adversarial Networks for Text-to-Image Synthesis”
Published in 2020
References
Jianxin Sun1,2*, Qiyao Deng1,2*, Qi Li1,2 †, Muyi Sun1, Min Ren1,2,

Zhenan Sun1,2 “AnyFace : Free-style Text-to-Face Synthesis and
Manipulation” Published in 2020
Harsh Jaykumar Jalan, Gautam Maurya, Canute Corda, Suspect Face

Generation Published in July 2020
Osaid Rehman Nasir+∗, Shailesh Kumar Jha+∗, Manraj Singh

Grover∗, Yi Yu†, Ajit Kumar‡ and Rajiv Ratn Shah “Text2FaceGAN:
Face Generation from Fine Grained Textual Descriptions” Published in
2019
References
Tingting Qiao, Jing Zhang*, Duanqing Xu*, Dacheng Tao, College of Computer
Science and Technology, Zhejiang University, China, School of Automation,
Hangzhou Dianzi University, China, UBTECH Sydney AI Centre, School of
Computer Science, FEIT, The University of Sydney, Australia: “MirrorGAN:
Learning Text-to-image Generation by Redescription”Published in 2021
Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr, University of
Oxford: “Controllable Text-to-Image Generation” Published in 2019
MUHAMMAD ZEESHAN KHAN SAIRA JABEENMUHAMMAD USMAN

GHANI KHANTANZILA SABA (Senior Member, IEEE), ASIM
REHMATAMJAD REHMAN , (Senior Member, IEEE),AND USMAN TARIQ
“A Realistic Image Generation of Face From Text Description Using the Fully
Trained Generative Adversarial Network” Published on August 10, 2020
References
Jupiter Tamrakar , Bal Krishna Nyaupane “Synthesizing Human Face Image

from Textual Description of Facial Attributes Using Attentional Generative
Adversarial Network”Conference Paper, October 2021
YUMING JIANG, S-Lab, Nanyang Technological University, SingaporeSHUAI

YANG, S-Lab, Nanyang Technological University, SingaporeHAONAN QIU, S-
Lab, Nanyang Technological University, Singapore: “Text2Human: Text-Driven
Controllable Human Image Generation” Published on July 2022
Any other information
Note: Changes can be

Provide any other made in the template,
information you wish to with the consent of the
add on. guide for inclusion of
any other information.
Thank You

ESA - UE20CS461A - Project Phase - 2 Template

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ESA - UE20CS461A - Project Phase - 2 Template

Uploaded by

Copyright:

Available Formats

UE20CS461A – Project Phase – 2

END SEMESTER ASSESSMENT

• Generating 2D faces from textual descriptions is a

• The scope of criminal face generation using AI involves

• The objective of this project is to develop an advanced face

• Non Functional Requirements:

• A background review of the state of the art in the

• The current state of AI-based face sketching software for criminal

• Design Details (Research Projects)

• The AI-powered criminal face generation project employs a novel

b) Details of the approach- benefits/drawbacks

• Text Description to GAN Module:

• Attribute Extraction using MXNet:

• Enhancements for Realism and Variation:

• Image Angle Variation:

• AttnGAN Editing for Final Image:

• GAN and its logical workflow.

• the logical workflow.

 Enlist all the modules/ features of the application.

 Module-wise implementation details that include

 Module name, Technology used, code explanation.

2. Image Download Module:

4. Data Receiver Module:

6. Machine Learning Module:

7. User Interface Module:

 Interpretation with Algorithms & Pseudocode used.

 Speech to text Converter

 DF – GAN Training Parameters

 DF – GAN Generator Architecture

• DF – GAN Discriminator Architecture

• DF –GAN Training Script

 Exhibit the working

 Data set creation

Demo and Product walk thru

Results and discussions on the experimentation conducted after testing.

Results obtained in comparison other with other technology/methodology

If there is a deviation, what is the reason for the

Show the evidences, status of the below documents:

• Give an overview of issues that has been overcome in this project.

Successfully created a custom dataset that has a higher diversity of

Further improvements of this project include increasing the size of the

Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang “DM-GAN: Dynamic

Jianxin Sun1,2*, Qiyao Deng1,2*, Qi Li1,2 †, Muyi Sun1, Min Ren1,2,

Harsh Jaykumar Jalan, Gautam Maurya, Canute Corda, Suspect Face

Osaid Rehman Nasir+∗, Shailesh Kumar Jha+∗, Manraj Singh

MUHAMMAD ZEESHAN KHAN SAIRA JABEENMUHAMMAD USMAN

Jupiter Tamrakar , Bal Krishna Nyaupane “Synthesizing Human Face Image

YUMING JIANG, S-Lab, Nanyang Technological University, SingaporeSHUAI

Note: Changes can be

You might also like

Jianxin Sun1,2, Qiyao Deng1,2, Qi Li1,2 †, Muyi Sun1, Min Ren1,2,