You are on page 1of 5

DEPARTMENT OF ARTIFICIAL INTELLIGENCE

AND
MACHINE LEARNING

AIP67: MINI PROJECT TERM: MARCH 2024 – JUNE 2024

Project Synopsis

Multi-Stage Deep Learning Framework for


Reconstructing Obscured Faces in CCTV Footage
with Multi-Angle Generation

Under the guidance of


Dr. Meeradevi A. K.

PROJECT TEAM MEMBERS

Sl. US Na
No N me

1. 1MS21AI056 Tanishka Deep

2. 1MS21AI015 Chitransh Srivastava

3. 1MS21AI011 Siddharth Bhetariya

4. 1MS21AI054 Sujal Prakash Singh

M.S. RAMAIAH INSTITUTE OF TECHNOLOGY


(Autonomous Institute, Affiliated to VTU)
Multi-Stage Deep Learning Framework for Reconstructing Obscured
Faces in CCTV Footage with Multi-Angle Generation
Problem Statement:
Facial recognition systems often struggle to identify and analyse faces when they are obscured due
to poor image quality, blurring or mask-like occlusions, or simply bad viewing angles. This is a
common issue faced by law enforcement agencies, security professionals, and anyone using facial
recognition for surveillance. This can miss opportunities to apprehend criminals, identify
suspicious activities, and make video surveillance systems less effective. In this project, we
propose a new deep learning framework that is specifically designed to reconstruct faces that are
obscured in CCTV footage. The goal is to create a clearer, more accurate representation that will
ultimately improve security and identification capabilities for facial recognition systems.

Motivation:
Facial recognition technology needs clear images for correct operation. Even minor blur or
distortion hampers recognition accuracy. Current systems struggle when dealing with suboptimal
CCTV footage, such as blurred, obscured, or poorly angled faces. This undermines their utility for
law enforcement agencies and security personnel relying on facial recognition for surveillance
purposes.

Situations like the Bengaluru coffee shop blast highlight the downsides of current facial
recognition technology. Investigators used AI systems to analyse CCTV footage of masked
suspects. But standard face recognition struggles with blurry, blocked, or angled faces on low-
quality CCTV cameras. Our project develops a deep learning system to reconstruct faces from the-
se challenging images. It aims to boost accuracy - providing clearer facial images, even if parts are
obscured or captured at poor angles. The goal is to provide improved security and recognition
capabilities through enhanced clarity. The purpose of this system is to improve facial recognition
accuracy and provide clearer vision, with the ultimate goal being to improve security and
recognition capabilities.

Objectives & Scopes of the Proposed Project:


● Researching techniques which handle many types of obscuring: low image quality,
blurring, mask-like blockages, and poor viewing angles. This must all be carefully
investigated and implemented.
● Acquiring and preprocess diverse CCTV footage datasets, including data augmentation and
normalisation techniques.
● To prepare data, a varied high-resolution facial image dataset is curated with diverse poses,
ethnicities, and gender represented. Data augmentation like cropping, flipping, artificial
occlusions is also implemented, boosting the model's ability to generalise.
● Developing a deep learning framework involving architectural design, appropriate model
selection (cascaded diffusion models, conditional GANs, generative flow-based models)
per stage, plus implementing the training process are all required here.

RAMAIAH INSTITUTE OF TECHNOLOGY


(Autonomous Institute, Affiliated to VTU)
● Generating clearer representations is the next progression. The reconstructed faces should
be more detailed, clearer than the original obscured footage, enabling further analysis and
possible identification.
● Explore multi-angle generation techniques to reconstruct facial features from different
perspectives, compensating for suboptimal camera angles using a generative flow based
model.
● Assessing the performance of the framework through metrics like landmark localization
accuracy and user studies evaluating the realism and quality of the reconstructed faces.

Proposed Methodology:
Data Preparation:

● We gather a diverse dataset of high-resolution facial images and apply variations (cropping,
colour adjustments) to improve model generalizability.
● Techniques like blurring and masking are used to simulate occlusions commonly found in
CCTV footage on the training images.

Stage 1: Denoising and Enhancement

● A cascaded diffusion model effectively removes noise often present in low-quality CCTV
videos.
● This stage significantly improves the overall image quality, revealing crucial facial details
obscured by noise.

Stage 2: Facial Region and Feature Extraction

● Pre-trained facial detection models pinpoint the face location within the denoised image.
● Subsequently, techniques like those used in Dlib extract key facial features (eyes, nose,
mouth) for further processing.

Stage 3: Face Reconstruction

● A conditional GAN, specifically an AttnGAN, focuses on the facial region based on the
pre-processed image and extracted landmarks.
● This stage effectively reconstructs a realistic and detailed face that aligns with the
underlying details in the denoised image.

Stage 4: Multi-Angle Generation

● A generative flow-based model like StyleFlow allows manipulation of style vectors to


generate the reconstructed face from various viewpoints.
● This offers a more comprehensive view for identification purposes.

Hardware & Software to be used:


● Hardware: High-performance computing systems equipped with Modular SMPS
GIGABYTE NVIDIA RTX 30/40 Series 8 GB GDDR6 GPU to accelerate deep learning
computations.
RAMAIAH INSTITUTE OF TECHNOLOGY
(Autonomous Institute, Affiliated to VTU)
● Software: Python for creating models,TensorFlow or PyTorch deep learning frameworks,
and relevant libraries such as NumPy,OpenCV (for facial landmark detection
functionalities similar to Dlib/Mediapipe).

Outcome of the Proposed Project:

Improved facial reconstructions that make deeper probing possible. Potentially able to view from
multiple angles. In the end, objectives include enhancing identification and security procedures for
various applications. Enhancements in crime prevention, criminal apprehending, and suspicious
activity identification can result from more precise facial recognition. These anticipated results
show how the project has the potential to advance facial recognition technology and its
applications in the surveillance and security fields.

What contribution to society would the project make?


● It could have a major positive impact on public safety by increasing the accuracy of facial
recognition systems in identifying people from CCTV footage.
● Criminal Apprehension: Higher rates of criminal apprehension, enhanced capacity to spot
suspicious activity immediately. more successful criminal investigations and prosecutions.
● Enhanced Security Measures: Access control systems at sensitive locations are just one
example of how more dependable facial recognition can improve security measures,
protection of borders and identifying people of interest, loss avoidance in retail settings.
● Missing Persons Investigations: When the first CCTV footage provides little information,
reconstructing faces that are obscured could be essential to finding the missing.
● Medical Applications (Potential): As this technology develops, it may be used in medical
settings to reconstruct faces injured in accidents or surgeries.

References
[1] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... &
Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing
systems (pp. 2672-2680).

[2] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint
arXiv:1312.6114.

[3] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep
convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

[4] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative
adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (pp. 4401-4410).

[5] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in
Neural Information Processing Systems, 33, 6840-6851.

[6] Dollár, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (pp. 1078-1085).

RAMAIAH INSTITUTE OF TECHNOLOGY


(Autonomous Institute, Affiliated to VTU)
[7] Parkhi, O. M., Vedaldi, A., Zisserman, A., & Jawahar, C. (2015). Deep face recognition. In the
British Machine Vision Conference.

[8 ]AbdAlmageed, W., Wu, Y., Rawls, S., Harel, S., Hassner, T., Masi, I., ... & Medioni, G.
(2016). Face recognition using deep multi-pose representations. In 2016 IEEE Winter Conference
on Applications of Computer Vision (WACV) (pp. 1-9). IEEE.

[9] Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). DeepFace: Closing the gap to human-
level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (pp. 1701-1708).

[10] Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using
multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-1503.

Plagiarism Report:

Guide’s Comments:

Signature of the Guide with date

RAMAIAH INSTITUTE OF TECHNOLOGY


(Autonomous Institute, Affiliated to VTU)

You might also like