Professional Documents
Culture Documents
Tasks
Vaibhav Sharma
Enrollment No: T23156
March 4, 2024
Figure: f (X ) = f 3 f 2 f 1 (X )
Figure: Y = f(X;W)
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 5
Unified model for multiple tasks
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 6
Paper 1
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 7
About the paper
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 8
Paper 2
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 9
Transformer
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 10
The Transformer architecture
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 12
Vision Transformer
▶ uses the transformer encoder and fed images as patches of 16
X 16 in a sequence order as words in NLP.
▶ work on the concept of pre-training on a large dataset and
then fine-tune for a specific task.
▶ outperforms convolution-based networks like ResNet if
pre-trained on a large dataset
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 14
UniHCP Architecture
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 15
Paper 4
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 16
Paper 5
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 17
Recent Advances in Unified Models
▶ Two notable papers released in 2023 adopt a similar approach
to UniHCP in the field of unified models.
▶ The first paper addresses six tasks and introduces the
Projector Assisted Hierarchical Pretraining Method (PATH),
utilizing a hierarchical architecture in contrast to UniHCP’s
Plain transformer and decoder approach.
▶ The second paper released in December 2023, introduces
HQNet, a flexible model that learns a single shared ”human
query.” It consists of four key components: Backbone,
Transformer encoder, Transformer decoder, and Task-specific
heads.
▶ Additionally, this paper contributes a specific dataset for
Human perception tasks named COCO-UniHuman created by
adding annotations to the COCO dataset.
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 18
Conclusion and Future work
The studies I have done not only demonstrate the possibility of a
unified model for multiple related tasks but also showcase its
better performance compared to existing models, both with and
without fine-tuning for specific tasks.
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 19
Thank you
t23156@students.iitmandi.ac.in
Vaibhav Sharma, T23156 Unified Model for Multiple Human Centric Vision Tasks 20