You are on page 1of 8

Virtual Fitting and Styling using AI

Mrs. Kiran Kumari1,Neelanjan De 2,Bharat Dhedhia 3, Tarash Budhran4,Parth Jain 4


1 Assistant Prof., IT Dept., KJSCE, SVU, Mumbai, India
2,3,4Student, IT Dept., KJSCE, SVU, Mumbai, India

Abstract
Technology breakthroughs are driving a significant transition in the shopping sector. This paper, address the simulation
of how clothing appears on individuals through 3D models, particularly emphasizing recent advances in image-based 3D
human shape estimation driven by deep neural networks. Due to memory limitations in previous approaches they end to
take low resolution images as input to cover large spatial context, and produce less precise 3D estimates as a result. To
overcome this limitation, we used a multi-level architecture PIFu . The PIFu has two levels. The coarse level processes
the entire image at a lower resolution, emphasizing holistic reasoning for context, while the fine level refines the estimates
using higher-resolution images to capture intricate details. The model is conditioned on both 3D representation and
clothing, enabling the creation of 3D models from images and facilitating the simulation of diverse outfits on different
human images. This paper aims to revolutionize the online clothing shopping experience. By offering a solution that
allows virtual try-ons, users can visualize how clothes look and fit before buying, potentially reducing return rates and
enhancing the overall shopping journey.

Keywords: Deep neural network, 3Dmodels, Virtual fitting

1. Introduction
With the explosion of e-retail stores and online payment systems, more and more people have come to embrace online
shopping. In 2021 alone, the number of online shoppers has risen to a whopping 2.14 billion, or approximately 28%
of the global population.

The primary purpose of this paper is to develop an AI system capable of generating a person's 3D model based on a
user-provided image and enable the virtual trial of various outfits on this 3D model representation. This 3D Virtual
Fitting and Styling model aims to offer a comprehensive visualization of how a product would appear and fit the
individual, providing a 360-degree view. Users can select different clothing items and visualize a combined styled
look. The project goals include designing a fully customizable 3D human body model with the customer's image,
ensuring customer satisfaction with clothing choices.

The model used in the application is PIFU (Pixel-Aligned Implicit Function) . PIFU is a method used for 3D human
body reconstruction from a single image. Single-level PIFU and multi-level PIFU refer to different variations or
improvements of the PIFU algorithm.

The application will help in increasing clothing sales and improving the shopping experience by satisfying every
customer with the best clothes and reliable services. The AI system computes a person’s 3D model from an image
provided by the user, and allows them to try different clothes on the 3D model representation in a 360-degree view.

The development of the Virtual Fitting and Styling system will revolutionize the fashion industry and bring a new
level of convenience to the online shopping experience. This 3D model will be dressed with different clothes to
provide a 360-degree view of how the product will look and fit on the individual. To enhance customization, users
have the option to modify attributes such as hairstyle, hair color, and skin color. This step adds a layer of
personalization to the virtual representation, allowing users to experiment with diverse looks and styles. These
sequential steps guide users through an interactive journey, from logging in to refining their virtual avatar with
personalized attributes, offering a unique and engaging online shopping experience.
2. Literature Survey

Paper Name Authors Published in Pub. Algorithm Used and Efficiency of Algoirthm used
Year Explanation
Unrestricted facial M. Sela, E. Proceedings of 2017 Image-to-Image The network generalizes well, but is
geometry Richardson, the IEEE translation network for still limited by the synthetic data.
reconstruction and R. International facial geometry The absolute error, given in
using image-to- Kimmel Conference on reconstruction. percents of the ground truth depth,
image translation Computer is upto 30%.
Vision
"Siclope: R. Natsume et IEEE 2019 Silhouette-based For the deep visual hull algorithm,
Silhouette-based al. Conference on representation model - 3D the approach taken as per has view
clothed people," Computer models of clothed human sampling strategy is better than 69%
in , 2019, pp. Vision and bodies from single-view random selected views, while for
4480–4490. [2] Pattern images. the naive visual hull method, the
Recognition approach given in paper always
outperforms random view selection.
Tex2shape: T. Alldieck, IEEE 2019 Tex2Shape is an image- The experiments in paper
Detailed full G. Pons-Moll, International to-image translation demonstrate
human body C. Theobalt, Conference on approach that leverages that Tex2Shape generalizes robustly
geometry from a and M. Computer UV-mapping to real-world footage .The absolute
single image Magnor Vision (ICCV) error, given in percents of the
ground truth depth, is upto 37%.
Group Y. Wu and K. European 2022 Group Normalization The 64-frame variant of GN has
normalization He Conference on (GN) is presented as a 74.5 / 91.7 accuracy, showing
Computer simple and effective healthy gains over its BN
Vision specifically addressing the counterpart and all BN variants.
challenges posed by
smaller batch sizes.
3D Human Body S. Jackson, C. European 2020 The paper introduced a The method reconstructs with an
Reconstruction Manafas, and Conference on novel approach to 3D IoU of 78%. This is likely due to
from a Single G. Computer human body the better spatial alignment between
Image via Tzimiropoulos Vision reconstruction using the training images and target.
Volumetric volumetric regression
Regression networks.
Moulding V. Gabeur et International 2019 The paper introduces a Quantative benefits or
Humans: Non- al. Conference on novel non-parametric improvements are not available in
parametric 3D Computer approach for 3D human the research paper.
Human Shape Vision shape estimation using a
Estimation from double depth map
Single Images representation. This
method aims to address
limitations in detail and
resolution encountered by
existing approaches,
incorporating adversarial
training to enhance the
realism of the 3D
reconstructions.
Paper Name Authors Published in Pub. Algorithm Used and Efficiency of Algoirthm used
Year Explanation
Image-Based B. Fele, A. IEEE Xplore 2017 The paper presents SPD The absolute error, given in
Virtual Try-On Lampe, P. as a multi-task percents of the ground truth depth,
Network Peer, and V. segmentation . The is upto 40%.
Struc proposed approach is
evaluated on standard
datasets, and ablation
studies provide further
insights into the
effectiveness of the multi-
task learning strategy.
Stacked hourglass Newell, K. European 2016 The paper presents The approach ConvNets in the
networks for Yang, and J. Conference on architecture for human paper outperforms existing methods
human pose Deng Computer pose estimation, on standard benchmarks.
estimation Vision leveraging the power of Evaluation is done using the
ConvNets and introducing standard Percentage of Correct
a more symmetric Keypoints (PCK) metric which
topology. reports the percentage of detections
that fall within a normalized
distance of the ground truth. The
resuts shared in paper shows
reaching 99% PCK@0.2 accuracy
on the elbow, and 97% on the wrist.
Multi-garment L. Bhatnagar IEEE 2019 The paper introduces the The model was validated with
net: Learning to et al. International Multi-Garment Network mean vertex-to-surface error
dress 3d people Conference on (MGN), a novel model of 5.78mm.
from images Computer that addresses the
Vision (ICCV) challenge of inferring
human body and layered
garments separately from
images.
SMPL: A skinned M. Loper et IEEE 2015 The SMPL model is The absolute error, given in
multi person al. International learned from a large percents of the ground truth depth,
linear model, Conference on dataset and is compatible is upto 55%
Computer with existing graphics
Vision (ICCV) pipelines. It outperforms
previous models in terms
of accuracy, and the
extension to soft-tissue
deformations enhances its
realism.
3. Methodology

The proposed application is built on the Pixel-aligned Implicit Function (PIFu) , a sophisticated model designed for
predicting the 2D images within specific three-dimensional spatial contexts. The PIFu algorithm is subdivided into
two levels. The single extraction layer is responsible for capturing relevant information from the input 2D image. This
layer plays a crucial role in encoding the contextual details of the image, such as the pose, body contours, and overall
appearance. By extracting features at different scales, the network gains the ability to understand both global and local
characteristics in the input image, facilitating a more comprehensive understanding of the 3D structure.\

The multi reconstruction layer follows the feature extraction layer and focuses on generating the 3D representation
from the extracted features. .PIFu employs pixel-aligned sampling, meaning that for each pixel in the 2D image, the
corresponding point on the 3D surface is estimated. The surface reconstruction layer refines the initial predictions,
aligning them more accurately with the observed 2D image. The combination of these two layers allows PIFu to
transform 2D images into detailed and realistic 3D human models, making it a valuable tool in computer vision and
graphics applications. This dual-layered approach allows for a comprehensive understanding of the spatial occupancy
of objects, contributing to the model's accuracy and robustness.

The pixel-aligned nature of the representation allows us to seamlessly fuse the learned holistic embedding from coarse
reasoning with image features learned from the high-resolution input in a principled manner. Each level incrementally
incorporates additional information missing in the coarse levels, with the final determination of geometry made only
in the highest level.

The implementation further involves a meticulous process . Subsequently, the clothing mesh undergoes deformation
to adapt its shape to that of the body mesh. Following this, the body pose is optimized to enhance its conformity with
the deformed clothing, refining the overall alignment. The methodology culminates in the further refinement of the
clothing fit, achieved through additional optimization steps or smoothing techniques. This intricate series of steps
ensures not only an accurate representation of clothing on the 3D body model but also a dynamic interplay between
the PIFu framework's predictive capabilities and the specific procedures for aligning, deforming, and optimizing the
clothing fit.

Fig. 1. Diagram explaining two levels of PIFu


4.Implementation

Fig 2: Implementation for the proposed application

Implementation for Pixel Aligned Implicit Functions (PIFu) involves several key steps in the process of reconstructing
3D objects from 2D images. Below are the steps that are included in the implementation of application :

• Human Image Dataset : We collected and pre-processed a collection of paired 2D and utilized a dataset containing
images of human subjects from various poses.

• Model Architecture Selection: Next step we choose or design an appropriate architecture for converting 2D
images to 3D models .We then selected a PIFu variant and built the architecture based on the specific
requirements of the application.

• Training Process: We then trained the PIFu model on the prepared dataset to learn the mapping from 2D images
to 3D representations. Use a combination of supervised learning techniques, optimization algorithms to train the
network.

• 3D Model Generation: We executed the trained PIFu model to generate 3D models from input 2D images.
Following we passed a new 2D image through the trained PIFu network to obtain a dense and detailed 3D
reconstruction of the depicted object or subject.

• Post-Processing: Apply post-processing techniques to refine the generated 3D models. Enhanced details for a
visually appealing and accurate 3D representation.

• Evaluation Metrics: After then we evaluated the performance of the PIFu model using appropriate metrics.
Quantitative metrics may include accuracy, completeness.

These steps illustrate the sequential and iterative nature of implementing Pixel Aligned Implicit Functions,
emphasizing the training, inference, and evaluation processes. The specific details may vary based on the chosen PIFu
variant and the application requirements.

The proposed application involves the following functions for users:

• 1.Login into Shopping Website:


Users can initiate the application by logging into the shopping website, establishing a personalized connection
to their account.

• 2 Uploading Image for 3D Model:


After logging in, users proceed to upload an image that will serve as the basis for generating their 3D model.
This image is a fundamental input for crafting a virtual representation.

• 3 3D Model Generation:
Leveraging advanced technologies, the application transforms the uploaded image into a lifelike 3D model,
capturing intricate details to ensure a realistic representation of the user.
• 4.Trying Different Clothing:
With the 3D model in place, users can explore various clothing options virtually. This step enables them to try
on different outfits, providing an immersive and personalized shopping experience.

• 5.Changing Attributes (Hair Style, Hair Color, Skin Color):


To enhance customization, users have the option to modify attributes such as hairstyle, hair color, and skin
color. This step adds a layer of personalization to the virtual representation, allowing users to experiment with
diverse looks and styles. These sequential steps guide users through an interactive journey, from logging in to
refining their virtual avatar with personalized attributes, offering a unique and engaging online shopping
experience.

5.Results and discussion

Fig 3: Shopping Website

Fig 4: Uploading Image for 3D model


Figure 5: Changing Attribute

6.Conclusion:

The proposed application Virtual Fitting and Styling system has the potential to revolutionize the fashion
industry, bringing about a new era of convenience and innovation in online shopping. By offering a tailored
and immersive experience that goes beyond traditional static images, the system bridges the gap between the
virtual and physical realms, providing users with a more accurate representation of how clothes would look on
them.

PIFU, being an implicit function, provided a reconstructed 3D surface directly from 2D images without the
need for intermediate representations. This lead to faster inference times compared to methods that rely on
volumetric or multi-stage approaches. PIFU captured fine details and realism in the reconstructed geometry,
which is crucial for applications like facial and body reconstruction .The absolute error, given in percents of
the ground truth depth, is 10% lowest among other algorithms .

As technology continues to evolve, the integration of 3D modeling and customization based on user images
sets a new standard for online retail experiences. The convergence of fashion and cutting-edge technology
showcased in the research paper exemplifies the transformative power of innovation in shaping the future of
consumer interactions with online platforms.
7.REFERENCES
[1] M. Sela, E. Richardson, and R. Kimmel, "Unrestricted facial geometry reconstruction using
image-to-imagetranslation," in Proceedings of the IEEE International Conference on Computer Vision,
2017, pp. 1576–1585. [1]

[2] R. Natsume et al., "Siclope: Silhouette-based clothed people," in IEEE Conference on Computer
Vision and Pattern Recognition, 2019, pp. 4480–4490. [2]

[3] T. Alldieck, G. Pons-Moll, C. Theobalt, and M. Magnor, "Tex2shape: Detailed full human body
geometry from a single image," in IEEE International Conference on Computer Vision (ICCV), Oct.
2019. [3]

[4] Y. Wu and K. He, "Group normalization," in European Conference on Computer Vision, 2018, pp.
3–19. [4]

[5] S. Jackson, C. Manafas, and G. Tzimiropoulos, "3D Human Body Reconstruction from a Single
Image via Volumetric Regression," in ECCV Workshop Proceedings, PeopleCap , 2018, pp. 0–0. [5]

[6] V. Gabeur et al., "Moulding Humans: Non-parametric 3D Human Shape Estimation from Single
Images," in ICCV 2019 - International Conference on Computer Vision, Oct. 2019, pp. 1–10. [6]

[7] B. Fele, A. Lampe, P. Peer, and V. Struc, "Image-Based Virtual Try-On Network," IEEE Xplore,
17, SI-1000 Ljubljana, Slovenia. [7]

[8] Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," in
European Conference on Computer Vision, 2016, pp. 483–499. [8]

[9] L. Bhatnagar et al., "Multi-garment net: Learning to dress 3d people from images," in IEEE
International Conference on Computer Vision (ICCV), Oct. 2019. [9]

[10]M. Loper et al., "SMPL: A skinned multi person linear model," ACM Transactions on Graphics,
2015. [10]

[11]S. Lombardi et al., "Deep appearance models for face rendering," ACM Transactions on Graphics
(TOG), vol. 37, no. 4, p. 68, 2018. [11]

[12] Z. Zheng et al., "Deephuman: 3d human reconstruction from a single image," in The IEEE
International Conference on Computer Vision (ICCV), Oct. 2019. [12]

[13] M. Jaderberg et al., "Spatial transformer networks," in Advances in neural information processing
systems, 2015, pp. 2017–2025. [13]

[14] R. Zhang et al., "The unreasonable effectiveness of deep features as a perceptual metric," in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
[14]

[15] J. Jiang et al., "ClothFormer: Virtual Try-on," IEEE Xplore. [15]

[16] Raj et al., "SwapNet: Image Based Garment Transfer," ECCA 18. [16]

[17] Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, Hao Li PIFu:
Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , IEEE International
Conference on Computer Vision (ICCV), 2022, pp. 2304-2314.

[18]K Susheel Kumar, Vijay Bhaskar Semwal, Shitala Prasad, R.C Tripathi Generating 3D Model
Using 2D Images of an Object ,International Journal of Engineering Science and Technology (IJEST)
2023, pp. 2204-2214.

You might also like