You are on page 1of 22

Multimedia Tools and Applications

https://doi.org/10.1007/s11042-020-09989-x

In-home application (App) for 3D virtual garment


fitting dressing room

Chenxi Li 1 & Fernand Cohen


1

Received: 25 March 2020 / Revised: 21 July 2020 / Accepted: 24 September 2020

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract
This work introduces a novel method for the creation of an in-home virtual dressing room
for garment fitting using an integrated system consisting of personalized 3D body model
reconstruction and garment fitting simulation. Our method gives saliency to and estab-
lishes a relational interconnection between the massive scatter points on the 3D generic
model. Starting with a small set of anthropometric interconnected ordered intrinsic control
points residing on the silhouette of the projections of the generic model, corresponding
control points on two canonical images of the person are automatically found – hence
importing equivalent saliency between the two sets. Further equivalent saliencies between
the projected points from the generic model and the canonical images are established
through a loop subdivision process. Human shape mesh personalization is done through
morphing the points on the generic model to follow and be consistent with their
equivalent points on the canonical images. The 3D reconstruction yields sub resolution
errors (high level accuracy) when compared to the average resolution of the original
model using the CAESAR dataset. The reconstructed model is then fitted with garments
sized to the 3D personalized model given at least one frontal image of the garment with
no requirement for a full 3D view of the garment. Our method can also be applied to
virtual fitting system for online stores and/or for clothing design and personalized
garment simulation. The method is convenient, simple, efficient, and requires no inter-
vention from the user aside from taking two images with a camera or smart phone.

Keywords 3D reconstruction . Body model reconstruction . In-home dressing room . Virtual


dressing

* Chenxi Li
cl982@drexel.edu

Fernand Cohen
fsc22@drexel.edu

1
Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA
Multimedia Tools and Applications

1 Introduction

Online virtual tailoring has gained traction recently as evidenced by the MTAILOR
(https://www.mtailor.com/) App that allows custom clothes to be made by measuring an
individual on their phone. With the popularization of online shopping and advancement of
virtual techniques, the demand for a low-cost and highly accurate virtual in-home
fitting/dressing room system that goes beyond virtual tailoring measurement is in need. The
creation of a virtual in-home dressing room for garment fitting is timely and is a needed online
shopping application that would limit order returns, allow the trying on garments at home
before ordering, and can be updated as often as necessary to accommodate changes of users in
physique, weight, aging, etc.
In this paper, we introduce an integrated system that generates a 3D personalized
human body model from two images, then maps images of the garments onto 3D
personalized body. 3D body model generation is realized by morphing one generic
model based on two images (e.g. frontal and side images) of the person. This is done
by morphing anthropometric salient points on the 3D generic model and their corre-
sponding ones in the two images. Instead of manually selecting and relocating each
salient point on the generic model, which is a vast and tedious work, we firstly
reduce the size of salient point set selected from the generic model projections into a
reasonably small size (control point set). Then we further enlarge this small size into
a full-scale covering the entire surface through loop-subdivision which generates
additional salient points. The corresponding control point set in the two images are
roughly located (stage 1) with a pretrained Active Shape Model (ASM) [6] and then
refined with boundary regularization (stage 2). ASM is a point model that measures
variability and identifies shapes belonging the same class (i.e. objects with similar but
not identical shapes that vary in some characteristics). Stage 1 allows for an initial
location of the control points on the two canonical images. As the control points are
restricted to reside on the body boundary, these initial estimated points are regularized
to the boundary of the body in the frontal and side images in stage 2. These
combined two steps result into the selection of the corresponding control points on
the user’s input images with no needed help from the App user.
In the second part of the paper, garments are mapped on the personalized 3D human body
model using images of the garments, by first segmenting them from the 2D images, then
mapping their appearance on the 3D personalized mode. Two different garment types are
considered - tight and loose types, with a different appearance mapping method adopted for
each type. For tight type, garments are fitted to the 3D personalized body model in accordance
with the skintight property of the person trying them. For loose garment type, the general
shape of the garment is factored in the appearance mapping. The overview of our integrated
system is shown in Fig. 1 below.
The paper makes the following contributions:

1) We propose a simple and novel virtual 3D fitting room application (App) that generates a
personalized body model reconstruction for an individual and allows for virtual garment
fitting.
2) The App requires minimum intervention from the user aside from taking two images
(frontal and profile views of themselves) using their camera or smart phone and specifying
the garment type (loose or fitted) they are fitting.
Multimedia Tools and Applications

Fig. 1 Overall process for garment simulation system

3) The App neither measures body index nor uses 3D photography or scanning, but instead
uses two images taken from either the same camera or two cameras to synthesize the
person 3D body.
4) The reconstruction method is convenient, efficient, and is a natural alternative to taking
precise body measurements that a tailor would perform.
5) Our 3D reconstruction method gives saliency and topological meaning to the massive
scatter points on the 3D generic model (mesh), as well as establishing an order and
relational interconnections among them.
6) The 3D reconstruction yields sub resolution errors (0.3 of the resolution cell) when
compared to the average resolution of the original model using the large CAESAR
dataset. The error is about 0.315% of the average individual height of 160cms for the
CAESAR dataset considered. This shows a high level of accuracy for our 3D reconstruc-
tion method.
7) Virtual garment display on the personalized 3D model of the individual does not utilize
precise pattern of garment patches, a method more commonly used in fashion designing,
but rather reconstruct the 3D garment fitted to the person from images of the garment
itself. This allows the user to instantaneously see how the garment is fitting to the user’s
body shape from all 3D views by simple manipulation of the 3D fitted personalized
garment display.
8) Our algorithm operates under the condition that just one frontal image or set of images of
the garment is/are available. It allows for a 3D reconstruction of the garment in accor-
dance with the 3D reconstructed surface of the individual trying the garment, as well as
allowing for showing the garment fitted on the individual in 3D with the appearance
(texture) of the garment properly imported and mapped onto the individual 3D surface. It
also allows for loose garment types where the general shape of the garment is factored in
the appearance mapping.

The paper is organized as follows. Related works are listed and compared in the second
section. In section three, the method we use for creating a personalized 3D human body model
and for garment simulation in 3D space are discussed, respectively. In section four, the
performance and implementations of our system is discussed. In the last section, conclusions
and future work are discussed.
Multimedia Tools and Applications

2 Related works

2.1 Virtual try-on system

A virtual try-on system is an integrated system that cuts on the time spent in retail shopping
and unnecessary trips to the mall by allowing virtual garments trying on. Customer can try on
clothes without wearing them from home or anywhere using a computer or a smart phone.
Such technique could be more effective for shoppers who want to see how they look like in
different garments but have not enough time to go shopping.
There are many works aimed at building an ideal virtual try on system. One of the methods
is to virtually display the user with garments on a screen. The work in [12, 13] introduces a
virtual try-on system that relies on real-time image rendering in a virtual fitting room with
cameras installed from all angles and pre-recorded worn garments. This elaborate setting of the
virtual fitting room also requires simultaneous garment simulation. This adds a level of
inconvenience to the user and limits the system’s usefulness. In contrast, our proposed system
generates the 3D human body shape mesh and garment simulation simply with a hand-on
device and requires only pictures of both user and target garments. Others prefer to visualize
garments on reconstructed 3D body models. [28] shows the 3D virtual garment simulation
with garment description and a scanned 3D body model. The system focuses on how virtual
garments appear in a real scene subject to a given illumination setting on a known 3D human
body model. [33] proposes a mixed reality try on system which is aided by a reference avatar
model. However, the personalized model is a rescaled avatar model with the user’s body index
measured with RGB-D sensor. Also, the garments are pre-loaded along with the avatar. This is
in sharp contrast to our reconstructed method that uses 2D images for body reconstruction and
garment fitting. For our system, no prior knowledge of the human body shape is assumed. This
is simply generated using users’ images with hand-on devices. Other garment fitting systems
build computer aided design (CAD) systems by fitting the 3D body model with 2D design
pattern sketches and well-defined 3D garments. This is commonly used in the fashion design
industry but is not accessible to regular users. [15] proposes an efficient and automatic method
for adjusting a 3D garment to a mannequin model using markers on both garment and
mannequin. Using predefined feature lines and landmarks, [8] shows an automatic sewing
over human system. [31] introduces the way to locate salient points from silhouettes in frontal
and lateral images by evenly segmenting the body silhouettes into seven sections with respect
to height and choosing only few partition points on boundary for body measurement. [14]
implements a magic mirror for real-time garment visualization based on different color and
figure on the clothes. Unlike all these CAD methods, our method does not require definite
patterns or predefined landmarks, which makes it more generic and simpler to use.

2.2 3D model reconstruction

For a robust virtual garment try on system, 3D human model generation for different users is of
a paramount importance. A considerable body of 3D reconstruction research exists and has
been used for accurate virtual simulation and exhibition. With the development of data-capture
techniques, laser scanner is no longer the only tool for 3D modeling. Methods based on depth-
camera and calibrated regular cameras also allow for accurate 3D models relying on images
taken from multiple views. Smartphone software like 123D catch needs more than 20 images
from various horizontal and downward angles for model reconstruction. [10, 24] introduce
Multimedia Tools and Applications

methods for human body reconstruction from a sequence of uncalibrated images taken from
still video camera. [20] generates 3D humans shape model by projecting two depth images
with one of them predicted from the other using a deep neural network. In contrast, our
proposed method generates a high-resolution human body shape displayed in high
resolution, while requiring only two images taken from two canonical views easily
obtained by the user at home.
Model-based 3D reconstruction is one of the basic methods for 3D modeling. The well-
known SCAPE (Shape Completion and Animation of People) [2] model is widely used for
human shape and posture estimation. [9] uses a single image for human posture and shape
estimation by finding the articulated skeleton of the person. [35] parameterizes human body
model to generate customized human shape by adjusting different attributes (e.g. weight and
height) in accordance with a boundary fitting process. The human appearances are firstly
estimated using SCAPE model with different shape and pose parameters. Then attributes like
weight and height are adjusted using shape parameters in a low dimension space. Others
operate on shape deformation in body space by performing PCA (Principle Component
Analysis). [3, 22, 32] utilize PCA technique to train a mapping scheme for constructing 3D
models from 2D images. PCA extracts the main feature vectors and characterizes variations in
body space, but it needs a variety of examples for training. Boisvert’s method [3] builds a
transformation map between the parameters in 2D shape spaces and 3D shape spaces, where
the shape spaces are the lower dimensional spaces generated by PCA. In contrast, our method
establishes saliency point correspondence between a generic model and personal images and
reconstructs the body shape by morphing a generic model in accordance with these corre-
spondences. Based on the prevalence and competitive performance of Neural Network, it is
used in human shape and behavior analysis tasks [26, 29]. [7] uses Neural Networks to predict
the parameters of the SCAPE model from input silhouette images. These methods, which rely
on SCAPE models, need a vast number of training examples for parameter estimation
especially for inferencing a deep network structure and for finding the low dimension model
for a human shape. In contrast, the training cost for our proposed method is minimal, hence can
operates in a relatively small sample dataset environment.
Several large datasets have collected 3D human body models for body pose and motion
study. Most of the datasets like FLIC [27] and LSP [17] collect model information under
various poses for body motion and pose study. The scanned 3D body models in [11, 16] are
based on a limited population with small human body shape variations. Pishchulin et al. [23]
generated statistical body model from scanned body models stored in CAESAR dataset [25].
In this paper, we use the mean shape computed from a subset of the CAESAR dataset, with
each of its members given in triangular mesh representation, as our reference generic model.
This generic model along with the frontal and side images of an individual is used in the body
reconstruction of that individual. Moreover, the CAESAR dataset is also utilized for accessing
the performance of our 3D reconstruction method.

2.3 Garment simulation

For garment visualization, computer aided design systems are wildly used in garment dressing
design and virtual exhibition [18, 21, 34]. Professional systems based on different materials of
clothes and raw design drawings display the shape of clothes relied on their mechanical
properties. The system can help designers to get a proper visual display for their works [30].
The work in [34] introduces a garment sewed on 3D body models with the design imported
Multimedia Tools and Applications

from 2D images. With a predefined model for garments, one can also resize the body shape
with a feature curve net based on body indexes for different people [34]. The adjustment of the
model is flexible, but it needs a precise body measurement and a display on a generic model. In
[5], clothing appearance transfer is realized by mapping clothes from 2D images onto the
human body model by establishing a matching between the contour of clothing article in
the image and the corresponding part of a rendered human model. These methods either
need the precise pattern for each patch of the garments or need the preloaded 3D
garments for refinement and retexturing. The rendering of the garments in 3D for our
system does not require precise patterns or preloaded 3D garments, and hence is simpler,
more flexible, and is easy to operate by the users (just imports the image(s) of the
garment the garment to be tried on).

3 Human body shape reconstruction and garment simulation system

3.1 3D human body model synthesis

Our customized integrated system starts with body model reconstruction. There are basically
three ways for 3D modeling. The first is using a 3D scanner, which is the most popular method
in all kinds of 3D model reconstructions. The second one is constructing 3D models using
images taken from various views. And the third one starts with a generic model of the target
reconstruction type, then personalizes it by morphing salient points on the generic model so it
aligns itself with the images of models that we are trying to synthesize.
The first method is direct and yields a high-resolution model. The scanner, however, is
expensive and not easy to operate by a layman user. As for the second method, a series of
images from multiple views are required to be aligned for reconstruction. Because of the
complicated 3D structure of human body, it is hard to get a precise model just based on
images. The method in this paper belongs to the third class. We use a statistical average model
from a dataset as our generic model and two images from orthogonal views of a specific
person for human body shape reconstruction. Starting with a small set of interconnected
ordered intrinsic salient points (control points) residing on the silhouette of the projections
of the generic model, we automatically find their corresponding ones on the two canonical
images, and generate through loop subdivision the saliency of all other points on the generic
model and their equivalent points on the canonical images. The personalization of the human
shape mesh into the specific individual is done through morphing the points on the generic
model to follow and be consistent with their equivalent points on the canonical images.
The generic model we use in the proposed method is the average statistical model
obtained from the MPII Human shape models [23] based on the CAESAR dataset.
The two canonical images are taken from front and profile (left side) views of an
individual. These two images are self-taken using a fixed camera or smart phone
where the image plane is perpendicular to the ground. For example, we can attach the
smartphone on a stand or rest it against the windowsill and take frontal and side
images by turning 90 degree. To normalize the distance from the camera caused by
movement during image taking, the user’s outline in the images are normalized to the
same height after being segmented from the background. User’s height in pixels can
also correlates with the real height of the user in the final real 3D model reconstruc-
tion. The overview process of body model reconstruction is shown in Fig. 2.
Multimedia Tools and Applications

Fig. 2 Overview of 3D body model reconstruction

3.2 Personalization from a generic model and a pair of images of the person

We use the average statistical human shape model from the CAESAR dataset as our generic
model. The model is represented as a 3D triangle mesh consisting of 6449 vertices with 12,894
mesh faces. This has been augmented to a denser data set using loop subdivision [19] on the
original set, resulting into the smoother and denser point set V = {vi, i = 1, …, N} with N =
413037 points. The loop subdivision algorithm subdivides each triangle into four by consid-
ering the midpoints of the triangle sides as new points and joining them resulting into four new
triangles. This loop subdivision process is discussed further in section 3.2.1. These points on
the generic model are evenly distributed over the surface of the human body model and control
the body shape.
The generic model is personalized to the individual body model by relocating salient
anthropometric points (the control point set) on the generic model in accordance with their
corresponding ones in the two personal images. There are two main issues to be addressed
here. The first is given a small set of corresponding salient anthropometric points (control
points) on the generic model and the two images, how to morph the generic model into a
personalized model to obtain a high-resolution 3D model of the individual. This is addressed in
section 3.2.1. The second is how to automatically obtain these control point sets in the two
images and ensure that they are the same anthropometric points identified on the generic
model. This is addressed in section 3.2.2 and 3.2.3. We finally discuss a refinement process in
section 3.2.4 to eliminate some local distortions and to fix resulting holes in our reconstructed
model.

3.2.1 Personalized model using the salient anthropometric points

The overall procedure of obtaining a personalized model from frontal and side images with a
generic model is shown in Fig. 3. If we are given the small set of salient control points on the
generic model and their corresponding anthropometric ones on the frontal image (the finding
of these on the frontal image is discussed in sections 3.2.2 and 3.2.3). The personalization of
Multimedia Tools and Applications

Fig. 3 Personalization from generic model with 2 images

the generic model is performed by adjusting the x-y values of the projected (onto the X-Y
frontal image plane) generic control points to have their x-y values in accordance with the
corresponding control points in the frontal image of the individual. From this set of control
points, we augment the personalized model through loop subdivision.
Based on the control points, the input frontal body image is partitioned into a mesh of
triangular polygons. Similarly, the projection of the generic model into the X-Y plane is also
partitioned into a mesh from the pre-selected control points. Note that the control points are all
registered between the input image and the generic model. They have point-to-point and
triangle-to-triangle correspondences. Next, the point density on both the frontal image and
projected generic model increases by performing the loop subdivision algorithm in [19], which
subdivides each triangle into four by considering the mid points of the triangle sides as new
points and joining them resulting into four triangles. The subdivided triangles on the frontal
image and X-Y projected generic model maintain correspondences (see Fig. 4).
This process leads to a very dense (distance map) correspondences between the input image
mesh and the projected generic model landmark mesh. The personalized model is then formed

Fig. 4 Subdivision does not change the point-to-point correspondence (a) Frontal Image (b) Generic model
frontal projection
Multimedia Tools and Applications

by combining the x and y values on the frontal image point with the z value imported from the
corresponding generic model.
We enhance the reconstruction (personalization) with a 90-degree profile image. The z-
component, which is imported from the generic model is now updated when considering the
triangle mesh generated from the profile image.The x-axis of the profile image is now aligned
with the z-axis of the generic model. In summary in Personalization Step 1 (Fig. 3), the
personalized model has its (x, y) values associated with the 2D frontal image salient point
mesh, whereas it imports its z values, or the depth values, from the generic model. Whereas in
Personalization Step 2, the z value becomes updated in accordance with the x values from the
personalized model. Figure 5 shows the reconstructed and textured 3D model given the frontal
and side projection images. The appearance of a point on the personalized 3D surface is
imported from its projected appearance on the frontal or profile image. Note that the back
image is also needed for a back-appearance display, but the lack of it will not affect the 3D
geometric reconstruction.

3.2.2 Salient control points selection and registration

In the previous section, we have shown the personalization process based on the user’s images
and generic model with known correspondences of their salient control points. In this section,
we discuss what control point set to consider as well as how to introduce saliency to points in
the high-resolution dense map of the frontal/side projected generic model, and how to import
that saliency to points on the individual frontal and side images corresponding. This is done
using an index vector map generated starting from the interconnected control point set using
loop subdivision.

Fig. 5 Reconstructed 3D model from different views


Multimedia Tools and Applications

Intuitively, reconstruction of a personalized model is achieved once we identify and modify


each projected point on the generic model in accordance with its corresponding one on the
personal images in X-Y plane and Y-Z plane. While some points are anthropometric and are
easily identifiable (e.g. the eye or elbow point), not all points are, which renders the corre-
spondence points identification extremely difficult. We tackle that problem by reducing it into
two parts. The first part selects a small set of anthropometric salient control points and go
about identifying their equivalent in the frontal and side images, then using indexing in the
loop subdivision process to track down the correspondences in the high resolution projected
generic model and their corresponding ones in the personal images.
Since the control points should be common to all individuals, in addition to being intrinsic,
i.e. they should appear as salient points in the images of every individual, we select these
points to reside on the boundary of the generic projected model and the images of the
individual in X-Y and Y-Z planes. We describe how we find those in the frontal and side
images in section 3.2.3 next. These are obtained using a combination of a pretrained point
model and regularization to guarantee that the control points on the frontal and side images
reside on the individual boundary in the image space. The control salient points are chosen to
mark joint points on the human silhouette boundary with physical significance (e.g. head and
neck, shoulders, forearms, arm, waist, hips, legs, and feet) as well as points with topological
meanings (e.g. the highest point on an object or points with curvature extrema, inflection
points, etc.). This come to 56 points on the frontal image and 26 on the side image. Seven
iterations of loop subdivision are considered to arrive at the high-resolution model.
Regarding indexing, we start with the set of control points on the boundary of the projected
model (see for example red points in Fig. 6c for frontal projection). From this set, the next
iteration generates a set of new indexed points (see the red points (Set 2) in Fig. 6b, which are

Fig. 6 Setting index map between original and subdivided salient point sets (a) Set 1: Salient points on frontal
projection (b) Set 2: Subdivided points generated from main salient points on boundary (c) Overlapping Set 1 and
Set 2 from generic model
Multimedia Tools and Applications

generated by loop-subdivision of the pre-selected control point set {RG} = {1, 2, 3} on model’s
frontal boundary - green points (Set 1) in Fig. 6a).
We can easily find the closest points on the generic model that are projected in the X-Y
plane and associate the index of points subdivided from each iteration to their closest points on
generic projections. For example, in the zoom-in window in Fig. 6c, black points are on the
generic model frontal projection while the intersections of blue lines are new generated
subdivision points from the control points at an advanced stage of the subdivision. The
indexes of the red-circled ones in the subdivision point set are recorded in I since they are
the closest points to the black points on the frontal projection.
We do the same thing for the side projection, hence creating an index map between the
subdivision points and the projection points on Y-Z plane. We repeat the same process starting
with the control point set {R} on the body boundary in the image frame on the personal
side view image. As mentioned in section 3.2.1, the subdivision of the two point sets
{R} and {RG} maintains point-to-point and triangle-to-triangle mapping as long as the
two sets {R} and {RG} correspond. This implies that the index maps following the
subdivision correspond as well.
This process introduces interconnected topological meaning to the massive point set on the
3D generic model from a small set of control points located on its projection boundary. With
the interconnections and saliency of each point on the generic model established, it is easily
and efficiently imported to their corresponding points on the canonical images, which leads to
a higher resolution personalized model.

3.2.3 Finding the equivalent main salient points on the frontal and side images
of a given individual

In this section, we want to find the set of control points {R}, which is used in section 3.2.2, on the
body boundaries of an individual’s canonical images. We can select these points manually by
implementing a user interface, or use a Neural Network (NN) approach that uses a training set to
predict an initial placement of the control points on the frontal and side images for a given
individual, or we can use an Active Shape Model (ASM) [6] approach. In this paper, we select the
ASM approach and train two ASM models one for frontal and one for side landmark auto-
selection using 100 statistical models from the CAESAR dataset for training. In both training sets,
each example contains an image with a set of the ordered control points located on its boundary.
From the training set, we access the degree of variability that exists amongst individuals and use
that variability in obtaining a reliable set of control points for the new individuals.
We assume that the image frame for the frontal image of a new individual is aligned with
the frontal training set. Note that when that is not the case, we estimate the rigid transformation
map M(s, θ, t) applied on the pretrained point model that results into the smallest distance error
between boundary points of the new individual and the pretrained frontal image.
Let R(i),i = 1, 2, . . , 100, be the control point set for the ith training individual, R be the
100  
average control point set computed from the training set, and ½S ¼ 100 1
∑ RðiÞ −R RðiÞ −R
i¼1
T
be covariance matrix associated with the control point sets of the 100 individuals considered
for training. In a principal component analysis (PCA) space, R(i) is represented as RðiÞ ¼Rþ
½P bðiÞ ; where the variability matrix model [P] is the eigenvector matrix associated with [S],and
b(i) is a weight vector describing the deviation from the mean. When searching for the control
Multimedia Tools and Applications

point set R on a new image, the average point model R from the training examples is set as the
initial R, i.e. R¼R. Since the control points have to reside on the boundary, the position for
each salient point Rjin R (if not on the boundary), is moved to the boundary point with
the maximum image intensity gradient along the perpendicular direction of the line joining Rj −
1 and Rj + 1. This constitutes a boundary regularization step. After updating the locations dR of
all the salient control points in image frame (frontal and side), we estimate the change db in the
PCA space using the variability model Rþ½P  ðbþdbÞ to best fit the position changes R + dR.
By using the new b = b + db, we update our point mode R¼Rþ½P b to keep the general shape
consistent with the training examples. We toggle between updating R and updating the
parameters b in the PCA space, until there is no change of those salient points’ positions
above a set threshold. This is shown in Fig. 7 where the control points in red on the new input
images are firstly selected using the ASM model in Stage 1. Most of the control points are
located on the body boundary. For those points that are not, the boundary regularization step
(Stage 2) is invoked hence forcing those points to migrate to the boundary. Note that no
manual interference by the user is needed here.

3.2.4 Model refinement

With auto-selecting the salient points over the frontal and side images on the body boundaries,
we reconstruct a 3D personalized model V based on the process in 3.2.1. This personalized
model describes the global shape of one person but not in its details, meaning the surface of the
mesh model is rough and not as smooth as the skin surface of real person as shown in the
upper-left figure of Fig. 8. This is due to : 1) local stepped-like noise appearing on the surface;
and 2) incomplete or missing regions caused by the control points selection. The first flaw is
due to the fact that the projected boundary is a discontinuous piecewise step function (due to
the pixel-based (all integers) nature of the image frame) rather than a smooth continuous one.
That means that during the loop subdivision process, points near the projection boundary are
moved to the quantized boundary with many points on assuming the same x or y coordinate,
instead to a unique point if the boundary were a continuous one. The second flaw only occurs
when the main salient points are misplaced using ASM. These two local flaws do not affect the
overall shape of the reconstructed model; however, they result in a rough surface appearance.
In this section, we eliminate the rough mesh surface problem by refining the reconstructed
model in the PCA space. Note that unlike the case in section 3.2.3, the PCA approach is
applied to the entire 3D data set for each of the 3D human body shape in the training CAESAR
set instead of just the control point set on the boundary. In the PCA space [17], each 3D point
scanned model V (i) from the CAESAR dataset is represented as a shape mesh

Fig. 7 Main representative salient points auto-selection


Multimedia Tools and Applications

V ðiÞ ¼ V þ ½C φðiÞ , where V is the mean body shape model from the training set, and [C] is
first k components using PCA decomposition for variation measurement [17]. The parameter
vector φ(i) represents the deviation of each component V(i) from the mean model V and it
mainly controls the shape of the body mesh. By changing φ, a new human-like model with
smooth surface appearance could be created even when it is absent in each image. We estimate
the parameter φ∗ that minimizes the point-to-point distance between the raw reconstructed
shape model V and the model represented in the PCA space, i.e. we find φ* ¼ argmin

V þ ½C φ−V : This leads to φ* ¼ ½C þ V−V , where [C]+ = ([C]T[C])−1[C]T represents the
pseudo inverse of matrix [C]. The estimated refined body shape model is V * ¼V þ ½C φ* :This
model V∗ is smooth and has no holes. Figure 8 shows the refinement on the models from the
CAESAR dataset and for the reconstructed surface model in Fig. 5 (with no appearance).

3.3 Mapping the garments on the personalized body

The second part of our system virtually maps the garments on 3D human body model from
given 2D garment image(s). The mapping differs depending the nature of the clothes (tight

Fig. 8 Model refinement on different models


Multimedia Tools and Applications

versus loose). Clothes like shirts, T-shirts, and pants, stick to the human body when wearing
them on, and hence requires an appearance mapping that results into wrapping the cloth onto
the body. On the other hand, garments like dresses and skirts have their own shape and could
not be directly wrapped onto the human body, and hence requiring a different type of
wrapping. Note that before we proceed with either garment appearance mapping, the garments
are segmented from their 2D images.

3.3.1 Direct mapping to body model

For tight clothes, where the garment shape completely depends on human body shape, we
follow the same procedure for establishing main salient points correspondences between the
garment image and the main salient points on the personalized 3D model. This is followed by
the subdivision process that generates denser corresponding salient points between the garment
image and the 3D personalized model. We simply import the appearance of the garment salient
point to their corresponding salient point on the 3D model as shown in Fig. 9. As the
subdivision process proceeds, we obtain a denser point cloud of 3D model attached with the
garment appearance from the corresponding pixels (vertex points) on the 2D garment images.
The denser the point cloud is, the smoother and seamless appearance of the mapped garment
on the reconstructed individual surface is. The 3D point cloud for the front and back of the
shirt are generated separately. For a better visualization, the 3D surface topology with points
and mesh is realized using an open crust method [1]. While the texture on the reconstructed
model is mapped from the frontal and back image of the garment, there will be some stretch
over the stitching line. Figure 10 shows the garment mapping based on frontal and back
images.

Fig. 9 Constructing relations between salient points on 3D model and garment image (a) Patching clothing
layers on generic model (b) Texture mapping on frontal patch (c) Texture mapping on back path
Multimedia Tools and Applications

3.3.2 Indirect mapping to body model

For loose clothes like dresses and skirts, we adopt a different mapping than the direct one
described in section 3.3.1, where we preserve the inherent shape of the garment, by selecting
anchor points on the garment boundary and adopt an one-piece mapping procedure that maps
the whole garment onto the 3D body model after aligning it with a few anchor salient points at
the boundary of the 3D model. This has the effect of scaling and shifting the loose garment in
accordance with the 3D body model. Consider skirts for example, the anchor salient points are
the two endpoints of the waist. Note that this mapping is a 2D planar map. For such a mapping
to look realistic, the above mapping should be rendered 3-dimensional, which necessitates
additional needed assumptions. Based on the rounded structure of human body, a realistic
‘elliptic’ garment shape assumption for the garment is taken. That is, if we horizontally slice
the garment, each segmentation view shows as an ellipse when considering both the front and
back pieces. Thus, we generate meshes with salient points selected from the edge of the skirt.
For each line on the mesh, we give a depth to the points. These points lie on the ellipse. For
each ellipse, the parameters are obtained from the points lying on each line. The length of the
major axis is decided based on the leftmost and rightmost points while the length of minor axis
is half that of the major axis. Figure 11a shows the estimation of parameters in 2D while Fig.
11b and c are the geometric and appearance model after elliptic reconstruction. The 3D
garment reconstruction is then mapped onto the human body model based on the salient
points via translation and scaling.

Adding wrinkle curves for garment reconstruction To further reflect more realism to how
skirts do look like, we add to the cone-like hemline elliptic look wrinkles along the hemline
that manifesting themselves as shadow on the garment image. A darker area in the image
represents a relatively deeper depth while a brighter area represents a shallower part. Thus, this
shadow information can be extracted for depth reconstruction. To get rid of the influence of
colors, we convert the RGB image into a gray image and use the average filter horizontally to
eliminate the sharp changing edge appearance. Shown on a brightness scale, deep points have

Fig. 10 3D Garment reconstruction from 2D images (a) Reconstructed 3D body model with and without
garment(T-shirt) simulation (b) Reconstructed T-shirt (c) Frontal and back images of T-shirts from Google
Image search
Multimedia Tools and Applications

Fig. 11 Second type garment reconstruction with “ellipse assumption” (a) 2D garment image and “ellipse
assumption” parameters (b) Geometric garment model (c) Textured garment model

smaller intensity value while closer points (in depth) appear brighter. Instead of using the
intensity value directly, we calculate the gradient of each point as the brightness gradient map.
For point I(x, y) on image I, the gradient map for this point Gðx; yÞ ¼ I ðxþΔx;y
Δx
Þ−I ðx;yÞ
, where only
the horizontal gradients are considered. For each row in the gradient map G, each point is
smoothed with an average filter of length 20 to avoid sharp increase/decrease. Then for
point (x, y, z) previously located on every ellipse ring of model in section 3.3.2, we
implement the gradient map G(x, y) to optimize the depth information. The elliptic shape
assumption with and without adding shadow (depth) information are shown in Fig. 12a
and b, respectively. The addition of shadow information also changes the shape of the
skirt reconstruction as shown in Fig. 12c and d.

Fitting garment reconstruction on a 3D body model As indicated in section 3.3.2, the


reconstructed garments of the second type are mapped onto the human body model based on
the fixed salient points located both on the garment and model. After aligning the fixed salient
points to both the garment and body models, part of the garment intersects the model and
appears inside it instead of lying on its surface. To avoid this situation, we move the
unreasonable points from the sectional view. As is shown in Fig. 13a and b, red points are
the original skirt layer while the gray points are from the body model. We take the sectional

Fig. 12 Adding wrinkle curves based on shadow information (a) Depth without shadow information (b) Depth
with shadow information (c) Garment without shadow information (d) Garment with shadow information
Multimedia Tools and Applications

view of one slice from the combination of skirt layer and body layer. From Fig. 13a, the
relative positions of skirt points and human body points are obvious. Part of the garment cuts
through the body mesh, which is impossible. To address this problem, we find the center of the
body mesh points in the sectional view and move each point on the garment (red point) outside
the body mesh. As a result, the final positions of each point on the skirt layer covers rather
than cuts through the human body shape mesh as shown in Fig. 13b. Human body models in
Fig. 13a and b reflect the differences before and after this modification especially around the
waist section of body model.

4 Evaluation

To validate our proposed system, we access both the quality of the reconstruction a well as the
appearance of the virtual garment fitting as compared to the real fitting. For the former, we test
our proposed method on the CAESAR dataset [23] to access the reconstruction accuracy
(reconstruction error evaluation over the entire data), as well as accuracy with the regards to
specific body measurements. 700 CAESAR models are used for evaluation. We randomly
select 100 human body subjects from the dataset to train the ASM model to find the control
salient points on the canonical images, which are obtained by projecting the 3D data points on
an individual in the CAESAR dataset into the frontal and side image frames. The ground truth
base models are those scanned models from the CAESAR dataset, while the test models are
the reconstructed ones using our method and the canonical images. As for the appearance
evaluation, garments from different images are simulated on the individual model for which
images of a real fitted garment on a real person are available, and both the simulated and the
real images are visually compared.

4.1 Three-dimensional mesh comparisons

To access the accuracy, we first compare the point difference over the triangle mesh between
base and test models. Each corresponding point and triangle patch contribute to the geometric
error

Fig. 13 Fitting garment on 2D body model (a) Garment simulation before fitting on body model (b) Garment
simulation after fitting on body model
Multimedia Tools and Applications

1 N
E geo ¼ ∑ ‖xi −yðiÞ ‖
N i¼1
where N is the number of points on reconstructed model, xi represents the points on recon-
structed model, and y(i) is its nearest neighbor on the base model. The average geometric error
over 700 models was found to be 5.43 mm with standard deviation of 1.28 mm. The error is
quite low when compared to the average resolution (average length of the edges in the
triangular mesh over the 700 individuals) in the CAESAR dataset, which is found to be
17.82 mm with standard deviation of 5.04 mm. The 5.43 mm value is sub resolution (0.3 of the
resolution cell) when compared to the average resolution 17.82 mm of the original model. Also
note that the error is very low when considering that on average the height of a human body is
over 1600 mm (about 0.315%). This shows that our 3D reconstruction method achieves a very
high level of accuracy. In addition, larger geometric errors occur near the hands and feet and
lesser errors on the person’s torso and limbs leading to a better body measurement for further
garment simulation.
The reconstructed model is compared to the original model in the 2D canonical image space
and the 3D space as shown Fig. 14. The projected silhouettes from the original base model is
shown in cyan while the reconstructed model has silhouettes shown in red. Their overlapping
area is shown in black in Fig. 14. The reconstructed model shows small errors in the invisible
part on the side view (i.e. the two armpits, thigh, fists). This could also be observed from the
colored error bar shown on reconstructed model in Fig. 14.

4.2 Body shape measurements

We also perform sixteen 3D human shape measurements on the reconstructed mesh to give
another evaluation dimension to our proposed method. These sixteen body shape measure-
ments shown in Fig. 15 are commonly used in garment fitting [3].
Measurements represented with straight lines are Euclidean distances between vertices of
the reconstructed models, while measurements represented by an ellipse are circumference that
are measured on the body surface. Compared with the state-of-art methods, our proposed

Fig. 14 Geometric error evaluation between base and test models


Multimedia Tools and Applications

method achieves a lower error in most of the circumference measurements but are less accurate
in shoulder-blade and leg lengths. This is shown in Table 1 below.

4.3 Simulating different garments on the virtually reconstructed human body

Garment fitting simulation onto a virtually reconstructed human body model for the two kinds
of garments considered in this paper is shown in Fig. 16. This is to be contrasted to the real
image of a real fitted garment on the real person shown in the right-hand side of Fig. 16. As we
can see a great degree of realism and similarity in appearance between the real and virtually
fitted cases is manifested.

5 Discussion and conclusion

This work introduces a novel method for the creation of an in-home dressing room for virtual
garment fitting using an integrated system consisting of a personalized 3D human body shape
reconstruction and garment fitting simulation using a generic model and two canonical images
(frontal and profile images) of a specific person. 3D body mesh generation is realized by morphing
one generic model based on the two images of the person. This is done by morphing salient points
on the 3D generic model and their corresponding ones in the two images. Instead of manually
selecting and relocating each point on generic model, which is a vast and tedious work, we reduce
the size of salient point set selected from generic model projections and images into a reasonably
small size. Then we further enlarge this small set into a full-scale covering the entire
surface through loop-subdivision that results into additional salient points. Garments
were mapped on the personalized 3D human body model using images of the
garments, by first segmenting from the 2D images, then map their appearance on
the 3D personalized mode. Two different garment types are considered - tight and
loose types, with a different appearance mapping method adopted for each type. For

Fig. 15 Body measurement illustration


Multimedia Tools and Applications

Table 1 Body measurements comparison with other state-of-art methods (Mean error ± Std. Dev.(mm))

Measurement Proposed Method Boisvert et al. [3] Chen et al. [4]

A. Head Circumference 4±7 10 ± 12 23 ± 27


B. Neck Circumference 4±5 11 ± 13 27 ± 34
C. Shoulder-blade 10 ± 6 4±5 52 ± 65
D. Chest Circumference 7±5 10 ± 12 18 ± 22
E. Waist Circumference 3±3 22 ± 23 37 ± 39
F. Pelvis circumference 4±3 11 ± 12 15 ± 19
G. Wrist circumference 4±3 9 ± 12 24 ± 30
H. Bicep circumference 9±5 17 ± 22 59 ± 76
I. Forearm circumference 6±4 16 ± 20 76 ± 100
J. Arm length 9±7 15 ± 21 53 ± 73
K. Inside leg length 17 ± 9 6±7 9 ± 12
L. Thigh circumference 15 ± 7 9 ± 12 19 ± 25
M. Calf circumference 4±5 6±7 16 ± 21
N. Ankle circumference 3±3 14 ± 16 28 ± 35
O. Overall height 6±8 9 ± 12 21 ± 27
P. Shoulder breadth 6±5 6±7 12 ± 15

tight type, garments are fitted to the 3D personalized body model because of the
skintight property of the garment. For loose garments, the general shape of given
garment is factored in the appearance mapping.
Our system is efficient and portable to use once it is implemented on portable
devices. The creation of a virtual in-home dressing room for garment fitting is a
timely and needed application that would go hand in hand with online clothes
shopping that would limit online order returns, in addition to allowing for trying
garments at home before ordering them.
A limitation of the proposed method is that for human shape reconstruction, the input
canonical images should be in the same posture or articulation as the generic model. Great
changes in posture or large occlusion necessitates a much larger set for training the ASM
model.

Fig. 16 Garment simulation display


Multimedia Tools and Applications

Authors’ contributions All authors contributed to the study conception and design. Method pipeline design,
data collection and experiments analysis, first draft of the manuscript are finished by Chenxi Li; Method
optimization suggestions and draft editing are given by Fernand Cohen. All authors read and approved the final
manuscript.

Data Availability Statistical human shape of CAESAR dataset is downloaded from http://humanshape.mpi-inf.
mpg.de/#download.

Compliance with ethical standards

Conflict of interest Not Applicable.

Code availability Not Applicable.

References

1. Amenta N, Bern M, Kamvysselis M (1998) A new Voronoi-based surface reconstruction algorithm.


Proceedings of the 25th annual conference on Computer graphics and interactive techniques pp 415–421.
2. Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J (2005) SCAPE: shape completion and
animation of people. ACM SIGGRAPH 24:408–416
3. Boisvert J, Shu C, Wuhrer S, Xi P (2013) Three-dimensional human shape inference from silhouettes:
reconstruction and validation. Mach Vis Appl 24(1):145–157
4. Chen Y, Kim T-K, Cipolla R (2010) Inferring 3D shapes and deformations from single views. European
Conference on Computer Vision pp 300–313
5. Chen W, Wang H, Li Y, Su H, Wang Z, Tu C, Lischinski D, Cohen-Or D, Chen B (2016) Synthesizing
training images for boosting human 3d pose estimation. 2016 Fourth International Conference on 3D Vision
(3DV) pp 479–488
6. Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models-their training and application.
Comput Vis Image Underst 61(1):38–59
7. Dibra E, Jain H, Öztireli C, Ziegler R, Gross M (2016) Hs-nets: estimating human body shape from
silhouettes with convolutional neural networks. 2016 fourth international conference on 3D vision (3DV) pp
108–117
8. Duan L, Yueqi Z, Ge W, Pengpeng H (2019) Automatic three-dimensional-scanned garment fitting based
on virtual tailoring and geometric sewing. J Eng Fibers Fabrics 14:1558925018825319
9. Guan P, Weiss A, Balan AO, Black MJ (2009) Estimating human shape and pose from a single image.
IEEE 12th International Conference on Computer Vision pp 1381–1388
10. Han X, Wong K-YK YY (2016) 3D human model reconstruction from sparse uncalibrated views. IEEE
Comput Graph Appl 36(6):46–56
11. Hasler N, Stoll C, Sunkel M, Rosenhahn B, Seidel HP (2009) A statistical model of human pose and body
shape. Comp Graphics Forum 28:337–346
12. Hauswiesner S, Straka M, Reitmayr G (2011) Free viewpoint virtual try-on with commodity depth cameras.
Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in
Industry pp 23–30
13. Hauswiesner S, Straka M, Reitmayr G (2013) Virtual try-on through image-based rendering. IEEE Trans
Vis Comput Graph 19(9):1552–1565
14. Hilsmann A, Eisert P (2009) Tracking and retexturing cloth for real-time virtual clothing applications.
International Conference on Computer Vision/Computer Graphics Collaboration Techniques and
Applications pp 94–105
15. Huang L, Yang R (2016) Automatic alignment for virtual fitting using 3D garment stretching and human
body relocation. Vis Comput 32(6–8):705–715
16. Jain A, Thormählen T, Seidel H-P, Theobalt C (2010) Moviereshape: tracking and reshaping of humans in
videos. In: ACM Transactions on Graphics (TOG) pp 1–10
17. Johnson S, Everingham M (2010) Clustered Pose and Nonlinear Appearance Models for Human Pose
Estimation. bmvc 4:5
18. Li J, Lu G, Liu Z, Liu J, Wang X (2013) Feature curve-net-based three-dimensional garment customization.
Text Res J 83(5):519–531
Multimedia Tools and Applications

19. Loop CT (1987) Smooth subdivision surfaces based on triangles. Master's Thesis, University of Utah
20. Lunscher N, Zelek J (2018) Deep learning whole body point cloud scans from a single depth map.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops pp 1095–
1102
21. Magnenat-Thalmann N, Seo H, Cordier F (2004) Automatic modeling of virtual humans and body clothing.
J Comput Sci Technol 19(5):575–584
22. Michael N, Drakou M, Lanitis A (2017) Model-based generation of personalized full-body 3D avatars from
uncalibrated multi-view photographs. Multimed Tools Appl 76(12):14169–14195
23. Pishchulin L, Wuhrer S, Helten T, Theobalt C, Schiele B (2017) Building statistical shape spaces for 3d
human modeling. Pattern Recogn 67:276–286
24. Remondino F (2002) Human body reconstruction from image sequences. Joint Pattern Recognition
Symposium pp 50–57
25. Robinette KM, Daanen H, Paquet E (1999) The CAESAR project: a 3-D surface anthropometry survey.
Second International Conference on 3-D Digital Imaging and Modeling pp 380–386
26. Sajjad M, Zahir S, Ullah A, Akhtar Z, Muhammad K (2019) Human behavior understanding in big
multimedia data using CNN based facial expression recognition. Mobile networks and applications pp 1–11
27. Sapp B, Taskar B (2013) Modec: multimodal decomposable models for human pose estimation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 3674–3681
28. Spanlang B, Vassilev T, Buxton BF (2004) Compositing photographs with virtual clothes for design.
International Conference on Computer Systems and Technologies pp 1–6
29. Ul Haq I, Ullah A, Muhammad K, Lee MY, Baik SW (2019) Personalized movie summarization using deep
cnn-assisted facial expression recognition. Complexity 2019
30. Volino P, Magnenat-Thalmann N (2005) Accurate garment prototyping and simulation. Computer-Aided
Design Appl 2(5):645–654
31. Wang D, Sheng Y, Zhang G (2019) A new female body segmentation and feature localisation method for
image-based anthropometry. International Conference on Multimedia Modeling pp 567–577
32. Xi P, Lee W-S, Shu C (2007) A data-driven approach to human-body cloning using a segmented body
database. 15th Pacific Conference on Computer Graphics and Applications pp 139–147
33. Yuan M, Khan IR, Farbiz F, Yao S, Niswar A, Foo M-H (2013) A mixed reality virtual clothes try-on
system. IEEE Trans Multimed 15(8):1958–1968
34. Zhong Y, Xu B (2009) Three-dimensional garment dressing simulation. Text Res J 79(9):792–803
35. Zhou S, Fu H, Liu L, Cohen-Or D, Han X (2010) Parametric reshaping of human bodies in images. ACM
Trans Graphics (TOG) 29(4):126

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

You might also like