Professional Documents
Culture Documents
Nikhil Malik
nmalik1@andrew.cmu.edu
Kannan Srinivasan
kannans@cmu.edu
Dokyun Lee
dokyun@cmu.edu
1. Introduction
A rich stream of literature in marketing has studied the impact of the visual appeal of product aesthetics
as well as the impact of product endorsers on product demand (Petty 1983 JCR, Richins 1991 JCR, Reingen
1993 JCP, Hendershott-King 1997, Gentry 1997 JA, Cryder 2017 JMR, Claudia and Shu JCP 2010). The
effective use of attractive solicitors, models and endorsers in advertising and in the persuasion of
customers has been shown repeatedly. In the employment of people to work in their stores, Abercrombie
& Fitch made headlines for specifically trying to hire “young, attractive, mainstream athletic types and
cheerleaders who might be their girlfriends” (Edwards 2003, Bryant 2003).
One would expect a particularly strong impact of looks when the product and the endorser are somewhat
merged, such as in personal marketing. Willis and Todorov (2006) show that people infer within seconds
personality traits, such as competence, trustworthiness and aggressiveness, from faces. These
1 Going forward, in this paper, we refer to individuals whose faces receive an above average attractiveness rating by evaluators as “attractive”
and others who do not receive this rating as “plain looking”. Beliefs or preferences based on this difference are referred to as the following: an
“Attractiveness Bias”, an “Attractiveness Gap” and an “Attractiveness Premium”. Note that some of the prior literature uses phrases, such as a
“beauty bias” and a “beauty premium” for the same concept.
However, this body of research has mostly studied attractiveness bias in static, experimental situations
(Mobius and Rosenblat 2005) 2. Individuals are asked to play hypothetical roles, such as a jury member
(Kulka and Kessler 1978), a manager (Watkins and Jones 2015), a likely convict or an able worker. In these
settings, individuals choose between attractive and plain-looking subjects, based on short interactions.
They are given no performance history or significant time to learn about the subjects. Additionally, the
evaluator does not expect to spend time with the subject after their evaluation. Experiments thus control
away entangled mechanisms in order to perform causal inference of attractiveness bias effect size. These
characteristics of experiments are in contrast with real world settings where a manager not only has more
information but may also expect to have an ongoing social (or even romantic or marital) relationship with
an employee. Experiments are thus limited in their external validity.
Further, short, static settings make it difficult to identify the underlying source of bias since different
sources may lead to the same patterns of a single-shot outcome (Bohren et al. 2018 AER, Fang and Moro
2011). Progression in professional roles depends on the continuous manager evaluation of the employees’
performance. When employees are starting out, there is limited performance history. A young graduate
with a 3.7 GPA has similar grades and the same classes as those of dozens of his or her peers. Similarly,
the investment banking interns’ first year is spent rotating across company divisions in training, producing
little of value to differentiate themselves. Discrimination may be in play if attractive individuals are
systematically more or less likely to be evaluated higher and to obtain better jobs or early promotions,
even in the absence of any objective performance signals. However, suppose that the individuals start
producing outputs of similar quality. Would the attractiveness bias persist or disappear? The answer to
this question depends critically upon the source of the bias.
We focus on two major sources of discrimination that are proposed in the literature, namely, preference
based (Becker 1957) and belief based (Fang and Moro 2011, Phelps 1972). The two sources would lead to
different dynamics of the attractiveness bias. In the preference-based bias, the evaluators have inherent
taste (or distaste) for employees who are attractive. This could simply be a result of the evaluators being
able to spend time around the attractive individuals. Although performance on the job provides
informative signals of an employee’s ability, it will have no impact on changing the preference of the
evaluator. In contrast, in belief-based bias, the evaluator at the outset perceives attractive individuals to
be on average more or less able than their plain-looking counterparts. Dion et al. (1972) suggest a positive
perception of social skills, mental health and personality from attractive looks. Observing performance on
the job will overcome such perceptions and reduce discrimination. This particular source of discrimination
is also known as statistical discrimination (Altonji and Pierret 2001). An evaluator has group-level priors
that diminish once the evaluator obtains objective signals of ability from the individual. Thus, if the source
of bias is preference based, the bias will persist; otherwise, the bias will diminish.
2 Without revealing the dynamics or sources of bias, other studies (Hamermesh and Biddle 1994, Zebrowitz and Donald 1991) that look at long
term archival data show over a significant career length an attractiveness premium.
Overall, the small size of preference bias (0.64% gap per year) and the insignificant degree of belief bias
seem to suggest a marginal attractiveness premium. However, these yearly premiums add up to a 2.6% -
3.6% attractiveness gap in a 15-year career. In comparison, the gender-pay gap in the US stands at over
1% per year (Stanley and Stephen 1998) and adds up to an 18%-22% gap when averaged across all jobs.
The gender gap is estimated to be smaller at 1%-6% (Hay Group 2016) when the same set of jobs is
considered between men and women. Note, that these numbers are not directly comparable. We
measure success as the quick attainment of a desired job; however, most gender gap studies look at
average wages.
Our longitudinal sample presents major challenges in dealing with unstructured data in the shape of
images and text. We need to rate the attractiveness of a large number of subjects. Additionally, we only
observe a single instance of the subject’s self-posted profile picture. This creates three major challenges:
(1) the derivation of a measure of attractiveness from a high-dimensional (unstructured) image pixel-level
data, (2) the obfuscation of the underlying facial appearance by the subject’s choice of temporary
accessories, such as clothing, hair style and expressions, (3) a single current snapshot that must be used
to extrapolate the appearance of an individual at the time of the individual’s MBA program graduation up
to 25 years prior. The first is a relatively common obstacle that has been tackled in recent times by using
deep learning models supervised by labels from human raters. To ensure that our deep learning model’s
performance is not affected by extraneous features, we learn a model capable of disregarding temporary
features and focusing only on permanent ones. Finally, obtaining a rating of attractiveness for an
individual over a number of years from only one profile image poses a serious challenge. It has been
established in the literature that attractiveness evolves with age (Zhang et al. 2017). As a result, comparing
a 25-year-old individual with a 45-year-old individual in terms of attractiveness could be misleading. More
importantly, different individuals age differently. For example, consider a case of two individuals A and B.
Let us say at 25, A is more attractive than B. Now at 45, A will not necessarily continue to be more
attractive than B. Even if A continues to be more attractive than B, the gap between the two could change
significantly. Hence, to do a fair comparison over time, we need to model the evolution of attractiveness
for people who look like A and who look like B. We use a generative deep learning model (Conditional
Adversarial Auto Encoder, Zhang et al. 2017) to calculate the attractiveness of an individual over time.
In addition to building a quantitative measure out of the image data, we need to establish a measure of
success comparable between individuals. We observe all career milestones (job title, company, location
and timelines) that need to be ranked for each individual. We need to be able to order jobs such that a
higher-ranked job is more desired by the job market participants. We rely upon traditional labor
economics methods (Baker et al. 1994 and Gayle et al. 2012) that establish a preference order between
Our paper makes three main contributions. First, we tease out two sources of attractiveness bias, namely,
preference based and belief based, by using the dynamics of bias measured over 15 years of post-MBA
careers. We measure the size and direction of these biases. We find a positive preference bias.
Additionally, we find that belief-based bias does not play a significant role in the careers of MBA
graduates. This is a significant finding because belief bias may be minimized by initially providing rich
performance-related information. However, preference-based biases are much harder to remove. The
media has highlighted sexual harassment across industries. Such pure taste-based, perverse favoritism
cannot simply be eliminated by small policy changes. It rather requires a social and cultural change.
Second, our findings are based on one of the largest longitudinal, archival data sample which provides
external validity to the findings of prior experimental, small-sample studies. The study spans 25 years of
MBA graduates between 1990 and 2015, thus providing credence to a perpetual phenomenon rather than
a quirk in a short-lived setting. Third, our computer vision-based deep learning method provides a method
to extract the heterogeneous evolution of partially observed visual characteristics. Our attractiveness
ratings and job ranking procedures have wide applicability in any labor economics study on appearances.
To our knowledge, this is the first attempt in business research at visual image morphing driven by deep
learning.
The belief-based discrimination is also known as statistical discrimination. Belief-based bias occurs in an
environment of imperfect information where evaluators rely on group-level beliefs to judge individuals.
These beliefs and their subsequent updates are similar to the Bayesian learning approach frequently
applied in marketing literature to model how consumers learn about a new product (Erdem and Keane
1996 and Erdem et al. 2008, Mehta et al. 2003, Huang et al. 2014). The evaluator starts with a prior based
A majority of the studies in the literature point to beliefs as the underlying source of bias. Dion et al.
(1972) suggest a positive perception of social skills, mental health and personality from attractive looks.
Olivola & Todorov (2010) show the perception of competence to be correlated with attractiveness. These
qualities align well with the requirements of the labor market. Not surprisingly, a majority of this research
finds positive premiums from attractiveness. Mobius and Rosenblat (2005) show that in an experimental
labor market, employers consider attractive workers as more able, even with equal performance
measures. Similar outcomes are shown in a public goods experimental setup (Andreoni and Petrie 2006,
Rezlescu et al. 2012). There do exist settings where belief bias is shown as a penalty (Heilman and
Some of the attractiveness bias literature also highlights preference-based sources. Luxen and Vijver
(2006) show a preference for attractive individuals, especially when there is high expectation of continued
contact post-experiment. In addition to the academic literature, numerous instances of sexual harassment
in the workplace have recently emerged in the media. While the presence of perverse favoritism for
attractive individuals was always understood, the sheer size of this problem has left society as a whole
looking for answers. These examples provide more than sufficient evidence for the existence of purely
taste-based decision-making by employers. Once again, a small handful of literature shows a penalty from
taste preferences as well. Agthe et al. (2010) show a penalty for attractive scholarship candidates when
judged by same sex evaluators. Ruffle and Shtudiner (2014) suggest envy and jealousy can engender a
penalty for attractive young women when evaluated by other young women. Once again, the overall
effect of preference bias on professional careers is thus unclear.
Most of these prior studies involved experimental setups that differ from an actual professional work
environment in the following three key ways: (1) the evaluator has little knowledge of past performance,
(2) the evaluator has limited time to learn about the subject on the job, and (3) (in most cases) the
evaluator does not expect to spend time with the employee after end of experiment. The first two
naturally mean that experimental setups focus on belief-based bias. The evaluator must decide based on
priors or group level beliefs for attractive individuals. The third contrast means that pure taste or
preference-based bias is minimized in experiments. The hypothetical manager in an experiment has no
expectation of spending time with the subject by hiring them or promoting them. These experiments
successfully control away numerous interfering factors in order to perform a causal inference of the
attractiveness bias. They are able to inform us of impact and direction of bias in specific settings. In actual
work places, all multiple factors are thrown in simultaneously. While belief bias may dominate under
experimental control, preference may dominate in long-term interactions.
There are only a few observational data studies on attractiveness bias. Most establish a positive career
impact. This suggests that belief and preference biases combine to create a positive premium.
Unfortunately, the existing studies do not clarify the actual positive or negative role of both sources
independently. The observational literature is unable to answer this because it does not study the
dynamics in attractiveness bias. Biddle and Hamermesh (1998) do show a premium in 5-year and 15-year
careers, but they do not model dynamics in career progress or bias. Therefore, they do not compare the
size of early versus late bias or tease apart different sources. Our observational data and empirical method
allows us to exploit fine-grained career dynamics to identify the source of bias and provides external
validity through the use of a large sample.
We suggest that the dynamics of attractiveness bias can be used to distinguish the sources and direction
of the bias. The preference- or taste-based bias is unaffected by any information regarding the individual’s
underlying quality. In contrast, the belief-based bias disappears as evaluators receive information
regarding the individual’s underlying quality. Early career decisions resemble a scenario where the
employer has limited information and therefore may unwittingly absorb perceptions from appearance
cues. In professional careers, performance signals are of course not objective indicators. Nevertheless, we
expect a period of 4-8 years to be a more than sufficient time duration for prior beliefs to be overcome
by actual performance signals. Belief-based attractiveness rewards should vanish after this point. We use
this critical difference between preference and belief bias to independently measure their impact.
First, we measure attractiveness bias in the later part of one’s career. If a positive bias (does not) exists
(exist), it would imply the presence (absence) of a positive preference bias. Next, we measure the
attractiveness bias in the early part of one’s career. If the preference bias was absent, any early career
bias points to the existence of a belief bias. If a positive preference bias was present, determining the
belief bias is non-trivial. One of the following three possibilities exist: (1) positive belief bias, if the early
career bias is greater than the late career bias, (2) zero belief bias, if the early career bias is equal to the
late career bias, (3) small negative belief bias, if the early career bias is less than the late career bias. Thus,
we can use the relative size of the early and late career bias to answer the following three open questions:
(i) Is preference bias absent, positive or negative? (ii) Is belief bias absent, positive or negative? (iii) What
is the combined effect when preference and belief bias coexist early in professional careers? We ignore
here possibilities where one or both of the biases are significantly negative, resulting in an attractiveness
penalty over the entire career. This would go against all prior observational data research.
Prior research has also developed computer models of faces (Olivola et al., 2014; Olivola & Todorov, 2010;
Todorov et al., 2015) for use in fine-grained studies of appearances and perceptions. We take the deep
learning route to significantly improve accuracy. Amos et al. (2016) develop OpenFace as a general
purpose tool for face recognition by using Convolution Neural Networks (CNN). Radford et al. (2016) utilize
a Generative Adversarial Network (GAN, Goodfellow et al. 2014) for general image generation. Zhang et
al. (2017) perform image morphing by using a Conditional Adversarial Auto Encoder (CAAE). We take
elements from the OpenFace (Amos et al. 2016) and CAAE (Zhang et al. 2017) models to make a first
attempt at visual image morphing driven by a deep learning method used for business research. With
regards to unstructured profile data, we build on Page Rank (Page et al. 1999) to order unstructured, self-
reported, free-text job titles.
We use the search functionality offered by this professional social network platform. To collect profiles,
we use “MBA” as a search string along with filters by location (”United States”) and school name. The list
of university names is derived from the US News business school rankings. A typical MBA program
graduates anywhere between 200 to 1000 students per year, representing a rate that is equivalent to
5000 to 25000 graduates in the last 25 years. The search functionality returns 100 pages with 10 profiles
per page of search results. To cover more profiles, we make use of a platform quirk. We perform
approximately 10 searches per school by using a different filter each time for industry. This theoretically
allows us to capture up to 10000 profiles per school search. The platform has over 200 industry
classifications overall, but we select the top 10 most frequent industries that are likely to cover more than
90% of all MBA graduates. Appendix A.9 provides a breakdown of the collected profiles by school. Figure
1 shows the number of profiles collected by MBA school rank.
Unfortunately, the online platform places a high cost on collection and places severe limits on the number
of profiles that can be viewed per day. Moreover, the search for profiles for a school often returns
individuals who were never MBA graduates. These individuals may have served in teaching, administrative
or other capacities in the specific MBA program or in a similarly sounding MBA program. The platform
search mechanism is relatively opaque. Due to these limitations, we are able to collect a total of 31,009
MBA graduate profiles. Out of these, because some of the profiles were incomplete, we are eventually
able to use 19,893 profiles . Table 2 provides details on the reasons for discarding certain profiles.
Some profiles will always be omitted from our collected sample for the following reasons: (a) some
graduates may have exited the job market prior to 2017 and thus never created or deleted their profiles,
(b) some may be in the job market but do not use this professional social network, (c) some use this social
network but keep their profiles private, and (d) others may have an active public profile but do not
mention their MBA degree at all. All such individuals are systematically left out of our sample. To mitigate
potential biases from sample selection, we also collect a secondary data set. We access the school
directory of a top 10 MBA program in the US. This gives us access to names and the graduating year for
all 15,640 students between 1990 and 2016. Of course, we do not find professional profiles for all these
15,640 graduates, but now we have visibility into the characteristics of the graduates that have been
selected out. In Appendix A.9, we describe this secondary data and the robustness tests that were
conducted by using this smaller dataset. We realize that the access to complete data is unrealistic in this
domain (on or outside this platform), as a large number of professionals choose to keep their profiles
completely or partially private. Beyond a certain limit, as researchers, we are cognizant of not using
intrusive technical methods to gather private data.
The Table 3 shows sample values for key profile fields fetched by us. We typically find 2 to 8 career steps,
with each step comprising the job title, employer name and industry. We also find 2 to 4 degrees (e.g.,
high school, undergraduate, Masters, MBA, PhD), each comprising a degree and university name. All of
these have a corresponding start year and end year. We identify the education degree corresponding to
the MBA by using the degree field. Figure 2 shows the number of profiles collected corresponding to each
MBA graduation year.
Since the profile details are self-reported in free-text fields, we perform a significant data cleanup process
to standardize the names of universities, employers, degrees and job titles. This cleanup ranges from
correcting misspellings to matching synonymous names. In addition to the features directly available on
the profiles, we also need to calculate new features, i.e., gender, age, university rankings, career success,
and attractiveness. These will be used in our primary analysis to compare the professional success of
attractive and plain-looking individuals while controlling for other heterogeneous characteristics. Table 4
provides a list of these features, their source (or calculation) and the resulting values. In the next section,
we describe in detail calculations for attractiveness and job rank. In addition to these two, the details on
other data-cleaning procedures and feature calculations are provided in Appendix A.8.
Table 4: Calculated Features, their sources and values. More details in Appendix A.8
Feature Source Values
Gender (from profile picture) Microsoft Azure Cognitive Face API Male, Female
Ethnicity http://www.textmap.com/ethnicity European, Asian, Others
Age (2017 - Year of Undergraduate Graduation) + 22 23-60
(MBA Graduation Year – 2) – (Year of
Pre MBA Experience [0,16]
Undergraduate Graduation)
Undergraduate Ranking US News 1-1000
MBA Ranking US News 1-100
Undergraduate Degree Area Table 23 Business, Arts, Science
Industry Category Table 24 Finance, Consulting, IT, Others
Job Type Table 25 Management, Technical, Others
Large Employer Appendix A.8 {0,1}
Large Location Appendix A.8 {0,1}
Attractiveness Section 4.1, 4.2 {0,1}
Job Rank Section 4.3 r~Ν(0,1)
4. Measures
Our research goal rests on being able to compare professional success for individuals that differ in their
perceived attractiveness. We need to extract measures of attractiveness and career success from profiles.
4.1 Attractiveness
We focus only on one specific attractiveness aspect, which is measured by facial appearance. We measure
the attractiveness of the individual based on the profile picture of the individual. Our source platform
provides us a single image per person. Given the professional nature of the profiles, people often post
clear head shots. We detect and remove 6358 profiles from our sample for which the pictures either do
not capture a face or capture more than one face. Most studies in the literature measure attractiveness
as an average of ratings by a group of human raters (Biddle and Hamermesh 1998). Assuming at least 5
human raters per image, it would require approximately 100,000 ratings (19,893 profiles * 5 ratings) for
our dataset. As a solution to scale our analysis, we obtain only a subsample of images rated by humans.
We train a machine learning model to learn the relationship between low-level image features and the
attractiveness label in our training data. As a result, our machine learning model learns to mimic
evaluating attractiveness in a manner similar to that of a human rater. Next, we explain the different steps
used in building and testing out machine learning classifier.
Labeled dataset: We set up a small-scale experiment to enable raters to judge the attractiveness of a
random set of 659 out of 19,893 full-sample profile pictures. These ratings would act as training labels for
our machine learning model. This experiment was executed on the AMT (Amazon Mechanical Turk)
platform. The raters were selected to participate based on the following three criteria: 1) geographical
location in United States, 2) completion of at least 500 human intelligence tasks (HITs) in the past on the
AMT platform and 3) the receipt of approvals on 95% or higher of all their completed HITs. These
conditions were applied to ensure that the raters understood our requirements in English, that they were
experienced in using the AMT platform and that they have showcased good quality of work in the past.
Additionally, limiting the raters to the United States is likely to help in mimicking how a subject’s
appearance would be judged in a typical US professional work environment. Our training data consists of
images labeled as attractive based on a 1 to 7 scale, where 1 represents the lowest value of attractiveness
and 7 represents the highest value of attractiveness. Table 5 provides the mean and standard deviation
of the attractiveness ratings, with women being rated on average higher than men. Each image is labeled
by 5 coders, and we use the average of the five ratings for an image as its true rating. Attractiveness is a
subjective measurement, and multiple raters are unlikely to exactly agree in their ratings. We calculate a
standardized Cronbach alpha (Santos 1999) value of 0.73, which is a typical psychometric standard used
for consensus between the raters. The relatively high value (0.7 typically being an acceptable standard)
gives us confidence that using an average would capture at least a coarse variance in the attractiveness
of the profile pictures.
Image Features: We need to ensure that the machine learning model picks up unalterable appearance
differences instead of superficial differences that are based on how a specific photograph was taken. To
ensure this, we preprocess the profile pictures. This step involves removing or discounting extraneous
details, such as picture quality, picture background, clothing and hairstyle, from the images. We first use
Once front-facing comparable face images are obtained, we compress a typical 400x400x3 (RGB) image
into a handful of key features that capture a majority of differences between appearances. We use a deep
neural network implementation by the Open Face project (Amos et al. 2016). This architecture, while
trained for face recognition task, provides a 128-dimensional intermediate layer. This layer represents a
low-dimensional embedding of any face image. Figure 3 lays out a summary of various transformations
performed on profile pictures to arrive at a set of 128 features for every face.
Figure 3: For two samples, the figure shows (left to right) the original picture, the bounded picture, the pose corrected
versions and the 128-dimensional feature representation.
Predictive Model: We train a support vector regression model that learns the relationship between 128
image features and the attractiveness label. The model achieves a high-prediction accuracy with a root
mean square prediction error of 0.92 on a 5-fold cross validation. During cross validation, some of the
labeled images are held back from model building. Then, the accuracy of the model is validated on the
held-out sample. A large number of faces are rated in the 3.0 – 6.0 region; as a result, in this region, our
model also does a relatively good job of mimicking human ratings.
We investigate other approaches from the literature on machine-rated facial attractiveness. Golden
ratios, neoclassical canons (Schmid et al. 2008) and Eigen faces (Turk and Pentland 1991) are alternatives
for identifying image features. The deep neural network method easily outperforms the other two
methods. Our approach has the following two steps: (i) use of the neural network to obtain 128-D dense
features from images and (ii) use of the SVM to make predictions. An end-to-end deep neural network
(Lawrence et al. 1997) is another alternative. Our two-step approach has a side benefit of ignoring
superficial temporary characteristics, e.g., room lighting, facial hair, expressions, etc. This is the result of
step (i) which enables recognition of the same face across different settings, clothings and picture quality.
The 128-D features learn to ignore superficial temporary characteristics.
The model above calculates the attractiveness 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑖𝑖,𝑎𝑎 of individual 𝑖𝑖 at age 𝑎𝑎. It assumes that
attractiveness evolves linearly over age. Later in this section, we reject a quadratic model. 𝜌𝜌0,𝑖𝑖 captures
individual, specific heterogeneity in attractiveness. This can be trivially estimated by using OLS with a
single observation per individual, i.e., evaluating attractiveness at the individual’s current age. We would
obtain estimates of (𝜆𝜆�0 , 𝜆𝜆
�1 ) as well as the residual heterogeneity 𝜌𝜌0,𝑖𝑖 . This can subsequently be used to
predict the attractiveness at a target age for individual 𝑖𝑖. This is a naive approach that assumes that
attractiveness for all individuals evolves similarly. Unfortunately, this naive assumption on a
homogeneous attractiveness evolution is refuted by past studies on the evolution of looks over age. Zhang
et al. (2017) suggest that looks evolve over a complex manifold. If person A is 1 rating point higher than
person B in 2017, it does not necessarily mean that this gap was preserved at their MBA graduation date.
Figure 4 depicts such a manifold where looks of different individuals take different evolution paths over
their age.
We would like to estimate a heterogeneous model of attractiveness evolution where 𝜌𝜌1,𝑖𝑖 𝑎𝑎 captures
different evolutions across individuals. 𝜌𝜌1,𝑖𝑖 can not be estimated with a single profile picture per
individual. Since a person is observed only once, an alternative could be to use a picture at a different age
for a different individual with similar looks. We do have individuals observed at different ages, e.g., a
person A at 40 and a person B at 28. However, we do not know if A looked like B when they were both 28.
If we did, we could potentially extrapolate how B will look at 40 or how A would have looked at 28. This
cannot be solved with our dataset. We need individuals (e.g., person C and D) whose pictures are available
over time. Between C and D, if C was closer than was D in resemblance to A at 40, we can extrapolate that
A would also have looked like C (and not D) at 28. Next, to solve this problem, we describe the Conditional
Adversarial Auto Encoder (CAAE) model.
Figure 4: The figure illustrates the heterogeneous evolution of faces over a complex manifold. This is a very simplified
example for illustration purposes only (Zhang et al. 2017).
The training data for the CAAE model is limited; it cannot learn a unique evolution path for every single
face. It must generalize. Therefore, it tries to learn evolution paths for a large number of representative
faces seen during training. For a new test face not seen during training, the model predicts an evolution
path based on evolution seen for training faces that were similar to the new test face. It does so by
creating a low-dimensional representation for an input face picture such that similar faces are close
together in this low-dimensional space. Thus, the model has two building blocks, i.e., one used to find a
low-dimensional latent representation and the second used to create pictures at the target age from this
latent representation.
The CAAE produces an output 𝑋𝑋′ (an entire image) for a profile picture X and a target age a. A CAAE model
learns the probability distribution 𝑝𝑝(𝑋𝑋′/𝑋𝑋, 𝑎𝑎). It does so by combining two blocks 𝑝𝑝(𝑧𝑧/𝑋𝑋) and 𝑝𝑝(𝑋𝑋′/𝑧𝑧, 𝑎𝑎)
where 𝑧𝑧 is the latent low-dimensional representation. The first block creates a latent-lower dimensional
representation. The second block creates realistic morphed images 𝑋𝑋′ without using the full input image
𝑋𝑋 and instead relying on a latent representation. This latent representation ensures that the image
morphing is generalized to faces not seen during training. These two blocks are common in Computer
Vision literature, i.e., Autoencoders (AE) and Generative Adversarial Networks (GAN, Goodfellow et al.
2014) respectively. We provide intuition behind these two building blocks and then describe how they are
combined together in CAAE.
GAN: The GAN learns to “hallucinate” photo-realistic face pictures of a certain age 𝑎𝑎. The hallucinated
images 𝑋𝑋′ are generated by using the age 𝑎𝑎 and white-noise random shock 𝜖𝜖 as inputs, i.e., 𝑝𝑝(𝑋𝑋′/𝜖𝜖, 𝑎𝑎). A
GAN has two components, namely, a generator (G) and a discriminator (D). The generator creates
pictures; the discriminator tells if the image is “hallucinated” by the generator or its an actual image. The
discriminator’s job can be thought of as a person giving feedback on the quality of the generated pictures.
These models take millions of iterations: it is not feasible to use human feedback on millions of generated
pictures.
Initially, both these components are terrible at their task. The generator creates pixel values that look
nothing like a face, but the discriminator cannot tell this garbage from actual face pictures. Over training
iterations, the discriminator becomes better at discerning “hallucinated” pictures from actual images. The
model’s objective function pits the generator against the discriminator. Therefore, the generator also
becomes truly proficient at producing near real images for any age. This method derives its power from
the ability of the deep neural network layers of the generator and the discriminator to break down high-
dimensional complex distributions into simpler building blocks. The shallow layers of the generator learn
to create basic image pieces(e.g., edges, circles). The deeper layers combine together these basic pieces
into more sophisticated forms (e.g., ears, hair, texture).We call pictures created by the GAN generator
AE: The AE block helps to create pictures targeted toward a specific individual. The AE learns 𝑝𝑝(𝑧𝑧/𝑋𝑋)
where 𝑧𝑧 is a low-dimensional representation of the input image 𝑋𝑋. The encoder converts the input image
to the latent encoding (𝑋𝑋 → 𝑧𝑧). The decoder reconstructs the image back from the encoding (𝑧𝑧 → 𝑋𝑋′). The
AE is trained to ensure that this latent encoding 𝑧𝑧 is sufficient to reconstruct back the original input image.
The AE objective function forces the decoder output to be close to the input (|𝑋𝑋 ′ − 𝑋𝑋| → 0)
Figure 5: Independent Auto Encoder and GAN models. The Decoder on the autoencoder is merged with the Generator on the
GAN.
The CAAE model combines the AE and GAN building blocks. More specifically, it forces the decoder (𝑧𝑧 →
𝑋𝑋′) on the AE module to also act as the generator for the GAN module. The latent representation 𝑧𝑧
replaces the random shock 𝜖𝜖. While the AE objective function forces the decoder output to be close to
the input (|𝑋𝑋 ′ − 𝑋𝑋| → 0), the output need not be realistic. The additional GAN objective further forces the
decoder to generate realistic outputs that can fool the discriminator. This ensures that the CAAE model
balances the generation of images that are both realistic for an age 𝑎𝑎 and close to the input image 𝑋𝑋. In
Appendix A.12, we provide more details on how the GAN and AE modules are trained. Once the entire
CAAE network is trained, it is able to produce realistic images for a new face image conditioned on any
arbitrary age value.
Figure 6: (from left to right) The original face image, the aligned version, and the generated (reconstructed) images at the
current age, 10 years prior to the current age and 10 years past the current age. Note how facial hair is minimized even on
the current age picture.
At a very high level, our primary data is limited, as we observe a single image per person. This CAAE model
uses completely unrelated labeled data (Morph Dataset and CACD dataset) where multiple images are
In the equation below, we had initially set out to estimate the heterogeneous evolution 𝜌𝜌1,𝑖𝑖 . The CAAE
model has given us a complex manifold of the heterogeneous evolution of looks. The attractiveness
∗
prediction 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑖𝑖,𝑎𝑎 for the CAAE generated images does not follow an interpretable functional form
such as the one below. We therefore tie the deep learning predictions back to an interpretable and
unbiased linear model. By doing so, we are trading off the higher accuracy of the complex high-
dimensional manifold learned by CAAE with a simpler but unbiased model of heterogeneous evolution.
Thus, we combine the advantages offered by two paradigms: (1) the capacity of deep learning models to
learn high-dimensional pixel-level distributions and (2) the ability of linear econometric models to
estimate interpretable and unbiased heterogeneous characteristics.
∗
𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑖𝑖,𝑎𝑎 = 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑖𝑖,𝑎𝑎 + 𝜂𝜂𝑖𝑖,𝑎𝑎
Note that originally, this model could not be uniquely identified since we have a single observation per
individual. Now, we can utilize the outputs of the deep learning module to estimate the individual specific
evolution over age (𝜌𝜌1,𝑖𝑖 𝑎𝑎). We use a ridge regression for estimation. This is meant to penalize (𝜌𝜌1,𝑖𝑖 ), i.e.,
the creation of a large gap between individual specific attractiveness evolution and the population level
Zhang et al. (2017) perform a survey to validate CAAE-generated face images for which there is a greater
than 20-year age gap between the generated face and the available ground-truth pictures. A total of
48.4% of the respondents indicate that the generated face image depicts the same person as the ground
truth image, 29.6% of the respondents indicate that it does not, and the rest of the respondents are not
sure. Extrapolating a person’s looks 20 years out is an extremely demanding task. CAAE outperforms
existing methods, but it is not clear what level of accuracy should be considered sufficient.
In Appendix A.3, we show that our attractiveness-bias analysis is robust to different attractiveness models:
(i) heterogeneous evolution over a career (𝜆𝜆0 + 𝜆𝜆1 𝑎𝑎 + 𝜌𝜌0,𝑖𝑖 + 𝜌𝜌1,𝑖𝑖 𝑎𝑎) by using CAAE, (ii) homogeneous
evolution over the career (𝜆𝜆0 + 𝜆𝜆1 𝑎𝑎 + 𝜌𝜌0,𝑖𝑖 ) without using CAAE and (iii) non evolving measure (𝜆𝜆0 + 𝜌𝜌0,𝑖𝑖 )
assuming no change in attractiveness. Our results are robust to binary or continuous measures of
attractiveness. In spite of robustness to various attractiveness measures, our calculations above are
critical to answer a major criticism of long duration (up to 25 years) study of characteristics that may
evolve heterogeneously.
We define a job as a combination of title, employer size and employer industry. This ensures that positions,
such as “CEO at very small internet” company and “CEO at large finance” company are not ranked together
simply because the title “CEO” remains the same in otherwise very dissimilar jobs. The former is likely a
position at a 5-person startup company, while the latter may be a position leading thousands of
employees in a multi-billion dollar company. Once we have a set of jobs that occur relatively frequently in
our dataset, we count all the directed transitions between any two pairs of jobs. The matrix M represents
The Page Rank algorithm was originally used to determine the web page’s importance based on its in-links
and out-links. In our case, jobs are equivalent to web pages, and links are equivalent to job transitions. A
top position is unlikely to have the largest number of incoming transitions; instead, it receives incoming
transitions from relatively senior jobs (e.g., CFO to CEO, Principal to CEO). This algorithm iteratively
propagates rank along the directed edges, therefore eventually transferring weight from a junior position
(intern, analyst) to top positions (CEO, board member). The rank (r) corresponds to the likelihood of
ending up in a given job if an individual spends an infinite time in the job market switching between the
jobs, with the probability of each step drawn from the transitions matrix (M).
Figure 7: (Left) Job switches observed between a set of four jobs (a, b, c, d). The number next to every edge represents the
number of job switches. (Right) The resulting matrix M, the initial ranks 𝑟𝑟0 and the final ranks r are shown.
Next, we deal with the less frequent jobs. As an example, “principal consultant at a large consulting firm”
and “software developer at a large IT firm” are two common jobs that can be used to infer the ranking for
positions, such as the “principal consultant for the energy-oil industry at a large consulting firm” and the
”Agile software developer at a large IT firm”, respectively. We convert job strings into a tf-idf (Mihalcia
2006) representation. We chose 2000 as the size of the vocabulary, therefore describing every job as a
2000-dimensional vector. The ith entry for a job vector represents whether the ith vocabulary word occurs
(1) or not (0) in this job. This choice of vocabulary size keeps words informative of rank, such as “vice”,
“associate”, and “chief”, while eliminating rare words, such as “agile”, “rockstar-coder”, and “energy-oil”.
As rare vocabulary words such as ”agile” are ignored in the vector notation, the two jobs, “software
developer at large IT” and “Agile software developer at large IT”, are exactly superimposed in this tf-idf
vector space. More generally, the tf-idf representation allows us to calculate the distance between any
two jobs. We use the weighted k-Nearest Neighbor (Cover et al. 1967, k = 10) to approximate the ranking
for the less frequent jobs, based on the 10 nearest-ranked jobs.
We try multiple hyper-parameter settings in this job rank calculation, e.g., k=2,10,50 for k-NN,
N=100,1000,5000, for the top N frequent jobs, and the tf-idf vocabulary size = 100,500,1000. These hyper-
parameters are chosen to maximize the performance on a held-out validation dataset. The performance
is measured by how well the calculated job ranks from the training data explain the job switches on the
validation data. On an average, we expect individuals to move to higher-ranked jobs over their career.
Therefore, calculated job ranks are good if a high proportion of job switches go from a lower-ranked job
to a higher-ranked job. If the job ranks are uninformative (or random), this proportion will be 0.5. We
select hyper-parameters that maximize this ratio. We are able to avoid overfitting by the use of a held-
Table 7: A small sample of jobs and the respective log of page ranks.
Title Company Size Industry Category Log Rank
Investment Banking Summer Analyst Large IT -8.581
Systems Analyst Very Small IT -8.371
Marketing Coordinator Small IT -8.265
Senior Research Analyst Medium Finance -8.172
Product Director Small IT -7.979
Regional Director Small Others -7.847
VP Finance Medium Consulting -7.611
Chief Revenue Officer Very Small Finance -7.322
Table 7 reports a random sample of jobs and their respective rankings. For every individual, we find the
job performed at t years after MBA graduation and the corresponding job rank 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 . Figure 8 plots
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 trend over time averaged across profiles. In addition to the validation performance, this shows
that our job ranks grow almost linearly and overall appear to be sensible. Despite the almost linear trend,
there are no guarantees on the distribution of these page ranks 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 . We calculate 𝑟𝑟𝑖𝑖,𝑡𝑡 as a standardised
version of 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 , with 𝜇𝜇𝑡𝑡 �𝑟𝑟𝑖𝑖,𝑡𝑡 � = 0 and 𝜎𝜎𝑡𝑡 �𝑟𝑟𝑖𝑖,𝑡𝑡 � = 1 across all peers at t years after graduation. This
measures how individual 𝑖𝑖 is placed at year 𝑡𝑡 relative to all their peers at the same stage of their career.
𝑟𝑟𝑖𝑖,𝑡𝑡 ≥ 0 means that the individual’s 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 is greater than the mean rank of all other peers exactly t years
after graduation. In Appendix A.4, we show that our results are robust to either choice 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 or 𝑟𝑟𝑖𝑖,𝑡𝑡 . Our
primary results stand, even when we use a coarse, binary success state of individual 𝑠𝑠𝑖𝑖,𝑡𝑡 ∈ {0,1}. 𝑠𝑠𝑖𝑖,𝑡𝑡 = 1
for the top half of the individual’s peers, and 𝑠𝑠𝑖𝑖,𝑡𝑡 = 0 for the bottom half of the individual’s peers.
5. Empirical Strategy
We first present some model-free evidence and then describe our primary model.
We divide individuals in our dataset into attractive (𝑏𝑏𝑖𝑖,0 = 1) and plain-looking (𝑏𝑏𝑖𝑖,0 = 0) groups by using
their attractiveness at the time of MBA graduation (𝑡𝑡 = 0). For every individual, we find the job performed
at t years after MBA graduation and their corresponding page rank 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 and the standardized rank 𝑟𝑟𝑖𝑖,𝑡𝑡 .
The latter, described in section 4.3, measures how individual 𝑖𝑖 is placed at year 𝑡𝑡 relative to all his or her
peers at the same stage of their career. Figure 8 presents the mean of the 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 , and 𝑟𝑟𝑖𝑖,𝑡𝑡 measures for
both groups over time.
Figure 8 shows that from the beginning, there appears to be a small gap between the attractive and plain-
looking groups. The gap grows over time. The effect of attractiveness 𝑏𝑏𝑖𝑖,𝑡𝑡 after MBA graduation is our
primary focus. It is likely that attractiveness bias plays a role, even before the start of the post-MBA career.
It may help individuals to gain better education and skills. This cumulative effect of bias in the 5-10 years
prior to MBA graduation will appear on the 𝑟𝑟𝑖𝑖,0 , i.e., the first job after MBA graduation. We perform
Figure 8: The dark and gray lines are attractive and plain-looking groups respectively. The groups start their career with a
gap in the first year after MBA Graduation. The dashed lines are matched on all pre-MBA graduation characteristics. The left
figure represents raw job ranks (from page rank), while the right figure represents the proportion of individuals in each
group that are successful
Figure 9: The dark and gray lines are attractive and plain- looking groups, respectively. The groups start their career with a
gap in the first year after MBA graduation. The dashed lines are matched on all pre-MBA graduation characteristics as well
as job rank attained after 5 years after MBA graduation.
The gap appears to widen gradually for both unmatched and matched cases, developing into a large gap
by the end of the career. This second observation can be reasoned in two ways: (i) a positive attractiveness
bias exists late in the career or (ii) the small gap in the early career places attractive individuals in slightly
better roles. However, these small early differences have far reaching effects, as they are propelled
upward later in a career. This may happen because of high-quality on-the-job training, e.g., a McKinsey or
Bain Consulting junior associate who gets first hand industry-wide experience. To test the latter, we match
the two groups on job rank attained at the end of 5 years in addition to pre-MBA graduation
The role of early career belief bias cannot directly be inferred here. A small difference that develops
early in a career can be explained by positive preference bias combined with (i) positive belief bias, (ii)
zero belief bias, or (iii) small, negative belief bias. Next, we develop a full model to precisely measure
these effects.
5.2 Model
The career success for individual 𝑖𝑖 in year 𝑡𝑡 is measured by 𝑟𝑟𝑖𝑖,𝑡𝑡 . The year 𝑡𝑡 denotes the duration of time
from MBA graduation, i.e., a job for a 2005 graduate in year 2009-2010 and job for a 2010 graduate in
year 2014-2015 are both 𝑡𝑡 = 5 years from graduation. At year 1, a senior associate may be a successful
job 𝑟𝑟𝑖𝑖,1 > 0, while an intern is likely to be an unsuccessful state 𝑟𝑟𝑖𝑖,1 < 0. By year 10, a senior associate
may be an unsuccessful state 𝑟𝑟𝑖𝑖,10 < 0 relative to that of the senior associate’s peers, while a managing
partner 𝑟𝑟𝑖𝑖,10 > 1 is likely to be succesful. The job attained and thus the individual’s state in year 𝑡𝑡 + 1
depends on the individual’s state in the previous year 𝑟𝑟𝑖𝑖,𝑡𝑡 (Rubenstein and Weiss 2006). The figure below
illustrates this with an example where 𝑃𝑃(𝑟𝑟𝑖𝑖,𝑡𝑡+1 > 0/𝑟𝑟𝑖𝑖,𝑡𝑡 > 0) = 0.85 and 𝑃𝑃(𝑟𝑟𝑖𝑖,𝑡𝑡+1 < 0/𝑟𝑟𝑖𝑖,𝑡𝑡 < 0) = 0.15
are unequal. This is because the state at year 𝑡𝑡 + 1 does not simply depend on performance or the bias
between year 𝑡𝑡 and 𝑡𝑡 + 1. Early success (say 𝑟𝑟𝑖𝑖,1 ) due to early performance or bias may have a far-reaching
effect. Over and above these average career progression dynamics (𝛽𝛽0 𝑠𝑠𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 ), the individual’s ability
����⃗4 Z𝑖𝑖 ), bias (𝛽𝛽2 𝑏𝑏𝑖𝑖,𝑡𝑡 ), heterogeneous industry-specific factors (𝛽𝛽
(𝛽𝛽 ����⃗3 𝑋𝑋𝑖𝑖,𝑡𝑡 ) and career-jump specific shocks 𝑢𝑢𝑖𝑖,𝑡𝑡
play a key role in determining this career growth.
����⃗3 𝑋𝑋𝑖𝑖,𝑡𝑡 + 𝛽𝛽
𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝛽𝛽0 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 + 𝛽𝛽2 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖,𝑡𝑡
Figure 10: Success at year 𝑡𝑡 + 1 does not simply depend on performance or bias between year 𝑡𝑡 and 𝑡𝑡 + 1. Early success (𝑟𝑟𝑖𝑖,1 )
due to early performance or bias may have a far-reaching effect. This is because 𝐸𝐸(𝑟𝑟𝑖𝑖,𝑡𝑡+1 /𝑟𝑟𝑖𝑖,𝑡𝑡 > 0) > 𝐸𝐸(𝑟𝑟𝑖𝑖,𝑡𝑡+1 /𝑟𝑟𝑖𝑖,𝑡𝑡 < 0)
We cannot directly test an individual’s ability; instead, we account for this via available signals. Z𝑖𝑖 captures
the individual’s undergraduate university rank, undergraduate area of study, the length of experience
between undergraduate graduation and attendance at an MBA program, and MBA program rank. The
undergraduate area of study (Science, Arts, Business) is accommodated as a fixed effect. The differences
in undergraduate education create differences in the individuals’ skills, as well as their fit for specific roles.
We also expect higher-ranked MBA programs to typically have a positive impact on an individual’s career.
We also control for the year of graduation. For estimation, we pool together MBA graduates between
1990 and 2015. This potentially leads to the following two major concerns: (1) Typical career progression
in the 2000s may have changed significantly due to structural shifts in job markets. Employers may have
become less hierarchical and more accepting of young highly talented individuals taking on critical roles
early in their career. As a result, the 1990s’ graduates may have moved slower through their early career.
(2) Even if the structure of the job market is assumed to have remained static, the 1990s’ graduates are
less likely to accurately record their early career progression on this professional social network. Thus,
there is a greater likelihood of early graduate profiles to have missing career steps. Both (1) and (2) result
in a perception of slower (or at least different) career growth for early graduates (i.e., the 1990s’) relative
to more recent graduates (i.e., the 2000s’). Controlling for year of graduation eliminates these concerns.
While attractiveness is the primary focus of our study, gender and ethnic bias is well documented in prior
research. An individual’s gender and ethnicity is also captured as part of Z𝑖𝑖 while their attractiveness at
the time of graduation is denoted by 𝑏𝑏𝑖𝑖 .
Individuals at the same stage of their career and with similar abilities may progress differently because of
the typical progression in their domain. 𝑋𝑋𝑖𝑖,𝑡𝑡 captures the individuals’ current industry, job type, location
size and employer size. Hypothetically, a Google software developer may start at $150,000, but growth
may be slower. In contrast, a Goldman Sachs investment banker may start off rotating across divisions,
producing little, and earning $70,000. However, he or she may have much faster executive growth
opportunities. These domain specific idiosyncrasies are captured by 𝑋𝑋𝑖𝑖,𝑡𝑡 . Finally, individual career
outcome-specific shocks are modeled as 𝑢𝑢𝑖𝑖,𝑡𝑡 .
A similar model specification has been used in income and earning equations. The literature (Zhang and
Zhou 2017, Altonji et al. 2013, Browning and Ejrnaes 2013) on individual earning dynamics captures the
dynamics by using a time series ARMA process and time-invariant heterogeneity, such as schooling, race,
experience, employment duration and job tenure.
6. Results
We estimate our model by using 19,893 profiles. Table 8 provides the descriptive statistics for the
independent and dependent variables in our model. We observe a total of 190,031 career snapshots (𝑖𝑖, 𝑡𝑡
combinations) for 19,893 individuals 𝑖𝑖. On average, we have 9.51 yearly snapshots for an individual.
While we study 15 years of an individual’s career, this average is naturally smaller because of the
following: (i) individuals who graduate after 2002 do not have 15 years of experience after their MBA
graduation, and (ii) some individuals may leave gaps in their career history, e.g., individuals may have
worked in job 1 from 2002-2006 and in job 2 from 2010-2013. The rank is standardized to a mean of 0
We estimate our primary model (Table 9 Model 1,2) by using a static persistent impact of attractiveness
𝛽𝛽2 𝑏𝑏𝑖𝑖 throughout the 15-year career period. We estimate interaction effects with gender but do not find
any statistically significant differences. The estimates for 𝛽𝛽2 = 0.011 are positive and significant at the
1% level. Since ranks (𝑟𝑟𝑖𝑖,𝑡𝑡 ) are standard normal, this means that attractive individuals gain a 0.011
standard deviation in a single year.
𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝛽𝛽0 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 + 𝛽𝛽2 𝑏𝑏𝑖𝑖,𝑡𝑡 + ����⃗ ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖,𝑡𝑡
𝛽𝛽3 𝑋𝑋𝑖𝑖,𝑡𝑡 + 𝛽𝛽
Attractiveness Gap: Here, we define the attractiveness gap that will be used going forward. A gain of a
0.011 standard deviation in a single year means that the job ranks for the two groups are normally
distributed as N(0,1) and N(0.011,1) at the end of the year. We can use these distributions to calculate
the probability that an attractive individual’s rank 𝑟𝑟𝐴𝐴 exceeds the rank 𝑟𝑟𝑛𝑛𝑛𝑛 of their plain-looking
counterpart. This probability comes out to 0.5044 or 50.44%. We call this a single-year attractiveness
gap of 0.44%. Going forward, we use this definition of the attractiveness gap, which is used in comparing
attractive and plain-looking individuals having similar jobs at the start of the year and a similar education
background. While statistically significant, this difference seems economically small. Later in this section,
we will look at the lifetime impact through the yearly accumulation of this premium.
0.5
𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝐺𝐺𝐺𝐺𝐺𝐺 = 𝑃𝑃(𝑟𝑟𝐴𝐴 > 𝑟𝑟𝑛𝑛𝑛𝑛 ) = ∫ |exp(−(𝑟𝑟 − 0.011)2 ) − exp(−𝑟𝑟 2 )| 𝑑𝑑𝑑𝑑
√2𝜋𝜋
First, we modify the model to include two disparate impacts of attractiveness, 𝛽𝛽2𝑃𝑃 and 𝛽𝛽2𝐵𝐵 , denoting the
preference effect and belief effect, respectively. The preference effect is modeled to play a persistent
role throughout the career. The attractiveness belief effect is modeled to vanish after the first half of the
career, i.e., after 6 years.
����⃗3 𝑋𝑋𝑖𝑖,𝑡𝑡 + 𝛽𝛽
𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝛽𝛽0 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 + 𝟏𝟏(𝑡𝑡 ≤ 6)𝛽𝛽2𝐵𝐵 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽2𝑃𝑃 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖,𝑡𝑡
As discussed in section 2.3, we expect 6 years to be more than sufficient duration for prior beliefs to be
overcome by actual performance signals (Andreoni & Petrie 2003, Rezlescu et al. 2012). We expect that
after 6 years, employers, managers or any other evaluators will have sufficiently long and informative
performance history for the subject. In Appendix A.2, we report the robustness of our results with respect
Table 9: (Model 1 and 2) The impact of attractiveness on career rank with and without gender interaction. (Model 3 and 4) The
impact of attractiveness preference and attractiveness belief on career rank with and without gender interaction. We skip some
covariates here and report the full table in Appendix A.14.
The preference bias is positive and significant (Table 9 Model 1,2). The preference bias in a single year
leads to a gain of a 0.016 standard deviation. This is equivalent to a 50.64% probability that an attractive
individual exceeds a plain-looking counterpart, both individuals having started the year in a similar
position. This establishes the importance of preference or taste-based bias in determining career
success, i.e., a preference bias attractiveness gap of 0.64% per year.
We highlight the impact of secondary characteristics. Both the undergraduate and MBA program ranks
have a significant negative impact. A 100th ranked MBA program hurts success by 3.9% per year compared
to the 1st ranked MBA program. Even though we are focusing on the Masters in Business Administration
degree, a business undergraduate degree appears to hurt career success by 1.0% per year.
We answer related questions and provide support to our primary hypothesis by using alternative model
specifications. Prior empirical research (Hamermesh and Biddle 1994, Zebrowitz and Donald 1991) shows
an economically significant impact of the attractiveness bias over an entire career. Our primary model
shows a 0.44%-0.64% advantage for attractive individuals in a single year.
����⃗3 𝑋𝑋𝑖𝑖 + 𝛽𝛽
𝑟𝑟𝑖𝑖,15 = 𝛽𝛽1 + 𝛽𝛽2 𝑏𝑏𝑖𝑖 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,15 = 𝛽𝛽1 + 𝛽𝛽2 𝑏𝑏𝑖𝑖 + 𝛽𝛽 ����⃗3 𝑋𝑋𝑖𝑖 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖
We develop a simple model to measure for a period of over 15 years the lifetime premium for an attractive
individual (Table 11 Model 3). We use 8,828 individual profiles where we observe at least 15 years of
career length. In this model, we replace the time-variant characteristics of the industry domain 𝑋𝑋𝑖𝑖,𝑡𝑡 with
time-invariant characteristics 𝑋𝑋𝑖𝑖 . For example, if an individual switches industries over his or her career,
we only use the single industry where the individual spends the maximum duration. Attractive individuals
gain a premium of a 0.067 standard deviation over a 15-year career. For career advancement, this is
equivalent to a 52.67% probability that an attractive individual exceeds a plain-looking counterpart, both
having started their careers in a similar position. Note that this difference exists even after controlling for
undergraduate education, pre-MBA experience and the MBA school attended. Table 11, Model 1, reports
a premium of a 0.09 standard deviation between attractive and plain-looking individuals over a 15-year
career when these controls are dropped. Thus, there is an overall attractiveness gap of 3.6% (53.6%
probability of better career performance). This likely indicates the presence of an attractiveness premium
even prior to MBA graduation.
Past attractiveness bias studies show a wage gap of 2-8% (Hamermesh and Biddle 1994, Biddle and
Hamermesh 1998, Doorley and Siesminska 2015). Gender bias studies have shown a wage gap of 16%-
22% (O’Brien 2015). Wage gap calculations typically combine the effects of different jobs attained by
two groups and different wages for the same job. We do not measure wages; instead, we focus only on
We estimate another model to measure the difference between attractive and plain-looking individuals
for their first job within 1 year of MBA graduation (Table 11 Model 4). A large gap between the two groups
would indicate a role of belief bias in addition to that of preference bias.
We find a positive and significant difference between the two groups when not controlling for education
and pre MBA experience (Table 11 Model 4). Attractive individuals have an advantage of 0.035 standard
deviation on their first job after MBA graduation. This is an extremely large gap within a year from MBA
graduation, compared to the yearly premium of a 0.011-0.016 standard deviation that we found with our
primary model. This is also a one time gap, no other 1 year duration along the career leads to such a large
gap. This can only point to large belief bias in the 1st year alone. However, once we additionally control
for undergraduate education, pre-MBA job experience and the MBA program rank (Table 11 Model 5),
the first job difference between the two groups vanishes. These two results suggest that the large gap
within 1 year of MBA graduation, is not likely to be result of bias in 1 year alone. Instead it is likely an
accumulation of bias prior to MBA. The large gap in the first job, without controls, can have two
explanations, namely, (i) a preference based bias before MBA or (ii) a belief based bias before MBA.
Attractiveness bias prior to MBA graduation is not the focus of our study. Nevertheless it is important to
highlight a common mis-interpretation using these results. If a succesful senior job (say CEO) is dominated
by attractive individuals, it does not mean all the bias occurs when the board picks the CEO. Instead it is
likely an outcome of gradual bias throughout the career. Simply a greater likelihood of becoming a CEO
does not clarify if an attractive individual receives premium early or late in the career. Following from this
example, the 1st job difference is not the result of a large overt bias in 1 year. This supports our primary
hypothesis that belief bias does not provide a significant premium to attractive individuals after MBA
graduation. Preference bias also does not amount to a large gap within a year, instead it gradually adds
up over a 15 year career.
We had set out to answer the following three research questions: (i) Is preference bias absent, positive
or negative? (ii) Is belief bias absent, positive or negative? (iii) What is the combined effect when
preference and belief bias coexist early in professional careers? In response to the first question, we find
there is a positive (0.64% gap per year) preference bias. Regarding the second question, we find that
there is an insignificant belief bias in professional careers after MBA graduation. In response to the third
question, we find that there is a positive combined effect of belief and preference bias when they
coexist early in a career and that this positive combined effect is largely driven by the latter. In the next
section, we highlight alternate explanations and some concerns with our model. We also provide
robustness checks to defend against these limitations.
7. Robustness Checks
Our primary model relies heavily on attractiveness and job rank calculated from unstructured data. Using
measures calculated from unstructured text (Lee et al. 2016) and images (Zhang et al. 2018) in
econometric models has become a common practice. However, employing machine learning methods to
In our primary model, the individual’s ability is controlled via undergraduate degree, rank, pre-MBA
experience and MBA rank. One could argue that this does not sufficiently account for an individual’s
ability. In Appendix A.5, we estimate our model with additional unobserved random effects for
undergraduate school and the MBA school. This may still leave room for an unobserved ability 𝜉𝜉𝑖𝑖 . If so,
our random career transition shocks (𝑢𝑢𝑖𝑖,𝑡𝑡 = 𝜉𝜉𝑖𝑖 + 𝜂𝜂𝑖𝑖,𝑡𝑡 ) would be serially correlated. Estimating individual-
level random effects is not trivial because 𝜉𝜉𝑖𝑖 is correlated with the explanatory variable 𝑟𝑟𝑖𝑖,𝑡𝑡−1 (Dynamic
Panel Bias, Nickell 1981). The solution is to use (𝑟𝑟𝑖𝑖,𝑡𝑡−2 , 𝑟𝑟𝑖𝑖,𝑡𝑡−3 …) as the instrument variable for 𝑟𝑟𝑖𝑖,𝑡𝑡−1
(Anderson and Hsiao 1981). We show the robustness of our results by estimating this alternative model.
One could argue that unsuccessful individuals (small 𝑟𝑟𝑖𝑖,𝑡𝑡−1 ) may be more likely to drop out of the job
market if they have an outside option. Let us consider that an individual makes a choice whether to drop
∗
out of the job market 𝑑𝑑𝑖𝑖,𝑡𝑡 based on the job rank in the previous year 𝑟𝑟𝑖𝑖,𝑡𝑡 and a random shock 𝜓𝜓𝑖𝑖,𝑡𝑡 . At MBA
∗ ∗
graduation, all individuals remain in the job market (𝑑𝑑𝑖𝑖,0 > 0 ∀ 𝑖𝑖). In year 1, individuals with 𝑑𝑑𝑖𝑖,1 < 0 drop
∗
out. The remaining individuals are those that did not drop out, i.e., 𝑑𝑑𝑖𝑖,1 > 0. The average rank of
individuals remaining in the job market at year 1 (𝐸𝐸 ′ �𝑟𝑟𝑖𝑖,1 �) is different from an uncensored average
(𝐸𝐸�𝑟𝑟𝑖𝑖,1 �) as shown below. Similarly, the average rank of individuals in the job market at year 2 (𝐸𝐸 ′ �𝑟𝑟𝑖𝑖,2 �) is
further censored by the dropouts in year 1 or year 2.
∗
𝑑𝑑𝑖𝑖,𝑡𝑡 = 𝛿𝛿0 + 𝛿𝛿1 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝜓𝜓𝑖𝑖,𝑡𝑡 ; 𝜓𝜓𝑖𝑖,𝑡𝑡 ~𝑁𝑁(0, 𝜎𝜎𝜓𝜓 )
𝐸𝐸 ′ �𝑟𝑟𝑖𝑖,2 � = 𝐸𝐸[𝑟𝑟𝑖𝑖,2 /𝜓𝜓𝑖𝑖,𝑜𝑜 > −(𝛿𝛿0 + 𝛿𝛿1 𝑟𝑟𝑖𝑖,0 ), 𝜓𝜓𝑖𝑖,1 > −(𝛿𝛿0 + 𝛿𝛿1 𝑟𝑟𝑖𝑖,1 )]
Consider profiles collected in 2017 for graduates from 1997. The average observed job rank 𝐸𝐸 ′ �𝑟𝑟𝑖𝑖,5 � for
these individuals in 2002 will be different than the actual 𝐸𝐸�𝑟𝑟𝑖𝑖,5 �. This is because some of those graduates
∗ ∗
may have dropped out prior to 2002 (𝑑𝑑𝑖𝑖,𝑡𝑡 < 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡 ∈ [0,5]) or between 2002 and 2017 (𝑑𝑑𝑖𝑖,𝑡𝑡 < 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡 ∈
′ ∗
𝐸𝐸𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔=1997 �𝑟𝑟𝑖𝑖,5 � = 𝐸𝐸�𝑟𝑟𝑖𝑖,5 /𝑑𝑑𝑖𝑖,𝑡𝑡 > 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡 ∈ [0,5]⋃[5,20]�
′ ∗
𝐸𝐸𝑔𝑔𝑔𝑔𝑔𝑔𝑑𝑑=2012 �𝑟𝑟𝑖𝑖,5 � = 𝐸𝐸�𝑟𝑟𝑖𝑖,5 /𝑑𝑑𝑖𝑖,𝑡𝑡 > 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡 ∈ [0,5]�
Now consider a different graduating class from 2012. Once again, the observed average (𝐸𝐸 ′ �𝑟𝑟𝑖𝑖,5 � > 𝐸𝐸�𝑟𝑟𝑖𝑖,5 �)
∗
is upwardly biased. However, this upward bias is only the result of dropouts prior to 2017 (𝑑𝑑𝑖𝑖,𝑡𝑡 < 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡 ∈
[0,5]). We can compare the average job rank for 1997 graduates in 2002 and 2012 graduates in 2017. The
second sample also looks at job ranks 5 years from graduation but has a smaller dropouts. If unsuccessful
′ ′
individuals systematically drop out (𝛿𝛿1 > 0), then 𝐸𝐸𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔=2012 �𝑟𝑟𝑖𝑖,5 � < 𝐸𝐸𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔=1997 �𝑟𝑟𝑖𝑖,5 �. The figure below
provides this comparison and suggests no significant impact of job rank on dropping out. We also find that
the job rank (𝐸𝐸𝑔𝑔′ �𝑟𝑟𝑖𝑖,𝑡𝑡 �) and graduation year (𝑔𝑔) correlation to be statistically insignificant (Appendix A.10).
This means that dropout is not a systematic result of a lower (or higher) job rank.
Figure 11: Average job page ranks (and standard deviation) attained in 5 years for different graduating classes between
1990 and 2010. If lower-ranked individuals drop out, the earlier graduation year should appear to have higher job page
ranks. No evidence of dropouts of lower-ranked individuals was found.
∗
𝑑𝑑𝑖𝑖,𝑡𝑡 = 𝛿𝛿0 + 𝛿𝛿1 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛿𝛿2 𝑏𝑏𝑖𝑖,𝑡𝑡−1 + 𝜓𝜓𝑖𝑖,𝑡𝑡
A second argument could be that attractive individuals are more (or less) likely to drop out (𝛿𝛿2 ≠ 0).
This may be the result of better (or worse) outside options for attractive individuals (e.g., financially
secure marriage prospects). Theoretically, we can present a similar comparison between average
attractiveness 5 years after MBA graduation for different graduating classes. If attractive individuals
drop out systematically, we would expect the average attractiveness of the observed sample of 1997
graduates to be smaller than that of the 2012 graduates. Unfortunately, our attractiveness measures are
calculated (section 4.2) with an implicit assumption that different graduating classes do not differ in
their average attractiveness. We do not have evidence to reject this possibility.
This possibility interferes with our attractiveness bias analysis if both of the following are true: (i)
Attractive unsuccessful individuals are more likely to drop out. Our data will only observe relatively
In addition to rank and attractiveness, dropouts may also systematically happen based on gender and
graduation era. This may be because of the following: (i) The outside option is better (or worse) for one
gender than the other (e.g., homemaker or blue collar work). (ii) Certain periods may have unusually low
employment, e.g., the 2008-2009 crisis, where unemployment temporarily went from 4% to 10%. We
have access to a secondary dataset that provides visibility into the gender and graduating class of
individuals whose profiles are selected out of our sample. In Appendix A.10, we estimate a Heckman’s
self-selection model on this dataset to match our original findings. This secondary dataset uses a
completely different profile search procedure. However, it matches our original attractiveness bias
results. This provides confidence that using the black box platform search functionality to collect profiles
isn’t biasing our results.
8. Conclusion
Our research finds an attractiveness gap of 2.67% over a 15-year career after MBA graduation. This 2.67%
attractiveness gap is an accumulation of a yearly 0.64% gap driven by preference bias. Post-MBA belief
bias does not add or subtract to this gap. This discrimination impacts individuals who otherwise have a
similar undergraduate education, MBA education, pre-MBA experience and who work in similar industry
domains. Readers should be mindful in interpreting our findings. We do not suggest that there is no belief
bias in the job market. It may well play a role in specific settings e.g. hiring an undergraduate with no work
experience, accepting a sales pitch etc, voting for a candidate. We focus only on professional job market
after MBA graduation. Further we only capture average impact over tens of thousands of professionals
and can not comment on degree of belief bias faced by each individual. Similary, we can not answer
whether slow or fast career progress for each individual can be explained by instances of preference based
favoritism or penalty. On an average we find a 2.6% gap over a 15 year career, particular individuals may
face a greater or no undue preference at all. A gap of 2.6% may also appear small in context of the ongoing
We are able to establish the role of a preference-based attractiveness premium by using a large archival
dataset. We have highlighted that our findings are limited to specific white-collar professional work
settings in the US. Future researchers could expand the scope in order to capture the variation in degree
of employee-employer interaction and employee prior performance information. This could help better
identify what level of performance information is sufficient to alleviate belief bias and what level of future
interactions impacts preferences during hiring and promotions. We also develop quantitative measures
of evolving attractiveness ratings and job rankings that should be more generally usable for future
research in bias, personal marketing, behavioral perception and labor market studies.
This direction of research has significant implications for practitioners. Gender and race biases have
received much attention in recent times. Google, such as many other companies, provides their
employees unconscious-bias training (Feloni 2016). Moreover, hiring procedures are increasingly
emphasizing the importance of performance-related information rather than demographic information.
Such training, awareness and emphasis on performance are targeted at the elimination of stereotypes
and belief biases. The role of beliefs in creating bias may be avoided under the right settings, but
preferences that engender bias are much harder to remove. The media has highlighted sexual harassment
across industries. Such pure taste-based, perverse favoritism cannot simply be eliminated by small policy
changes. It rather requires a social and cultural change.
References
• Agthe, M., Spörrle, M., & Maner, J. K. (2010). Don't hate me because I'm beautiful: Anti-attractiveness bias in organizational
evaluation and decision making. Journal of Experimental Social Psychology, 46(6), 1151-1154.
• Altonji, J. G., Smith Jr, A. A., & Vidangos, I. (2013). Modeling earnings dynamics. Econometrica, 81(4), 1395-1454.
• Amos, Brandon, Bartosz Ludwiczuk, and Mahadev Satyanarayanan. OpenFace: A general-purpose face recognition library
with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science, 2016.
• Andreoni, James, and Ragan Petrie. ”Beauty, gender and stereotypes: Evidence from laboratory experiments.” Journal of
Economic Psychology 29.1 (2008): 73-93.
• Aral, S., Brynjolfsson, E., & Van Alstyne, M. (2012). Information, technology, and information worker
productivity. Information Systems Research, 23(3-part-2), 849-867.
• Baker, George, Michael Gibbs, and Bengt Holmstrom. 1994a. The internal economics of the firm: Evidence from personnel
data. Quarterly Journal of Economics 109, no. 4:881919.
• Berggren, N., Jordahl, H., & Poutvaara, P. (2010). The looks of a winner: Beauty and electoral success. Journal of Public
Economics, 94(1-2), 8-15.
• Biddle, J. E., & Hamermesh, D. S. (1998). Beauty, productivity, and discrimination: Lawyers' looks and lucre. Journal of Labor
Economics, 16(1), 172-201.
• Bloch, Peter H., Frederic F. Brunel, and Todd J. Arnold. ”Individual differences in the centrality of visual product aesthetics:
Concept and measurement.” Journal of consumer research 29.4 (2003): 551-565.
• Bradley J. Ruffle and Zeev Shtudiner. 2014. Are Good-Looking People More Employable?. Management Science. 1760 1776.
• Browning, M., & Ejrnæs, M. (2013). Heterogeneity in the dynamics of labor earnings. Annu. Rev. Econ., 5(1), 219-245.
• Bryant, Adam (1999), ”Fashion’s Frat Boy,” Newsweek, (September 13), 40.
• Buss, D. M., M. Haselton. 2005. The evolution of jealousy. Trends in Cognitive Sciences 9(11) 506507.
• Christopher Y. Olivola and Alexander Todorov. 2010. Elected in 100 milliseconds: Appearance-Based Trait Inferences and
Voting. Journal of non verbal behavior. 34:83110.
• Chung, D. J. (2015). How much is a win worth? An application to intercollegiate athletics. Management Science, 63(2), 548-
565.
Appendix
����⃗3 𝑋𝑋𝑖𝑖,𝑡𝑡 + 𝛽𝛽
𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝛽𝛽0 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 + 𝟏𝟏(𝑡𝑡 ≤ 6)𝛽𝛽2𝐵𝐵 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽2𝑃𝑃 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖,𝑡𝑡
Table 12: (Model 1) Graduates between 2000 and 2015. (Model 2) Graduates between 1990 and 2010. We skip some
covariates here.
(1) (2)
Independent Variables
Belief Bias -0.009 -0.005
(0.006) (0.006)
Preference Bias 0.013*** 0.011**
(0.004) (0.004)
Rank(t-1) 0.775*** 0.778***
(0.002) (0.002)
(0.003) (0.003)
Constant 0.189*** 0.221***
(0.011) (0.012)
We had made a choice of 6 years after MBA graduation as a cutoff between early and late career. We
estimate alternative models to test robustness with respect to this choice of cut off point. Alternative
models with 4 year and 8 year cutoffs match our original result.
����⃗3 𝑋𝑋𝑖𝑖,𝑡𝑡 + 𝛽𝛽
𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝛽𝛽0 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 + 𝟏𝟏(𝑡𝑡 ≤ 4)𝛽𝛽2𝐵𝐵 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽2𝑃𝑃 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖,𝑡𝑡
����⃗3 𝑋𝑋𝑖𝑖,𝑡𝑡 + 𝛽𝛽
𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝛽𝛽0 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 + 𝟏𝟏(𝑡𝑡 ≤ 8)𝛽𝛽2𝐵𝐵 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽2𝑃𝑃 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖,𝑡𝑡
Table 13: (Model 1) Cutoff point at 6 years. (Model 1) 4 years and (Model 3) 8 year. We skip some covariates here
We use an evolving measure of attractiveness 𝑏𝑏𝑖𝑖,𝑡𝑡 in our primary analysis. This measure was calculated in
three steps – (i) a deep learning model was trained to learn a evolution of looks over a complex manifold,
(ii) a prediction model was used to find attractiveness of these evolving looks, (iii) a linear heterogeneity
model was estimated using these ML predicted attractiveness ratings. This was necessary to answer
potential criticisms since we observe only a single profile picture for an individual. Our overall procedure
encompasses various choices – (a) to model heterogeneous evolution in attractiveness or homogeneous,
(b) to model attractiveness evolution at all or just use a single time invariant measurement (c) to perform
binary classification or use the continuous values. We show below that our attractiveness bias analysis is
robust to all these alternative choices.
Model 1 uses the original heterogeneously evolving binary measure 𝑏𝑏𝑖𝑖,𝑡𝑡 ∈ {0,1}. Model 2 uses
heterogeneously evolving continuos measure 𝑏𝑏′𝑖𝑖,𝑡𝑡 . Model 3 uses a homogeneously evolving measure
𝑏𝑏′′𝑖𝑖,𝑡𝑡 . Model 4 uses a time invariant measure 𝑏𝑏′𝑖𝑖,𝑀𝑀𝑀𝑀𝑀𝑀 as attractiveness at MBA Graduation. Model 5 uses
a time invariant measure 𝑏𝑏𝑖𝑖,2017 as attractiveness of current profile picture.
Table 14: (Model 1) Heterogeneous binary . (Model 2) heterogeneously continuos, (Model 3) Homogeneous continuous,
(Model 4) At MBA Graduation, (Model 5) Current picture in 2017. We skip some covariates here
Table 15: (Model 1)standardized 𝑟𝑟𝑖𝑖,𝑡𝑡 dependent variable, (Model 2) unstandardized page rank 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑖𝑖,𝑡𝑡 dependent variable
(Model 3) success 𝑠𝑠𝑖𝑖,𝑡𝑡 as dependent variable. (Model 4) Logit model with binary success 𝑠𝑠𝑖𝑖,𝑡𝑡 dependent variable.
We also construct an alternative model that does not rely on job rank calculations. Our profile data
captures jobs that are self reported by individuals in free text form. This creates an extremely large and
sparse list of jobs. While a handful of job titles (e.g. Software Developer, Associate Consultant) are
reported frequently, most job titles are somewhat customized by individuals (e.g. Principal consultant in
Energy and Oil). One approach could be to compare individuals on time taken to achieve the same job.
Attractiveveness bias can be established if attractive individuals systematically take a shorter duration. 𝜃𝜃𝑗𝑗
represents job specific effects i.e. average time to reach the specific job.
We observe a total of 59,854 distinct jobs among 233,142 observed career milestones. The number of
jobs are extremely large, with a vast majority of jobs not providing sufficient variation across individual
characteristics (beauty, gender etc). We can focus only on frequently observed jobs, say jobs self reported
by at least 20 individuals. We are left with 209 jobs and 23,443 career milestones. This model finds beauty
to have a significant and negative impact on time taken. A coefficient of -0.268 means that attractive
individual attain the same job approximately quarter year faster than plain looking individuals.
Alternatively we could also model 𝛽𝛽𝑗𝑗 as an unobserved random effects. As expected, we find (model 2)
beauty to have a significant and negative impact on time taken. This establishes that our attractiveness
bias finding is robust with and without the use of page rank procedure.
We can theoretically extend this method to identify belief and preference bias. We can model dependent
variable Δ𝑡𝑡𝑖𝑖,𝑗𝑗1,𝑗𝑗2 to represent time taken by individual 𝑖𝑖 to move from job 𝑗𝑗1 to job 𝑗𝑗2. We can split 𝛽𝛽2 to
(𝛽𝛽2𝐵𝐵 , 𝛽𝛽2𝑃𝑃 ). Here the belief bias 𝛽𝛽2𝐵𝐵 plays a role if the transition (𝑗𝑗1 → 𝑗𝑗2 ) happens during the first 6 years.
The 𝛽𝛽𝑗𝑗1,𝑗𝑗2 captures fixed effects specific to the 𝑗𝑗1 to 𝑗𝑗2 transition. This would allow us to compare time
taken by different individuals for the same job transition.
Table 17: A small sample of extremely frequent jobs transitions in Management Consulting
Unfortunately we observe 75,669 unique job transitions among 233,142 observed career milestones.
Further, restricting to job transitions reported by at least 20 individuals, we are left with 36 transitions
and 1082 career milestones. Not surprisingly, all estimates are insignificant. This model fails to utilize more
than 90% of all job transitions. The lack of generalization results in large standard errors for key
parameters (𝛽𝛽2 , 𝛽𝛽2𝐵𝐵 , 𝛽𝛽2𝑃𝑃 ) we are interested in. We also unsuccesfully attempt a random effects model for
𝛽𝛽𝑗𝑗1,𝑗𝑗2 . This limitation is why we use page rank procedure to calculate a create a ranked list of otherwise
dissimilar jobs.
Further we also add controls for undergraduate (𝑢𝑢𝑆𝑆𝑖𝑖 ) and MBA (𝑚𝑚𝑆𝑆𝑖𝑖 ) schools. Our dataset has 2,362
distinct under graduate schools and 59 distinct MBA schools. These are both added as random effects.
As an example the two most frequent undergraduate schools are University of California, Berkeley (889
profiles) and Cornell University (782 profiles). The two most frequent MBA schools are Tepper School of
����⃗3 𝑋𝑋𝑖𝑖,𝑡𝑡 + 𝛽𝛽
𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝛽𝛽0 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 + 𝟏𝟏(𝑡𝑡 ≤ 6)𝛽𝛽2𝐵𝐵 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽2𝑃𝑃 𝑏𝑏𝑖𝑖,𝑡𝑡 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑆𝑆𝑖𝑖 + 𝑚𝑚𝑆𝑆𝑖𝑖 + 𝑢𝑢𝑖𝑖,𝑡𝑡
Table 19:
����⃗3 𝑋𝑋𝑖𝑖,𝑡𝑡 + 𝛽𝛽
𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝛽𝛽0 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛽𝛽1 + 𝛽𝛽2 𝑏𝑏𝑖𝑖 + 𝛽𝛽 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖,𝑡𝑡
There are some key challenges in estimating such a model specification. LSDV (least square dummy
variable) estimator is the natural choice for estimation. LSDV model subtracts the mean for the fixed
effects transformation. The explanatory variable (𝑟𝑟𝑖𝑖,𝑡𝑡−1 − 𝑟𝑟�)𝚤𝚤 would naturally be correlated with the
𝑢𝑢𝚤𝚤 This dynamic panel bias (Nickell 1981) is because of the presence of
transformed error term (𝑢𝑢𝑖𝑖,𝑡𝑡 − ���).
𝑢𝑢𝑖𝑖,𝑡𝑡−1 in 𝑢𝑢�𝚤𝚤 . Anderson and Hsiao (1981) used lagged differences as valid instrument to resolve the
endogeneity issue. Arellano and Bond (1991), Blundell and Bond (1998) extend this concept using GMM
estimation to deal with the dynamic panel bias and to account for unobserved heterogeneity.
𝑢𝑢� ����⃗
𝚤𝚤,𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑏𝑏𝑖𝑖 + 𝛽𝛽4 Z𝑖𝑖 + 𝜉𝜉𝑖𝑖 + 𝜖𝜖𝑖𝑖,𝑡𝑡
The GMM estimation is performed on the first difference equation which eliminates the time invariant
observed 𝑏𝑏𝑖𝑖 , Z𝑖𝑖 and unobserved 𝜉𝜉𝑖𝑖 characteristics. For this first different equation all
𝑟𝑟𝑖𝑖,𝑡𝑡−2−𝑗𝑗 𝑜𝑜𝑜𝑜 Δ𝑟𝑟𝑖𝑖,𝑡𝑡−2−𝑗𝑗 ∀ 𝑗𝑗 ≥ 0 serve as valid instruments for Δ𝑟𝑟𝑖𝑖,𝑡𝑡−1 . Using only the first of these possible
instruments (𝑗𝑗 = 0) allows us to use maximum possible time periods for the estimation. The remaining
time variant job domain specific characteristics are assumed to be exogneous. The time invariant
characteristics are recover in the second step (Pesaran and Zhou 2017). Residuals 𝑢𝑢� 𝚤𝚤,𝑡𝑡 are obtained from
the first step GMM. The second step is to run an OLS on residuals as dependent and time-invariant
variables as explanatory variables (Zhang and Zhou 2017).
Model 1 provides an unobserved individual heterogeneity model without correcting for Nickell Bias.
Model 2 provides the first step GMM estimates of the first difference equation. Model 3 provides the bias
corrected second step OLS to get a single attractiveness bias estimate 𝛽𝛽 �2 with a time invariant measure
of attractiveness 𝑏𝑏𝑖𝑖 . Model 4 provides the bias corrected second step OLS to get a single attractiveness
�2 for time varying measure 𝑏𝑏𝑖𝑖,𝑡𝑡 . Model 5 provides the bias corrected second step OLS to
bias estimate 𝛽𝛽
�2 𝐵𝐵 , 𝛽𝛽
get belief and preference bias estimates ( 𝛽𝛽 �2 𝑃𝑃 ).
Table 20: (Model 1) Single step unobserved heterogeneity model (Model2) First step GMM estimates (Model 3,4,5) Second
step OLS estimates with three different specifications for attractiveness measure.
We use propensity score matching for model free comparison between career progression of attractive
and plain looking groups. We perform PSM to match the two groups 𝑏𝑏𝑖𝑖,0 = {0,1} on pre MBA graduation
characteristics – Undergraduate area of study, Undergraduate university ranking, Year of undergraduate
completion, Length of Pre MBA experience, MBA School and MBA rank. Individuals are classified into the
two groups using their attractiveness at the time of graduation (t=0).
����⃗1 𝑍𝑍
𝑏𝑏𝑖𝑖,0 = 𝜃𝜃0 + 𝜃𝜃 ���⃗𝚤𝚤 + 𝜉𝜉𝑖𝑖
We use logistic regression estimation with nearest neighbor matching. We set the caliper width at 0.01,
matching ratio at 1 and do not perform replacement. The dataset contained 23,638 individual profiles
initially, that were reduced to 19,075 after matching. The table below provides the distribution of
���⃗𝚤𝚤 in the two groups before and after matching.
characteristics 𝑍𝑍
Table 21: Propensity Score Matching on pre MBA characteristics for Male Profiles
Table 22: Propensity Score Matching on pre MBA characteristics for Female Profiles
The Table 4 provided a list of features, their source (or calculation) and resulting values. Below are the
details on how these features are specifically calculated.
Gender: We use Microsoft Azure Cognitive Face API to identify individual’s gender using their profile
picture. This service uses deep learning models behind the scenes, but does not reveal exact algorithms
used. We manually verify at least 100 gender tags provided by this service. We find it to be accurate in
100 out of 100 cases. Note that we had already eliminated profiles where the profile picture does not
clearly capture a single face.
Ethnicity: We use ethnicity classifier built by Ambekar et al (2009) which uses names to identify ethnicity.
Its possible ethnicities are – GreaterEuropean, EastAsian, SouthAsian, North African or West Asian,
GreaterAfrican, Others. A vast majority of profiles in our dataset belong to the first three classes. We
further categorize these outputs into three possible values – European, Asian and Others.
Age: We use year of undergraduate completion to calculate individual’s age. We assume that everyone
completes their undergraduate at the age of 22. Thus we find age at MBA graduation and all subsequent
career milestones.
We do not have absolute ground truth to verify this assumption. Its possible that some individual graduate
earlier or later than 22. It is important to ensure that our age calculation does not have a systematic under
or over estimation across different graduation years. For example graduates in 1990-2000 being
consistently assigned a lower than actual age while graduates in 2010-2015 assigned a higher than actual
age. We use another completely independent signal of age to assess if our age calculations are
systematically biased. We use Microsoft Azure Cognitive Face API which predicts age for a face picture.
This Face API has a large mean absolute error of 7.62 years (Dehgan et al 2017), but nevertheless serves
the purpose of a sense check on our age calculation.
The two methods, Face API and self reported UG graduation year, differ on a case by case basis by 0-10
years (less than 3 years more often than not). We calculate the number of profiles for each undergraduate
year using the self reported graduation year. We also calculate the same numbers using the current age
from the Face API service on pictures. If the current age in 2017 is 40, we presume the individual
completed undergraduate studies in 1999 = (2017 – 40 + “22”). We compare these two distributions. A
simple qualitative (or quantitative) analysis of these distributions suggests that the two methods do not
have systematic bias.
University Ranking: We use US News (www.usnews.com) to retrieve rankings for undergraduate and MBA
education. We use National University ranking (up to ~1000) for assigning rank to an individuals
undergraduate education. We use Business Program rankings (up to 100) for assigning rank to an
individuals MBA. As an example, University of Pennsylvania is current ranked 8th among National
Universities while its MBA program at Wharton school is ranked 1st. New York University is currently
ranked 30th among National Universities while its MBA program at Stern school is ranked 5th. These
rankings are time invariant i.e. we do not collect ranking of a university at the time of individual’s
graduation.
Undergraduate Degree: We first identify education milestone corresponding to the MBA degree. This is
done by searching for one of the terms “MBA”, “M.B.A” or “masters of business administration” in the
individuals self reported degree. Next we classify education milestone prior to their MBA into
undergraduate Science, Arts and Business. The table below provides the search terms used to make this
classification. These search term lists are manually created to ensure we can classify more than 90% of
undergraduate degrees into on of these categories.
Location Size: We determine location size endogenously. We first find a complete list of locations self
reported in profile career milestones. For each location we calculate the number of profiles where the
location is mentioned at least once. A large location (e.g. New York, San Francisco) is mentioned by more
than 100 profiles each. A small location (e.g. Cleveland) is mentioned by fewer profiles. Our primary
analysis uses a binary classification Large (1) and small (0).
Employer Size: We determine employer size endogenously. We first find a complete list of employers self
reported in profile career milestones. For each employer we calculate the number of profiles where the
employer is mentioned at least once. A large employer (e.g. Google, IBM, McKinsey, Citibank) is
mentioned by more than 100 profiles each. A small employer is mentioned by fewer profiles. In extreme
cases a startup may well be mentioned by a single profile.
We use this employer size in two ways. First, our primary analysis uses a binary classification Large (1) and
small (0) as a control. Second, our job rank calculations using page rank use employer size. Remember
that we defined job as combination of title, employer size and employer industry e.g. “CEO at very small
internet”. Here the employer size is split into five categories – “Very Small”, “Small”, “Medium”, “Large”,
“Very Large”. A fine grained employer size (say 10 categories) is more informative but it makes page rank
calculations difficult because of sparse job switch observations. A coarse split (say 2 categories) is less
informative but the resulting jobs have a large number of observed switches. The 5-way split was
determined as an optimal hyper parameter setting. As described in Section 4.3 this was set to maximize
the performance at held out validation set.
Industry Category: The professional social network uses 100s of industry names. An individual can self
report their job to belong to one of these available industry names. We manually 3 create 3 industry
categories – IT, Finance and Management. The table below shows the mapping from industry to industry
categories. These 3 industry categories cover more than 80% of all profile career milestones. All remaining
industry names are mapped to Others. We use these industry categories in two ways – (i) In our primary
attractiveness bias analysis to control for career growth idiosyncrasies in specific industries (ii) In
definition of a “job” during calculation of job ranks. We do this because same title “analyst” may mean
something very different in Finance vs IT vs Management Consulting. As with employer size, a very spare
or very coarse categorization creates a problem. This 4-way split works relatively well. As described in
Section 4.3 this was chosen to maximize the performance of job ranks at held out validation set.
3 This manual mapping is informed by a hierarchical clustering. This ML algorithm clusters together different industries with a
large number of within cluster jon transitions and a small number of across cluster job transitions.
Industry Industry
Category
IT/Computers Market Research, Computer Software, Information Technology and Services, Internet,
Telecommunications, Online Media,Computer Networking,Information Services,Computer &
Network Security,Wireless, Computer Games,Computer Hardware,Consumer Electronics,
Semiconductors, Electrical/Electronic Manufacturing
Finance Financial Services,Venture Capital & Private Equity,Accounting,Investment Management,Real
Estate, Banking,Investment Banking,Insurance,Capital Markets,Commercial Real Estate,
International Trade and Development
Management Management Consulting,Marketing and Advertising,Nonprofit Organization
Management,Education Management,Human Resources,Executive Office,Staffing and
Recruiting,Hospitality,Government Administration,Public Relations and
Communications,Broadcast Media,Program Development, Philanthropy,
Outsourcing/Offshoring,Public Policy,International Affairs,Government Relations,Think
Tanks,Religious Institutions,Civic & Social Organization,Fund-Raising,Import and Export,Political
Organization
Others Hospital & Health Care,Consumer Goods,Retail,Pharmaceuticals,Higher Education,
Entertainment,Medical Devices,Defense & Space,Research,Logistics and Supply Chain,Law
Practice,Utilities,Health, Wellness and Fitness,Food & Beverages,Professional Training &
Coaching etc
JobType: We manually classify self reported titles into one of four categories –
“Technical”,”Consultant”,”Management” and “Others”. Note that our research does not attempt to
answer if attractiveness bias is more or less in certain jobs. We only use these to capture idiosyncrasies
with respect to different career progression in different job types.
Table 25: Strings used to search self reported titles for three job types.
Other Text Cleaning: We deal with numerous self reported free text fields – school name, degree name,
employer name etc. This creates two problems – (i) mis-spellings because of the free text nature, (ii) usage
of different terms with otherwise synonymous meaning. For example MBA school names – “University of
Pennsylvania”, “Wharton School” and “Whrton School” should all be treated the same. We use numerous
small tools for data cleaning, a couple of examples include - Levenshtein distances to match employer and
school name strings, and autocorrect software packages on degree and title strings.
Our primary dataset is collected using “MBA” as a search string along with filters on location = “United
States” and school names. The list of MBA schools is derived from US News business school ranking.
Table 26: Number of profiles collected for each MBA programs. MBA programs with 100 or more profiles are listed
individually. A large number of remaining schools have fewer than 100 profiles each adding up to 639 profiles.
MBA
Rank MBA School Profiles
8 Carnegie Mellon University - Tepper School of Business 1980
5 Northwestern University - Kellogg School of Management 1962
2 The University of Chicago Booth School of Business 1777
0 Harvard Business School 1667
15 University of North Carolina at Chapel Hill - Kenan-Flagler Business School 1492
9 Cornell University - S.C. Johnson Graduate School of Management 1486
12 University of Michigan - Stephen M. Ross School of Business 1261
53 UCLA Anderson School of Management 1235
3 University of California, Berkeley - Walter A. Haas School of Business 1191
4 Massachusetts Institute of Technology - Sloan School of Management 1109
14 University of California, Los Angeles - The Anderson School of Management 925
16 The University of Texas at Austin - The Red McCombs School of Business 845
7 Dartmouth College - The Tuck School of Business at Dartmouth 379
60 The University of Connecticut School of Business 363
20 Georgetown University - The McDonough School of Business 350
1 Stanford University Graduate School of Business 312
97 The University of New Mexico - Robert O. Anderson School of Management 258
19 New York University - Leonard N. Stern School of Business 198
55 University of Pittsburgh - Joseph M. Katz Graduate School of Business 196
32 University of Southern California - Marshall School of Business 154
39 University of Rochester - William E. Simon Graduate School of Business 114
University of Maryland - Robert H. Smith School of Business, Georgia State
University - J. Mack Robinson College of Business, University of Washington,
Michael G. Foster School of Business, University of Washington, Michael G.
Foster School of Business, Indiana University - Kelley School of Business,
Purdue University - Krannert School of Management etc 639
To mitigate sample selection issues, we access the school directory of a top 10 MBA program in U.S. This
gives us access to names and graduating year for 15,640 students between 1990 and 2016. We utilize
Bing Search API to search for professional profiles for these graduates. Specifically, we rely on search
terms such as “Name” + “School” + “MBA” + “professional platform web domain name” as well as
numerous permutations of the same. We access all relevant professional profiles returned by such
searches. In situations where Bing searches result in more than one closely relevant profiles (e.g.
students with common names) we are able to reject profiles where the self-reported graduation year
does not match the information from the school name. This secondary data does not rely on the
platform’s search functionality. Further it provides visibility into exact graduates that have been selected
out during the data collection. The Figure 12 represents the size of graduating class from the school
Figure 13 : Size of graduating classes (from school directory) between 1990-2016 and the number of profiles collected (from
professional social network) for each graduating class.
∗
𝑑𝑑𝑖𝑖,𝑡𝑡 = 𝛿𝛿0 + 𝛿𝛿1 𝑟𝑟𝑖𝑖,𝑡𝑡−1 + 𝛿𝛿2 𝑏𝑏𝑖𝑖,𝑡𝑡−1 + 𝛿𝛿3 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑟𝑟𝑖𝑖 + 𝛿𝛿4 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑟𝑟𝑖𝑖 + 𝜓𝜓𝑖𝑖,𝑡𝑡
∗
Let us consider that an individual makes a choice whether to drop out of the job market (𝑑𝑑𝑖𝑖,𝑡𝑡 < 0). We
assume that all graduates choose to stay in the market at the year of MBA graduation. Subsequently, if
∗
they exit the job market at year t because (𝑑𝑑𝑖𝑖,𝑡𝑡 < 0), then they stay out of the job market permanently
∗ ′
(𝑑𝑑𝑖𝑖,𝑡𝑡 ′ < 0 ∀ 𝑡𝑡 ≥ 𝑡𝑡). Individual can make the choice to exit based on various factors – (i) An unsuccessful
individual (small 𝑟𝑟𝑖𝑖,𝑡𝑡−1 ) may be more likely to drop out of the job market if they have an outside option
(ii) the outside option may be better (or worse) for one gender than the other (e.g. homemaker or blue
collar work). (iii) the outside option may be better (or worse) for attractive individuals (e.g. via marriage
prospects) (iv) certain periods may have unusually low employment e.g. 2008-09 crisis where
unemployment temporarily went from 4% to 10%. The drop out model above captures all these factors.
Our secondary dataset (2,506 profiles) provides visibility into two of these four factors – gender
(𝛿𝛿3 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑟𝑟𝑖𝑖 ) and yes of graduation (𝛿𝛿4 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑟𝑟𝑖𝑖 ), but not for – job rank (𝛿𝛿1 𝑟𝑟𝑖𝑖,𝑡𝑡−1 ) and attractiveness
(𝛿𝛿2 𝑏𝑏𝑖𝑖,𝑡𝑡−1 ). For individuals who drop out, their profile is naturally absent from our sample. Gender and
graduation year are available from the MBA school directory but job ranks and attractiveness are
Figure 14: Percentage of profiles missing from each graduation year for men and women.
Figure 13 show percentage of profiles missing from each graduation year. Missing profiles for year 2000
graduates can be interpreted as all dropout between 1-17 years of MBA. Similar missing profiles for
2010 graduates can be interpreted as all dropout between 1-7 years of MBA. Naturally, drop out is much
greater for earlier graduation years than recent graduation years. This is reasonable as there may simply
be retirements as well as individuals who never got onboard professional social networking websites.
We can see that dropout rate for women is much higher. This may be explained by switch to
homemaking or childcare for women.
We create an alternate specification of our attractiveness bias analysis following Heckman’s sample
∗
selection model. We only observe (𝑟𝑟𝑖𝑖,𝑡𝑡 = 𝑟𝑟𝑖𝑖,𝑡𝑡 ) for individuals who choose to remain in the job market (𝑑𝑑𝑖𝑖∗ >
0). The drop out decision depends on gender (𝛿𝛿3 ) and the graduation year (𝛿𝛿4 ). Model free evidence above
suggests that both these factors play a big role in dropouts. Note that we model a single drop out decision
for an individual 𝑑𝑑𝑖𝑖∗ instead of a time varying decision 𝑑𝑑𝑖𝑖,𝑡𝑡 ∗
. This is because we have already dropped time
varying drop out factors (𝑟𝑟𝑖𝑖,𝑡𝑡−1 , 𝑏𝑏𝑖𝑖,𝑡𝑡−1 ) and we directly observe accumulation of all drop outs between
graduation and 2017.
∗
𝑟𝑟𝑖𝑖,2017 ����⃗4 Z𝑖𝑖 + 𝑢𝑢𝑖𝑖
= 𝛽𝛽1 + 𝛽𝛽2 𝑏𝑏𝑖𝑖 + 𝛽𝛽
∗
𝑟𝑟𝑖𝑖,2017 𝑖𝑖𝑖𝑖 𝑑𝑑𝑖𝑖∗ > 0
𝑟𝑟𝑖𝑖,2017 = �
𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸
Table 27: (Model1) Impact of beauty without selection model (Model2) Impact of beauty with heckman selection model
(Model3) Impact of belief and preference bias with heckman selection model.
Table 27 reports heckman estimation results. As expected we find dropout decision to be significant for
both gender and graduation year. Our primary results – persistent preference bias, insignificant belief bias
and lifetime premium remain the same. As expected, the standard errors are much larger on this dataset
given a 8 times smaller dataset (2,506 in secondary vs 19,893 in primary dataset). This establishes
robustness of our findings to self selection based on gender and periodic job market shocks that adversely
impact specific graduate classes.
Further, let us consider that aspects such as health and stress do have a permanent effect on face and
thus bleed into our attractiveness measure. These factors take significant time (say 3-5 years) to have
permanent impact. This means that an MBA graduate from 2005 who had a succesful career has better
looks in 2017 than a counterpart who was unsuccesful. However, an MBA graduate in 2015 who has a
succesful first two years would hardly look different in 2017 than a counterpart who was unsuccesful. Two
years between 2015 and 2017 is too little time for the succesful individual to improve their looks via better
health, lower stress etc. We construct multiple treatment (succesful 2 years after MBA) and control
(unsuccesful 2 years after MBA) groups corresponding to various graduation years. We also contruct a
variable Δ𝑏𝑏𝑔𝑔 as (𝐸𝐸𝑖𝑖∈𝑆𝑆,𝑔𝑔 [𝑏𝑏𝑖𝑖,2 ] − 𝐸𝐸𝑖𝑖∈𝑢𝑢𝑢𝑢,𝑔𝑔 [𝑏𝑏𝑖𝑖,2 ]). Here 𝐸𝐸𝑖𝑖∈𝑆𝑆,𝑔𝑔 [𝑏𝑏𝑖𝑖,2 ] is the average attractiveness for all
individuals who are succesful (𝑟𝑟𝑖𝑖,2 > 0) two years after graduation in year 𝑔𝑔 and 𝐸𝐸𝑖𝑖∈𝑢𝑢𝑢𝑢,𝑔𝑔 [𝑏𝑏𝑖𝑖,2 ] for
unsuccesful. If success leads to attractiveness, we should find this variable to be statistically different
between graduation years e.g. |Δ𝑏𝑏2005 − Δ𝑏𝑏2015 | > 0 . We are able to reject this hypothesis and
therefore explanation of success causing attractiveness.
A.12 CAAE
GAN: A GAN itself relies on two deep neural network modules that compete against each other. The
generator takes white noise as input. A single realization of white noise is a random shock 𝜖𝜖. It outputs an
image 𝑋𝑋′. The discriminator takes an image as an input and tries to distinguish whether the input is real
(class 𝑦𝑦 = 1) or fake (𝑦𝑦 = 0). A real input 𝑋𝑋 is an actual picture of an individual. A fake input 𝑋𝑋′ is an image
generated by the Generator. At the beginning, the generator parameters 𝜙𝜙 are initialized randomly and
thus produce arbitrary outputs 𝑋𝑋 ′ = 𝐺𝐺𝜙𝜙 (𝜖𝜖) which look nothing like 𝑋𝑋. The discriminator parameters 𝜃𝜃 are
then trained to classify real versus fake. The loss function pushes the discriminator to output 𝐷𝐷𝜃𝜃 (𝑋𝑋) = 1
for real and 𝐷𝐷𝜃𝜃 (𝑋𝑋 ′ ) = 0 for fake inputs. Next, the generator parameters 𝜙𝜙 are trained to fool the
discriminator into confusing fake samples for real. The generator loss function pushes 𝐷𝐷𝜃𝜃 (𝑋𝑋′) towards 1
for fake inputs. This discriminator and generator training steps are repeated numerous times. Eventually,
the generator gets better at producing outputs which make it harder for the discriminator to distinguish
real from fake. These outputs are indistinguishable from real data by the discriminator as well as human
judges.
𝐽𝐽(𝜃𝜃, 𝜙𝜙) = −E𝑥𝑥~𝑝𝑝𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 [log 𝐷𝐷𝜃𝜃 (𝑥𝑥)] + E𝑥𝑥~𝐺𝐺𝜙𝜙 [log 1 − 𝐷𝐷𝜃𝜃 (𝑥𝑥)]
This method derives its power from the deep neural network layers (Figure 15) ability to break down high
dimensional complex distributions into simpler building blocks. The shallow layers of the generator learn
to create basic image pieces(e.g. edges, circles). The deeper layers combine together these basic pieces
into more sophisticated forms (e.g. ears, hair, texture). Further a conditional GAN used here allows the
generator and discriminators to learn 𝐺𝐺𝜙𝜙 (𝜖𝜖/𝑎𝑎) and 𝐷𝐷𝜃𝜃 (𝑋𝑋 ′ /𝑎𝑎) conditioned on age 𝑎𝑎 respectively. While
the images 𝑋𝑋 ′ ~𝐺𝐺𝜙𝜙 (𝜖𝜖/𝑎𝑎) are now realistic for an age 𝑎𝑎, the random shock 𝜖𝜖 means that pictures can not
be targeted to a specific individual. We need to replace these random shocks to be able to create more
deterministic outputs.
Figure 16: Generative Adversarial Network (GAN) architecture with a two layer generator and two layer discriminator.
AE: AE’s are used to map a high dimensional (e.g. 640x480) image 𝑋𝑋 onto a low dimensional embedding
𝑧𝑧 (e.g 128-D) through a series of neuron layers called encoder. Next this low dimensional embedding 𝑍𝑍 is
mapped back to an output 𝑋𝑋′ with the same dimensionality as the input through another series of
convolutional filters called decoder. AE’s do not need labels, instead they attempt to directly minimize
the difference (or L2 loss) between their output 𝑋𝑋′ and input 𝑋𝑋. Intuitively, a low reconstruction loss
means that the network has found encoder (𝑋𝑋 → 𝑧𝑧) and decoder (𝑧𝑧 → 𝑋𝑋′) parameters such that the low
dimensional representation 𝑧𝑧 has captured necessary information to reconstruct back the original image.
Figure 17: AutoEncoder with a two layer encoder and a two layer decoder.
The CAAE model overall combines the AE and CGAN building blocks. More specifically it forces the decoder
(𝑧𝑧 → 𝑋𝑋′) on the AE module to also act as the generator 𝑋𝑋 ′ ~𝐺𝐺𝜙𝜙 (𝜖𝜖/𝑎𝑎) for the GAN module. The latent
representation 𝑧𝑧 replaces the random shock 𝜖𝜖. While the AE objective function forces the decoder output
to be close to the input (|𝑋𝑋 ′ − 𝑋𝑋| → 0), the output need not be realistic. The additional CGAN loss further
forces the decoder to generate realistic (𝐷𝐷𝜃𝜃 (𝑋𝑋′) = 1) outputs. Once the entire CAAE network is trained it
is able to produce realistic images for a new face image conditioned on any arbitrary age value.
We describe our attractiveness prediction model in Section 4.1. This deep learning model is trained
using a labelled dataset. We create this labelled dataset by acquiring attractiveness ratings from human
raters on Amazon Mechanical Turk (AMT). These ratings are solicited on a small set of 659 images from
the professional social network profile pictures. We want to be sure that our prediction model is robust.
We collect images and labels from five different prior experiments – (i) Chicago Face Dataset (CFD) (ii)
SCUT-FBP Dataset (iii) Todorov Lab Original (iv) Todorov Lab Computer Generated. We want to ensure
that the predictive model can learn from any of these datasets and predict on another. This would
ensure that the predictive model can learn and predict attractiveness for any subject pictures (not just
professional social network profile pictures). If we fail to do so, it may mean that our prediction is
Table: Each cell is attractiveness prediction accuracy, with 0.5 being random prediction. The model is trained on the dataset
in the rows and accuracy is tested on dataset in columns.
Test
Train AMT CFD SCUT-FBP Todorov A Todorov B
AMT 0.78 0.70 0.71 0.78 0.69
CFD 0.76 0.73 0.66 0.97 0.57
SCUT-FBP 0.67 0.58 0.78 0.37 0.52
Todorov A (Comp Generated) 0.62 0.57 0.46 1.00 0.70
Todorov B 0.74 0.59 0.57 1.00 0.67
All of these independent datasets gather attractiveness ratings at different scales from human raters.
We derive a simple attractive (1) and plain looking (0) binary classification for each dataset. Table above
shows that model trained on out AMT dataset is able to correctly predict attractiveness (well above
random) on all other datasets.