You are on page 1of 7

AN EMPIRICAL STUDY OF PERFORMANCE APPRAISAL PRACTICES

IN JAPAN, KOREA, TAIWAN AND THE U.S.


*John F. Milliman, University of Colorado, Colorado Springs, 1420 Austin Bluffs Pkwy,
Colorado Springs, CO 80918
Stephen Nason, Hong Kong University of Science and Technology
Kevin Lowe, Florida International University
Nam-Hyeon Kim, Keimyung University
Paul Huo, Hong Kong University of Science and Technology
co-first authors
ABSTRACT

present study, we empirically assess five purposes of


performance
appraisals which include (1) employee
development, (2) documentation of performance, (3) allowing
employees to express their views, (4) the determination of pay,
and (5) the determination of promotions.

Survey data fi-om Japan, South Korea, Taiwan, and the U.S. are
analyzed with LISREL on the relationship of performance
appraisal practices to performance appraisal effectiveness and
ultimately to job satisfaction and organizational effectiveness.
The results are interpreted in terms of cultural influences.

METHODS
Subjects
INTRODUCTION
Since the surveys were originally developed in English, the
Korean, Taiwanese, and Japanese surveys went through a
translation and back translation process by native bom
professors from each country.

In an increasingly turbulent and highly competitive global


economy, effective human resource management (HRM)
practices are essential in developing a skilled workforce (Cascio
& Bailey, 1995) and organizational flexibility and fit (Butler,
Ferris, & Napier, 1991). Cross-cultural studies that empirically
assess the frequency and effectiveness of HRM practices should
enable organizations to develop better HRM policies for
employees in international operations as well as in multicultural domestic settings (Von Glinow, 1993).

South Korea. The Korean sample consists of 237 managers.


The respondents include one to tliree managers from over fifty
companies in both the manufacturing and service sectors.
Taiwan. The Taiwanese sample consists of 241 employees
from thirty medium and large manufacturing companies. The
sample is 50 percent managers, 40 percent engineering
employees, and 10 percent other types of personnel.

The objective of this study is to investigate one iniportant


aspect of IHRM practices, the purpose of performance
appraisal, in Japan, Taiwan, South Korea, and the U.S.
Empirical research comparing appraisal practices between these
countries is virtually non-existent. Hence, we seek to determine
the extent to which a number of purposes of performance
appraisal (1) are emphasized in each these four countries, and
(2) are related to performance appraisal effectiveness, employee
job satisfaction, and organization effectiveness in these four
countries.

Japan. Data from Japan include 223 respondents from three


large manufacturing companies. About 60 percent of these
respondents are managers, 30 percent are engineers, and 10
percent are fi-om other functional areas.
United States. The U.S. sample includes 144 managers. About
50 percent of the sample are managers fi^om a diverse set of
organizations who participated in two executive business
education programs associated with a large southwestern
university. The other half of the sample is from a large defense
corporation also located in the southwestern U.S.

Purposes of Performance Appraisals


Performance appraisals are considered to be important
management tools in the U.S. (Gomez-Mejia, Balkin, & Cardy,
1995), Japan (Pucik & Hatvany, 1983), and Korea (Milliman,
Kim, & Von Glinow, 1993). The multi-faceted purposes of
performance appraisals are important because they strongly
influence the impact of appraisals on both managers and the
employees (Milliman, Nathan, & Mohrman, 1991). Since the
purpose of performance evaluations differs across countries and
cultures (Cascio & Bailey, 1995), examining appraisal purposes
is an important starting point for cross-cultural research. In the

Instruments
Appraisal purposes. The respondents were asked to what
extent the items describe the purposes of appraisal practices as
currently conducted. The employees responded on a Likert
type scale ranging fi-om 1 (Not at all) to 5 (To a very great
extent). Eleven items on the purposes of appraisal were
initially included in the study. Six of the performance appraisal

182

items are from Milliman et al. (1991) and three of the items are
based on Cleveland et al. (1990). Two additional items were
developed specifically for this study. The factor structure of the
eleven items is based on Milliman et al. (1991), but to further
adapt the study from a U.S. to a cross-cultural based study, we
proposed several measures that are combinations of the
Milliman et al. and Cleveland et al. studies which are discussed
below. The five purposes of appraisals studied included pay (2
items), development (3 items), document (2 items), enabling
subordinates to express themselves (2 items), and promotion (1
item).

overall fit. The values for each of the models was less than or
close to .05, once again indicating an adequate fit.
The chi-square goodness of fit statistic is a third measure of the
overall fit of the model. Chi-squares that have a significance
of greater than .05 are considered not to be significantly
different than the actual data (note here we are looking for nonsignificance). However, the chi-square statistic has several
problems. At both large and small sample sizes it is inaccurate
(BoUen, 1990) and it is extremely sensitive to minor variations
in multivariate normality (Hayduk, 1989). One way of
correcting for sample size bias in the chi-square test is to
calculate the ratio of the chi-square/degrees of freedom
(Joreskog & Sorbom, 1989). Models are considered to fit well
when this ratio is less than 5/1 (Hayduk, 1989). Each of these
ratios for the four countries are well under the 5/1 criteria,
indicating an acceptable fit.

Appraisat and organizationat effectiveness. Three items were


used to form a factor on the effectiveness of performance
appraisals. Job satisfaction is comprised of five items and
organization effectiveness is made of five items from Milliman,
Nason, Von Glinow, Huo, Lowe, and Kim (1994). Job
satisfaction is comprised of a composite measure of items on
employee satisfaction using the same Likert type scale as the
one for the pay practices. Organization effectiveness is another
composite measure, based to a large degree on the work of
Porter (1990). These items were measured on a 5-point Likert
scale from 1 = Very False to 5 = Very True.

LISREL Path Modet Analysis


Two primary means of analysis are employed. The first
involves an assessment of the mean scores of the appraisal
purpose measures to determine the extent to which these
purposes are currently utilized in each of the four countries. The
second involves LISREL path analysis of the relationship of the
five appraisal purposes to appraisal effectiveness (gammas) and
appraisal effectiveness to employee job satisfaction, and
organizational effectiveness (betas) as shown in Figure 1.

RESULTS
Confirmatory Factor Analysis
To test the proposed casual model of the 11 purpose appraisal
purpose items and the three performance appraisal effectiveness
items, a covariance structural model was employed using
LISREL 7 (Joreskog & Sorbom, 1990). The items loaded as
anticipated on the proposed factors in all four countries with
two exceptions. First, one document item did not load cleanly
on the document factor in all four countries and thus was
discarded from the analysis.
Second, in the U.S. one
development item loaded higher on the subordinate express
factor (.59) than the development factor (.26) it was intended
for. To be consistent with the other three countries we placed
this item on the development factor. Finally, it should be noted
that the promotability item was not included in the confirmatory
factor analysis since it is a single item measure.

The global LISREL statistics indicate that the hypothesized


model fits the actual data well and are presented in Table 2.
All statistics meet or approach traditional acceptance criteria.
The GFI and AGFI are greater than .90 in all cases, but one
(AGFI of .86 for the U.S.). The chi-square/df ratios range
from .88/1 to 2.38/1, and the overall magnitude of the residuals
is low (RMSR is from .018 to .041). The total coefficient of
determination ranges from .38 to .58, indicating that a good
deal of the variance in the dependent variables is explained by
the independent variables.
South Korea
The path model investigated the relationship from the five
appraisal purposes to appraisal effectiveness and appraisal
effectiveness to job satisfaction and organization effectiveness.
The path analysis estimates for all four countries are shown in
Figure 1. Only pathways with significant estimates (at p < .05)
are shown in this figure. The path analysis estimates (gammas)
for South Korea indicated that the Promotion (.11), Document
(.34), and Development (.29) are significantly related to
appraisal effectiveness, while those of Subordinate Express and
Pay are not. In addition, appraisal effectiveness (beta) was
significantly related to organization effectiveness (.41) and
employee job satisfaction (.71), indicating that appraisals do
have a positive impact on organizations and employees in
Korea.

LISREL confirmatory factor analysis also provides global


statistics that assess how well the overall factor structure fits the
data and these global statistics are presented in Table 1. The
goodness of fit index (GFI) and adjusted goodness of fit index
(AGFI) are two such statistics. A rigorous rule-of-thumb is that
the GFI and AGFI should be close together and greater than
0.90, though several researchers (e.g. Gudykunst, Yang, &
Nishida, 1987) suggest that .80 is more practical. Each of the
four factor models has a GFI greater than .90. The AGFI for
each model exceed or approach the .90 level: South Korea =
.90, Taiwan = .91, Japan = .86, and the U.S. = .83.
The root mean square residual (RMSR) is another measure of

183

Japan

be flirther evidence that Taiwan is moving towards greater


professionalization of management practices. In contrast, the
finding in Korea that pay was not discussed widely in
appraisals (and not related to appraisal effectiveness) supports
the findings of Milliman et al. (1994) that seniority based pay
is important to Korean employee satisfaction.

Japan was the only country in which all five purposes were
found to be significantly related to appraisal effectiveness, with
gammas ranging from . 11 to .22. In addition, the effectiveness
of the appraisal was significantly related to organization
effectiveness (.25) and job satisfaction (.53). These data appear
to indicate that appraisal systems are working effectively in
Japan for both employees and organizations.

There were some surprising results in Japan. First, it was the


only country in which all five appraisal purposes were
significantly related to appraisal effectiveness. Second, the
Japanese results were largely contrary to our predictions as the
development and subordinates express purposes were not
currently practiced extensively, while documentation was found
to be among the most highly used purposes. Third, in contrast
to literature that indicates that pay is based primarily on
seniority, the pay purpose was found to be related to appraisal
effectiveness. This finding lends support to the contentions of
Wakabayashi and Graen (1989) that appraisals have a greater
impact on hierarchical advancement and pay than is more
widely assumed in the West.

Taiwan
The path model results indicate that three of the five purposes.
Pay, (.23), Development (.27), and Document (.24), are
significantly related to appraisal effectiveness.
In turn,
performance evaluation effectiveness is significantly related to
organization effectiveness (.15) and job satisfaction (.12),
although at a much lower level than in Japan and South Korea.
United States
The path model data indicates some surprising results. First,
the Development and Pay measures were not significantly
related to appraisal effectiveness; only Document (.37),
Subordinate Express (.21), and Promotion (.17) had significant
gammas. Second, appraisal effectiveness was significantly and
positively related to job satisfaction (.51) and negatively related
to organization effectiveness (-.52). These results indicate the
perplexing perspective that appraisal effectiveness is positively
related to employee job satisfaction, but has a negative impact
on organizational effectiveness.

The most surprising results were in the U.S. data. First, despite
the suggested widespread emphasis of appraisals on
development (Gomez-Mejia et al. 1995), this purpose was not
found to be related to performance appraisal effectiveness.
Second, in the U.S, appraisal effectiveness was found to be
significantly negatively related to organizational effectiveness,
but positively related to job satisfaction. It is not clear why
employees perceive appraisals to have a positive impact on
themselves, but a negative affect on the organization. Given the
widespread criticism of appraisal in the U.S., it may be possible
that these data support researchers such as Meyer (1991) who
contend that appraisals conducted in their traditional manner
often harm organizations.

DISCUSSION
One of the most important findings is that the same factor
structure surfaced in each of the four countries regarding the
purposes of performance appraisal. Finding a common factor
structure is unusual from a methodological perspective because
factor analyses are notoriously susceptible to cultural infiuences.
Even when a common factor across countries exists, cultural
differences infiuence interpretations to such a degree that
demonstrating such underlying factors is difficult. Given the
cultural diversity among these four countries this is a notable
finding.

Caution must be exercised in interpreting these data since the


data comes from perceptual measures on a single survey
instrument. In addition, there are some significant differences
in the size and comparability among the samples of the four
countries. In particular, the U.S. data is somewhat limited by
its smaller sample size (144). Further, the reliabilities of a few
measures were below .70, indicating issues about their
psychometric qualities. Finally, as is often the case in crosscultural studies, wording is problematic on a scale which was
originally developed in the U.S. Given the inherent complexity
of cross-cultural studies, we believe these limitations represent
a reasonable compromise given the new ground it breaks in the
area of international performance appraisal in terms of including
data and common factor structures from four countries. Some
of the implications for research and practice provided by this
study are as follows.

Overall, there are a number of similar findings on appraisal in


Korea and Taiwan. The largest difference between the two
countries is that in Korea discussing promotion was the most
highly emphasized currently and was significantly related to
appraisal effectiveness, while in Taiwan pay was the most
highly emphasized and was significantly related to appraisal
effectiveness. The finding that determining pay is an important
purpose in appraisals in Taiwan is particularly interesting given
that pay based on seniority was not found to be strongly related
to employee job satisfaction in a recent study on Taiwan. As
suggested by McEvoy and Cascio (1990), these findings may

First, the present study is one of the first cross-cultural


empirical investigations on the purpose of performance
appraisals using large scale data and sophisticated data analysis
techniques. Building an empirical data base on appraisal
practices is a critical first step towards the development of

184

effective IHRM practices.


In addition, the confirmatory
analysis reveals that at least three of the multi-item scales may
be useful in fiiture cross-cultural research, and that the basic
factor structure is appropriate in each of the different countries.

Joreskog,K.G. & Sorbom, D. (1990). SPSS LISREL VII and


PRELIS user's guide and reference. Chicago, IL:
SPSS Inc.
McEvoy, G.M. & Cascio, W.R. (1990). The U.S. and Taiwan:
Two different cultures look at performance appraisal.
In G.R. Ferris, K.M. Rowland (Eds.) Research in
Personnel and Human Resources Management (201219), Supplement 2. Greenwich, CN: JAI Press.

Second, our study found a number of counter-intuitive findings


such as a lack of emphasis on development in appraisal in the
three Asian countries and the positive relationship of pay to
appraisal effectiveness in Japan and Taiwan. Due to the
frequent lack of systematic research, MNCs which rely on local
managers in their overseas business units do not know if these
managers are truly adopting best practices in the local context.
These surprising findings again indicate the need for more
empirical cross-cultural research.

Meyer, H. (1991). A solution to the performance appraisal


enigma. Academv of Management Executive. 5, 6876.
Miiliman, J.F., Nathan, B., & Mohrman, A.M. (1991).
Conflicting appraisals purposes of managers and
subordinates and their effect on performance and
satisfaction. Paper presented at the Academv of
Management Meeting. Miami, FL., August.

The nature of these findings suggest that replication is needed


with a new set of firms. Such future studies should consider
the use of different appraisal purposes, quantifiable measures of
performance, and organizational, industry, and cross-cultural
contextual variables.
Such research is essential for
organizations to develop effective performance appraisal
practices in managing both a global labor force and a
multi-cultural workforce in a domestic setting.

Miiliman, J.F., Kim, Y.M. & Von Glinow, M.A., (1993).


Hierarchical advancement in Korean chaebols: A
model and research agenda.
Human Resource
Management Review. 3, 293-320.

REFERENCES

Miiliman, J.F., Nason, S., Von Glinow, M.A., Huo, P., Nam,
K.& Lowe, K. (1994). Benchmarking best strategic
pay practices: An exploratory study of Japan, South
Korea, Taiwan, and the U.S. Paper presented at the
International Personnel and Human Resources
Management Conference. Gold Coast, Australia, July.

Bollen, K. (1990). Overall fit in covariance structure models:


Two types of sample size effects. Psvchological
Bulletin. 107, 256-259.
Butler, J., Ferris, G., & Napier, N. (1991). Strateev and human
resources management.
Cincinnati,
OH:
South-Westem Publishing Co.

Porter, M.E. (1990). The competitive advantage of nations.


N.Y.: The Free Press.

Cascio, W. & Bailey, E. (1995). International human resource


management: The state of research and practice. In
O.Shenkar (Ed.) Global Perspective of Human
Resource Management (pp. 16-36), Prentice Hall:
Englewood Cliffs, N.J.

Pucik, V. & Hatvany, N. (1983). Management practices in


Japan and their impact on business strategy.
Advances in Strategic Management 1, 103-131.
Von Glinow, M.A. (1993). Diagnosing "best practice" in
human resource management practices. In B. Shaw,
Kirkbridge, G. Ferris, & K. Rowland (Eds.) Research
in Personnel and Human Resource Management
Supplement 3. Greenwich, CN: JAI press.

Cleveland, J., Murphy, K., & Williams, R. (1989). Multiple


uses of performance appraisal:
Prevalence and
correlates. Joumai of Applied Psvchologv. 74, #1,
130-135.

Wakabayashi, M. & Graen, F. (1989). Human resource


development of Japanese managers: Leadership and
career investment In A. Nedd, G.R. Ferris, and K.M.
Rowland (Eds.) Research in Personnel and Human
Resources Management (pp. 235-255), Supplement 1.
Greenwich, CN: JAI Press.

Gomez-Mejia, L., Balkin, D., & Cardy, R. (1995). Managing


Human Resources. Prentice Hall: Englewood Cliffs,
N.J.
Gudykunst, W., Yang, S., & Nishida, T. (1987). Cultural
differences in self-consciousness and self-monitoring.
Communications Research. 14, 7-34.

Wheaton, B., Muthen, B., Alwin, D., & Summers, G. (1977).


Assessing reliability and stability in panel models. In
D. Heise (Ed.), Sociological Methodologv. San
Francisco: Jossey-Bass.

Hayduck, L. (1987).
Structural equation modeling with
LISREL: Essentials and applications. Baltimore: The
John Hopkins University Press.

185

HGURE 1: PATH MODEL RESULTS


South Korea

Japan

Taiwan

U.S.

TABLE 1: GLOBAL LISREL CONFIRMATORY FACTOR ANALYSIS RESULTS


Japan
Korea
Taiwan
Chi-Square / df
Goodness of Fit Index
Adjusted Goodness of Fit Index
Root Mean Square Residual

44/118
,92
,86
.057

44/102
,94
,90
,042

TABLE 2: GLOBAL LISREL PATH MODEL RESULTS


Japan
Korea
Total Coeficient of Determination
,371
,414
Chi-Square / df (Sig)
7/8 (.52)
19/8 (.01)
Goodness of Fit Index
.993
.981
Adjusted Goodness of Fit Index
.967
,914
Root Mean Square Residual
.022
,038

186

US

44/73
,95
,91
,029

43/94
,9 1
.83
.058

Taiwai
,576
11/8 (,21)
,99
,95
,018

US
.443
19/8 (02)
.97
.86
.041