You are on page 1of 10

Journal of Archaeological Science: Reports 4 (2015) 310–319

Contents lists available at ScienceDirect

Journal of Archaeological Science: Reports

journal homepage: http://ees.elsevier.com/jasrep

Color in historical ceramic typologies: A test case in statistical analysis of


replicable measurements
John M. Chenoweth a,⁎, Alan Farahani b
a
University of Michigan-Dearborn, Dept. of Behavioral Sciences, 4012 CASL, 4901 Evergreen Rd., Dearborn, MI 48170, USA
b
University of California-Los Angeles, Cotsen Institute of Archaeology, 308 Charles E Young Drive W, Los Angeles, CA 90024, USA

a r t i c l e i n f o a b s t r a c t

Article history: Three types of white-bodied, non-vitreous earthenwares distinguished primarily (though not exclusively) by
Received 6 October 2014 color and commonly known as creamware, pearlware, and whiteware are some of the most frequently encoun-
Received in revised form 21 September 2015 tered artifacts on eighteenth and nineteenth century archeological sites. However, problems exist both in the def-
Accepted 23 September 2015
inition of these types and the interpretation of their meanings. This project has applied reflectance
Available online 2 October 2015
spectrophotometry to collect replicable, highly-precise color data on a large sample of this material in order to
Keywords:
address the nature of the color differentiation and the reality of the types themselves. Linear Discriminant Anal-
Ceramics ysis (LDA) was used to assess the degree to which the visual intuitions of the archeologists making ware identi-
Color fications could be predicted by the repeatable spectrophotometer values. We do not suggest a method of proving
Historical archeology identifications “correct” or “incorrect” but analyze the nature of this attribute of the typology itself. This analysis
Reflectance spectrophotometry quantifies the level of uncertainty inherent in the types as they are currently used, addresses a longstanding ques-
Linear Discriminant Analysis (LDA) tion about illumination during identification, and discusses one possible concrete implication of these results.
Refined earthenwares This article also discusses color measurement in general, its use in archeology, and evaluations of the Munsell
BLUE mean ceramic dating (BLUE MCD)
system.
© 2015 Elsevier Ltd. All rights reserved.

1. Introduction for some of the most commonly applied techniques of historical arche-
ology, such as mean ceramic dating.
This project applies the replicable measurement of color to the eval- In broader terms, the use of instruments designed to make precise,
uation of three of the most well-known ceramic types in historical ar- replicable observations of many kinds has been growing in archeology
cheology: the non-vitreous, white-bodied earthenwares distinguished thanks to new instrumentation and the decreasing cost of technology.
primarily by color and commonly known as creamware, pearlware, Better observations in archeological data can enable more effective
and whiteware. Almost ubiquitous on sites connected to worldwide communication and comparisons, though always ultimately to address
trade routes from the mid-eighteenth to the mid-nineteenth centuries, broad anthropological questions. That is, while archeology is and always
these three wares are some of the most useful, most discussed, and pos- will be an interpretive endeavor, we can ensure that our interpretations
sibly some of the most controversial in archeological analysis (Majewski are based on understandings, definitions, and observations on which we
and O'Brien, 1987; Miller, 1987; Miller, 1993; Miller and Hunter, 2001). can agree or upon which our best data converge.
This project has sought to clarify some of these controversies
through the application of replicable, highly precise color observations 2. Creamware, pearlware, and whiteware: problems and questions
taken using a reflectance spectrophotometer. Multivariate statistical
analyses were used to evaluate the resulting data set. Rather than sug- Ivor Noël Hume, with whom most discussions of historical artifacts
gesting that archeologists use this equipment to classify individual begin, suggests that the “most important development of the eigh-
sherds or correct visual intuitions, the goal of this work has been to as- teenth century” in the context of ceramics was the appearance of a
sess the nature of color as an attribute of these types. In effect, we assess “thin, hard-firing, pale-yellow or cream-colored earthenware” that is
the extent to which replicable, independent measurements conform to “now universally known as creamware” about 1762 (Noël Hume,
a priori groups made by researchers. This question has real implications 1969: 124). Creamware was largely supplanted by pearlware in the
1780s and 1790s, and their bodies are nearly identical but the latter
has a small amount of cobalt in the glaze, giving a whiter, bluer, or
⁎ Corresponding author.
greener appearance (Barker, 1991:167, Majewski and O'Brien, 1987:
E-mail addresses: jmchenow@umich.edu (J.M. Chenoweth), alanfarahani@ucla.edu 118). Large amounts of pearlwares were produced in England and
(A. Farahani). exported from about 1775 through about 1830, when they were

http://dx.doi.org/10.1016/j.jasrep.2015.09.015
2352-409X/© 2015 Elsevier Ltd. All rights reserved.
J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319 311

overtaken in popularity by whitewares, which have a more truly white The problematic result is a series of typologies which are mutually-
appearance. This was a gradual process, involving the slow whitening of unintelligible, and a host of publications which use the same terms in
pearlware, as whiteware too sought to imitate porcelain which was different ways.
growing whiter itself (Miller, 1993). This project addresses the issue of consistency through instrumenta-
Because the history of the development of these wares is documented tion (discussed below), but the debate over these types is deeper than
relatively precisely and they represent horizons, largely supplanting one simply the ability to communicate color. Miller and Hunter (2001)
another sequentially (although their production did overlap) these have pointed out that the terms used by archeologists do not corre-
wares have been of substantial interest for dating, especially in tech- spond to any group that would have been meaningful to contemporary
niques such as mean ceramic dating (South, 1977). Though made in potters, suggesting that merchants and customers distinguished be-
England, these wares are connected to the rise of industrialism and tween ceramic types more by decoration than what archeologists
world trade, and so can also be used to study these important develop- term “ware.” Miller argues that “there is no way of knowing if the
ments, being found worldwide. archeological definition of pearlware is the same as that of 19th century
potters and merchants” (Miller, 1980: 2). There are, in effect, two paral-
2.1. Differentiation of the types lel typologies (Miller, 1993).
This discussion returns to the more than half-century old debate be-
Despite all the interest in these types, the distinctions between them tween James A. Ford and Albert Spaulding about archeological typolo-
have been difficult to define. Noël Hume identifies creamware's glaze as gies in general. Spaulding (1953) saw types as being the result of
being “yellow or green in the crevices” where the glaze pools (Noël decisions made by people in the past: their choice of temper in ceramic
Hume, 1969: 130) and Price terms it “deep green or yellow-green” production was one “mode” while their choice of rim type was another.
(Price, 1979: 10). Pearlware's glaze is “blue in crevices of footrings and Collectively (normatively) a particular group would agree on a particu-
around handles” such that it “can readily be distinguished from late lar series of choices among modes, and therefore create similar pottery.
creamware” (Noël Hume, 1969: 130). Lofstrum, Tordorf and George, That is, types were the outgrowth of culture, and would themselves
however, suggest that the glaze of pearlware is “uniformly greenish” have had meaning to past peoples. Ford (1954) held the opposite
where evenly-spread (Lofstrum et al., 1982: 6) and Price calls it “an view: that types were imposed in the present by the archeologist, and
overall blue or blue-green cast” (Price, 1979: 14). Whitewares are char- that past peoples would not have necessarily recognized our types as
acterized by the body color, since they may also have slight bluing in important. While the debate did not remain a focus during the 1960s
crevices, but appear to be completely colorless in body (Miller, 1993; rise of processual theory, it is nonetheless possible to hold that both
Price, 1979: 13). However, these relatively simple distinctions become kinds of types exist: emic ones, having had meaning to past peoples,
difficult to implement in practice. Miller (1980: 2) points out that per- and etic ones, imposed from the present by the analyst for particular
sonal opinions about the extent of the “bluing,” for instance, coupled ends.
with the inherent difficulty of describing and communicating color For certain questions, such as those of economic scaling, it is entirely
leads different analysts to make different determinations. appropriate to focus on the emic typologies of potters and customers,
Other attributes of ceramics are, of course, used in their classifica- whose ideas of different groups of ceramics and their prices directly re-
tion, including decoration and form. Price underlines the importance late to buying habits. But in the analysis of archeologically-recovered
of considering the color of underglaze decoration as well when defining materials, there is sometimes a mismatch between what is needed to
typologies (Price, 1979: 15). Miller distinguishes between “pearlware” assign a piece to an emic type and what information is available. For in-
and a variant known as “China Glaze” in that while both have a blue stance, Miller discusses the difference between what past potters called
tint to the glaze, the latter has decoration and forms in imitation of Chi- “China Glaze” and “Pearl White” as being one of decoration and form
nese patterns (Miller, 1993). Ceramicist David Barker describes how (Miller, 1993), but for undecorated fragments there would be no way
some elements of decoration could be used to identify work of particu- to make a determination. Emic understandings can even run contrary
lar potters, although he also suggests problems with this procedure due to traditional archeological (etic) classifications: “We need to keep the
to molds and patterns being shared among manufacturers (Barker, intent of the potter in mind. If it was to produce a whiteware, then the
1991: 169). vessel should not be classified as pearlware because of a small amount
In this study we examine only the question of body or glaze color but of cobalt used to achieve a white appearance” (Miller, 1993). We
do not mean to argue that it is the only factor used in classification. We agree that the intent of the potter is an important area of inquiry, but
focus on this attribute because in many archeological applications body it will not always be clear from archeologically-recovered materials.
color is used alone as it is often not possible to examine other attributes, Certainly it would be pessimistic to assume that because archeologists
since not every sherd will be decorated or identifiable as to form. In are not always able to access such emic typologies, that we therefore
some distinctions, glaze color is the only distinguishing attribute; for in- can say nothing about what is found based on its material properties.
stance, David Barker writes that “apart from the blue tint of the glaze, Though etic types based on material observations—such as color
there is virtually nothing to distinguish [pearlware] from contemporary —cannot always be connected to contemporary (emic) terms, they
creamwares” (Barker, 1991: 167). For these reasons, we feel that inves- still speak to past practices and are worthy of consideration. The
tigating the attribute of color more or less in isolation is informative, fact that contemporaries may not have been aware of some of
even if it is not the only attribute relevant in practice. these material distinctions does not make them meaningless. Trace
element analysis is a thoroughly modern, etic analysis but it provides
2.2. Ware-based types and etic classifications substantial new information relevant to reconstructing past lives
and is valid even if those making artifacts were unaware of these el-
Miller (1980) and Majewski and O'Brien (1987) critique ware-based ements' existence. Moreover, any disjunctions between etic and
schemes such as the division of creamware, pearlware, and whiteware, emic offer substantial ground for interpretative understanding, for
because they omit the importance of decoration and because they are if we recognize a physical distinction that was ignored in the past,
“unwieldy,” since the differences they describe are difficult to define or determine that a difference considered important emicly was in
consistently. The latter note that “critical terms either remain undefined fact not clearly distinguished, then we are understanding something
or are applied inconsistently or incorrectly, thus making it difficult to about culture. But in order to recognize these distinctions, we need
use the data from such a study for comparative purposes” (Majewski thorough studies of both the emic and etic. The present study should
and O'Brien, 1987: 105). It seems likely that color and its communica- be classed among the latter, but should not be interpreted as a gen-
tion is one of the critical terms Majewski and O'Brien consider here. eral argument against using the former.
312 J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319

3. The measurement of color problems with the system and its use. Considering the importance of
color to ceramic classification in general and the necessity that such
While color is one of the most obvious attributes of artifacts—indeed, classifications be repeatable, Giardino and colleagues suggest that dif-
of any object—it is also one of the more difficult to measure and commu- ferences in skill, lighting, and viewing geometry often result in inconsis-
nicate. This section provides only the barest of introduction to this field; tent color measurements (Giardino et al., 1998: 477–8). Bishop and
see (Berger-Schunn, 1994; Billmeyer and Saltzman, 1981; Hunt, 1991; colleagues also point to variability in observation conditions as well as
Hunter and Harold, 1987) for more. While color is usually understood the problem of precision (capturing the many shades which fall be-
as being a result of the wavelengths of the light, the perception of tween Munsell chips) and reproducibility (Bishop et al., 1988: 328).
color is more complex, involving a three-part relation of object, light, Hammond asserted that the use of Munsell on ceramics “is decidedly
and observer. an inferior method” due to variability in human perception
Most visible light is composed of many different wavelengths in dif- (Hammond, 1964: 259). Hammond actually suggests the use of a spec-
ferent proportions. Spectrophotometry is the analysis of light, in this trophotometer but laments that one was not available for his study
case visible reflected light, through the creation of spectra describing (1964: 265).
the reflectance percentage of light at different wavelengths. The use of The most often cited problem with Munsell is inconsistency. Frankel
an instrument to make these observations systematizes two of the had different people identify the same sherds repeatedly in different
three parts of color perception—the lightsource (the “illuminant”) and conditions, showing that not only do color determinations differ be-
observation—making measurements dependent only on the object tween people, but also that determinations made between one day
and thus replicable. and the next by the same person may vary, in the end suggesting that
Spectra, while perhaps the most accurate way to record the color of Munsell notations provide a “false sense of accuracy” (Frankel, 1980:
an object, are difficult to work with, and hard to relate to human percep- 36). The reliability of human-generated color-values was tested more
tion. These can be converted, with some caveats, to “tristimulus” values: recently by Giardino and colleagues, who compared human-generated
three-part measurements which approximate the amounts of primary color values of 52 Mississippian-period ceramics to those produced by
colors which can be mixed to achieve a certain color (Billmeyer and a spectroradiometer (similar to a spectrophotometer) and a low-cost
Saltzman, 1981: 38). Tristimulus values are similar to the Munsell hue, color sensor. They concluded that, “there can be considerable variation
chroma, and value measurements that are more familiar to in the perception of an object's color among observers” especially in
archeologists but easier to use for statistical purposes. One of the most the determination of hue (Giardino et al., 1998: 480–1), possibly ac-
common systems for representing color in tristimulus values is known counting for the difference of opinion noted above in section 2.1 as to
as the “CIE L*a*b*” system, which was recommended by the Commission whether pearlware is “blueish” or “greenish.”
Internationale de l'Eclairage (International Commission on Illumination) It should be noted that none of the scholars cited above who have
in 1976; it is generally abbreviated CIELAB. In this system, the “light- used Munsell are unaware of its problems, and most make allowances
ness” of a color is given by “L*” such that a perfect white has the value for these issues in the particular contexts of their own work. This
of 100 and a perfect black zero (Hunt, 1991: 72–3), “a*” represents the paper is not intended as a wholesale critique of these studies, and several
position of a color between red and green, while “b*” indicates the posi- include suggestions to correct or account for some of these issues.
tion between blue and yellow (Hunter and Harold, 1987: 199–200). Gerharz and colleagues note that the surface textures and the homoge-
This system is used here, since calculating differences between two neity of samples complicate color determination judgments and suggest
colors in the CIELAB colorspace is the same as doing three- the use of a “field of color,” comprised of several Munsell color chips, in-
dimensional geometry. stead of the choice of a single Munsell color as a partial corrective
(Gerharz et al., 1988: 91). Stanco and colleagues also note the problem
4. Color measurement in archeology of surface textures along with that of patina and body shape, all of
which alter color perception of ceramics (Stanco et al., 2011: 338–9).
4.1. Munsell They propose a system for determining Munsell color through digital
images which they suggest will counter some of this inconsistency.
Munsell values need no introduction to most archeologists as they are
widely in use to record soil colors in the field. The Munsell system de- 4.2. Mechanical measurement: spectrophotometers and colorimeters
scribes color based on hue (“that quality of color which we describe by
the words red, yellow, green, blue, etc.”) value (lightness or darkness), Surprisingly, considering their relative accessibility and precision,
and chroma (difference from a gray of the same value) (Billmeyer and spectrophotometers and colorimeters (a similar device producing only
Saltzman, 1981: 28). Some studies have extended Munsell to other tristimulus values, often at a lower cost) have been little used in arche-
areas of archeological analysis, such as preserved fabric (Paul, 1990), ology. Thompson and Jakes (2002) used a spectrophotometer as part of
and several have also applied it to ceramics. Most notably Prudence Rice's a study of textile dyeing, exploring colorants suggested by archeological
seminal work, Pottery Analysis, uses Munsell to characterize ceramic color and ethnohistoric evidence, and Strudwick (1991) used a colorimeter to
(Rice, 1987: 339–43) and Anna Shepard suggested it for producing “reli- record colors as part of the documentation of Egyptian tomb paintings
able comparisons” in her classic Ceramics for the Archaeologist (Shepard, in The Valley of the Nobles.
1956: 107, 113). Nance reports using Munsell to classify precolumbian ce- Two earlier studies have employed color measurement equipment
ramics by paste color, and applies this to his analysis of shifts in ceramic to analyze archeological ceramics, although both focus only on lower-
color over time and inter-level mixing (Nance, 1976), while Hammond fired, pre-historic wares. In an exploratory study, Giardino and col-
(1964) used Munsell to assess Nabataean pottery and outlines the leagues analyzed 52 sherds of Mississippian ceramics and compared
Munsell values for particular types. In historical archeology, in addition the results to human-generated color identifications. They also sug-
to Lofstrum et al. (1982), discussed above, which cites Munsell colors gested that there are “exciting prospects for expanding this type of anal-
for both bodies and decorations, Sussman used Munsell values in describ- ysis into the nonvisible (i.e., infrared, thermal, and ultraviolet) portions
ing the blue decoration (not the blueish body) of pearlware, suggesting a of the electromagnetic spectrum” (Giardino et al., 1998: 478). Bishop
change from a dark blue in the eighteenth century to a more “purple- and colleagues used a Minolta chromameter to assess firing variations
toned” blue by the 1820s (Sussman, 1977: 108). through color on Hopi yellow-firing ceramics as part of an effort to cre-
While it may be true that the “Munsell system is the most widely ate more comprehensive analytical groups (Bishop et al., 1988: 328).
used approach by archeologists to analyze color in artifacts” (Giardino Because the examined samples had already been determined to have
et al., 1998: 478), many authors have also pointed to substantial similar iron content through compositional analysis, the variations in
J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319 313

color were thought to result from firing, and the recorded colors were given a centroid, a center-point to the three-dimensional cloud they
used to argue that ancient potters had a greater degree of control over compose, and a variance, the average Euclidian distance from that
ceramic appearance through firing than previously thought. point; these are given in Table 1. All other statistical calculations were
Problems noted with this equipment generally center around its made in R 3.1.1 using the “lda” function found in the package “MASS,”
cost. This remains an issue, with the best equipment costing several and all visualizations were made using the package “ggplot2.” The R
thousand dollars. However, Giardano and colleagues' study just men- code used to iteratively repeat Linear Discriminant Analysis as well as
tioned also evaluated an inexpensive, easily available color sensor and many of the specifications and outputs of the model are provided as
found “good agreement between the reflectance or color measured by supplemental material.
the two instruments over the entire visible spectrum” including in the
analysis of ceramics (Giardino et al., 1998: 480). This opens the possibil-
ity of using less-expensive equipment for measurements which are still 5.1. Linear discriminant analyses
far more replicable than the human eye. In addition, as discussed below,
the present study does not suggest that it is possible or even desirable To understand the nature of these types, data for these 477 sherds
for every archeologically-recovered piece to be measured with this were analyzed using multiple iterations of Linear Discriminant Analysis
equipment. At the moment, the potential for this type of analysis lies (LDA), with the groups generated by prior archeologists as the categor-
in targeted studies aimed at clarifying specific questions and describing ical outcome variable (“ware”), and the 477 L*a*b* values as the predic-
the nature of larger classes of material. tors. LDA is a multivariate statistical technique used to maximize the
separation of two or more a priori (or “given”) categorical groups
5. Materials and methods using continuous multivariate predictors through the maximization of
within and between-group variances (Izenman, 2008: 237–238). The
To this end, this project utilized a Konica Minolta CM-700d spectro- linear combinations of the predictor variables are called discriminant
photometer (loaned for the purpose by Konica Minolta Sensing US) to functions, and the relative contributions of these standardized variables,
take color readings of several hundred archeologically-recovered ce- or coefficients, are used to calculate discriminant functions (Kovarovic
ramic sherds. Ceramics for the project were from the collections of Inde- et al., 2011: 3008). These discriminant functions can then be used to
pendence National Historic Park in Philadelphia and were recovered predict group membership based on the existing data, along with the
from late 18th to mid-19th century archeological contexts as part of posterior probabilities of group membership, and these results can be
the National Constitution Center excavations in 2000–2003. All mea- compared to the original classification in a classificatory table (or “con-
surements were taken on glazed portions of undecorated sherds or fusion matrix”). With this method it is possible to assess whether the
areas of decorated pieces not immediately adjacent to decorations. Al- variables of three-dimensional color readings can be used to achieve
though areas where glaze has pooled are often used in identification, separation between a priori groups, that is, groups whose membership
these are not present on most sherds and the goal of this study was to has already been independently assigned—in this case the groups of
evaluate the color of pieces overall, so areas of pooled glaze such as creamware, pearlware, and whiteware assigned by archeologists.
footrings and drips were avoided as well. Sherds severely discolored A low proportion of overlap between assigned versus predicted
or weathered from their time in the ground or entirely covered in dec- group membership might suggest that there are no distinguishable
oration were not analyzed. groups for those variables or that the variables characterize other
Measurements were taken originally from 699 sherds, but as the groups, if any. In the present study, a low proportion of overlap would
shape of a ceramic surface may affect the readings collected by suggest that the typology under consideration might not have indepen-
refracting reflected light (although our results suggest that the differ- dent empirical foundation using color as the main attribute, and that the
ence is negligible) the analyses described below use only readings within and between-group variation is too high for the ceramic group-
from the slightly-convex sides of sherds, probably the exteriors of ves- ings to be of analytic use: in other words, the mechanically-measured
sels. The final data set includes 477 different pieces of ceramic, 143 of color of these sherds would have little to do with their classification
which had been previously cataloged by project archeologists as based on archeologists' visual intuitions. A high proportion of overlap
whiteware, 137 as pearlware, and 197 as creamware. Those doing the would suggest that the chosen variables are the most parsimonious to
original cataloging were either National Park Service staff or material separate the a priori groupings, and thus that the data represent highly
culture specialists employed by URS Corporation or Kise, Straw, and identifiable (i.e. separable) groups that are a function of their visible
Kolodner, contract firms hired by the Park Service. All contract firm per- color. This does not imply, however, that this method can assess wheth-
sonnel were vetted by NPS employees for relevant expertise and their er those making the identifications were “correct.” Instead, it provides a
catalog entries were supervised and checked by the Park Service. All way to determine whether certain variables can provide a consistent
Park Service employees relevant to this work held at least an MA degree classification for these types, i.e. the extent to which the observations
with a specialization in historical archeology. of color (L*a*b* values) can be classified into these ware groups.
Although it has long been suggested that color for these vessels may LDA has wide application as a data classification tool both within ar-
vary across a single piece, within the confines of a two- or three-square- cheology (Baxter, 1994; Kovarovic et al., 2011) and in other disciplines
inch sherd, repeated measurements produced remarkably consistent (Hammer and Harper, 2006: 96–99, Legendre and Legendre, 2012: 616–
readings. For instance measuring different parts of the same side of 631), and its use in archeology has risen considerably since 2002
one sherd of Creamware three times produced readings with a variance (Kovarovic et al., 2011: 3007). LDA makes several assumptions
of 0.28, and five measurements of another had a variance of 0.43, far concerning the underlying distributions of the variables, including
lower than the variance for the types overall. Five repeated measure-
ments on the same area of one sherd produced a variance of just
0.009. Issues with illuminants will be discussed below, but all results
Table 1
considered here (other than where specifically indicated) are calculated
Centroids and variances for each analyzed type.
with CIE Standard Illuminant D65, which approximates natural
daylight. Centroid L* Centroid a* Centroid b* Variance
Spectra were converted into tristimulus values using Konica Creamware 197 89.44 −0.84 13.31 2.49
Minolta's SpectraMagix NX software, so that each reading consisted of Pearlware 137 86.12 −1.40 9.48 3.43
three numbers which could be analyzed using three-dimensional ge- Whiteware 143 90.03 −0.31 8.99 2.13
All Sherds 477 88.66 −0.84 10.91 3.67
ometry in the CIELAB colorspace. The readings for each ware can be
314 J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319

multivariate normality, homogeneous within-group covariance matri- Table 3


ces, identical and independently distributed variables, and little to no Confusion matrix using illuminant A.

collinearity between variables. LDA also generally performs better Predicted by LDA
when the number of observations exceeds the number of groups and Cream Pearl White % Mismatch
variables.
Assigned by Archeologist Cream 177 1 19 10.2%
Pearl 27 95 15 30.7%
White 9 2 132 7.7%
5.2. Model training
Overall classification error 15.3%.

One of the more important recent critiques of the use of LDA in ar-
cheology is by Kovarovic et al. (2011), who, among other issues, point
to the necessity of cross-validation within a data set, in order to prevent 6.1. The reliability of types
model over-fitting (2011:3009). Model over-fitting occurs when a par-
ticular model cannot generalize to novel or unseen data because the Fig. 1 is a plot of the first two discriminant functions and shows that
original model fits the specific characteristics of the training data set L*a*b* values can be successfully used to separate each of the three ware
too closely. They note that there are two popular methods for groups and predict a priori group membership. In both (Fig. 1A) and
preventing model over-fit. The first is the “hold-one-out” test in which (Fig. 1B), the letters “C,” “P,” and “W” represent Creamware, Pearlware,
a variable portion of the data set is reserved for training and the remain- and Whiteware. In (Fig. 1A) the letters represent the predictions of the
der is used for prediction. “Training” is a process of calculating the dis- LDA model, while in (Fig. 1B) the letters represent the sherds as origi-
criminant functions based on a sample of the data and applying the nally classed by archeologists. The biplot of the two discriminant func-
results to the independent, withheld data to test the reliability of the tions illustrates two separate conclusions: first, that there is
functions. The second is the “leave-one-out” or “jackknife” approach discrimination of groups by instrumentally-measured color. Second,
where one observation is left out, the discriminant functions are then that there is overlap between these groups, especially at the edges of
re-calculated, and the process is repeated for every observation. The the group centroid means (Fig. 3). The discriminant functions derived
present study employs both methods in order to address the issue of from the “hold-one-out” training method closest to the permutation av-
model over-fitting. erage (see supplementary material) classified 83% of the wares to the
The “confusion matrices” (the comparison of the archeologist-classi- ware specified by the archeologist using the L*a*b* values as predictor.
fied to LDA-predicted wares) presented below (Tables 2 and 3) are That is to say, archeologists' visual intuitions were coincident with the
based on the jackknife method. In this case, the jackknife method is pre- groupings based on the spectrophotometer readings about 83% of the
ferred as it is “a more conservative approach with lower but more real- time. If these archeologists' classifications had not been consistent
istic success rates” (Kovarovic et al., 2011: 3009). The plots of the first with L*a*b* values, a much smaller percentage of wares would have
two discriminants presented below (Figs. 1–4) are based on the “hold- been predicted. In Fig. 2, the letters C, P, and W represent sherds
one-out” training method of LDA. In this case, 10% of sherds (or about where the LDA and archeologists agreed on a classification, while the re-
47 sherds) were withheld iteratively without replacement from a ran- maining numbers in bold show sherds predicted by LDA to be of one
dom sample of observations and used as training data. The 10% thresh- ware based on mechanical measurement but assigned by archeologists
old was chosen as it is conservative and attempts to predict the majority to another group.
of the data set classifications from a small subset of it. The remaining While the group centroids are separable, the multivariate 95% confi-
data (90%) were then used to test the accuracy of the LDA predictions dence ellipses, shown in Fig. 3, illustrate considerable overlap between
versus archeologists' predictions, resulting in 10 independent predic- groups. As a whole, ware groups are difficult to separate due to a large
tions on randomly drawn subsets of the same data. To ensure the reli- degree of internal variance, and as can be seen in Fig. 4, which shows nu-
ability of these model accuracy assessments, the random observations merous discrepancies between archeologists' classification and LDA
chosen for training were permuted 1000 times for each of the 10- prediction at the outer edges of overlap between the groups. The extent
prediction averages (see supplemental data). Both the jackknife method to which these outliers are classification cataloging errors versus being
and the “hold-one-out” method produce very similar results, reinforc- the result of actual variation present in ware groups may be one contrib-
ing the reliability of accuracy estimates between archeologists' ware uting factor here, and is indicated in cases where a particular sherd lies
classification and the classification of the model. well within the 95% confidence ellipse of one type only but has been
classified by archeologists to a different type. In this study, a visual in-
spection of these figures suggests that perhaps a dozen sherds might
6. Results be considered cataloging errors of the type which could be eliminated
by spectrophotometric classification, if such analysis was practical.
The collected data were used to address two main questions: the im- However what would be called cataloging errors are not the cause of
portance of the type of illumination used when making the distinctions, most of the disagreements between LDA and archeologists' classifica-
a point made central many discussions of of Munsell, and the nature of tions shown here. It is clear from these data that some sherds cannot
the groups themselves (the extent to which L*a*b* measurements of be classified as falling clearly into one or another of these groups. That
color correspond to the groups of creamware, pearlware, and is, the colors of some of these sherds fall between groups, blurring the
whiteware ceramics). groups at the edges (Fig. 4). It is important to note that this is not neces-
sarily a “mistake” by the original catalogers, and that these pieces have
not necessarily been “mistyped.” Rather, some sherds may not be clearly
Table 2
Confusion matrix using illuminant D65 based on the jackknife method. assignable to one group or another on the basis of color. While other at-
tributes can sometimes be used to make determinations, as discussed
Predicted by LDA
above, the identification of some undecorated sherds without a priori
Cream Pearl White % Mismatch information about dating or manufacturer and which fall between
Assigned by Archeologist Cream 177 1 19 10.2% these LDA groupings is fundamentally ambiguous. The ambiguity of
Pearl 26 96 15 30.1% these types has long been recognized by ceramicists and has led to the
White 7 3 133 6.9% calls to abandon type-based assessment of these materials discussed
Overall classification error 15.3%. above. In counterpoint, our results also suggest that types based on
J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319 315

Fig. 1. Plot of first two discriminant functions of the “hold-one-out” training model closest to the permutation average, where the symbols in (A) represent the ware classification predicted
by the LDA model, and (B) where the symbols represent archeologists' ware classification. The x and y axes represent the first two linear discriminants.
316 J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319

Fig. 2. Distribution of discrepancies between LDA and analysts categorizations, where C(reamware), P(earlware), and W(hiteware) represent the concordance between LDA and a priori
ware assignation, and the remaining numbers represent, via the key, cases where the archeologist's classification differed from the predicted (by LDA) ware.

glaze tinting are not complete fantasies either; real patterning does Interestingly, this blurriness does not apply to all types equally. As
exist when the color of these ceramics is assessed with replicable shown in Table 2, only 7% of whiteware and 10% of creamware, respec-
measurements. tively, were classified differently by LDA and the archeological techni-
The data collected here allow us to put a number on this ambiguity. cians, showing that archeologists' identification and L*a*b* values are
Using the L*a*b* values as predictor, the discriminant functions of the tightly matched. More problematic is that 30% of pearlware was not pre-
more conservative “leave-one-out” jackknife method classified 85% of dicted by LDA as belonging to archeologists' initial ware classification.
the wares, or 406 out of 477, to the ware specified by the archeologist. The variability is probably based on the fact that the colored glaze that
That is to say, archeologists' visual intuitions can be explained as coinci- is key to defining pearlware is created by the inclusion in the glaze of co-
dent with more repeatable methods about 85% of the time. In theory, balt, which could be added in any amount, resulting in a greater or lesser
approximately 15% of the ceramics known as cream, pearl, and degree of “bluing.” Some different categorizations might be a result of
whitewares should equally fall into this “blurry” category, which prob- other factors besides glaze tinting being involved in identification of
ably includes a small number of potential cataloging errors as well as types by archeologists, particularly for pearlware. For instance, Miller
sherds whose color falls between two or more groups. argues that blue-tinted glazes should be distinguished into at least

Fig. 3. Ellipses of the confidence level of each group at the 95% level based on a multivariate t-distribution — the symbols represent the predicted associations based on the LDA model.
J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319 317

Fig. 4. Ellipses of the confidence level of each predicted group at the 95% level based on a multivariate t-distribution — the symbols represent the discrepancies between the LDA model and
analysts' prior categorization.

three (emic, potters') categories: pearlware, “China glaze” imitating illuminant used in making observations has a significant effect on the
Chinese forms and decorations, and even whitewares if the tint is min- results of those observations in the case of these particular ceramic
imal, the piece late, and the intention of the potter is to make a whiter types. The “leave-one-out” jackknife method was used for cross-
ware not to imitate porcelain (Miller, 1993). However if this is so, it validation on the same 477 measurements using a different illuminant
calls into question the overlap of these two attributes in the typology, than the D65 one used in the above discussions: CIE Illuminant “A”
color and decoration; this remains to be assessed in another study. which approximates incandescent artificial light. Again, the model pre-
In addition, we suggest that the influence of other attributes on dicted the ware to which a given piece would be classified given its
archeologists' categorizations, and thus possibly on disjunctions be- L*a*b* values, this time a set generated using illuminant A, and com-
tween the L*a*b* values and archeologists classifications, is likely to be pared these to the actual classifications made by archeologists. The re-
minimal for two reasons. The first is that this study used primarily sults are given in Table 3, and show a net change of only three sherds
(though not exclusively) undecorated individual sherds rather than of 477 (0.6%) classified differently as a result of the different illuminant.
complete vessels, where glaze color was the only attribute available Only 6 individual sherds in total were classified differently depending
for classification between the three types considered here. The ques- on the illuminant (1.2%). This is substantially smaller than the overall
tions of time and potters' intent were not available to either the cata- classification error of 15% inherent in these materials established in
logers or the LDA. Second is that when decoration or form were the last section. This suggests that illuminant may have far less of a
considered by archeologists, China Glaze was distinguished only as a role in the identification of these particular types than thought. Put sim-
variant of pearlware by the cataloging protocol in use by NPS. Though ply, the additional precision of uniform lighting is outweighed by the in-
incommensurate with emic potters' typologies, this makes sense from herent uncertainty of these particular types.
an archeological perspective, as undecorated sherds could be consid-
ered part of either type. Therefore the classifications by LDA and 7. Implications and conclusions
archeologists considered here should be largely commensurate, and
the results given above indicative of color variation. It is quite clear that “wares are not static entities” (Majewski and
O'Brien, 1987: 106) and that the differences between creamware,
6.2. Metamerism and illuminants pearlware, and whiteware are the “result of an evolution of one type
out of another” (Miller, 1980: 2). This project has clarified the extent
Color scientists often remind us that the phenomenon of color is and nature of the variation inherent in these types as a result of this pro-
made up of three parts: light, object, and observer. Differences in per- cess of change, showing that fully 85% of the time sherds were repeated-
ceived color can be related to any of these. “Metamerism” occurs when ly classifiable based on color. As was noted above, this project has not
changes in illuminant (light source) or observer make two objects attempted to determine if those making the initial identifications were
which appeared to have the same color no longer look the same. This oc- “correct” but rather to assess the extent to which a consistent classifica-
curs because color spectra are more complex than either tristimulus tion can be made at all for these types: the extent to which observations
values such as L*a*b* numbers or human perception: “even though of color produce identifiable groups. Our results show that there is in-
two objects may have different spectral reflectance curves, they match herent ambiguity, but we have quantified the ambiguity of these types
if their CIE tristimulus values are the same for a given illuminant and ob- and this allows its implications to be considered.
server” (Billmeyer and Saltzman, 1981: 53). So two objects, say a ceramic One example is in mean ceramic dating (MCD), one of the most com-
sherd and a Munsell chip, may appear to be identical under one light and mon uses of the typology considered here. This is the calculation of an
quite different under another. average production age for a ceramic assemblage based on the frequen-
As a result, lighting conditions for color comparisons are given sub- cy and mean production date of each type included (South, 1977). The
stantial discussion in color science works (Hunter and Harold, 1987) MCD technique is, it should be noted, a rough measurement which
and have been noted by archeologists concerned with color analysis has been much discussed in the field (see Chenoweth, in press).
(Bishop et al., 1988: 328, Giardino et al., 1998: 477–8, Shepard, 1956: Under certain circumstances and with certain assumptions, this sug-
111). This project was in a position to analyze whether or not the gests a possible mean occupation date for a site. In other cases,
318 J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319

Table 4
Mean ceramic date calculations (totals based on percentages do not total precisely due to rounding to whole sherds, since fractional sherds are not meaningful units).

Archeologist's classifications LDA classifications Confident arch. IDs Questionable arch. IDs Possible extreme old Possible extreme new

Creamware (1762–1820) 197 210 177 20 (10%) 250 177


Pearlware (1775–1830) 137 100 95 42 (30%) 95 95
Whiteware (1820–1900) 143 167 132 11 (7%) 132 205
Total 477 477 404 73 477 477

MCD 1815.7 1818.1 1812.88 1823.46


BLUE MCD 1807.3 1808.9 1805.1 1813.2

mismatch between MCD and known occupation dates suggests some- typology. This result may also be another reason to move away from
thing important about human behavior (Turnbaugh and Turnbaugh, the traditional calculation of MCDs and rely instead on a range of
1977). Table 4 shows MCD calculations on the sample used in this dates, as has been occasionally suggested (e.g. Bartovics, 1981), or con-
study derived from archeologist's classifications and from the assign- duct multiple versions of the MCD analysis and report and compare all
ments made by LDA based on the spectrophotometric data. The dates these results as a group (Chenoweth, in press).
are taken from South (1977) but that for pearlware is updated to reflect Beyond the implications for studies of creamware, pearlware, and
more recent work by Miller (1987) and Barker (1991). whiteware in particular, this project has aimed to contribute to a
We also incorporate a newer MCD technique based on finding the broader discussion of how archeological data are collected and analyzed
best linear unbiased estimator, producing what is called the “BLUE in the twenty-first century, particularly for historical archeology. This
MCD.” This method was developed in archeological analysis by Fraser analysis, ultimately, raises as many questions as it answers, as all scien-
Neiman and Karen Smith (Neiman and Smith, 2005) and has been tific endeavors should. The question of how color varies across a single
used to counteract the problem of skewing due to uneven manufactur- complete piece, how much it varies over time and the precise chronol-
ing ranges in ceramic types by Galle (2010, 2011) and Arendt (2013). ogy of these changes (as with the lessening of the “creaminess” of
However, there is an error in the published version of the equation in creamware or the whitening of pearlware), the relationship of body
all of these sources. The correct version of the equation is given below. color measured here with decoration and form, the relationship be-
tween emic and etic typologies, and a comparison of these results
 2 with other materials science techniques, such as those measuring hard-
t
∑i¼1 mi pi si1=6
¼ ness, porosity, and chemical composition: these are all areas for further
MCDBLUE  2 :
t
∑i¼1 pi si1=6 analysis raised by what we have presented here.
While archeology is and always will be an interpretive endeavor,
when addressing these questions and others we can ensure that our in-
It should be noted that, as far as we can determine, there are no terpretations are based on data created by understandings, definitions,
problems with other aspects of the data or calculations in the cited stud- and observations which can be repeated and are clearly communicated.
ies, but simply that the equation was misprinted. The technique works Instruments manufactured to conduct repeatable observations, here of
by incorporating into the equation the overall range of the dates of pro- color, can be used to create observations that are comparable across
duction, so that those types with extremely long manufacturing spans researchers.
such as tin-enameled wares (often called “delft”) play less of a factor
than more tightly-dated materials.
Acknowledgments
The LDA model presented above was similar to the classifica-
tions of archeologists 85% of the time, but about 15%—mainly
The authors would like to thank Jed Levin and the staff of Indepen-
pearlware—disagreed. For this sample, based on the separately-
dence National Historic Park for access to their collections. Thanks also
measured ambiguities of each type (creamware 10.2%, pearlware
to Douglas Mooney for information about cataloging. Funding for data
30.1% and whiteware 6.9%) and the archeologists' original identifi-
collection was provided by fellow research funds from the Thinking
cations, we can expect about 404 sherds to be confidently identi-
Matters (formerly “Introduction to the Humanities” or IHUM) program
fied, while about 73 will fall somewhere in between groups. If all
of Stanford University. We would also like to thank Dan Foley and
of these un-certain sherds were considered to be creamware, the
Konica Minolta Sensing, US, for the loan of equipment for the analysis,
chronologically earliest type, the MCD would be about 1813 and
Dustin Stansbury for his statistical advice, and Hans Bernard for his
the BLUE MCD 1805, while if these were all counted as whiteware,
thoughts on this manuscript. Thanks also to Jillian Galle and Alan
the latest type, the BLUE MCD would be closer to 1813 and the
Wiggins for discussions of the BLUE MCD technique.
traditional MCD 1823..
This difference is minor, but it is not suggested here that the spectro-
photometer data reveal a “correct” classification, and therefore that one References
of these dates is correct and the other an error. Rather, this study has
Arendt, B., 2013. Return to Hopedale: excavations at Anniowaktook Island, Hopedale, Lab-
provided a way to independently assess the ambiguity inherent in the rador. Canadian Journal of Archaeology/Journal Canadien d'Archéologie 37, 302–330.
attribute of color, and this MCD data suggests the implications of that Barker, D., 1991. William Greatbatch: A Staffordshire Potter. Jonathan Horne Publications,
ambiguity. In other words, the actual MCD for this assemblage could London.
Bartovics, A.F., 1981. The Archaeology of Daniels Village: An Experiment in Settlement Ar-
fall anywhere between these dates not because of errors in identifica- chaeology (Ph.D. dissertation) Department of Anthropology, Brown University, Prov-
tion but because of the ambiguity of these types themselves. idence, RI.
Of course, this ceramic dating analysis is merely illustrative and not Baxter, M.J., 1994. Stepwise discriminant analysis in archaeometry: a critique. J. Archaeol.
Sci. 21, 659–666.
generalizable until it is considered in the context of actual test cases,
Berger-Schunn, A., 1994. Practical Color Measurement: A Primer for the Beginner, a Re-
since it is based on an artificially selected assemblage. For many studies, minder for the Expert, Wiley, New York.
the potential variation in MCD or other analyses caused by this issue Billmeyer Jr., F.W., Saltzman, M., 1981. Principles of Color Technology. Second ed. John
may be insignificant. Other studies will want to consider the possibility Wiley & Sons, New York.
Bishop, R.L., Canouts, V., Atley, S.P.D., Qöyawayma, A., Aikins, C.W., 1988. The formation of
that their classifications of these materials have this ambiguity, which in ceramic analytical groups: Hopi Pottery Production and Exchange, a. C. 1300–1600.
turn produces variability in MCD and other calculations based on this Journal of Field Archaeology 15, 317–337.
J.M. Chenoweth, A. Farahani / Journal of Archaeological Science: Reports 4 (2015) 310–319 319

Chenoweth, J.M., 2015. Collecting, Updating, and Building on the Classics, in: Chenoweth. Miller, G.L., 1987. Origins of Josiah Wedgwood's “pearlware". Northeast Historical Archae-
In: M., J. (Ed.), The Historical Archaeology Laboratory Handbook. Society for Historical ology 16, 83–95.
Archaeology, Germantown, MD. in press. Miller, G.L., 1993. Thoughts Towards A User's Guide to Ceramic Assemblages, Part IV:
Ford, J.A., 1954. On the concept of types. Am. Anthropol. 56. Some Thoughts on Classification of White Earthenwares, Council for Northeast His-
Frankel, D., 1980. Munsell colour notation in ceramic description: an experiment. Aust. torical Archaeology Newsletter. 26, pp. 4–7.
Archaeol. 33-37. Miller, G.L., Hunter, R., 2001. How creamware Got the blues: the origins of China glaze and
Galle, J.E., 2010. Costly signaling and gendered social strategies among slaves in the eigh- pearlware. Ceramics in America 2001, 135–161.
teenth-century chesapeake: an archaeological perspective. Am. Antiq. 75, 19–43. Nance, C.R., 1976. Artifact attribute covariation as the product of inter-level site mixing.
Galle, J.E., 2011. Assessing the Impacts of Time, Agricultural Cycles, and Demography on Midcont. J. Archaeol. 1, 229–235.
the Consumer Activities of Enslaved Men and Women in Eighteenth-Century Neiman, F.D., Smith, K., 2005. How Can Bayesian Smoothing and Correspondence Analysis
Jamaica and Virginia. In: Delle, J.A., Hauser, M., Armstrong, D.V. (Eds.), Out of Many, Help Decipher the Occupational Histories of Late-eighteenth Century Slave Quarters
One People: The Historical Archaeology of Colonial Jamaica. The University of Ala- at Monticello? Paper presented at the Society for American Archaeology conference,
bama Press, Tuscaloosa, pp. 211–242. Salt Lake City, UT. [Available online in poster form: http://www.monticello.org/sites/
Gerharz, R.R., Lantermann, R., Spennemann, D.R., 1988. Munsell color charts: a necessity default/files/media/temp/Bayesian%20Smoothing%20and%20Correspondence%
for archaeologists? Aust. J. Hist. Archaeol. 6, 88–95. 20Analysis%20Poster_1.pdf].
Giardino, M., Miller, R., Kuzio, R., Muirhead, D., 1998. Analysis of ceramic color by spectral Noël Hume, I., 1969. A Guide to Artifacts of Colonial America. Knopf, New York.
reflectance. Am. Antiq. 63, 477–483. Paul, A., 1990. The use of color in paracas necropolis fabrics: what does it reveal about the
Hammer, Ø., Harper, D.A.T., 2006. Paleontological Data Analysis. Blackwell Pub, Malden, organization of dyeing, designing, and society? Natl. Geogr. Res. 6, 7–21.
MA. Price, C.R., 1979. 19th Century Ceramics in the Eastern Ozark Border Region. Center for Ar-
Hammond, P.C., 1964. The physical nature of nabataean pottery. Am. J. Archaeol. 68, chaeological Research, Southwest Missouri State University, Springfield, MO.
259–268. Rice, P.M., 1987. Pottery Analysis. University of Chicago Press, Chicago.
Hunt, R.W.G., 1991. Measuring Colour, 2nd ed. ed., E. Horwood, New York. Shepard, A.O., 1956. Ceramics for the Archaeologist. Carnegie Institution of Washington,
Hunter, R.S., Harold, R.W., 1987. The Measurement of Appearance. Second Edition. John Washington, DC.
Wiley and Sons, New York. South, S., 1977. Method and Theory in Historical Archaeology. Academic Press, New York.
Izenman, A.J., 2008. Modern Multivariate Statistical Techniques: Regression, Classifica- Spaulding, A.C., 1953. Statistical techniques for the discovery of artifact types. Am. Antiq.
tion, and Manifold Learning. Springer, New York. 18, 305–313.
Kovarovic, K., Aiello, L.C., Cardini, A., Lockwood, C.A., 2011. Discriminant function analyses Stanco, F., Tanasi, D., Bruna, A., Maugeri, V., 2011. Automatic color detection of archaeo-
in archaeology: are classification rates too good to be true? J. Archaeol. Sci. 38, logical pottery with munsell system. Proceedings of the International Conference
3006–3018. on Image Analysis and Processing 16, 337–346.
Legendre, P., Legendre, L., 2012. Numerical Ecology. Elsevier Science, Burlington. Strudwick, N., 1991. An objective colour-measuring system for the recording of egyptian
Lofstrum, E., Tordoff, J.P., George, D.C., 1982. The seriation of historic earthenwares in the tomb paintings. J. Egypt. Archaeol. 77, 43–56.
Midwest, 1780–1870. Minnesota Archaeologist 41, 3–29. Sussman, L., 1977. Changes in pearlware dinnerware, 1780–1830. Hist. Archaeol. 11,
Majewski, T., O'Brien, M.J., 1987. The use and misuse of nineteenth-century English and 105–111.
American ceramics in archaeological analysis. Advances in Archaeological Method Thompson, A.J., Jakes, K.A., 2002. Replication of textile dyeing with sumac and bedstraw.
and Theory 11, 97–209. Southeast. Archaeol. 21, 252–256.
Miller, G.L., 1980. Classification and economic scaling of 19th-century ceramics. Hist. Turnbaugh, W., Turnbaugh, S.P., 1977. Alternative applications of the mean ceramic date
Archaeol. 14, 1–40. concept for interpreting human behavior. Hist. Archaeol. 11, 90–104.

You might also like