Professional Documents
Culture Documents
6, JUNE 2008
Abstract—Local learning methods, such as local linear regres- regression estimate is bounded by the variance of the measure-
sion and nearest neighbor classifiers, base estimates on nearby ment noise.
training samples, neighbors. Usually, the number of neighbors We apply our proposed adaptive local linear regression to
used in estimation is fixed to be a global “optimal” value, chosen
by cross validation. This paper proposes adapting the number of printer color management. Color management refers to the task
neighbors used for estimation to the local geometry of the data, of controlling color reproduction across devices. Many com-
without need for cross validation. The term enclosing neighbor- mercial industries require accurate color, for example, for the
hood is introduced to describe a set of neighbors whose convex production of catalogs and the reproduction of artwork. In addi-
hull contains the test point when possible. It is proven that en- tion, the rising ubiquity of cheap color printers and the growing
closing neighborhoods yield bounded estimation variance under
some assumptions. Three such enclosing neighborhood definitions sources of digital images has recently led to increased consumer
are presented: natural neighbors, natural neighbors inclusive, demand for accurate color reproduction.
and enclosing k-NN. The effectiveness of these neighborhood Given a CIELAB color that one would like to reproduce, the
definitions with local linear regression is tested for estimating color management problem is to determine what RGB color one
lookup tables for color management. Significant improvements in must send the printer to minimize the error between the desired
error metrics are shown, indicating that enclosing neighborhoods
may be a promising adaptive neighborhood definition for other CIELAB color and the CIELAB color that is actually printed.
local learning tasks as well, depending on the density of training When applied to printers, color management poses a particularly
samples. challenging problem. The output of a printer is a nonlinear func-
tion that depends on a variety of nontrivial factors, including
Index Terms—Color, color image processing, color manage-
ment, convex hull, linear regression, natural neighbors, robust printer hardware, the halftoning method, the ink or toner, paper
regression. type, humidity, and temperature [6]–[8]. We take the empirical
characterization approach: regression on sample printed color
patches that characterize the printer.
Other researchers have shown that local linear regression is
a useful regression method for printer color management, pro-
L OCAL learning, which includes nearest neighbor (NN)
classifiers, linear interpolation, and local linear regression,
has been shown to be an effective approach for many learning
ducing the smallest reproduction errors when compared
to other regression techniques, including neural nets, polyno-
tasks [1]–[5], including color management [6]. Rather than fit- mial regression, and tetrahedral inversion [6, Section 5.10.5.1].
ting a complicated model to the entire set of observations, local In that previous work, the local linear regression was performed
learning fits a simple model to only a small subset of observa- over neighborhoods of NN, a heuristic known to pro-
tions in a neighborhood local to each test point. An open issue duce good results [9].
in local learning is how to define an appropriate neighborhood This paper begins with a review of local linear regression in
to use for each test point. In this paper, we consider neighbor- Section I. Then, neighborhoods for local learning are discussed
hoods for local linear regression that automatically adapt to the in Section II, including our proposed adaptive neighborhood
geometry of the data, thus requiring no cross validation. The definitions. The color management problem and experimental
neighborhoods investigated, which we term enclosing neighbor- setup are discussed in Section III and results are presented in
hoods, enclose a test point in the convex hull of the neighbor- Section IV. We consider the size of the different neighborhoods
hood when possible. We prove that if a test point is in the convex in Section V, both theoretically and experimentally. The paper
hull of the neighborhood, then the variance of the local linear concludes with a discussion about neighborhood definitions for
learning.
Manuscript received May 2, 2007; revised February 19, 2008. This work was I. LOCAL LINEAR REGRESSION
supported in part by an Intel GEM fellowship, in part by the CRA-W DMP, and
in part by the Office of Naval Research, Code 321, under Grant N00014-05-1- Linear regression is widely used in statistical estimation. The
0843. The associate editor coordinating the review of this manuscript and ap- benefits of a linear model are its simplicity and ease of use,
proving it for publication was Dr. Reiner Eschbach.
M. R. Gupta and E. K. Garcia are with the Department of Electrical Engi-
while its major drawback is its high model bias: if the underlying
neering, University of Washington, Seattle, WA 98195 USA (e-mail: gupta@ee. function is not well approximated by an affine function, then
washington.edu; eric.garcia@gmail.com). linear regression produces poor results. Local linear regression
E. Chin is with the Department of Electrical Engineering and Computer Sci- exploits the fact that, over a small enough subset of the domain,
ence, University of California at Berkeley, Berkeley, CA 94720 USA (e-mail:
erika.chin@gmail.com). any sufficiently nice function can be well approximated by an
Digital Object Identifier 10.1109/TIP.2008.922429 affine function.
1057-7149/$25.00 © 2008 IEEE
GUPTA et al.: ADAPTIVE LOCAL LINEAR REGRESSION WITH APPLICATION TO PRINTER COLOR MANAGEMENT 937
Suppose that, for an unknown function , we feature space is the CIELAB colorspace, which was painstak-
are given a set of inputs , where ingly designed to be approximately perceptually uniform with
and outputs , where . The goal is three feature dimensions that are approximately perceptually
to estimate the output for an arbitrary test point . To orthogonal.
form this estimate, local linear regression fits the least-squares Other approaches to defining neighborhoods have been based
hyperplane to a local neighborhood of the test point, on relationships between training points. In the symmetric k-NN
, where rule, a neighborhood is defined by the test sample’s k-NN plus
those training samples for which the test sample is a k-NN
(1) [21]. Zhang et al. [22] called for an “intelligent selection of
instances” for local regression. They proposed a method called
-surrounding neighbor ( -SN) with the ideal of selecting a
The number of neighbors in plays a significant role in the
preset number of training points that are close to the test
estimation result. Neighborhoods that include too many training
point, but that are also “well-distributed” around the test point.
points can result in regressions that oversmooth. Conversely,
Their -SN algorithm selects NNs in pairs: first, the NN
neighborhoods with too few points can result in regressions with
not yet in the neighborhood is selected, then the next NN that
incorrectly steep extrapolations. One approach to reducing the
is farther from than it is from the test point is added to the
estimation variance incurred by small neighborhoods is to regu-
neighborhood. Although this technique locally adapts to the
larize the regression, for example by using ridge regression [5],
spatial distribution of the training samples, it does not offer
[10]. Ridge regression forms a hyperplane fit as in (1), but the
a method for adaptively choosing the neighborhood size .
coefficients instead minimize a penalized least-squares cri-
Another spatially based approach uses the Gabriel neighbors of
teria that discourages fits with steep slopes. Explicitly
the test point as the neighborhood [23], [24, p. 90].
(2) We present three neighborhood definitions that automatically
specify based on the geometry of the training samples and
show how these neighborhoods provide a robust estimate in
where the parameter controls the trade-off between min- the presence of noise. Because each of the three neighborhood
imizing the error and penalizing the magnitude of the coef- definitions attempt to “enclose” the test point in the convex
ficients. Larger results in lower estimation variance, but hull of the neighborhood, we introduce the term enclosing
higher estimation bias. Although we found no literature using neighborhood to refer to such neighborhoods. Given a set of
regularized local linear regression for color management, its training points and test point , a neighborhood is an
success for other applications motivated its inclusion in our enclosing neighborhood if and only if when
experiments. . Here, the convex hull of a set
with elements is defined as ,
II. ENCLOSING NEIGHBORHOODS . The intuition behind regression on an enclosing
For any local learning problem, the user must define what neighborhood is that interpolation provides a more robust es-
is to be considered local to a test point. Two standard methods timate than extrapolation. This intuition is formalized in the
each specify a fixed constant: either in the form of the number following theorem.
of neighbors , or the bandwidth of a symmetric distance-de- Theorem 1: Consider a test point and a neighborhood
caying kernel. For kernels such as the Gaussian, the term “neigh- of training points where . Suppose
borhood” is not quite as appropriate, since all training samples and each are drawn independently and identically from a
receive some weight. However, a smaller bandwidth does corre- sufficiently nice distribution, such that all points are in general
spond to a more compact weighting of nearby training samples. position with probability one. Let and
Commonly, the neighborhood size or the kernel bandwidth is , where , and let each
chosen by cross validation over training samples [5]. For many component of the additive noise vector
applications, including the printer color management problem be independent and identically distributed according to
considered in this paper, cross validation can be impractical. a distribution with finite mean and finite variance . Given the
Consider that even if some data were set aside for cross vali- measurements , consider the linear estimate
dation, patches would have to be printed and measured for each , where solve
possible value of . This makes cross validation over more than
a few specific values of highly impractical. Instead, it will be (3)
useful to define a neighborhood that locally adapts to the data,
without need for cross validation.
Then, if , the estimation variance is bounded by
Prior work in adaptive neighborhoods for k-NN has largely
focused on locally adjusting the distance metric [11]–[20]. The
rationale behind these adaptive metrics is that many feature
spaces are not isotropic and the discriminability provided by
each feature dimension is not constant throughout the space.
However, we do not consider such adaptive metric techniques The proof is given in Appendix A. Note that if
appropriate for the color management problem because the for training set , then the enclosing neighborhood cannot
938 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 6, JUNE 2008
A. Building the LUTs Photo R300 (ink jet) with Epson Matte Heavyweight Paper
and third-party ink from Premium Imaging, and a Ricoh Aficio
The three 1-D LUTs enact gray-balance calibration, lin-
1232C (laser engine) with generic laser copy paper. Color
earizing each RGB channel independently. This enforces that
measurements of the printed patches were done with a Gretag-
input neutral RGB color values with will
Macbeth Spectrolino spectrophotometer at a 2 observer angle
print gray patches (as measured in CIELAB). That is, if one
with D50 illumination.
inputs the RGB colors for , the
In our experiments, the calibration and characterization LUTs
1-D LUTs will output the RGB values that, when printed,
are estimated using local linear regression and local ridge re-
correspond approximately to uniformly-spaced neutral gray
gression over the enclosing neighborhood methods described in
steps in CIELAB space. Specifically, for a given neighborhood
Section II and a baseline neighborhood of 15 NNs, which is a
and regression method, the 918 sample Chromix RGB color
heuristic known to produce good results for this application [9].
chart is printed and measured to form the training pairs.
All neighborhoods are computed by Euclidean distance in the
Next, the axis of the CIELAB space is sampled with 256
CIELAB colorspace and the regression is made well posed by
evenly-spaced values
adding NNs if necessary to ensure a minimum of four neigh-
to form incremental shades of gray. For each , a
bors. As analyzed in Section V, the enclosing k-NN neighbor-
neighborhood is constructed and three regressions on
hood is expected to have roughly seven neighbors, where the
, , and fit locally
word “roughly” is used to capture the fact that the assumptions
linear functions for . Finally, the 1-D
of Theorem 2 (see Section V-A) do not hold in practice. The
LUTs are constructed with the inputs and outputs
expected small size of enclosing k-NN neighborhoods led us to
, where correspond to the three
also implement a variation of the enclosing k-NN neighborhood
1-D LUTs.
which uses a minimum of neighbors: this is achieved by
The effect of the 1-D LUTs on the training data must be taken
adding NNs to the enclosing k-NN neighborhood if there are
into account before the 3-D LUT can be estimated. The training
fewer than 15. Note that this variant is also an enclosing neigh-
set is adjusted to find that, when input to the 1-D LUTs, re-
borhood, but ensures smoother regressions than the enclosing
produces the original . These adjusted training sample pairs
k-NN neighborhood.
are then used to estimate the 3-D LUT (Note: In our
The ridge parameter in (2) was fixed at for all the
process, all the LUTs are estimated from one printed test chart,
experiments. This parameter value was chosen based on a small
as is done in many commercial ICC profile building services.
preliminary experiment, which suggested that values of from
More accurate results are possible by printing a second test chart
to would produce similar results. Note that the
once the 1-D LUTs have been estimated, where the second test
effect of the regularization parameter is highly nonlinear, and
chart is sent through the 1-D LUTs before being sent to the
that steeper slopes (higher values of ) are more strongly af-
printer).
fected by the regularization. It is common wisdom that a small
The 3-D LUT has regularly spaced gridpoints .
amount of regularization can be very helpful in reducing esti-
For the 3-D LUTs in our experiment, we used a
mation variance, but larger amounts of regularization can cause
grid that spans the CIELab color space with and
unwanted bias, resulting in oversmoothing.
. Previous studies have shown that a finer
To compare the color management systems created by each
sampling than this does not yield a noticeable improvement in
neighborhood and regression method, 729 RGB test color
accuracy [6]. For each , its neighborhood is
values were drawn randomly and uniformly from the RGB
determined, and regression on , ,
colorspace, printed on each printer, and measured in CIELAB.
and fits the locally linear functions
These measured CIELAB values formed the test samples.
for .
This process guaranteed that the CIELAB test samples were
Once estimated, the LUTs can be stored in an ICC profile.
in the gamut for each printer, but each printer had a slightly
This is a standardized color management format, developed
different set of CIELAB test samples. The test samples were
by the International Color Consortium (ICC). Input CIELAB
then input as the “Desired CIELAB” values to test the accuracy
colors that are not a gridpoint of the 3-D LUT are interpolated.
of each estimated LUT, as shown in Fig. 2. Each estimated LUT
The interpolation technique is not specified in the standard; our
produced estimated RGB values that, when sent to the printer,
experiments used trilinear interpolation [29], a 3-D version
would ideally yield the test sample CIELAB values. The dif-
of the common bilinear interpolation. This interpolation tech-
ferent estimated RGB values were sent to the printer, printed,
nique is computationally fast, and optimal in that it weights
measured in CIELAB, and the error was computed with
the neighboring grid points as evenly as possible while still
respect to the test sample CIELAB values. The error
solving the linear interpolation equations by choosing the
metric is one standard way to measure color management error
maximum entropy solution to the linear interpolation equations
[6].
[3, Theorem 2, p. 776]. IV. RESULTS
Tables I–III show the average error and 95th percentile
B. Experimental Setup
error for the three printers for each neighborhood definition,
The different regression methods were tested on three and each regression method. In addition, we discuss in this sec-
printers: an Epson Stylus Photo 2200 (ink jet) with Epson tion which differences are statistically significantly different as
Matte Heavyweight Paper and Epson inks, an Epson Stylus judged at the .05 significance level by Student’s matched-pair
940 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 6, JUNE 2008
TABLE I
1E ERRORS FROM THE RICOH AFICIO 1232C
TABLE II
1E ERRORS FROM THE EPSON PHOTO STYLUS 2200
t-test. These three metrics (average, 95th percentile, and sta- eliminates over 10% of the 95th percentile error. Thus, the
tistical significance) summarize different aspects of the results, adaptive methods and the regularized regression make a clear
and are complementary in that good performance with respect difference for nonlinear color transformations.
to one of the three metrics does not necessarily imply good per- For the Ricoh laser printer, the lowest average error and
formance with respect to the other two metrics. The baseline is lowest 95th percentile error are produced by enclosing k-NN
the neighbors with local linear regression. Small errors minimum 15 (ridge): the 95th percentile error is reduced by
may not be noticeable; though noticeability varies throughout 21% over 15 neighbors (ridge), and the total error reduction is
the color space and between people, errors under are 31% over the baseline of 15 neighbors (linear). Enclosing k-NN
generally not noticeable. minimum 15 (ridge) is statistically significantly better than all
The Ricoh laser printer is the least linear of the three printers, other methods for this printer. These results suggest that highly
likely due to the printing instabilities that are common with nonlinear color transforms can be effectively modeled by local
high-speed laser printers. For the Ricoh, all of the enclosing regression using a lower bound on the number of neighbors
neighborhoods have lower average error and lower 95th per- (to keep estimation variance low) but allowing possibly more
centile error than the baseline of 15 neighbors and linear neighbors depending on their spatial distribution.
regression. Further, all of the methods were statistically sig- On the Epson 2200 all of the enclosing neighborhoods have
nificantly better than the 15 neighbors baseline, except for lower average and 95th percentile error than the baseline of 15
enclosing k-NN (linear), which was not statistically signifi- neighbors (linear). However, only enclosing k-NN (ridge), en-
cantly different. Changing to ridge regression for 15 neighbors closing k-NN minimum 15 (ridge), or natural neighbors (ridge)
GUPTA et al.: ADAPTIVE LOCAL LINEAR REGRESSION WITH APPLICATION TO PRINTER COLOR MANAGEMENT 941
TABLE III
1E ERRORS FROM THE EPSON PHOTO STYLUS R300
were statistically significantly better (the other methods were V. SIZES OF ENCLOSING NEIGHBORHOODS
not statistically significantly different). The natural neighbors
(ridge) is statistically significantly better than all of the other Enclosing neighborhoods adapt the size of the neighborhood
methods except for enclosing k-NN (linear and ridge), which are to the local spatial distribution of the training and test sample. In
not statistically significantly different. Enclosing k-NN (ridge) this section, we consider the key question, “How many neigh-
is statistically significantly better than all of the other methods bors are in the neighborhood?” We consider analytic and experi-
except for natural neighbors (ridge). These results are consistent mental answers to this question, and how the neighborhood size
with the Ricoh results in that enclosing neighborhoods coupled will relate to the estimation bias and variance.
with ridge regression provide significant benefit.
The Epson R300 inkjet fits the locally linear model well, as A. Analytic Size of Neighborhoods
evident in the low errors across the board and the small average
and 95th percentile error differences between methods. Here, Asymptotically, the expected number of natural neighbors
few methods are statistically significantly different, but the nat- is equal to the expected number of edges of a Delaunay tri-
ural neighbors inclusive is statistically significantly worse than angulation [27]. A common stochastic spatial model for an-
the other neighborhood methods, including the baseline. We hy- alyzing Delaunay triangulations is the Poisson point process,
pothesize that because the natural neighbors inclusive creates in which assumes that points are drawn randomly and uniformly
some instances very large neighborhoods, this increase in error such that the average density is points per volume . Given
may be caused by the bias of oversmoothing. the Poisson point process model, the expected number of natural
We have presented and discussed our results in terms of the neighbors is known for low dimensions: six neighbors for two
error because it is considered a more accurate error func- dimensions, neighbors for three dimen-
tion for color management than (Euclidean distance in sions, and neighbors for four dimensions [27].
CIELAB) [6]. In (1) and (2), we minimize the Euclidean error The following theorem establishes the expected number of
in CIELAB, because this leads to a tractable objective, whereas neighbors in the enclosing k-NN neighborhood if the training
minimizing error does not. The errors were also samples are sampled from a uniform distribution over a hyper-
calculated and compared to the errors. The results were sphere about the test sample.
very similar in terms of the rankings of the regression methods Theorem 2 (Asymptotic Size of Enclosing k-NN): Suppose
and the statistically significant differences. training samples are uniformly sampled from a distribution that
In summary, the experiments show that using an enclosing is symmetric around a test sample in . Then, asymptotically
neighborhood is an effective alternative to using a fixed neigh- as , the expected number of neighbors in the enclosing
borhood size. In particular, enclosing k-NN minimum 15 (ridge) k-NN neighborhood is .
achieved the lowest average and 95th percentile error rates for The proof is given in Appendix C.
the most nonlinear printer (the laser printer), and was either For both the natural neighbors and enclosing k-NN, these an-
the best or a top performer throughout. Also, ridge regression alytic results model the training samples as symmetrically dis-
showed consistent performance gains over linear regression, es- tributed about the test point. This is a good model for the general
pecially with smaller neighborhoods. Importantly. the overall asymptotic case where the number of training samples ,
low error rates on the inkjet printers suggest that the locally because if the true distribution of the training samples is smooth
linear model fits sufficiently well on these printers, resulting in then the random sampling of training samples local to the test
less room for improvement over the baseline method. sample will appear as though drawn from a uniform distribution.
942 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 6, JUNE 2008
Fig. 3. Histograms show the frequency of each neighborhood size when esti-
mating the gridpoints of the 3-D LUT for the Ricoh Aficio 1232C.
Note that . Let denote the 2) Re-order the set of training samples for
identity matrix. Then the covariance matrix of the regression by distance from the test point so that is the th NN
coefficients is to .
3) Add to the set .
4) Define the indicator function , where if
lies in the same half-space as with respect to the hy-
perplane that passes through and is normal to the vector
connecting to , and otherwise.
5) Add to the set the training point nearest to in the
half-space, that is
Then if and only if and the row vectors To simplify, change variables to
are not all contained in some hemisphere [31]. Let indicate
the event that vectors lie on the same hemisphere, and will
denote the complement of . Wendel [32] showed that for
points chosen uniformly on a hypersphere in
(8)
Let be the event that: the first ordered points enclose the
origin, but the first ordered points do not enclose the origin.
The probability of the event is
(9) where the second to last line follows from (10). Expand the
recurrence
Because one or zero points cannot complete a convex hull
around the origin, and . Combining
(8) and (9), and using the recurrence relation of the binomial
coefficient
(10)
for all