You are on page 1of 13

THE ASTROPHYSICAL JOURNAL, 536 : 571È583, 2000 June 20

( 2000. The American Astronomical Society. All rights reserved. Printed in U.S.A.

BAYESIAN PHOTOMETRIC REDSHIFT ESTIMATION


NARCISO BENI TEZ
Astronomy Department, University of California at Berkeley, 601 Campbell Hall, Berkeley, CA 94720-5030 ; benitezn=mars.berkeley.edu
Received 1998 November 11 ; accepted 2000 January 26

ABSTRACT
Photometric redshifts are quickly becoming an essential tool of observational cosmology, although
their utilization is somewhat hindered by certain shortcomings of the existing methods, e.g., the unreli-
ability of maximum-likelihood techniques or the limited application range of the ““ training-set ÏÏ
approach. The application of Bayesian inference to the problem of photometric redshift estimation e†ec-
tively overcomes most of these problems. The use of prior probabilities and Bayesian marginalization
facilitates the inclusion of relevant knowledge, such as the expected shape of the redshift distributions
and the galaxy type fractions, which can be readily obtained from existing surveys but are often ignored
by other methods. If this previous information is lacking or insufficientÈfor instance, because of the
unprecedented depth of the observationsÈthe corresponding prior distributions can be calibrated using
even the data sample for which the photometric redshifts are being obtained. An important advantage of
Bayesian statistics is that the accuracy of the redshift estimation can be characterized in a way that has
no equivalents in other statistical approaches, enabling the selection of galaxy samples with extremely
reliable photometric redshifts. In this way, it is possible to determine the properties of individual galaxies
more accurately, and simultaneously estimate the statistical properties of a sample in an optimal fashion.
Moreover, the Bayesian formalism described here can be easily generalized to deal with a wide range of
problems that make use of photometric redshifts. There is excellent agreement between the B130 Hubble
Deep Field North (HDF-N) spectroscopic redshifts and the predictions of the method, with a rms error
of *z B 0.06(1 ] z ) up to z \ 6 and no outliers nor systematic biases. It should be remarked that
since these resultsspec
have not been reached following a training-set procedure, the above value of *z
should be a fair estimate of the expected accuracy for any similar sample. The method is further tested
by estimating redshifts in the HDF-N but restricting the color information to the UBV I Ðlters ; the
results are shown to be signiÐcantly more reliable than those obtained with maximum-likelihood tech-
niques.
Subject headings : galaxies : distances and redshifts È galaxies : photometry È methods : statistical

1. INTRODUCTION these authors, a signiÐcant fraction of redshift estimates pre-


sented large, ““ catastrophic ÏÏ errors (Ellis 1997). Moreover,
The advent of the new class of 10 m ground-based tele-
as will be shown below, using a Bayesian statistical
scopes is having a strong impact on the study of galaxy
approach it is possible to obtain fast, inexpensive andÈ
evolution. Instruments such as LRIS at the Keck telescopes
more importantÈhighly reliable photometric redshifts for
allow observers to regularly secure redshifts for dozens of
B90% of the I \ 27 HDF-N galaxies.
I B 24 galaxies in several hours of exposure. Technical
In spite of the e†orts of Thomas Loredo, who has written
advances in the instrumentation, combined with the pro-
stimulating reviews on the subject (Loredo 1990, 1992),
liferation of 10 m class telescopes, guarantees a vast increase
Bayesian inference is still far from becoming the standard
in the number of galaxies, bright and faint, for which spec-
approach in astrophysics, and is often used as just another
troscopic redshifts will be obtained in the near future. In
tool in the available panoply of statistical methods.
spite of the progress in the sheer numbers of available
However, as any reader of the fundamental treatise by
spectra, the I B 24 ““ barrier ÏÏ (for reasonably complete
Jaynes (2000)1 can learn, Bayesian probability theory rep-
samples) is likely to stand for awhile yet, since there are no
resents a uniÐed look to probability and statistics, which
foreseeable dramatic improvements in the telescope area or
intends not to complement, but rather to fully replace the
detection techniques. This means that most of the galaxies
traditional ““ frequentist ÏÏ statistical techniques (see also
detected in very deep exposures are in practice inaccessible
Bretthorst 1988, 1990 ; Sivia 1996 ; Gelman et al. 1998). The
to spectroscopical analysis. The best example is the Hubble
basic aim of this paper is to consider the problem of photo-
Deep Field North (HDF-N ; Williams et al. 1996) : after
metric redshift estimation from the point of view of Bayes-
several years of intensive e†orts by the astronomical com-
ian inference. Kodama, Bell, & Bower (1998) also developed
munity, the spectroscopical sample only comprises B20%
a Bayesian classiÐer for photometric redshifts. There are,
of the I \ 27 galaxies detected in that Ðeld. Very few areas
however, several di†erences between their approach and the
of the sky will receive such a telescope barrage in the near
one followed in this work, perhaps the most signiÐcant of
future, so this can almost be considered as the limit of what
which is the treatment of priors (see ° 3.3).
is currently achievable by spectroscopy. In contrast, sur-
The outline of the paper is as follows. Section 2 reviews
prisingly accurate photometric redshifts were quickly
the current methods of photometric redshift estimation,
obtained for most of the HDF-N galaxies (notably by
Sawicki, Lin, & Yee 1997 ; see also Lanzetta, Yahil, &
Fernandez-Soto 1996 ; Gwyn & Hartwick 1996), although 1 Jaynes (2000) is also available at : http ://bayes.wustl.edu/etj/
due to the maximum-likelihood methodology employed by prob.html.
571
572 BENIŠTEZ Vol. 536

pointing out their main sources of error. Section 3 describes not make a great di†erence. The gaps may be Ðlled by Ðnely
in detail how to apply Bayesian probability to photometric interpolating between the templates (Sawicki et al. 1997),
redshift estimation. Section 4 compares the performance of but this is not strictly necessary : usually, the statistical pro-
traditional statistical techniques, such as maximum likeli- cedure employed to search for the best redshift performs its
hood, with Bayesian photometric redshift (BPZ) estimation, own interpolation between templates. When the colors of a
by applying both methods to the HDF-N spectroscopic galaxy do not exactly coincide with one of the spectra, s2 or
sample and to a simulated catalog. Section 5 brieÑy sum- the maximum-likelihood method will assign the redshift
marizes the main conclusions of the paper. corresponding to the nearest template in the color space.
This is equivalent to the curves z \ z (C ) having
2. PHOTOMETRIC REDSHIFTS : TRAINING SET VERSUS T T T
““ inÑuence areas ÏÏ around them, conforming a sort of step-
SED-FITTING METHODS like surface that interpolates across the gaps and also
There are two basic approaches to photometric redshift extends beyond the region limited by them in the color
estimation ; using the terminology of Yee (1998), they may space. Therefore, the SED-Ðtting method comes with a
be called ““ spectral energy distribution (SED)ÈÐtting ÏÏ and built-in interpolation (and extrapolation) procedure, and
““ empirical training set ÏÏ methods. The Ðrst technique (Koo for this reason, the accuracy of the photometric is not dra-
1985 ; Lanzetta et al. 1996 ; Gwyn & Hartwick 1996 ; Pello et matically improved by Ðnely interpolating between sparse
al. 1996 ; Sawicki et al. 1997, etc.) begins by compiling a spectra (see ° 4).
library of template spectra, empirical or generated with The intrinsic similarity of the two photometric redshift
population synthesis techniques. These templates, after methods may explain their comparable performance, espe-
being redshifted and corrected for intergalactic extinction, cially at z [ 1 (Hogg et al. 1998). For magnitude ranges
are compared with the galaxy colors to determine the red- with a relatively simple color-redshift topology, the
shift z that best Ðts the observations. The training-set tech- training-set method should perform better, if only because it
nique (Brunner et al. 1997 ; Connolly et al. 1995 ; Wang, avoids the possible systematics due to mismatches between
Bahcall, & Turner 1998) starts with a multicolor galaxy the predicted template colors and the real ones, and also in
sample with apparent magnitudes m and colors C that part because it includes not only the colors of the galaxies
have been spectroscopically identiÐed.0Using this sample, a but also their magnitudes, which helps to break the color/
relationship of the kind z \ z(C, m ) is determined using a redshift degeneracies (see below). It should not be forgotten,
0
multiparametric Ðt. however, that despite its apparent precision (dz B 0.06 for
It should be said that these two methods are more similar the HDF-N ; Connolly et al. 1997), by its own nature there
than they are usually thought to be. To understand this, let is not a strong guarantee that such an accuracy will be
us look in more detail at how they work. For simplicity, we reached in future samples, even within the same magnitude
forget about the magnitude dependence and assume that and redshift ranges. Nevertheless, this could be a good
only two colors, C \ (C , C ), are enough to estimate the approach for large, low- to moderate-redshift surveys with
1 2 given a set of spectroscopic
photometric redshifts. Thus, abundant calibration spectroscopy, such as the Sloan
redshifts Mz N and colors MCN, the training-set method will Digital Sky Survey (Gunn & Weinberg 1995).
try to Ðt a spec
surface z \ z(C) to the data. This is based on a However, in most cases the training-set approach is
very strong assumption : that the surface z \ z(C) is a func- impractical or even unfeasible. A basic Ñaw of this method
tion deÐned on the color space, where each value of C corre- is the assumption of a single functional form for the color-
sponds to one and only one redshift. Visually, this means redshift relationship, since it is obvious that as one goes to
that the surface z \ z(C) does not ““ bend ÏÏ over itself in the higher redshifts and/or fainter magnitudes, the topology of
redshift direction. Although this functionality of the the color-redshift distribution z \ z(C, m ) displays several
redshift/color relationship cannot be taken for granted in 0 information is
nasty degeneracies, even if the near-IR
the general case, it seems to be a good approximation to the included. This problem can be somewhat overcome by
real picture at z \ 1 redshifts and bright magnitudes dividing the redshift/color space into several regions and
(Brunner et al. 1997). A certain scatter around this surface is performing piecemeal Ðttings within each of them (Wang et
allowed : galaxies with the same value of (C) may have al. 1998). Another problem is that because of its empirical
slightly di†erent redshifts, and it seems to be assumed and ad hoc basis, it can only be reliably extended as far as
implicitly that this scatter is the factor limiting the accuracy the spectroscopic redshift limit. This means that it cannot
of the method. be applied where it is more needed, to study faint galaxy
The SED-Ðtting method is based on the color/redshift samples beyond the reach of the spectrograph. Moreover, it
relationships generated by each of the library templates T , is not straightforward to transfer the calibration obtained
C \ C (z). A galaxy at the position C is assigned the red- with a given Ðlter set to a di†erent one. Such an extrapo-
T corresponding
T
shift to the closest point of any of the C lation can be done with the help of templates, but once the
T
curves in the color space. If these C functions are inverted, shape of these templates has been determined, it seems more
T
one ends up with the curves z \ z (C ), which in general convenient to switch to the more versatile SED-Ðtting
T present
T T self-crossings (and
are not functions, since they may method, especially in its Bayesian version, which will be
of course intersect each other as well). If we limit ourselves developed below.
to the region in the color/redshift space in which the Although the SED-Ðtting method is not a†ected by some
training-set method deÐnes the surface z \ z(C), for a realis- of these limitations, it also comes with its own set of prob-
tic template set, the curves z \ z (C ) would be embedded lems. Several authors have analyzed in detail the main
T T T its ““ skeleton ÏÏ and
in the surface z \ z(C), conforming sources of errors a†ecting this method (Sawicki et al. 1997 ;
deÐning its main features. Fernandez-Soto et al. 1999), which can be divided into two
The fact that the surface z \ z(C) is continuous, whereas broad classes : color/redshift degeneracies and template
the template-deÐned curves are sparsely distributed, does incompleteness.
No. 2, 2000 BAYESIAN PHOTOMETRIC REDSHIFT ESTIMATION 573

Figure 1 (left) shows V [I versus I[K for the morpho- components of the Ñux vectors (see also Connolly et al.
logical types employed in ° 4 and 0 \ z \ 5. The color/ 1995). Therefore, if the photometric errors are large, increas-
redshift degeneracies happen when the line corresponding ing the number of Ðlters cannot totally eliminate the color/
to a single template self-intersects or when two lines cross redshift degeneracies, which makes them almost
each other at a point corresponding to di†erent redshifts unavoidable in faint galaxy samples. The training-set
[these cases correspond to ““ bendings ÏÏ in the redshift/color method somewhat alleviates this problem by introducing an
relationship z \ z(C)]. It is obvious that the frequency of additional parameter in the estimation : the magnitude,
such crossings will rise with the extension of the considered which in some cases breaks the degeneracy. However, color/
redshift range and with the number of templates included. redshift degeneracies may also a†ect galaxies with the same
Moreover, the presence of color/redshift degeneracies is magnitude, and the training-set method does not even con-
also increased by random photometric errors, which can be template their possibility.
visualized as a blurring or thickening of the C (z ) relation- The SED-Ðtting method at least allows for the existence
T T
ship (Fig. 1, right) : each point of the curves in the left panel of this problem, although it is not very efficient in dealing
of Figure 1 is expanded into a square of size dC, the error in with it, especially with noisy data. Its choice of redshift is
the measured color. The Ðrst consequence of this is a exclusively based on the goodness of Ðt between the
““ continuous ÏÏ [dz B (LC/Lz)dC] increase in the rms of the observed colors and the templates. In cases such as the one
““ small-scale ÏÏ errors in the redshift estimation. Worse still, described above, where two or more redshift/morphological
the overlaps in the color-color space become more frequent, type combinations have nearly the same colors, the value of
with a corresponding rise in the number of ““ catastrophic ÏÏ the likelihood L would have two or more approximately
redshift errors. Multicolor information may often be equally high maxima at di†erent redshifts (see Fig. 2).
redundant, so increasing the number of Ðlters does not nec- Depending on the random photometric error, one
essarily break the degeneracies. For instance, by applying a maximum would prevail over the others, and a small
simple PCA analysis to the HDF-N photometric sample, it change in the Ñux could involve a catastrophic change in
can be shown that the information contained in the seven the estimated redshift (see Fig. 2). However, in many cases
UBV IJHK Ðlters for the HDF galaxies can be condensed there is additional information, discarded by the maximum
using only three parameters, the coefficients of the principal likelihood (ML) method, that could potentially help to

FIG. 1.ÈL eft : V [I vs. I[K for the templates used in ° 4 in the interval 1 \ z \ 5. The size of the Ðlled squares grows with redshift, from z \ 1 to z \ 5. If
these were the only colors used for the redshift estimation, every crossing of the lines would correspond to a color/redshift degeneracy. Right : The same
color-color relationships, ““ thickened ÏÏ by a 0.2 photometric error. The probability of color/redshift degeneracies increases greatly.
574 BENIŠTEZ Vol. 536

FIG. 2.ÈExample of the main probability distributions involved in BPZ for a galaxy at z \ 0.28 with an Irr spectral type and I B 26, to which random
photometric noise is added. From top to bottom : (a) : Likelihood functions p(C o z, T ) for the di†erent templates used in ° 4. Based on ML, the redshift chosen
for this galaxy would be z \ 2.685, and its spectral type would correspond to a spiral. (b) : Prior probabilities, p(z, T o m ), for each of the spectral types
ML
(see text). Note that the probability of Ðnding a spiral spectral type with z [ 2.5 and a magnitude I \ 26 is almost negligible. 0 (c) Probability distributions,
p(z, T o C, m ) P p(z, T o m )p(C o z, T ), that is, the likelihoods in the top plot multiplied by the priors. The high-redshift peak due to the spiral has disappeared,
although there0 is still a small
0 chance of the galaxy being at high redshift if it has a Irr spectrum, but the main concentration of probability is now at low
redshift. (d) Final Bayesian probability, p(z o C, m ) \ £ p(z, T o C, m ), which has its maximum at z \ 0.305. The shaded area corresponds to the value of
p , which estimates the reliability of z and yields0a valueT of B0.91. 0 b
*z b

solve such conundrums. For instance, it may be known number of color/redshift degeneracies. Bayesian inference is
from previous experience that one of the possible redshift/ much less a†ected by this problem, since it weights each
type combinations is much more likely than any other, template by its prior probability, and therefore templates
given the galaxy magnitude, angular size, shape, etc. In that corresponding to relatively uncommon types, such as, e.g.,
case, and since the likelihoods are not informative enough, AGNs, etc., can be included without unduly disturbing the
Bayesian probability states that the best option would be redshift estimation for normal galaxies.
the one more likely a priori. This is plain common sense, As explained above, the SED-Ðtting techniques perform
but it is not easy to implement using ML ; at best, one can their own ““ automatic ÏÏ interpolation and extrapolation, so
modify the redshift of the problematic objects by hand or once the main spectral types are included in the template
devise ad hoc solutions for each case. In contrast, Bayesian library, the results are not greatly a†ected if one Ðnely inter-
probability theory allows one to include this additional polates among the main spectra. The e†ects of using a
information in a rigorous and consistent way, e†ectively correct but incomplete set of spectra are shown in ° 4.
dealing with this kind of error (see ° 3). Both sources of errors described above are exacerbated
Although in some cases the spectrum of a galaxy has no for high-redshift galaxies, which are usually faint, and there-
close equivalents in the template library, it will be assigned fore have large photometric errors. Moreover, the color/
by ML the redshift corresponding to the nearest template in redshift space has a very extended range in z, and thus
the color/redshift space, no matter how distant it is from the degeneracies are more likely to appear ; in addition, the
observed color (and from the real redshift) in absolute template incompleteness is worse, since there are few or no
terms. The solution to this problem seems obvious : to empirical spectra with which to compare the template
include more templates in the library until all the possible library.
galaxy types are considered. However, since all the tem- The e†ectiveness of any photometric redshift method is
plates have equal status in ML, doing this increases the established by contrasting its output with a sample of gal-
No. 2, 2000 BAYESIAN PHOTOMETRIC REDSHIFT ESTIMATION 575

axies with spectroscopic redshifts. It should be kept in mind, To simplify our exposition, this e†ect will not be consider
however, that the results of this comparison may be mis- hereafter ; just p(z o m ) and its equivalents will be used.
0
leading, since the available spectroscopic samples are
3.1. Bayesian Marginalization
almost by deÐnition especially well suited to photometric
redshift estimation, being relatively bright (and thus with It may seem from equation (1) (and it is unfortunately
small photometric errors) and often Ðlling a privileged niche quite a common misconception) that the only di†erence
in the color-redshift space, far from degeneracies (e.g., between Bayesian and ML estimates is the introduction of a
Lyman-break galaxies). Thus, it is risky to extrapolate the prior ; in this case, p(z o m ). However, there is more to
0
Bayesian probability than priors.
accuracy reached by current methods as estimated from
spectroscopic samples (this also applies to BPZ) to fainter The galaxy under study may belong to di†erent morpho-
magnitudes. This is especially true for the training-set logical types, represented by a set of n templates. This set is
T
considered to be exhaustive, i.e., including all possible types,
methods, which deliberately minimize the di†erence
between the spectroscopic and photometric redshifts. and exclusive : the galaxy cannot belong to two types at the
same time. In that case, using Bayesian marginalization, the
3. BAYESIAN PHOTOMETRIC REDSHIFTS (BPZ) probability p(z o D) can be ““ expanded ÏÏ into a basis formed
by the hypothesis p(z, T o D) (the probability of the galaxy
Within the framework of Bayesian probability, the
redshift being z and the galaxy type being T ). The sum over
problem of photometric redshift estimation can be posed as
all these ““ atomic ÏÏ hypothesis will give the total probability,
Ðnding the probability p(z o D, I), i.e., the probability of a
p(z o D). That is,
galaxy having redshift z given the data D \ MC, m N, and the
0 relevant
prior information I, which includes any knowledge p(z o C, m ) \ ; p(z, T o C, m ) P ; p(z, T o m )p(C o z, T ) ,
to the hypothesis under consideration not already con- 0 0 0
T T
tained in the data D. Although some authors recommend (3)
that the term o I should not be dropped from the expres-
sions of probability, here the rule of simplifying the mathe- where we have applied BayesÏ theorem in the second step.
matical notation whenever there is no danger of confusion Here p(C o z, T ) is the probability of the data C given z and
will be followed, and from now on p(z) will stand for p(z o I), T (where it is assumed that it does not depend on the mag-
p(D o z) for p(D o z, I), etc. nitude m ). The prior p(z, T o m ) can be developed using the
0 0
As a trivial example, let us consider just one template in product rule,
our library. Applying BayesÏ theorem, p(z, T o m ) \ p(T o m )p(z o T , m ) , (4)
0 0 0
p(z o m )p(C o z) where p(T o m ) is the galaxy type fraction as a function of
p(z o C, m ) \ 0 P p(z o m )p(C o z) . (1) 0 p(z o T , m ) is the redshift distribution for
0 p(C) 0 magnitude, and
galaxies of a given spectral0type and magnitude.
Here the expression p(C o z) 4 L(z) is simply the redshift Equation (3) and Figure 2 clearly illustrate the main dif-
likelihood : the probability of observing the colors C if the ferences between the Bayesian and ML methods. ML would
galaxy has redshift z. The probability p(C) is a normal- just pick the highest maximum over all the likelihoods
ization constant, and usually there is no need to calculate it. p(C o z, T ) as the best redshift estimate, without looking at
The Ðrst factor, the prior probability, p(z o m ), is the red- the plausibility of the corresponding values of z or T . On
shift distribution for galaxies with magnitude m0 . This func- the contrary, Bayesian probability averages over all the
0 existence
tion allows us to include information such as the likelihoods after weighting them by their prior probabilities,
of upper or lower limits on the galaxy redshifts, the presence p(z, T o m ). In this way, the estimation is not a†ected by
of a cluster in the Ðeld, etc. The e†ect of the prior p(z o m ) on spurious 0likelihood peaks caused by noise (Fig. 2 ; see also
the estimation depends on how informative it is. 0It is the results of ° 4). Of course, in an ideal situation with
obvious that for a constant prior (all redshifts equally likely noiseless observationsÈand a nondegenerate color/redshift
a priori), the estimate obtained from equation (1) will spaceÈthe results obtained with ML and Bayesian infer-
exactly coincide with the ML result. This is also roughly ence would be the same.
true if the prior is ““ smooth ÏÏ enough and does not present It is straightforward to extend equation (3) to a spectral
signiÐcant structure. However, in other cases, values of the library that depends on a set of continuous parameters, e.g.,
redshifts that are considered very improbable from the prior synthetic templates that depend on the metallicity Z, the
information would be ““ discriminated ; ÏÏ i.e., they must Ðt dust content, the star formation history, etc., or a set of a
the data much better than any other redshift in order to be few empirical spectra that are expanded using the principal
selected. component analysis (PCA) technique (Sodre & Cuevas
Note that rigorously, one should write the prior in equa- 1997). In general, if the spectra are characterized by n pos-
tion (1) as sible parameters S \ Ms . . .s N (which may be physicalSchar-

p(z o m ) P
P dmü p(mü )p(m o mü )p(z o mü ) , (2)
1 nS
acteristics of the models or just PCA coefficients), the
probability of z given C and m can be expressed as
0 0 0 0 0 0
where mü is the ““ true ÏÏ value of the observed magnitude m ,
0 0 p(z o C, m ) \
P 0
dSp(z, S o C, m )
p(mü ) is proportional 0 0
P
to the number counts as a function of
0
the magnitude m , and p(m o mü ) P exp [(m [ mü )2/2p2 ],
0 of observing
i.e., the probability 0 0m if the true 0 magnitude
0 m0is P dSp(z, S o m )p(C o z, S) . (5)
0 0
mü . The above convolution accounts for the uncertainty in
the0 value of the magnitude m , which has the e†ect of slight- One situation in which the use of redshift priors most
0 redshift distribution p(z o m ).
ly ““ blurring ÏÏ and biasing the clearly reveals its advantages is the study of galaxy cluster
0
576 BENIŠTEZ Vol. 536

Ðelds, especially when near-IR photometry is not available. Instead of Ñuxes, it may be more convenient to work with
Since the 4000 Ó break in the spectra of intermediate- colors, normalizing the total Ñuxes in each band by the Ñux
redshift cluster members is difficult to distinguish from the in a ““ base ÏÏ Ðlter, e.g., the one corresponding to the band in
Lyman break of higher redshift galaxies, for many objects which the galaxy sample was selected and is considered to
the redshift likelihood will present two or more peaks, be complete. Here the colors, C \ Mc N, are deÐned as
i
making ML photometric redshift estimation unfeasible for c \ f /f (i \ 1, n ), where f is the base Ñux. The exact way
i i 0 c 0
a major fraction of the sample. Using a redshift prior in which the colors are deÐned is not relevant ; other com-
modeled as a smooth background component with a binations of Ðlters are equally valid. Introducing the follow-
““ spike ÏÏ at the cluster redshift strongly reduces the number ing deÐnitions,
of objects with undetermined redshifts even with limited
color information (N. Ben• tez & T. Broadhurst, in c2 cc c2
C \ ; i , C \ ; i Ti , C \ ; Ti , (13)
preparation). OO p2 OT p2 TT p2
i ci i ci i ci
3.2. T he Redshift L ikelihood where p \ p /f and c \ f /f , one has
ci fi 0 Ti Ti T0
The redshift likelihood was written above as p(C o z, T ), (p~2 ] C )2
assuming that it only depends on z and T . However, the s2(z, T , a ) \ p~2 ] C ] 0 OT , (14)
m 0 OO p~2 ] C
redshift likelihood usually employed by ML photometric 0 TT
redshift techniques also depends on a, the template normal- F \ a2(p~2 ] C ) , (15)
ization factor : TT 0 0 TT
where p \ p /f and a \ f /f . Equations (3) and (12)È
f0 below
(15) will 0be used
0 TO and
in0all tests
0 practical applications.
( f [ af )2
[log (L) ] const P s2(z, T , a) \ ; a Ta , (6)
p2 3.3. Prior Calibration
a fa
where M f N, with a \ 0, n , are the observed galaxy Ñuxes, In those cases where the a priori information is vague and
and f (z)a are the Ñuxes cof the set of n templates. The does not allow us to choose a deÐnite expression for the
Ta
expression for s2 in the previous equationT can be rewritten prior probability, Bayesian inference allows us to
as ““ calibrate ÏÏ the prior, if necessary using the very sample
F2 under consideration.
s2(z, T , a) \ F [ OT ] (a [ a )2F , (7) Let us suppose that the distribution p(z, T , m ) is para-
OO F m TT metrized using n continuous parameters j. These 0 may be
TT j
where the coefficients of a polynomial Ðt, a wavelet expansion, etc.
In that case, including j in equation (3), the probability can
F be written as
a \ OT
m F
TT
is the value of a that minimizes equations (6) and (7), and
(8)
p(z o C, m ) \
P dj ; p(z, T , j o C, m )
0 0

F
OO
f2
p2 TT
f2
\ ; a , F \ ; Ta , F \ ; a Ta . (9)
p2 OT
f f
p2
P
P T
dj p(j) ; p(z, T , o m , j)
0
a fa a fa a fa T
In Bayesian Probability, the nuisance parameter a should ] p(C o z, T ) , (16)
be introduced from the beginning in the full expression for
the redshift probability : where p(j) is the prior probability of j, and p(z, T o m , j) is
the prior probability of z, T and m as a function 0of the
p(z o C, m ) \
P da ; p(z, T , a o C, m ) parameters j. The latter has not been0 included in the likeli-
0 0 hood expression, since C is completely determined once the
T
P
P ; p(z, T o m ) da p(a o m )
values of z and T are known.
Now let us suppose that the galaxy belongs to a sample
0 0 containing n galaxies. Each jth galaxy has a base magni-
T g colors C . The sets C 4 MC N and m 4
] p(C o z, T , a) , (10) tude m and
Mm N, ( j \ 1, n ) contain jthe colors and magnitudes,
0j j 0
respec-
where we have assumed that z and T do not depend on a 0j g
tively, of all the galaxies in the sample. Then the probability
once m is known. It is obvious by comparison with equa- of the ith galaxy having redshift z , given the full sample
tion (3) 0that under these assumptions, data C and m , can be written as i
p(z o C, T ) P
P da p(a o m )p(C o z, T , a) . (11) p(z o C, m ) \
0
P
dj ; p(z , T , j o C , m , C@, m@ ) . (17)
0 i 0 i i 0i 0
T
In the absence of information about the shape of p(a o m ), a The sets C@ 4 MC N and m@ 4 Mm N, j \ 1, n , j D i are iden-
safe approach in this particular case is to assume a 0Ñat tical to C and mj , except0 for the
0j exclusiong of the data C
prior, p(a o m ) \ const. Integrating over a, the likelihood and m . Applying BayesÏ theorem, the product rule, andi
0
deÐned using0equations (6) and (7), we Ðnd 0i

p(C o z, T ) P F (z)~1@2 exp [


C s2(z, T , a )D
m , (12)
simplifying,

p(z o C, m ) P
P dj p(j o C@, m@ )
TT 2 i 0 0
i.e., the same expression that would be reached using ML, ] ; p(z , T o m , j)p(C o z , T ) , (18)
except for the normalization factor F~1@2(z). i 0i i
TT T
No. 2, 2000 BAYESIAN PHOTOMETRIC REDSHIFT ESTIMATION 577

where, as before, the likelihood of C only depends on z , T , These parameters can therefore be directly estimated from a
i i
and the probability of z and T only depends on C@ and m@ large enough multicolor sample using equation (19).
i 0
through j. The expression we obtain is very similar to equa- Another instance in which equation (19) can be used
tion (16), only now the shape of the prior is estimated from beyond its original purpose of aiding the redshift estimation
the data C@, m@ . This means that even if one starts with a is in the detection of galaxy clusters. Although the presence
0 of a high-z cluster may only produce a negligible e†ect in
very sketchy idea of the shape of the prior, the very galaxy
sample under study can be used to determine the value of the number counts normalization, it is usually signaled by a
the parameters j, and thus to provide a more accurate esti- conspicuous spike in the redshift distribution. The latter can
mate of the individual galaxy characteristics. Assuming that be parametrized as a combination of a smooth ““ Ðeld ÏÏ com-
the data C@ (as well as m@ ) are independent among them- ponent and a sharp Gaussian at the cluster redshift, z ,
0 c
selves, which, together with the number of galaxies in the cluster,
n , can be left as free parameters and determined using
p(j o C@, m@ ) P p(j)p(C@ o m@ , j) \ p(j) < p(C o m , j) , c
0 0 j 0j equation (19). This procedure, which has already been suc-
j,jEi cessfully applied to search for possible galaxy clusters in
(19) deep NICMOS/VLT observations (Ben• tez et al. 1999), will
where we have taken into account that the parameters j do be developed and tested in detail in a forthcoming paper
not depend on the set m@ [since they describe the redshift/ (N. Ben• tez, in preparation).
0
type prior probability distribution of a galaxy given its 4. A PRACTICAL TEST FOR BPZ
magnitude m , but do not contain any information about
0 The Hubble Deep Field North (HDF-N ; Williams et al.
the general number counts distribution, n(m )] :

p(C o m , j) \
P 0
dz ; p(z , T , C o m , j)
1996) has become the benchmark in the development of
photometric redshift techniques. In this section, BPZ will be
j 0j j j j j 0j applied to the HDF-N and its performance contrasted with

\
P Tj
dz ; p(z , T o m , j)p(C o z , T ) . (20)
the results obtained with the standard ““ frequentist ÏÏ (in the
Bayesian terminology) approach, the procedure most fre-
j j j 0j j j quently applied to the HDF-N (Gwyn & Hartwick 1996 ;
Tj
If the number of galaxies in our sample is large enough, it Lanzetta et al. 1996 ; Sawicki et al. 1997, etc.). The photo-
can be reasonably assumed that the prior probability metry used for the HDF-N is that of Fernandez-Soto, Lan-
p(j o C@, m@ ) will not change appreciably with the inclusion zetta, & Yahil (1999), which, in addition to magnitudes in
of the data0 C , m belonging to a single galaxy. In that case, the four Hubble Space T elescope (HST ) UBV I Ðlters, also
a time-savingi approximation
0i includes JHK magnitudes from the observations of
is to use as a prior the prob-
ability p(j o C, m ), calculated using the whole data set, M. Dickinson et al. (in preparation). Here I is chosen as
0 814 as described
instead of calculating p(j o C@, m@ ) for each galaxy. In addi- the base magnitude, m . The colors are deÐned
0 has kindly provided a recent com-
tion, it should be noted that 0p(j o C, m ) represents the in ° 3.2. Mark Dickinson
Bayesian estimate of the parameters that deÐne 0 the shape of pilation of (D130) HDF-N spectroscopic redshifts drawn
the redshift distribution. from Cohen et al. (1996), Steidel et al. (1996), Lowenthal et
al. (1997), Dickinson (1998), Weymann et al. (1998), Spinrad
3.3.1. Physical Priors : Galaxy Evolution and Redshift Clustering et al. (1998), Hogg et al. (1998), and Barger et al. (1999),
In the next section, a semiempirical parameterization j is together with some unpublished redshifts from the Steidel
chosen to describe the redshift prior, p(z, T o m ). This is and Spinrad groups. From this catalog, we excluded the
done for convenience, but nothing prevents the0 use of a stars and object 2-256.0, whose spectroscopic redshift is
more physical parameter set, particularly if we consider that problematic (M. Dickinson 1999, private communication).
the parameters j characterize the joint magnitude-redshiftÈ After a few tests with the HDF-N spectroscopic sub-
morphological type galaxy distribution, which contains sample, a template library similar to that of Sawicki et al.
important information about galaxy evolution and the fun- (1997) was chosen. It contains the four Coleman, Wu, &
damental cosmological parameters. If it is assumed that all Weedman (1980) templates (E/S0, Sbc, Scd, and Irr), plus
the galaxies in a sample can be classiÐed as belonging to a the spectra of two starbursting galaxies from Kinney et al.
few morphological types, the joint redshiftÈmagnitude-type (1996 ; Sawicki et al. 1997 used two very blue SEDs from
distribution would be GISSEL). All the spectra were extended to the UV using a
linear extrapolation and a cuto† at 912 Ó, and to the
dV (z) near-IR using GISSEL synthetic templates. The spectra are
n(z, m , T ) P / (m ) , (21) corrected for intergalactic absorption following Madau
0 dz T 0
(1995).
where V (z) is the comoving volume as a function of redshift, It might seem in principle that a synthetic template set
which depends on the cosmological parameters ) , " , and that takes galaxy evolution into account (at least
H , and / is the Schechter luminosity function0 for0 each tentatively) is more appropriate than a low-redshift empiri-
0 T
morphological type T , with the absolute magnitude M cal library extrapolated to very high redshifts. However,
0
replaced by the apparent magnitude m (a transformation simple tests (see also Yee 1998) show that the extended
0
that depends on the redshifts, cosmological parameters, and CWW set o†ers much better results than the GISSEL
morphological type). SchechterÏs function also depends on models (Bruzual & Charlot 1993). Since the synthetic
M*, a, and /*, and on the evolutionary parameters v, such models do not seem to work well even within the relatively
as the merging rate, the luminosity evolution, etc., and bright magnitude range corresponding to the HDF-N spec-
therefore, the prior probability of z, T, and m depends on troscopic sample, there are few reasons to suppose that their
the parameters j \ M) , " , H N, j \ MM*,0 /*aN, and v. performance will improve at fainter magnitudes.
C 0 0 0 *
578 BENIŠTEZ Vol. 536

A crucial point is to ensure that the template library the sagging of the CWW-only diagram disappears, and the
covers the main characteristics of all the spectral types general scatter of the diagram decreases by 25%.
present in the data. Figure 3 illustrates the e†ects of tem- The next step in the application of BPZ is choosing the
plate incompleteness on the redshift estimation. The top shape of the prior probabilities. Because of the depth of the
plot displays the results obtained using ML (° 3.2) redshift HDF-N, there is practically no reliable information about
estimation using only the four CWW templates (this plot is the expected redshift distribution. This is therefore a good
similar to the z -z diagram shown in Fernandez-Soto example of a situation in which the prior calibration pro-
phot spec
et al. 1999). In the bottom panel of Figure 3, the results cedure described in ° 3 should be applied. It will be assumed
shown also use ML (no BPZ yet), but include two more that the early types (E/S0) and spirals (Sbc, Scd) have a
templates, SB2 and SB3 from Kinney et al. (1996). It can be spectral type prior (eq. [4]) of the form
seen that the new templates have little a†ect in the low-
redshift range, but the changes at z [ 2 are quite dramatic ; p(T o m ) \ f e~kt(m0~20) , (22)
0 t
with t \ 1 for early types and t \ 2 for spirals. The fraction
of irregulars (the remaining three templates ; t \ 3) is auto-
matically set to complete the galaxy mix, so there is no need
to parametrize it. The spectral fractions at m \ 20 are
0
E/SO 35%, spirals 50%, and Irr 15%. The results are very
robust with respect to changes in these initial values, since
the free parameters k and k provide enough leeway to set
the correct fractions 1from the
2 data at fainter magnitudes.
The shape chosen for the redshift prior is

p(z o T , m ) P zat exp [


GC z DH
at
. (23)
0 z (m )
mt o
This tentatively reproduces the exponential cuto† at high
redshifts present in spectroscopical redshift surveys, and has
the advantage of a great Ñexibility : depending on the value
of the parameter a , it can roughly approximate almost any
t
reasonable unimodal redshift distribution, from very
narrow, concentrated ones for a ? 2 to practically Ñat ones
t
at a > 1. In this way, the prior distribution to be estimated
t
from the data is as little biased as possible by the functional
shape chosen in equation (23). The ““ median ÏÏ redshift, z , is
m
chosen to have a simple, linear dependence on magnitude,
z (m ) \ z ] k (m [ 20) . (24)
mt 0 0t mt 0
In total, there are 11 free parameters (j \ Ma , z , k , k N)
to be determined using the calibration procedure. t 0t mt t
The HDF-N data set used for the prior calibration pro-
cedure is formed by 737 galaxies with 20 \ I \ 27. For the
D130 objects with spectroscopic redshifts, the likelihood
p(C o z, T ) was built as a delta function located at the galaxy
redshift and type T . The calibration sample was beefed up
at bright magnitudes by including the CFRS catalog (Lilly
et al. 1995 ; Crampton et al. 1995), 591 galaxies with
20 \ I \ 22.5 that were spectrally classiÐed using their
V [ I colors. This ensures the presence of enough galaxies
at all magnitude ranges for a meaningful determination of
the parameters j.
Table 1 shows the best values, jü , of the parameters in
equations (23) and (24), found by maximizing the probabil-
ity in equation (19) using the subroutine POWELL, a
direction-set minimization method (Press et al. 1992). To
estimate the errors on these parameters, the region around
the maximum is approximated as a multidimensional
FIG. 3.ÈT op : Photometric redshifts obtained by applying our ML Gaussian,
algorithm to the HDF-N spectroscopic sample using a template library
that contains only the four CWW main types : E/SO, Sbc, Scd, and Irr. p(j o D) B p(jü o D) exp [[1 (j [ jü )I(j [ jü )] .
These results show characteristics similar to those of Fernandez-Soto et al. 2
(1999). Bottom : Shows the signiÐcant improvement (without using BPZ)
obtained by just including two of the Kinney et al. (1996) spectra of star-
The Fisher information matrix, I, is deÐned as
burst galaxies, SB2 and SB3, in the template set. The sagging or systematic
L2 ln [p(jü o D)]
o†set between 1.5 \ z \ 3.5 is eliminated, and the general scatter of the I\ ,
relationship decreases from *z/(1 ] z ) \ 0.09 to *z/(1 ] z ) \ 0.07.
spec spec
L2j
No. 2, 2000 BAYESIAN PHOTOMETRIC REDSHIFT ESTIMATION 579
TABLE 1
PARAMETERS OF THE PRIORS, p(z o T , m )
0
Spectral Type a z k k
t 0t mt t
E/S0 . . . . . . . . . . 2.46 ^ 0.22 0.431 ^ 0.030 0.091 ^ 0.017 0.147 ^ 0.013
Sbc, Scd . . . . . . 1.81 ^ 0.10 0.390 ^ 0.024 0.0636 ^ 0.0090 0.450 ^ 0.036
Irr . . . . . . . . . . . . 0.91 ^ 0.05 0.063 ^ 0.013 0.123 ^ 0.012 ...

and calculated by numerical di†erentiation. The 1 p errors shift probability is spread over a large range in redshift, and
of the parameters, which can be estimated by inverting I, therefore the prediction is likely to be unreliable. As demon-
are shown in Table 1. strated below, p efficiently picks out the galaxies with
*z
Figure 4 shows the full prior in redshift p(z o m ), found by catastrophic redshift errors, usually those with multimodal
summing over T : 0 or very diluted redshift likelihoods.
The photometric redshifts resulting from applying BPZ
p(z o m ) \ ; p(T o m )p(z o T , m ) . (25)
0 0 0 to the HDF-N spectroscopic sample are plotted in Figure 5.
T Some interpolation (three intermediate spectra between
Once the priors are determined, we can proceed to the each pair of our template library) has been performed,
redshift estimation using equation (16). The results for indi- which only has the e†ect of slightly (10%) reducing the
vidual galaxies are very robust to variations in the prior small-scale scatter. A Ðner interpolation did not appreciably
parameters within the errors shown in Table 1, and thus the improve the scatter and considerably increased the compu-
multiplication by the probability distribution p(j) and the tational burden. Only one galaxy is discarded after applying
integration over dj will be skipped ; the additional compu- the p \ 0.95, *z \ 0.2 ] (1 ] z) threshold, not by coin-
tational e†ort of performing an 11 dimensional integral is *z one of the two outliers appearing in Figure 3. The
cidence
not justiÐed. other outlier is assigned a correct redshift by BPZ, and in
There are several options for converting the continuous general it is evident from Figure 5 that the agreement is very
probability, p(z o C, m ), to a point estimate of the ““ best ÏÏ good at all redshifts. The residuals, *z \ z [ z , have
0
redshift, z . Here the mode
b
of the Ðnal probability is chosen, S*z T \ 0.01. The rms of the quantity b*z /(1b ] zspec ) is only
although taking the median value of z, corresponding to b and there are no appreciable systematicb e†ects.
b
0.059,
50% of the cumulative probability, or even the average, Comparing the bottom panel of Figure 3 with Figure 5, it
SzT 4 / dz zp(z o C, m ), could also be valid. may seem that, apart from the elimination of two outliers,
Since Bayesian theory 0 works with full probabilities, it there is little advantage in using BPZ rather than ML. This
o†ers a way to characterize the accuracy of the redshift is not surprising in the particular case of the HDF-N spec-
estimation not available to ML estimates. For instance, a 1 troscopic sample, formed of relatively bright galaxies, which
p error can be deÐned using an interval that contains 66%
of the integral of p(z o C, m ) around z , etc. The indicator of
0 b
redshift reliability chosen here is the quantity p , the prob-
*z redshift.
ability of o z [ z o \ *z, where z is the galaxy
b
When the value of p is low, we are warned that the red-
*z

FIG. 5.ÈBPZ photometric redshifts, plotted against the spectroscopical


ones for the HDF-N. Some interpolation (three intermediate spectra) is
performed between the main template spectra mentioned in the text, which
slightly reduces (by 10%) the small-scale scatter. One of two outliers in the
FIG. 4.ÈPrior in redshift, p(z o m ), estimated from the HDF-N data bottom panel of Fig. 3 is assigned a correct redshift by BPZ. The other is
using the prior calibration procedure0described in ° 4, for di†erent values of the only object discarded when a p \ 0.95 threshold is applied (see text).
the magnitude, m (I \ 21È28) *z line is *z /(1 ] z ) \ 0.059.
The rms scatter around the continuous
0 814 b b
580 BENIŠTEZ Vol. 536

often occupy privileged regions in the color space, and an adequate template set) to obtain reliable photometric
which consequently have sharp likelihood peaks, little redshift estimates. Note that the ML estimates shown in
a†ected by smooth prior probabilities. To better illustrate Figure 3 presented outliers, which shows that applying BPZ
the e†ectiveness of BPZ, especially when working under to UV-only data may yield results more reliable than those
worse than ideal conditions, the photometric redshifts for obtained with ML including near-IR information.
the spectroscopic sample are estimated again using ML and The advantages of BPZ can also be illustrated with a
BPZ, but restricting the color information to the UBV I simulated sample. These can be generated using the pro-
HST Ðlters. The results are plotted in Figure 6. The ML cedure described in Fernandez-Soto et al. (1999). Each
redshift diagram displays six catastrophic errors galaxy in the HDF-N is assigned a redshift and type using
(*z Z 1). Note that these are the same kind of errors BPZ, and a mock catalog is created containing the colors
pointed out by Ellis (1997) in the Ðrst HDF-N photometric that correspond to the best-Ðtting redshifts and templates.
redshift estimations. BPZ assigns correct redshifts to four of To simulate the photometric errors, a random photometric
these outliers, and setting a p [ 0.95 threshold (which dis- noise of the same amplitude as the real photometric error is
cards a total of three galaxies)*zeliminates the other two. This added to each object. The bottom panel of Figure 7 shows
is a clear example of the capability of BPZ (combined with the ML estimated redshifts for the mock catalog (I \ 28)
against the ““ true ÏÏ redshifts ; although in general the agree-

FIG. 6.ÈT op : Results of applying ML to the HDF-N spectroscopic


sample using only the four UBVI HST bands. Bottom : E†ects of applying
BPZ to the same sample. Four of the outliers are assigned correct redshifts, FIG. 7.ÈT op : BPZ photometric redshifts, z , for the I \ 28 HDF-N
b z (see text). A threshold
and setting a p [ 0.95 threshold (which discards three galaxies) elimi- mock catalog, plotted against the ““ true ÏÏ redshifts,
*z Compare with Fig. 3 (bottom), which also includes the
nates the other two. of p [ 0.95 has been applied, which eliminatest 20% of the objects.
*z : Results obtained by applying ML to the same mock sample. The
near-IR photometry of Dickinson et al. (1998), in preparation. Even with Bottom
fewer Ðlters, BPZ is more reliable than ML. fraction of outliers is 7%.
No. 2, 2000 BAYESIAN PHOTOMETRIC REDSHIFT ESTIMATION 581

redshifts), but reduces the number of outliers to a remark-


able 1%.
It is not clear how to deÐne a reliability estimator analo-
gous to p [ 0.95 within the ML framework. The obvious
*z
choice, s2, is practically useless as a criterion to pick out the
outliers. Although one would naively expect that low values
of s2 (and therefore better Ðts) would correspond to more
reliable redshifts, the bottom panel of Figure 8 shows
almost the opposite to be true. This Ðgure plots the value of
s2 versus the ML redshift error for the mock catalog, with
the shaded region representing the upper quartile (25%) in
s2 ; most of the outliers are above it, at lower s2. This is not
difficult to understand : these outliers are caused by color/
redshift degeneracies (Fig. 1), which may produce excellent
Ðts to the colors C, but catastrophically wrong redshifts. In
stark contrast, the top panel of Figure 8 plots the errors in
the BPZ redshifts versus the values of p . The shaded
*z
region representing the lower quartile contains practically
all the outliers. Thus, by setting an appropriate threshold,
one can virtually eliminate the catastrophic errors.
Figure 9 shows the numbers of galaxies above a given p
*z
threshold in the HDF-N as a function of magnitude and
redshifts. It shows how the reliability of photometric red-
shifts quickly decreases for faint I Z 27 objects as the frac-
tion of objects with possible catastrophic errors steadily
grows with magnitude.
There is one caveat regarding the use of p or similar
*z a safety
quantities as a reliability estimator. They provide
check against the color/redshift degeneracies, since basically
they tell us whether there are other probability peaks com-
parable to the highest one, but they do not insure against
template incompleteness. If the template library does not
contain any spectra similar to the one corresponding to the
galaxy, there is no statistical indicator that can warn us
about the unreliability of the prediction.
Finally, Figure 10 shows the redshift distributions for the
HDF-N galaxies with I \ 27. No objects have been
removed on the basis of p , so the values of the histogram
bins should be taken with*zcaution. The overplotted contin-
uous curves are the distributions used as priors (see text).
The results obtained from the HDF-N and HDF-S will be
FIG. 8.ÈT op : Probability, p , plotted against the absolute value of the analyzed in more detail using a revised photometry else-
*z
di†erence between the ““ true ÏÏ redshift (z ) and the one estimated using BPZ
(z ) for the mock sample described in ° t4. The higher the value of p , the
where.
b
more reliable the redshift should be. The shaded region shows the *z low
quartile in the value of p . Most of the outliers are at low values of p ,
5. CONCLUSIONS
*z them by setting a suitable threshold of p (see *z
which allows us to eliminate Although spectroscopical techniques have experienced
text and Fig. 7, bottom). Bottom : Plot showing that it is not possible*zto do remarkable progress in the last few years, photometric red-
something similar using ML redshifts and s2 as an estimator. The value of
s2 of the best ML Ðt is plotted against the error in the ML redshift shifts are becoming increasingly important in many areas of
estimation, o z [ z o . The shaded region shows the high quartile in the observational cosmology. This is not surprising, as the
t would
ML
values of s2. One expect that low values of s2 (and therefore better example of the HDF-N (Williams et al. 1996) shows : after
Ðts) would correspond to more reliable redshifts, but this obviously is not several years, the best e†orts of the astronomical com-
the case. This is not surprising : the outliers in this Ðgure are all due to
color/redshifts degeneracies, like the one displayed in Fig. 1, which may
munity have only provided spectroscopic redshifts for
give an extremely good Ðt to the colors, C, but a totally wrong redshift. B20% of the I \ 27 galaxy sample. In contrast, reasonably
accurate photometric redshifts for most of its galaxies were
obtained within a few months by several groups, mostly
ment is not bad (as could be expected), there is a consider- using the SED-Ðtting method with a ML statistical
able fraction of outliers (7%), whose positions illustrate the approach. However, the redshifts estimated following this
main source of color/redshift degeneracies : high-z galaxies procedure present a considerable number (D10%) of
that are erroneously assigned z [ 1 redshifts and vice versa. catastrophic errors (*z Z 1), which excludes their use for
This shortcoming of the ML method is analyzed in detail in many practical applications (Ellis 1997). The other
Fernandez-Soto et al. (1999). In contrast, the top panel of approach to photometric redshift estimation, the training-
Figure 7 shows the results of applying BPZ with a threshold set method, is apparently more reliable, but since it cannot
of p [ 0.95, which eliminates 20% of the initial sample be extended beyond the spectroscopical limit, it is of limited
*z one-third of which have catastrophically wrong
(almost use for very faint samples, such as the HDF-N.
582 BENIŠTEZ Vol. 536

FIG. 9.ÈL eft : Histograms showing the number of galaxies over p thresholds of 0.95 and 0.99 as a function of magnitude. It can be seen that the number
of galaxies with reliable photometric redshifts quickly decreases with *z
magnitude. Right : Same as left panel, but as a function of redshift.

The application of Bayesian inference to photometric


redshift estimation e†ectively overcomes most of the draw-
backs of the ML and training-set methods. The use of prior
probabilities and Bayesian marginalization facilitates the
inclusion of relevant knowledge, such as the expected shape
of the redshift distributions and the galaxy type fractions,
which can be readily obtained from existing surveys but are
often ignored by other methods. If this previous informa-
tion is lacking or insufficient (for instance, because of the
unprecedented depth of the observations), the correspond-
ing prior distributions can be calibrated using even the data
sample for which the photometric redshifts are being
obtained. An important advantage of Bayesian statistics is
that the accuracy of the redshift estimation can be charac-
terized in a way that has no equivalents in other statistical
approaches, enabling the selection of galaxy samples with
extremely reliable photometric redshifts. In this way, it is
possible to determine more accurately the properties of
individual galaxies and simultaneously estimate the sta-
tistical properties of a sample in an optimal fashion. More-
over, the Bayesian formalism described here can be easily
generalized to deal with a wide range of problems that make
FIG. 10.ÈThe z redshift distributions for the I \ 27 HDF-N galaxies,
divided by spectralb types. Solid lines represent the corresponding p(z, T )
use of photometric redshifts.
distributions estimated with the HDF-N and CFRS samples using the There is an excellent agreement between the B130
prior calibration method described in the text. HDF-N spectroscopic redshifts and the predictions of the
No. 2, 2000 BAYESIAN PHOTOMETRIC REDSHIFT ESTIMATION 583

method, with a rms error *z B 0.06(1 ] z ) up to z \ 6 valuable comments, Alberto Fernandez-Soto and collabo-
spec
and no outliers nor systematic biases. It should be remarked rators for kindly providing the HDF-N photometry and
that these results have not been reached following a Ðlter transmissions, Mark Dickinson for making available
training-set procedure ; since the template library is empiri- his HDF-N spectroscopic redshift compilation, and Brenda
cal, the above value of *z should be a fair estimate of the Frye for help with the intergalactic absorption correction.
expected accuracy for any similar sample. The method is The referee, Prasenjit Saha, made valuable comments which
further tested by estimating redshifts in the HDF-N but helped to improve the paper, in particular the suggestion to
restricting the color information to the UBV I Ðlters ; the use a Ñat prior in a for the redshift likelihood. The author
results are shown to be signiÐcantly more reliable than acknowledges a Basque Government postdoctoral fellow-
those obtained with maximum-likelihood techniques. ship and Ðnancial support from the NASA grant LTSA
NAG-3280 and the Advanced Camera for Surveys project,
I would like to thank Tom Broadhurst and Rychard NASA contract NAS5-32865.
Bouwens for carefully reading the manuscript and making

REFERENCES
Barger, A. J., Cowie, L. L., Trentham, N., Fulton, E., Hu, E. M., Songaila, Kinney, A. L., Calzetti, D., Bohlin, R. C., McQuade, K., Storchi-Bergmann,
A., & Hall, D. 1999, AJ, 117, 102 T., & Schmitt, H. R. 1996, ApJ, 467, 38
Ben• tez, N., Broadhurst, T., Bouwens, R., Silk, J., & Rosati, P. 1999, ApJ, Kodama, T., Bell, E. F., & Bower, R. G. 1998, MNRAS, 302, 152
515, L65 Koo, D. C. 1985, AJ, 90, 418
Bretthorst, L. 1988, Bayesian Spectrum Analysis and Parameter Estima- Lanzetta, K. M., Yahil, A., & Fernandez-Soto, A. 1996, Nature, 381, 759
tion (Berlin : Springer) Lilly, S. J., Le Fevre, O., Crampton, D., Hammer, F., & Tresse, L. 1995,
ÈÈÈ. 1990, J. Magnetic Resonance, 88, 552 ApJ, 455, 50
Brunner, R. J., Connolly, A. J., Szalay, A. S., & Bershady, M. A. 1997, ApJ, Loredo, T. 1990, in Maximum Entropy and Bayesian Methods, ed.
482, L21 P. Fougere (Dordrecht : Kluwer), 81
Bruzual, A., G., & Charlot, S. 1993, ApJ, 405, 538 ÈÈÈ. 1992, in Statistical Challenges in Modern Astronomy, ed.
Cohen, J. G., Cowie, L. L., Hogg, D. W., Songaila, A., Blandford, R., Hu, E. D. Feigelson & G. J. Babu (New York : Springer), 275
E. M., & Shopbell, P. 1996, ApJ, 471, L5 Lowenthal, J. D., et al. 1997, ApJ, 481, 673
Coleman, G. D., Wu, C.-C., & Weedman, D. W. 1980, ApJS, 43, 393 Madau, P. 1995, ApJ, 441, 18
Connolly, A. J., Csabai, I., Szalay, A. S., Koo, D. C., Kron, R. G., & Munn, Pello, R., Miralles, J. M., Le Borgne, J.-F., Picat, J.-P., Soucail, G., &
J. A. 1995, AJ, 110, 2655 Bruzual, G. 1996, A&A, 314, 73
Connolly, A. J., Szalay, A. S., Dickinson, M., Subbarao, M. U., & Brunner, Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992,
R. J. 1997, ApJ, 486, L11 Numerical recipes in FORTRAN : The art of ScientiÐc Computing
Crampton, D., Le Fevre, O., Lilly, S. J., & Hammer, F. 1995, ApJ, 455, 96 (Cambridge : Cambridge Univ. Press)
Dickinson, M. 1998, in The Hubble Deep Field, ed. M. Livio, S. M. Fall, & Sawicki, M. J., Lin, H., & Yee, H. K. C. 1997, AJ, 113, 1
P. Madau, (Cambridge : Cambridge Univ. Press), 219 Sivia, D. S. 1996, Data Analysis : A Bayesian Tutorial (Oxford : Oxford
Ellis, R. S. 1997, ARA&A, 35, 389 Univ. Press)
Fernandez-Soto, A., Lanzetta, K. M., & Yahil, A. 1999, ApJ, 513, 34 Sodre, L., & Cuevas, H. 1997, MNRAS, 287, 137
Gelman, A., Carlin, J. B., Stern, H., & Rubin, D. 1998, Bayesian Data Spinrad, H., Stern, D., Bunker, A., Dey, A., Lanzetta, K. M., Yahil, A.,
Analysis, (London : Chapman & Hall) Pascarelle, S., & Fernandez-Soto, A. 1998, AJ, 116, 2617
Gunn, J. E., & Weinberg, D. H. 1995, in Wide-Field Spectroscopy and the Steidel, C. C., Giavalisco, M., Dickinson, M., & Adelberger, K. L. 1996, AJ,
Distant Universe, ed. S. J. Maddox & A. Aragon-Salamanca (3d ed. ; 112, 352
Singapore : World ScientiÐc) Wang, Y., Bahcall, N., & Turner, E. L. 1998, AJ, 116, 2081
Gwyn, S. D. J., & Hartwick, F. D. A. 1996, ApJ, 468, L77 Weymann, R. J., Stern, D., Bunker, A., Spinrad, H., Cha†ee, F. H., Thomp-
Hogg, D. W., et al. 1998, AJ, 115, 1418 son, R. I., & Storrie-Lombardi, L. J. 1998, ApJ, 505, L95
Jaynes, E. T. 2000, Probability Theory : The Logic of Science (Cambridge : Williams, R. E., et al. 1996, AJ, 112, 1335
Cambridge Univ. Press), in press Yee, H. K. C. 1998, preprint (astro-ph/9809347)

You might also like