You are on page 1of 4

Tikhonov regularization with oversmoothing penalty for

linear statistical inverse learning problems


Abhishake Rastogi

Institute of Mathematics, University of Potsdam


Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany

Email address: abhishake@uni-potsdam.de

Abstract. In this paper, we consider the linear ill-posed inverse problem with noisy data in the statistical learning setting. The
Tikhonov regularization scheme in Hilbert scales is considered in the reproducing kernel Hilbert space framework to reconstruct
the estimator from the random noisy data. We discuss the rates of convergence for the regularized solution under the prior as-
sumptions and link condition. For regression functions with smoothness given in terms of source conditions the error bound can
explicitly be established.
Keywords: Statistical inverse problem; Tikhonov regularization; Hilbert Scales; Reproducing kernel Hilbert space; Minimax con-
vergence rates.
2010 Mathematics Subject Classification: Primary: 62G20; Secondary: 62G08, 65J15, 65J20, 65J22.

Introduction
We consider the linear ill-posed operator equation of the form A(f ) = g with a linear forward operator A : H →
H0 between the infinite-dimensional Hilbert spaces H and H0 . Moreover, H0 is the space of functions g : X → Y for a
Polish space X (the input space) and Y a subset of real numbers R (the output space). Ill-posed inverse problems have
important applications in the field of science and technology (see, e.g., [4]). Here we consider the problem in statistical
learning setting in which we observe the random noisy image yi at the points xi . The problem can be described as
follows:
yi = g(xi ) + εi , g = A(f ) (1)
where εi is the random observational noise for 1 ≤ i ≤ m and m is called the sample size.
Suppose the random observations are drawn identically and independently according to the joint probability
measure ρ on the sample space Z = X × Y and the probability measure ρ can be splitting as ρ(x, y) = ρ(y|x)ρX (x),
where ρ(y|x) is the conditional probability distribution of y given x and ρX (x) is the marginal probability distribution.
The goodness of the estimator f can be measured through the expected risk:
Z
2
Eρ (f ) = kA(f )(x) − ykY dρ(x, y). (2)
Z

Since the probability measure ρ is unknown, therefore we consider the Tikhonov regularization in Hilbert scales
which consists of the error term measuring the fitness of data and oversmoothing penalty. We introduce an unbounded,
linear, self-adjoint, strictly positive operator L : D(L) ⊂ H → H with a dense domain of definition D(L) ⊂ H to
treat an oversmoothing penalty in terms of a Hilbert scale.
We define Tikhonov regularization scheme:
( m
)
1 X 2 2
argmin kA(f )(xi ) − yi kY + λkLf kH . (3)
f ∈D(L) m i=1

Here λ is a positive regularization parameter which controls the trade-off between the error term and the com-
plexity of the solution. In many practical problems, the operator L which influences the properties of the regularized
approximation is chosen to be a differential operator in some appropriate function spaces, e.g., L 2 -spaces.

Third International Conference of Mathematical Sciences (ICMS 2019)


AIP Conf. Proc. 2183, 110004-1–110004-4; https://doi.org/10.1063/1.5136221
Published by AIP Publishing. 978-0-7354-1930-8/$30.00

110004-1
Regularization schemes in Hilbert scales are widely considered in classical inverse problems (with deterministic
noise) [3, 6, 7]. Blanchard et al. [2] considered the linear inverse problem in the statistical learning setting and provided
the rates of convergence for the Hölder’s type source condition. Here we consider the Tikhonov regularization scheme
in Hilbert scales for the statistical inverse problem (with random noise). The paper is organized as follows. In the first
section, we discuss the basic definition and assumptions required in our analysis. In the second section, we discuss
the bounds of the reconstruction error in the learning setting.

Notation and Assumptions


We start with the concept of the reproducing kernel Hilbert spaces. It is a subspace of L 2 (X, ρX ; Y ) (the space of
square-integrable functions from X to Y with respect to the probability distribution ρX ) which can be characterized
by a symmetric, positive semidefinite kernel K : X × X → R and each of its function satisfies the reproducing
property [1]. We can construct a unique real-valued reproducing kernel Hilbert space (H, h·, ·iH ) of functions from
X to R as follows:
(i) We define the linear function Kx : X → R : t 7→ Kx (t) = K(x, t).
(ii) The span of set {Kx : x ∈ X} is dense in H with respect to the inner product hKt , Kx iH = K(t, x).
(iii) Reproducing property: f (x) = hf, Kx iH , x ∈ X, ∀ f ∈ H, in other words f (x) = Kx∗ f .
The space H0 is assumed to be a reproducing kernel Hilbert space of functions f : X → R corresponding to
kernel K : X × X → Y , the kernel is measurable and bounded, i.e., κ02 := supx∈X K(x, x) < ∞.
Now we introduce some relevant operators used in the convergence analysis. We introduce the notations
for the discrete ordered sets x = (x1 , . . . , xm ), y = (y1 , . . . , ym ), z = (z1 , . . . , zm ). The product Hilbert
Pm 2
space Y m is equipped with the inner product hy, y0 im = m 1 0
i=1 hyi , yi iY , and the corresponding norm kykm =
Pm 2
m
1
i=1 kyi kY . We define the sampling P operator Sx : H0 → Y m : g 7→ (g(x1 ), . . . , g(xm )), then the ad-
∗ 0 ∗ 1 m
m
joint Sx : Y → H is given by Sx c = m i=1 Kxi ci , ∀c = (c1 , . . . , cm ) ∈ Y m .
Let Iρ denotes the canonical injection map H0 → L 2 (X, ρX ; Y ). Then we observe that both the operators Sx
and Iρ are bounded by κ0 .
We denote the empirical version operators Bx = Sx AL−1 : H → Rm , Tx = Bx∗ Bx : H → H and the
population version Bρ := Iρ AL−1 : H → L 2 (X, ρX ; Y ), Tρ := Bρ∗ Bρ : H → H and Lρ = Iρ∗ Iρ : H0 → H0 ,
the corresponding covariance operator. The operator Iρ , Tρ , Tx are positive, self-adjoint and depends on both
the
kernel and the marginal probability measure. The operator Bx and Bρ are bounded by κ := κ0 AL−1 H→H0 ,
i.e., kBx kH→Y m ≤ κ and kBρ kH→L 2 (X,ρX ;Y ) ≤ κ.
By the spectral theory, the operator Ls : D(Ls ) → H is well-defined for s ∈ R, and the spaces Hs := D(Ls ), s ≥
0 equipped with the inner product hf, giHs = hLs f, Ls giH , f, g ∈ Hs are Hilbert spaces. For s < 0, the spaces Hs
1/2
is defined as completion of H under the norm kxks := hx, xis . The space (Hs ) s ∈ R is called the Hilbert scale
induced by L.
By using the above notations, the Tikhonov regularization scheme can be re-expressed as follows:
n o
2
fz,λ = argmin kSx A(f ) − ykm + λkLf k2H
f ∈D(L)

and its minimizer is given by


fz,λ = L−1 (Tx + λI)−1 Bx∗ y. (4)
+ +
Definition 1 (Index function). A function φ : R → R is said to be an index function if it is continuous and strictly
nondecreasing with φ(0) = 0.
We consider that random observations {(xi , yi )}m
i=1 follow the model y = A(f )(x)+ε with the centered noise ε.
We assume throughout the paper that the operator A is injective.
Assumption 1 (The true solution). The conditional expectation w.r.t. ρ of y given x exists (a.s.), and there exists fρ ∈
H such that Z
Ey [y|x] = ydρ(y|x) = A(fρ )(x), for all x ∈ X.
Y

The element fρ is the true solution which we aim at estimating.

110004-2
Assumption 2 (Noise condition). There exist some constants M, Σ such that for almost all x ∈ X,
Σ2
Z  
ky − A(fρ )(x)kY
eky−A(fρ )(x)kY /M − − 1 dρ(y|x) ≤ .
Y M 2M 2
This Assumption is usually referred to as a Bernstein-type assumption. The smoothness of the true solution is
measured in terms of the bounded linear injection and self-adjoint operator L−1 . The Hölder’s source condition is a
special case of the considered condition.
Assumption 3 (General source condition). The true solution fρ belongs to the class Ω(φ, R† ) with
Ω(φ, R† ) := f ∈ H : f = L−1 φ(L−1 )v and kvkH ≤ R† .


The polynomial function ϕ(t) = tr and logarithm function ϕ(t) = tp log−ν 1t are the example of the index

functions.
We consider the following link condition which is an interplay between the operator L−1 measuring the smooth-
ness of the source condition and the operator A.
Assumption 4 (Link condition). There exist an index function ψ and the constants α, β > 0 such that for all u ∈ H,
αkψ(L−1 )ukH ≤ kIρ AukL 2 (X,ρX ;Y ) .
Then, letting u := L−1 v we find that under Assumption 4, and with function Ψ(t) := αtψ(t), that

kΨ(L−1 )vkH ≤ kBρ vkL 2 (X,ρX ;Y ) = Tρ1/2 v , v ∈ D(Bρ ). (5)

H
Now we introduce the concept of the effective dimension which plays an important ingredient to derive the rates
of convergence rates. The effective dimension is defined as,
N (λ) := T r (Tρ + λI)−1 Tρ , for λ > 0.


Assumption 5 (Polynomial decay condition). Assume that there exists some positive constant c > 0 such that
N (λ) ≤ cλ−b , for b < 1

Convergence analysis
Blanchard et al. [2] discussed the rates of convergence for Tikhonov regularization schemes of the statistical
inverse problem under Hölder’s source condition in learning theory framework. Mathé et al [5] discussed the error
bounds for the regularization schemes in Hilbert scales in classical inverse problem setting. Here we study the con-
vergence issues for Tikhonov regularization schemes in Hilbert scale based on the prior assumptions and the link
condition.
For fixed η we can choose to appropriate the regularization parameter λ and sample size m such that the following
holds true:
64κ2 N (λ) log2 (4/η) ≤ mλ and λ ≤ kTρ kL(H0 ) . (6)
Theorem 2. Let z be i.i.d. samples drawn according to the probability measure
√ ρ. Assumption 1–4, √ the condition (6)
t
hold true and the function t → φ(t) is nondecreasing. Then for %(t) = Ψ−1 ( t), ϕ(t) = φ(Ψ−1 ( t)), Ψ(t) := tψ(t)
and for all 0 < η < 1, the following upper bound holds for the regularized solution fz,λ (4) with confidence 1 − η:
" ( r )#
Σ2 N (λ)
 
κM 4
kfρ − fz,λ kH ≤ C %(λ) Rϕ(λ) + + log ,
mλ mλ η
 
where C depends on B, D, cg , κ and under the apriori choice of the regularization parameter λ∗ = Φ−1 N ,ϕ
√1
m
1

for ΦN ,ϕ (λ) = ϕ(λ)λ


√ 2 , the following upper bound holds with the confidence 1 − η:
N (λ)
 
0 ∗ ∗ 4
||fz,λ − fρ ||H ≤ C % (λ ) ϕ (λ ) log ,
η
where C 0 depends on B, D, cg , κ, Σ, M , R.

110004-2
 
Here we observe that under the apriori choice of the regularization parameter λ∗ = Φ−1
N ,ϕ
√1
m
for ΦN ,ϕ (λ) =
1  
ϕ(λ)λ 2
√ the condition (6) reduces to the following: 8κϕ(λ) log η4 ≤ 1 and λ ≤ kTρ kL(H0 ) which holds true for
N (λ)
sufficiently small λ given the confidence parameter η.
p
In particular, we get the representation Φ(t) = t 2(a+1) and the following bounds for the link condition
with ψ(t) = ta and for the Hölder’s source condition fρ ∈ Ω(φ, R̄), φ(t) = tp and for some R̄ > 0:
Corollary 3. Under the same assumptions of Theorem 2 and Assumption 5 on effective dimension N (λ) with the
1
apriori choice of the regularization parameter λ∗ = m− 2r+b+1 and r = 2(1+a)
p
, p ≥ 1, for all 0 < η < 1, the
following error estimates hold with confidence 1 − η:
  
r 4
||fz,λ − fρ ||H = O m− 2r+b+1 log

.
η

Conclusion
In our analysis, we derive the upper convergence rates over the wide class of probability measures considering
general source condition in the real-valued setting. The lower convergence rates coincide with the upper convergence
rates for the optimal parameter choice based on smoothness parameters b, φ.

Acknowledgment
This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1294
“Data Assimilation”, Project (A04) “Non-linear statistical inverse problems with random observations”. The author is
grateful to P. Mathé and G. Blanchard for useful discussions and suggestions.

REFERENCES
[1] Nachman Aronszajn. Theory of reproducing kernels. Trans. Am. Math. Soc., 68:337–404, 1950.
[2] Gilles Blanchard and Nicole Mücke. Optimal rates for regularization of statistical inverse learning problems.
Found. Comput. Math., 18(4):971–1013, 2018.
[3] Albrecht Böttcher, Bernd Hofmann, Ulrich Tautenhahn, and Masahiro Yamamoto. Convergence rates for
Tikhonov regularization from different kinds of smoothness conditions. Appl. Anal., 85(5):555–578, 2006.
[4] Heinz W. Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems, volume 375. Math.
Appl., Kluwer Academic Publishers Group, Dordrecht, The Netherlands, 1996.
[5] Peter Mathé and Ulrich Tautenhahn. Interpolation in variable Hilbert scales with application to inverse prob-
lems. Inverse Probl., 22(6):2271–2297, 2006.
[6] Peter Mathé and Ulrich Tautenhahn. Error bounds for regularization methods in Hilbert scales by using
operator monotonicity. Far East J. Math. Sci., 24(1):1, 2007.
[7] M. Thamban Nair, Sergei V. Pereverzev, and Ulrich Tautenhahn. Regularization in Hilbert scales under general
smoothing conditions. Inverse Probl., 21(6):1851–1869, 2005.

110004-4

You might also like