You are on page 1of 17

International Journal of Computer Vision

https://doi.org/10.1007/s11263-022-01660-2

Guided Hyperspectral Image Denoising with Realistic Data


Tao Zhang1 · Ying Fu1,2 · Jun Zhang2

Received: 31 January 2022 / Accepted: 21 July 2022


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

Abstract
The hyperspectral image (HSI) denoising has been widely utilized to improve HSI qualities. Recently, learning-based HSI
denoising methods have shown their effectiveness, but most of them are based on synthetic dataset and lack the generalization
capability on real testing HSI. Moreover, there is still no public paired real HSI denoising dataset to learn HSI denoising
network and quantitatively evaluate HSI methods. In this paper, we mainly focus on how to produce realistic dataset for
learning and evaluating HSI denoising network. On the one hand, we collect a paired real HSI denoising dataset, which
consists of short-exposure noisy HSIs and the corresponding long-exposure clean HSIs. On the other hand, we propose an
accurate HSI noise model which matches the distribution of real data well and can be employed to synthesize realistic dataset.
On the basis of the noise model, we present an approach to calibrate the noise parameters of the given hyperspectral camera.
Besides, on the basis of observation of high signal-to-noise ratio of mean image of all spectral bands, we propose a guided HSI
denoising network with guided dynamic nonlocal attention, which calculates dynamic nonlocal correlation on the guidance
information, i.e., mean image of spectral bands, and adaptively aggregates spatial nonlocal features for all spectral bands. The
extensive experimental results show that a network learned with only synthetic data generated by our noise model performs as
well as it is learned with paired real data, and our guided HSI denoising network outperforms state-of-the-art methods under
both quantitative metrics and visual quality.

Keywords Hyperspectral image denoising · Paired real dataset · Realistic synthetic data · Guided nonlocal denoising network

1 Introduction RGB image, and various noises are easily introduced into
the acquisition process. This kind of degradation negatively
Hyperspectral image (HSI) can provide much more spectral influences not only the visual appearance of the HSI but also
information than RGB image, and is beneficial to numerous the performance of all downstream HSI applications (Chen
applications, including remote sensing (Borengasser et al. et al. 2014). Thus, HSI denoising is an essential step in the
2007; Ojha et al. 2015), computer vision (Cao et al. 2018; pipeline of HSI analysis and processing (Fig. 1).
Kwon and Nasrabadi 2007), medical diagnosis (Lu and Fei To remove the imaging noise, the well-known model-
2014; Bjorgan and Randeberg 2015) and more. Hyperspec- based HSI denoising methods often iteratively solve an
tral imaging is to capture spectral information of each spatial optimization problem with various hand-crafted priors, such
position in a scene with massive wavebands, and the commer- as smoothness (Yuan et al. 2012), self-similarity (Fu et al.
cial hyperspectral cameras often utilize the scanning design 2017; He et al. 2019) and so on. Nevertheless, the iterative
(Porter and Enmark 1987; Basedow et al. 1995). These make optimization procedure is time-consuming and the hand-
its photon counts for each band are much less than that in crafted priors cannot sufficiently represent the variety of
data in the real world. Instead of costly optimization and
Communicated by Boxin Shi. hand-crafted priors, learning-based methods (Wei et al. 2021;
Chang et al. 2018; Yuan et al. 2018; Dong et al. 2019; Lin
B Ying Fu et al. 2019) automatically learn the mapping from noisy HSI
fuying@bit.edu.cn
to clean HSI with convolutional neural network (CNN). How-
1 School of Computer Science and Technology, Beijing ever, existing leaning-based methods rarely consider the high
Institute of Technology, Beijing 100081, China signal-to-noise ratio (SNR) data property of mean image of
2 Advanced Research Institute of Multidisciplinary Science, all spectral bands, as shown in Fig. 2, and dynamic spatial
Beijing Institute of Technology, Beijing 100081, China

123
International Journal of Computer Vision

In this paper, we mainly focus on how to capture realistic


data for learning HSI denoising network, including paired
real dataset and high-quality synthetic data performing as
well as real data, as shown in Fig. 1. To the best of our
knowledge, there is no public dataset for training and test-
ing HSI denoising methods with diverse real world data and
ground truth. Therefore, we first collect a real dataset of
noisy HSI captured with short exposure time and each noisy
HSI has a corresponding long-exposure clean HSI, which
is beneficial to the follow-up noise model formulation and
denoising method evaluation. Then, we propose an accurate
noise model for HSI, which can formulate the distribution
of real data well. In addition, we calibrate the parameter of
the formulated noise model, and the calibrated noise model
can be utilized to synthesize realistic HSI denoising dataset.
Apart from HSI denoising dataset, the denoising algorithm is
Fig. 1 A scene from our collected HSI dataset, and we show the spec- vital important to the performance of HSI denoising. Based
tral band in 550 nm. a The input noisy image; b The output of CNN on the observation of high signal-to-noise ratio of mean
trained with synthetic dataset generated by complex noise model (Chen image of all spectral bands and inspire of traditional nonlo-
et al. 2017); c The output of CNN trained with our collected paired cal means denoising algorithms, we proposed a guided HSI
real dataset; d The output of CNN trained with synthetic dataset gener-
ated by our noise model, which is comparable with c the result trained denoising network with guided dynamic nonlocal attention
with paired real dataset and obviously outperforms synthetics dataset (GDNA), which calculates dynamic nonlocal correlation on
generated by complex noise model the guidance information, i.e., mean image of all spectral
bands, and adaptively aggregates spatial nonlocal features
of all spectral bands. Extensive experiments show that the
nonlocal correlation. The high SNR mean image of spectral HSI denoising network learned with only synthetic data gen-
bands can calculate more accurate spatial nonlocal correla- erated with our noise model can reach the HSI denoising
tion than low SNR spectral bands, which can effectively assist performance as well as that trained with real data, and our
HSI denoising with dynamic nonlocal feature aggregation. guided HSI denoising network outperforms state-of-the-art
Besides, existing learning-based methods generally rely methods under both quantitative metrics and visual quality.
on training dataset synthesized with simple Gaussian noise In summary, our main contributions are that we
model or complex noise models (Chen et al. 2017; Wei et al.
2021). Promising results on synthetic data notwithstanding,
these methods still cannot well work and evaluate on the real – Collect the first real dataset with paired noisy and clean
data, due to lacking realistic HSI data. HSIs, which will be publicly released to facilitate further
There are two ways to solve this problem. One is to cap- researches;
ture paired real data for HSI denoising network learning and – Formulate a noise model and calibrate it for a given
evaluation, like those for RGB image denoising (Plotz and hyperspectral camera, which can be utilized to synthe-
Roth 2017; Abdelhamed et al. 2018; Chen et al. 2018, 2019; size realistic noisy HSI;
Jiang and Zheng 2019). But collecting abundant high-quality – Propose a guided nonlocal HSI denoising network, which
real data for learning HSI denoising network is obviously can dynamically calculate spatial nonlocal correlation on
expensive and requires a large amount of labor. The other cleaner mean image of spectral bands.
is to generate realistic paired data. This is convenient and
inexpensive, but it lies in how accurate the noise formulation
model of real HSI is. The heteroscedastic Gaussian noise This paper is an extension of its preliminary version that
model (Acito et al. 2011) approximates the noise occurred appears in ICCV 2021 (Zhang et al. 2021). In Sect. 3, we
in HSI better than the commonly-used homoscedastic one. add the relevant works 3. In Sect. 3, we provide the setup
Nevertheless, it cannot delineate the full picture of sensor and process of paired real data capturing. In Sect. 4, we add
noise in the HSI. Actually, due to the physical characteristics parameter estimation details and compare synthetic data gen-
of hyperspectral camera, e.g., scanning the scene in spatial erated by different noise models and real data. In Sect. 5, we
or spectral domain (Porter and Enmark 1987; Basedow et al. design a guided HSI denoising network with GDNA. In Sect.
1995), the captured HSI contains more complex noise than 6, we give a more in-depth analysis of our noise model and
heteroscedastic Gaussian noise, as shown in Fig. 3. denoising network.

123
International Journal of Computer Vision

HSI denoising. Chen et al. (2010) proposed HSI denoising


method with sparsity prior using wavelet shrinkage and prin-
cipal component analysis. Maggioni et al. (Maggioni et al.
2012) extended the BM3D filter (Dabov et al. 2007) to volu-
metric data, which is named BM4D and explores the sparsity
and nonlocal self-similarity priors. Peng et al. (2014) and Fu
et al. (2017) proposed dictionary learning methods for HSI
denoising with sparsity and nonlocal self-similarity priors.
Xie et al. (2016a) proposed a HSI denoising method with ten-
sor sparsity regularization. Zhang et al. (2013) restored noisy
HSI with low-rank matrix recovery. To exploit low-rankness
prior for HSI denoising, more elaborately designed methods
have been successively proposed (Chang et al. 2020, 2017;
Dong et al. 2015; He et al. 2015, 2019; Wang et al. 2017;
Xie et al. 2016b; He et al. 2020). However, the iterative opti-
mization is time-consuming and the hand-crafted priors only
Fig. 2 High signal-to-noise ratio of mean image of spectral bands. The model the linear property of HSI, thus, cannot sufficiently
mean image of spectral bands a is clearer than original spectral bands exploit the nonlinearity of various HSI in the real world.
(b–d) Recently, researchers pay more attention to learning-based
methods (Wei et al. 2021; Chang et al. 2018; Yuan et al. 2018;
Dong et al. 2019; Lin et al. 2019; Shi et al. 2021), which infer
with less time than model-based methods leveraging graph-
ics processing unit (GPU) and automatically learn the deep
prior from training dataset. Chang et al. (2018) proposed a
HSI denoising network with residual learning and 2D convo-
lution. Lin et al. (2019) combined matrix factorization with
deep prior for HSI denoising. Yuan et al. (2018) employed a
residual network to recover HSI with a sliding window strat-
egy. Dong et al. (2019) proposed a 3D HSI denoising network
with U-net architecture (Ronneberger et al. 2015) to exploit
spectral and spatial correlations. Wei et al. (2021) introduced
Fig. 3 A typical band with the obvious noise in the HSI recurrent architecture to 3D HSI denoising network to exploit
global spectral correlation. Qian et al. (2021) integrated spa-
tial and spectral attention to 3D network for HSI denoising.
2 Related Work Miao et al. (2021) proposed an unsupervised HSI denois-
ing method with disentangled spatial-spectral deep priors.
In this section, we review the most relevant studies on HSI Zhao et al. (2022) employed spectral-spatial transform based
denoising, guided image restoration and nonlocal prior for sparse and low-rank representations for HSI denoising. Nev-
image restoration. ertheless, due to lacking paired real HSI denoising dataset,
these powerful learning-based methods are often trained with
2.1 HSI Denoising synthetic data. The most widely-used approach is applying
additive, white, Gaussian noise to generate noisy HSI (Chang
Noise removal from a single HSI is a well-developed topic et al. 2018; Lin et al. 2019; Yuan et al. 2018; Dong et al.
in computer vision (He et al. 2019) and remote sensing (Liu 2019). However, even the complex noise (Chen et al. 2017;
et al. 2012). Existing methods toward HSI denoising can be Wei et al. 2021) also cannot effectively model the noise in
roughly classified into two categories, including model-based realistic testing HSI, and leads to significant performance
methods and learning-based methods. reduction according to our experiments.
The model-based methods often iteratively solve an opti- Distribution difference between synthetic data and real
mization problem with various hand-crafted priors, e.g., image is a general problem in learning-based denoising meth-
smoothness (Yuan et al. 2012), sparsity (Chen and Qian 2010; ods. To alleviate this issue, there are mainly two solutions.
Zhang et al. 2018), nonlocal self-similarity (Fu et al. 2017), On the one hand, some works (Abdelhamed et al. 2018; Chen
low-rankness (He et al. 2019). Yuan et al. (2012) employed et al. 2018, 2019; Jiang and Zheng 2019) collect paired
spatial-spectral total variation to exploit smoothness prior for real RGB images not only for evaluation but also for net-

123
International Journal of Computer Vision

work learning. For instance, Chen et al. (2018) collected a


paired real dataset, with short-exposure noisy image and the
corresponding long-exposure clean image, for RGB image
denoising. Nevertheless, capturing an abundant paired real
dataset is obviously expensive and requires a great amount
of labor. More importantly, to the best our knowledge, there is
no appropriate paired real dataset for training HSI denoising
network.
On the other hand, to evade the difficulties in captur-
ing paired real data with camera, some researches focus on
improving the realism of synthetic dataset. By considering
both photon and thermal noise, the works of (Acito et al. Fig. 4 Paired real data capture setup
2011) utilized a signal-dependent heteroscedastic Gaussian
model to characterize the noise properties in real HSI. More
recently, Chen et al. (2017), Wei et al. 2021) utilized a noise the features in high resolution branches. Liu et al. (2020)
model, considering the Gaussian noise, stripe noise, deadline considered the high sampling rate of green channel and addi-
noise and impluse noise, to simulate noisy HSI. However, tionally introduced a green channel guidance branch for joint
these methods either oversimplify noise ingredients caused denoising and demosaicing.
by sensor, or do not estimate the noise parameter for real In this work, we employ the mean image of spectral bands
noisy HSI. as guidance information, which is clearer than each original
In this work, we collect the first dataset with short- band and can calculate more accurate spectral correlation.
exposure noisy HSI and corresponding long-exposure clean
HSI to support systematic reproducible research in HSI
2.3 Nonlocal Prior for HSI Restoration
denoising. Based on the captured real HSI dataset, we
propose an accurate noise formulation model and a corre-
Nonlocal self-similarity is an effective prior and has been
sponding noise parameter estimation method to synthesize
widely employed in HSI restoration tasks, e.g., HSI super-
realistic dataset, and verify its effectiveness by comparing
resolution (Fu et al. 2018) and coded HSI reconstruction
with the captured real HSI dataset.
(Wang et al. 2021). Some traditional HSI denoising methods
(Fu et al. 2017; He et al. 2020) employ nonlocal self-
2.2 Guided Image Restoration similarity for HSI denoising and achieved astonishing perfor-
mance. Following the importance of nonlocal self-similarity,
Guided image restoration employs additional information to
researchers recently have designed various nonlocal neural
assist the image recovery. The famous guided filter (He et al.
networks (Xiong et al. 2021; He et al. 2021; Shi et al. 2021)
2012) generates filter weights by utilizing additional image
to utilize this prior. Besides, some transformer-based meth-
as guidance, and achieves good performance in many image
ods (Cai et al. 2021) were proposed to exploit this prior.
recovery tasks, such as image denoising (Monno et al. 2015).
Traditional nonlocal prior often only employs some high
Recently, researchers (Hui et al. 2016; Li et al. 2016; Zheng
correlated tokens to aggregate feature, while existing deep
et al. 2018; Zhou et al. 2021; Fu et al. 2019; Wang et al.
learning methods calculate correlation and aggregate feature
2018; Guo et al. 2021; Kawakami et al. 2013; Ma et al. 2014)
of all tokens in the search region, which ignores how similar
employ guidance information for various image restoration
they are to the query token.
tasks, especially for super-resolution. Some methods (Hui
In this work, we propose a dynamic nonlocal mechanism,
et al. 2016; Li et al. 2016) employ RGB image as the guid-
which can exploit both nonlocal self-similarity prior and
ance information to super-resolve the depth image. Zhou et
dynamically groups the high similarity tokens to aggregate
al. (2021) employed cross-scale stereo image as guidance
feature.
information for image super-resolution. Fu et al. (2019) pro-
posed a hyperspectral image super-resolution method with
RGB image guidance. Wang et al. (2018) proposed a super-
resolution method guided by semantic information, which 3 Real HSI Denoising Dataset
can generate realistic textures.
Apart from these, some researchers pay attention to The existing learning-based methods (Wei et al. 2021; Chang
employing self-information as guidance. Gu et al. (2019) pro- et al. 2018; Yuan et al. 2018; Dong et al. 2019; Lin et al.
posed a self-guidance network for image denoising, where 2019) often trained and evaluated on synthetic data, and the
the features in low resolution branch were employed to guide denoising and generalization capabilities on real data are not

123
International Journal of Computer Vision

Fig. 6 Overview of hyperspectral camera imaging pipeline

employ a remote control software to deliberately decrease the


exposure time by factor s for short-exposure image. In other
word, we capture noisy image with fractions of a second to
seconds. The flow chart of data capture is shown in Algo-
rithm 1. Since we capture multiple images for one scene and
the exposure time for reference image is necessarily long, all
scenes in the dataset are static. Some samples of reference
images and corresponding noisy images are shown in Fig. 5.
The dataset contains both indoor and outdoor scenes. To cap-
Fig. 5 Example images in our captured HSI denoising dataset. Outdoor ture high-quality reference images, we capture indoor images
images in the top two rows, indoor images in the bottom two rows.
Long-exposure clean images (ground truth) are shown in left, and short-
under incandescent lamp to lighten the scene. The outdoor
exposure images are shown in right in each image. The noisy images are images are generally captured on sunny days between 9:00
captured with 1/50 exposure time as the reference image photographed and 17:00 around, when the scene is well-illuminated.
with. The RGB is synthesized by HSI bands in 482 nm, 539 nm and Following existing HSI datasets (Yasuma et al. 2010; Arad
607 nm
and Ben-Shahar 2016; Chakrabarti and Zickler 2011), we
select 34 bands around from 400 to 700 nm in visible spectral
Algorithm 1: Data acquisition process range. The dynamic range of the captured HSI is 12 bit, and
Input: Exposure factor s the spectral value range is 0–4095. This dataset contains 62
Output: Paired real HSI denoising dataset
while not enough scenes do long-exposure clean HSIs, of which each is paired with a
Adjust camera settings to maximize the quality of reference corresponding noisy HSI captured with short-exposure time.
image; We select paired noisy and reference images of 17 scenes
Find an exposure time T ;
to form the testing set, and the rest are selected for training
Capure a reference image with exposure time T ;
Capure a noisy image with exposure time T / f ; set. The captured real HSI denoising dataset can evaluate the
end generalization capability of existing learning-based methods
and verify the effectiveness of our noise model proposed in
the next section.
considered, as there is no appropriate paired real HSI denois-
ing dataset.
To support the systematic research, we collect the first real 4 Noise Modeling for HSI
dataset for training and benchmarking HSI denoising. We
employ a SOC710-VP hyperspectral camera, manufactured Existing learning-based HSI denoising methods (Wei et al.
by Surface Optics Corporation (SOC), USA. The SOC710- 2021; Chang et al. 2018; Yuan et al. 2018; Dong et al. 2019;
VP hyperspectral camera is equipped with a silicon-based Lin et al. 2019) are always learned on synthetic dataset, which
charge-coupled device (CCD) and an integrated scanning follows various noise models. However, there is no system-
system. With standard settings, the SOC710-VP can cap- atic and quantitative evaluation on real testing data for these
ture HSI with 696 × 520 pixels in spatial resolution and 256 noise models.
spectral bands from 376.76 to 1075.80 nm at 2.7 nm inter- In this section, we focus on synthesizing a realistic HSI
val. The camera is mounted on a sturdy tripod, and the data denoising dataset. We first formulate the noise model of HSI,
capture setup is shown in Fig. 4. For each scene, we adjust the imaging overview is shown in Fig. 6. Then, we introduce
camera settings such as aperture, focus and exposure time a noise parameter estimation approach to calibrate the noise
to maximize the quality of the reference images. Capturing model. Finally, we synthesize a realistic dataset for learning
a reference image often needs dozens of seconds. Then, we HSI denoising network based on calibrated noise model.

123
International Journal of Computer Vision

4.1 Noise Formulation electronic noise sources during electrons-to-voltage stage.


During voltage-to-digital stage, due to the dynamic range of
Nowadays, most hyperspectral cameras utilize CCD sen- digital storage medium, the continuous analog voltage sig-
sors. Fortunately, some works have discussed the various nal is quantized to discrete digital number, which causes the
noise components associated with CCD camera systems quantization noise. These noises can be described as
(Healey and Kondepudy 1994; Schott 2007; Holst 1998).
Let X (m, n, λ) indicate the clean HSI, where 1 ≤ m ≤ M
N p (m, n, λ) = Nd (m, n, λ) + Nr (m, n, λ) + Nq (m, n, λ), (3)
and 1 ≤ n ≤ N index the spatial coordinate and 1 ≤ λ ≤ Λ
indexes the spectral coordinate. A linear model describing the
relationship between the raw sensor output in digital num- where Nd is the dark current noise, Nr is read noise, and
bers, i.e., noisy HSI, and the integrated photoelectrons during Nq is quantization noise. As these noises are caused by
exposure can be expressed as camera circuit and not related to incident light, they are
signal-independent noises. Following the zero-mean noise
Y(m, n, λ) = X (m, n, λ) + N (m, n, λ) assumption for all pixels, these signal-independent noise can
(1) be formulated as a Gaussian distribution
= kL(m, n, λ) + N (m, n, λ),

where Y is the noisy HSI, N indicates the summation of all N p (n, m, λ) ∼ g(0, σ p2 (λ)), (4)
noises physically caused by light and camera, L represents
the number of photoelectrons that is proportional to the scene where g(·) represents the Gaussian distribution and σ p (λ)
irradiation, and k denotes the system gain, respectively. Note denotes the scale parameter for the λ-th band.
that we assume the system gain of all elements in the captured Apart from these signal-independent noise, due to spa-
HSI is identical due to the same CCD sensor in hyperspectral tial scanning design, hyperspectral camera often suffers from
camera. stripe pattern noise, as illustrated in Fig. 3. As we analyze
To systematically analyze the noise in HSI, we divide the hyperspectral camera employed to capture our real HSI
the noise into two components, e.g., signal-dependent noise dataset, we find it contains both horizontal stripe pattern noise
(correlated to incident light) and signal-independent noise and vertical stripe pattern noise, as shown in Fig. 7. The rea-
(uncorrelated to incident light). son is that each row is captured by the same CCD unit during
horizontal scanning, which causes the horizontal stripe pat-
4.1.1 Signal-Dependent Noise tern noise. Meanwhile, each column is captured in different
time, which causes the vertical stripe pattern noise. Thus, the
During the exposure time, photons in the incident light hits stripe pattern noise Nsp can be represented as
the sensing area of the sensor. Leveraging photoelectric con-
version, the sensor transforms photon to electrons. However, v
Nsp (n, m, λ) = Nsp
h
(n, m, λ) + Nsp (n, m, λ), (5)
due to the quantum property of light, the number of electrons
collected by sensor exists an ineluctable uncertainty, which h and N v denote horizontal and vertical stripe pat-
can be formulated as a Poisson distribution where Nsp sp
tern noise, respectively. The stripe pattern noise is caused
[L(m, n, λ) + Nsd (m, n, λ)] ∼ p(L(m, n, λ)), (2) by scanning camera design and not related to incident light,
and it is signal-independent noise. Following the assumption
where Nsd denotes the signal-dependent noise (i.e., shot of zero-mean Gaussian distribution in each row or column,
noise) and p(·) represents the Poisson distribution. Gener- we respectively formulate the horizontal and vertical stripe
ally, due to the limitation of light characteristic, shot noise pattern noise as
cannot be avoided by hyperspectral sensors. Overall, signal-
dependent noise is shot noise caused by photon-to-electrons Nsp
h
(m, λ) ∼ g(0, σh2 (λ)),
stage. v
(6)
Nsp (n, λ) ∼ g(0, σv2 (λ)),

4.1.2 Signal-Independent Noise


where σh (λ) and σv (λ) denote the scale parameters of hor-
izontal and vertical stripe pattern noise for the λ-th band,
For CCD sensors in hyperspectral imaging, during the expo-
respectively. Thus, the full signal-independent noise can be
sure time, thermal energy in silicon generates free electrons,
described as
known as dark current noise, which can be stored at collection
sites and thereafter become indistinguishable from photo-
electrons. Read noise is caused by amplifier, reset, and other Nsi = N p + Nsp . (7)

123
International Journal of Computer Vision

σh (λ)
System gain k
Scales σ p (λ)

Scales σv (λ)
Parameter

Scales g(0, σh2 (λ))


N p = Nd + Nr + Nq
[L(λ) + Nsd (λ)] ∼ p(L(λ))

N p (n, m, λ) ∼ g(0, σ p2 (λ))

Nsp (n, λ) ∼ g(0, σv (λ))


2

Distribution
Fig. 7 Analysis of stripe pattern noise. Performing discrete Fourier

h (m, λ)
transform on a bias frame, the highlighted vertical and horizontal pat-
terns in the centralized Fourier spectrum verify the existence of both

Nsp
v
horizontal and vertical stripe pattern noises, respectively

To summarize, our full noise model can be formulated as

N = kNsd + Nsi ,

Photons-to-electrons
Photons-to-electrons
(8)

Electrons-to-voltage
= kNsd + N p + Nsp ,

Scanning system
Voltage-to-digit
where k, Nsd and Nsi represents system gain, signal-
dependent noise and signal-independent noise, respectively. Stage

4.2 Noise Parameter Estimation

To calibrate the noise parameters of our proposed noise model


for the given hyperspectral camera, we introduce a noise

p(·) denotes Poisson distribution and g(·) is Gaussian distribution, respectively


Dark current noise Nd Read noise

Horizontal stripe pattern noise Nsp


h

parameter estimation method. According to Eqs. (2) (4) and Vertical stripe pattern noise Nsp
v

(6), Table 1 summarizes the four parameters in our noise


Nr Quantization noise Nq

model need to be calibrated, including the system gain k


for signal-dependent noise Nsd , the scale parameters σ p for
pixel noise N p , and scale parameters σh and σv for stripe
pattern noise Nsp . The system gain k is parameter of signal-
Noise name

dependent noise and the others scale parameters σ p , σh and


Shot noise

σv are parameters of signal-independent noise.

4.2.1 Estimating Parameter of Signal-Dependent Noise


Table 1 Summary of our noise model

According to Eqs. (1) and (8), a noisy HSI Y can be expressed


as
Signal-independent noise Nsi
Signal-dependent noise Nsd

Y = k(L + Nsd ) + Nsi . (9)

Following Eq. (9), the variance of noisy HSI can be repre-


Noise type

sented as

V ar (Y) = k 2 V ar (L + Nsd ) + V ar (Nsi ), (10)

123
International Journal of Computer Vision

Fig. 9 The distribution fitting of signal-independent noise (a), horizon-


Fig. 8 Estimation of system gain k. a The colorchecker employed to tal (b) and vertical (c) stripe noises. λ = 550 nm
estimate system gain k. b Regression result. The vertical axis denotes
the noisy signal variance and the horizontal axis is the underlying true
signal value, which satisfy a linear function. The slope of linear function approximating Gaussian distribution. Further, we subtract the
represents the estimated system gain k (Color figure online) estimated stripe pattern noise from bias image and estimate
the scale parameters σ p (λ) of pixel noise by maximizing the
log-likelihood of Gaussian distribution. Figure 9 shows the
where V ar (·) is the function to calculate the variance of a
probability plot. The goodness-of-fit can be evaluated by the
random variable. As (L + Nsd ) following a Poisson distri-
coefficient of determination R 2 (Morgan et al. 2011), and the
bution, the variance of it equals to its mean. Equation (10)
R 2 closer to 1 illustrates the fitness is better. We can see that
can be transfered to
the R 2 of σ p (λ), σh (λ) and σv (λ) are all close to 1, which
means our method fits the empirical data well.
V ar (Y) = k 2 L + V ar (Nsi )
(11) To make our calibrated noise model more robust, we esti-
= k(kL) + V ar (Nsi ), mate scale parameters of a sequence of bias images, and fit
them with Gaussian distribution in the logarithmic domain.
where kL is the underlying true signal and represented by For example, we first employ the λ-th band of a sequence of
digital numbers. bias images to estimate a group of scale parameters σ p (λ).
To estimate the system gain k, we need to fit a line lever- Then, we fit these parameters σ p (λ) with Gaussian distribu-
aging a set of observations {V ar (Y), kL}. Signal variance tion, and we can obtain mean and variance parameters a p (λ)
V ar (Y) is easy to calculate, while to get the true signal kL, and b2p (λ). In the same way, we also can obtain mean param-
we employ an image with uniform content to approximate eters ah (λ), av (λ) and variance parameters bh2 (λ), bh2 (λ) for
it by the median statistics. Specifically, we capture a col- horizontal and vertical stripe pattern noises. When we employ
orchecker under uniform light, as shown in Fig. 8(a). Each our calibrated noise model, we can sample the noise param-
color block in colorchecker is cropped to form a sequence eter from
of images captured with different intensities. Thus, only one
image need to be captured rather than multiple images with log(σ p (λ)) ∼ g(a p (λ), b2p (λ)),
different luminous fluxes. Given the signal variance V ar (Y)
log(σh (λ)) ∼ g(ah (λ), bh2 (λ)), (12)
and true signal kL, we calculate a linear least square regres-
sion to estimate the system gain k. As shown in Fig. 8(b), log(σv (λ)) ∼ g(av (λ), bv2 (λ)),
the point samples almost perfectly lie in a straight line, and
the slope of it denotes the estimated system gain k. With the where a(λ) and b2 (λ) denote the estimated mean and vari-
estimated system gain k, we can add signal-dependent noise ance of Gaussian distribution for λ-th band, respectively.
to HSI.
4.3 Noisy Image Synthesis
4.2.2 Estimating Parameter of Signal-Independent Noise
Here, we describe how to synthesize noisy image with the
To estimate the scale parameters σ p , σh and σv , the bias image calibrated noise model. We first divide the clean HSI by a
is captured with the shortest exposure time under a lightless factor to match the intensity of short-exposure noisy image,
condition, i.e., in a dark room and capping on the camera which follows the setting of our captured real HSI denoising
lens. Bias image delineates the noise picture independent of dataset in Sect. 3. Then, we add signal-dependent noise Nsd
light, blended by the multiple noise sources aforementioned. by converting the HSI X into the number of photoelectrons L,
Leveraging the zero-mean assumption of pixel noise N p , we imposing a Poisson distribution on L, and reverting the signal
firstly extract the mean values of each column or row of the to X , with the estimated system gain k. Besides, we sample
λ-th band in bias image to estimate the underlying intensi- the noise parameters according to Eq. (12), and generate pixel
ties of vertical or horizontal stripe pattern noise. Then, we noise N p and stripe pattern noise Nsp . These generated noise
can readily estimate the scale parameters σh (λ) or σv (λ) by are added to the scaled clean HSI. Finally, we multiply the

123
International Journal of Computer Vision

noisy HSI by the same factor to match the intensity of clean In this work, we propose a guided HSI denoising network,
HSI. which can effectively employ the prior of mean image of
Following these steps, we generate a realistic synthetic all spectral bands. Specifically, we design a guided dynamic
dataset with rich paired noisy and clean HSIs, which is bene- nonlocal attention module, where the attention map is cal-
ficial to train HSI denoising network and generalize it to the culated by the guidance information and related blocks are
evaluation on noisy HSI in the real world. dynamic for each spatial position.

4.4 Comparison of Real and Synthetic Data 5.2 Network Architecture

To compare synthetic data by our noise model with real Our network has two branches, i.e., main denoising branch
data and synthetic data by previous noise model, e.g., and guidance information branch. Recent researches (Long
homoscedastic Gaussian noise model, heteroscedastic Gaus- et al. 2015; Chen et al. 2018; Xu et al. 2015) have shown the
sian noise model and complex noise model (Chen et al. 2017), effectiveness of CNN in image-to-image tasks, especially U-
we summarize the advantage and disadvantage of different Net (Ronneberger et al. 2015). Inspired by it, we employ a
data in Table 2. Collecting abundant high-quality real data is modified U-Net with 3D convolution as the main denoising
expensive and requires a amount of labor. In contrast, syn- branch to better exploit spectral correlation. The architec-
thesizing realistic paired data is convenient and inexpensive. ture of network is illustrated in Fig. 10. The main denoising
Synthetic data generated with our noise model is accurate, branch consists of 4 encoder stages and 4 corresponding
and can make the HSI denoising network trained on it per- decoder stages. At the end of each encoder stage, the feature
forms as well as that trained on real data. Besides, both real maps are downsampled to 1/2× scale with a 1 × 4 × 4 kernel
data and synthetic data generated with our noise model can size and 2 stride convolution. Before each decoder stage, the
be employed to benchmark the HSI denoising algorithms. feature maps are upsampled to 2× scale with a 1 × 2 × 2
kernel size and 2 stride transposed convolution. Skip con-
nections pass large-scale low-level feature maps from each
5 Guided HSI Denoising Network encoder stage to its corresponding decoder stage. We utilize
residual block as the fundamental block to build encoder and
In this section, we first describe the motivation of our method. decoder, which is conducted by two 3 × 3 × 3 convolutions
Then, we describe the overall network architecture of guided followed by LeakyReLU activation function and a 1 × 1 × 1
HSI denoising network. Finally, we introduce the guided convolution.
dynamic nonlocal attention (GDNA) module, which can For the guidance information branch, we employ a
adaptively guide the HSI denoising by the guidance infor- encoder with several residual blocks and downsampling lay-
mation of mean image of spectral bands. ers, and map the guidance information to desired spatial
resolution. Note that we only employ 2D convolution in
5.1 Motivation this branch as the mean image only has one channel and
no spectral correlation need to be exploited. Specifically, to
HSI has dozens to hundreds of spectral bands, and each band balance the performance and computational cost, we plug
often has random noises, including signal-dependent noise guided dynamic nonlocal attention module to 1/2× spatial
and signal-independent noise. For nature image denoising, resolution feature of main denoising branch in both encoder
researchers often capture several images and average them to and decoder.
obtain clean image (Abdelhamed et al. 2018). If we treat each We employ Charbonnier loss (Charbonnier et al. 1994) to
spectral band as a gray image and average them to obtain a optimize the proposed guided HSI denoising network, which
mean image, the signal-independent noise can be effectively can be formalized as
removed. Therefore, we can observe that the signal-to-noise

ratio of mean image of all spectral bands is higher than each
L(X̂ , X ) = ||X̂ − X ||2 +  2 , (13)
original band, as shown in Fig. 2. Besides, HSI often has
similar blocks in different spatial positions. On the basis of
this observation, traditional HSI denoising methods (Fu et al. where X̂ denotes the denoised HSI, and  is a constant param-
2017; He et al. 2020) widely employ nonlocal self-similarity eter which is empirically set to 0.001.
prior, which calculates nonlocal correlation with dynamic
block groups. Inspired by that, recent deep learning methods 5.3 Guided Dynamic Nonlocal Attention
(Shi et al. 2021) design nonlocal attention network for HSI
denoising, however, which exploits nonlocal correlation in As mentioned in Sects. 2.2 and 2.3, self-guidance informa-
fixed neighborhoods. tion has been employed in several image restoration tasks

123
International Journal of Computer Vision

Table 2 Comparison of real and synthetic data


Inexpensive Convenient Accurate Benchmarking

Real data  
Synthetic data with previous noise model  
Synthetic data with our noise model    *
Note that synthetic data with our noise model is more accurate than that with previous model, and is realistic enough to benchmark HSI denoising
algorithm, even if it is not real data

Fig. 10 The overview of the proposed guided HSI denoising network a encoder to extract guidance information from mean image, and guided
architecture, which consists of main denoising branch and guidance dynamic nonlocal attention (GDNA) to guide HSI denoising. Note that
information branch. The main denoising branch employs U-Net as the we employ 3D convolution in the main denoising branch and 2D con-
basic architecture and residual block as fundamental block. We employ volutionn in the guidance branch, respectively

(a) (b)
Fig. 11 Guided dynamic nonlocal attention (GDNA)

and existing deep nonlocal methods employs a fixed number tion map is employed to aggregate feature of main denoising
of neighbor tokens for each query token. In this subsec- branch. The detailed structure of the GDNA module is shown
tion, we introduce a guided dynamic nonlocal attention in Fig. 11.
module, which can calculate dynamic nonlocal relationship
according to guidance information. It first employs the guid-
ance information to calculate initial correlation map. Then, 5.3.1 Initial Correlation Map
a dynamic gate mechanism is utilized to adaptively filter
the low-similarity tokens, and the remained high-correlation In our GDNA, we first employ two 1 × 1 convolution gφ and
tokens conduct the attention map. Finally, the dynamic atten- gψ to transform guidance information G ∈ RC g ×H ×W to two

123
International Journal of Computer Vision

hidden representations, which can be expressed as 5.3.3 Guided Feature Aggregation

Gφ = gφ (G), and Gψ = gψ (G). (14) Given the dynamic attention map, we attentively aggregate
feature of each band in main denoising branch, and the final
Then, we flatten Gφ and Gψ into feature vectors, and dot output can be expressed as
product them to calculate initial correlation map M ∈ 

R N ×N (N = H × W ). It can be represented as Fout,λ = Fin,λ + f out ( Ai, j f in (Fin,λ )), (19)
j∈ηi
M = θ (Gφ )T ⊗ θ (Gψ ), (15)
where f out is the output transform function with a 1 × 1
where θ is the flatten operation. convolution. As we separately calculate attention for each
spectral band, each spectral band can obtain different dif-
5.3.2 Dynamic Attention Map ferent dynamic attention map. Besides, it still maintains the
flexibility of 3D convolution network to process HSI with
To obtain the dynamic attention map, we design a dynamic different spectral bands.
gate mechanism, which can select high-similarity tokens with
a adaptive threshold. Let us consider Mi,: , i.e., the i-th row
of M, being correlations between the i-th token and the other 6 Experimental Results
tokens. We first average the Mi,: to obtain a initial threshold,
as it is the average importance of different tokens to the i-th In this section, we first introduce the settings in our exper-
token. To improve the adaptability, we further apply a token- iments, including implementation details and metrics for
specific linear transform to compute adaptive threshold Ti , quantitative evaluation. Then, we discuss the effect of dif-
which can be formalized as ferent noise model to synthesize noisy data. In addition, our
method is compared with several state-of-the-art methods on
 )
τ1 (Fin,λ N our realistic HSI denoising dataset. Finally, we discuss the

Ti = Mi, j + τ2 (Fin,λ ) effect of different network modules.
N
j=1
(16)
α 
N 6.1 Settings
= Mi, j + β,
N
j=1 6.1.1 Implementation Details

where Fin,λ is the λ-th band feature map transformed from To train our HSI denoising network, we crop overlapped
Fin,λ (Fin ∈ RΩ×C×H ×W ) by f in , and τ1 and τ2 are two 128 × 128 spatial regions from realistic dataset synthesized
convolution layer generating specific linear transform param- with our noise model and augment them by random flipping
eters α and β for each token, respectively. and/or rotation. Besides, we also train networks with our real
Then, we employ ReLU (Nair and Hinton 2010) as trun- HSI denoising dataset and employ the same augmentation
cation function, which is simply and differentiable. This method.
process removes the negative part and maintains the posi- Our implementation is based on PyTorch (Paszke et al.
tive part, and can be expressed as 2019). The models are trained with Adam optimizer (Kingma
and Ba 2014) (β1 = 0.9 and β2 = 0.999) for 200 epochs. The
Ai,: = r (Mi,: − Ti ), (17) initial learning rate and mini-batch size are set to 2 × 10−4
and 1, respectively.
where r is the ReLU function and A ∈ R N ×N is the dynamic
attention map. If Gi is highly correlated to Gi , Ai, j equals to 6.1.2 Evaluation Metrics
similarity weight, otherwise equals to zero.
Furthermore, we normalize non-zero weights of attention We employ four quantitative quality metrics to evaluate the
map by the softmax function. It can be represented as performance of all methods, including peak signal-to-noise
ratio (PSNR), structural similarity (SSIM) (Wang et al. 2004),
 exp(Ai, j ) spectral angle mapping (SAM) (Kruse et al. 1993) and rela-
Ai, j =  , j ∈ ηi , (18)
k∈ηi exp(Ai,k ) tive dimensionless global error in synthesis (ERGAS) (Wald
2000). PSNR and SSIM show the spatial accuracy, which
where ηi is the index group of high similarity tokens to the are calculate on each 2D spatial image and averaged over all
i-th token. spectral band. SAM and ERGAS show the spectral fidelity.

123
International Journal of Computer Vision

Table 3 KL distance of real noise and synthetic noise


Complex Homo. Hetero. Ours

KL distance 0.6177 0.0992 0.0966 0.0912


The complex noise model (Chen et al. 2017) is consist of Gaussian,
stripe, deadline and impulse noises, and utilized in previous learning-
based method (Wei et al. 2021). ‘Homo.’ denotes the homoscedastic
Gaussian noise model for pixel noise (N p ) and ‘Hetero.’ represents the
heteroscedastic Gaussian noise model for signal-dependent and pixel
noises (Nsd + N p )

Table 4 Quantitative results of our network trained with real dataset


and synthetic datasets generated by different noise models
Datasets Noise Metrics
models PSNR SSIM SAM ERGAS

Synthetic Complex 28.337 0.8327 7.9049 28.831


Homo. 29.976 0.8446 5.8650 22.505
Hetero. 30.284 0.8946 4.4488 21.675
Ours 30.451 0.9004 4.1364 21.324
Real – 30.583 0.8843 4.6816 20.816

SAM is calculated on each 1D spectral vector and averaged


over all spatial points, while ERGAS calculates the average
amount of specific spectral distortion normalized by mean
intensity of each band. Lager values of PSNR and SSIM
indicate better reconstruction, and smaller values of SAM Fig. 12 Visual quality comparison of our network trained with different
and ERGAS denote higher performance. noise models. We show the spectral band in 550 nm

6.2 Evaluation of Noise Models


et al. 2017), homoscedastic Gaussian noise model and het-
6.2.1 Distribution Distance eroscedastic Gaussian noise model (Acito et al. 2011). Note
that we employ the basic HSI denoising network for all
To verify the effectiveness of the proposed noise model noise models. The results are provided in Table 4. The
for HSI, we first compare the distribution distance of real complex noise model is utilized by previous learning-based
noise and synthetic noise generated by complex noise model method (Wei et al. 2021) to synthesize dataset. However,
(Chen et al. 2017), homoscedastic Gaussian noise model, the complex noise model performs worse than latter three
heteroscedastic Gaussian noise model, and our noise model calibrated noise models, for it does not well calibrate and
by KL distance (Kullback and Leibler 1951). The results match the noise distribution of real HSI data. Homoscedastic
are provided in Table 3. We can see that the calibrated and heteroscedastic Gaussian noise models perform worse
homoscedastic, heteroscedastic Gaussian noise models and than our full noise model, even if we has calibrated the noise
our noise model have much lower KL distance than uncali- parameters for them, which is consistent with distribution
brated complex noise model. Furthermore, our noise model comparison in Table 3. Besides, the HSI denoising network
match the real noise distribution better than homoscedastic, trained on synthetic data generated by our noise model per-
heteroscedastic Gaussian noise models. It reveals that our formance as well as that trained on paired real data. The
noise model can accurately model the real noise distribution. reason may be that the synthetic dataset generated by our
noise model can provide more samples for the same number
6.2.2 Denoising Performance of scenes, compared with real dataset. It further verifies the
superiority of our accurate noise model.
In addition, we compare the performance of network trained A visual comparison of our noise model and other noise
with real dataset and synthetic dataset generated by differ- models is provided in Fig. 12. It can be seen that the result of
ent noise models, including complex noise model (Chen our method is cleaner than compared methods, which verifies

123
International Journal of Computer Vision

Table 5 Quantitative results of different methods on our real HSI denoising dataset
Metrics Methods
Noisy BM4D ITSReg LRTDTV NMoG LLRT NGmeet HSID-CNN QRNN3D 3D-ADNet Ours

PSNR 20.907 25.318 25.459 25.564 24.953 24.917 25.260 29.842 26.438 29.593 30.889
SSIM 0.3186 0.8156 0.8400 0.7859 0.7475 0.7399 0.7848 0.8825 0.7494 0.8641 0.9007
SAM 25.2992 6.3024 5.1425 6.4881 6.6469 7.4341 6.3595 5.1508 8.8176 6.1157 4.0924
ERGAS 60.758 35.078 34.618 34.429 37.163 36.698 35.037 22.399 48.289 23.009 20.516
Times(s) – 850 2717 482 662 1710 634 1 2 31 8

Fig. 13 Visual quality comparison on three typical outdoor scenes for noise removing in our real HSI denoising dataset. The denoising results and
the reference image are shown from left to right and from top to bottom

that our method can accurately formulate the noise in the HSI 2012), ITSReg (Xie et al. 2016a), LRTDTV (Wang et al.
and can generate the most realistic dataset for CNN training. 2017), NMoG (Chen et al. 2017), LLRT (Chang et al. 2017)
and NGmeet (He et al. 2019), and three learning-based
method, i.e., HSID-CNN (Yuan et al. 2018), QRNN3D (Wei
6.3 Evaluation of HSI Denoising Methods et al. 2021) and 3D-ADNet (Shi et al. 2021). We make great
effort to reproduce the best results for competitive methods
6.3.1 Compared Methods with the codes that are released publicly. Note that we train
all network with synthetic data generated by our noise model
We compare our method with four state-of-the-art HSI and test on our collected paired real dataset.
denoising methods on our captured real HSI dataset, includ-
ing six model-based methods, i.e., BM4D (Maggioni et al.

123
International Journal of Computer Vision

Fig. 14 Visual quality comparison on three typical indoor scenes for noise removing in our real HSI denoising dataset. The denoising results and
the reference image are shown from left to right and from top to bottom

6.3.2 Numerical Results 6.3.3 Spatial Quality

Table 5 provides the averaged recovery results over all To visualize the experimental results, two representative
test images on our real HSI denoising dataset with indoor scenes are shown in Figs. 13 and 14. Compared with other
and outdoor scenes, to quantitatively compare our method methods, the recovered results by our method are consistently
with BM4D, ITSReg, LRTDTV, NMoG, LLRT, NGmeet, more accurate for all scenes. Specifically, our method can
HSID-CNN, QRNN3D and 3D-ADNet. The best results are produce the visually pleasant results with less artifacts and
highlighted in bold for each metric. It can be seen that sharp edges. It verifies that our method can provide higher
our method always has better performance than all com- spatial accuracy.
pared methods, which demonstrates the effectiveness of our
method. Moreover, our method outperforms existing deep 6.3.4 Spectral Fidelity
learning methods on all metrics. This reals the advantages of
mean image of spectral bands guidance and dynamic non- Figure 15 shows the recovered spectra of two points in the the
local correlation exploitation. Besides, we compare running selected images, which is indicated as orange in Fig. 13. We
time with other methods. It can be seen that the proposed can see that the spectra recovered by our method are much
method is comparable with HSID-CNN and QRNN3D and closer to the reference, which demonstrates that our method
is much faster than other methods, which verifies the effi- obtain higher spectral fidelity.
ciency of the proposed method.

123
International Journal of Computer Vision

Table 6 Quantitative results of


Cases Modules Metrics
our network with different
Unet Guidance Nonlocal Dynamic PSNR SSIM SAM ERGAS
modules
1  30.451 0.9004 4.1364 21.324
2   30.536 0.9005 4.1285 21.267
3    30.807 0.9007 4.1043 4.0924
4     30.889 0.9007 4.0924 20.516

researches in this area. On the other hand, we propose an


accurate noise model to comprehensively consider the noise
occurred in imaging process. We also present a noise parame-
ter estimation approach to calibrate the proposed noise model
for a given hyperspectral camera, and the calibrated noise
model can be employed to synthesize realistic dataset. In
addition, we propose a guided HSI denoising network with
guided dynamic nonlocal attention module, which can well
employ the high signal-to-noise ratio mean image of spectral
bands as guidance and calculate dynamic nonlocal correla-
tion to aggregate features of all spectral band. Experimental
Fig. 15 Comparison of spectral fidelity. The points of outdoor and
results show that a network learned with only synthetic data
indoor scenes are indicated as orange in Figs. 13 and 14. The spectra generated with our noise model performs as well as that
recovered by our method are much closer to the reference learned with real data, and our guided HSI denoising network
outperforms state-of-the-art methods under both quantitative
metrics and visual quality. We hope our work could provide
6.4 Ablation Study foundations for further researches in the field of HSI denois-
ing with realistic data.
To verify the effectiveness of the guided dynamic nonlocal
attention module, we conduct an ablation study on our realis- Acknowledgements This work was supported by the National Nat-
tic dataset. We train all network with synthetic data generated ural Science Foundation of China under Grants No. 62171038, No.
61827901 and No. 62088101.
by our noise model and test on our collected paired real
dataset. The results are provided in Table 6. Note that case 2
denotes simply concatenation guidance information on main
denoising branch. It can be seen that all modules contribute to
the performance improving. Specifically, guidance informa- References
tion improves the performance in all metrics, which verifies
Abdelhamed, A., Lin, S., & Brown, MS. (2018). A high-quality denois-
the effectiveness of guidance HSI denoising. Nonlocal cor- ing dataset for smartphone cameras. In: Proc. of Conference on
relation exploitation further improves the spatial fidelity and Computer Vision and Pattern Recognition, pp 1692–1700.
spectral accuracy, which demonstrates the advantage of non- Acito, N., Diani, M., & Corsini, G. (2011). Signal-dependent noise mod-
local attention. Finally, dynamic attention further improves eling and model parameter estimation in hyperspectral images.
IEEE Trans Geoscience and Remote Sensing, 49(8), 2957–2971.
denoising results, which reveals the superiority of dynamic Arad, B., & Ben-Shahar, O. (2016). Sparse recovery of hyperspectral
correlation. Overall, it demonstrates the effectiveness of our signal from natural rgb images. In: Proc. of European Conference
guided HSI denoising network and guided dynamic nonlocal on Computer Vision, pp. 19–34.
attention module. Basedow, RW., Carmer, DC., & Anderson, ME. (1995). Hydice system:
Implementation and performance. In: Proc. of SPIE’s Symposium
on OE/Aerospace Sensing and Dual Use Photonics, pp. 258–267.
Bjorgan, A., Randeberg, & LL. (2015). Towards real-time medical
diagnostics using hyperspectral imaging technology. In: Proc. of
7 Conclusion Clinical and Biomedical Spectroscopy and Imaging IV, p. 953712.
Borengasser, M., Hungate, W. S., & Watkins, R. (2007). Hyperspec-
In this paper, we mainly focus on capturing realistic datasets tral Remote Sensing: Principles and Applications. Remote Sensing
and design effective denoising network for HSI denoising. Applications Series: CRC Press.
Cai, Y., Lin, J., Hu, X., Wang, H., Yuan, X., Zhang, Y., Timofte, R.,
On the one hand, we collect the first real HSI denoising
& Van Gool, L. (2021). Mask-guided spectral-wise transformer
dataset with paired short-exposure noisy image and long- for efficient hyperspectral image reconstruction. arXiv preprint
exposure clean image, which is beneficial to follow-up arXiv:2111.07910

123
International Journal of Computer Vision

Cao, X., Zhou, F., Xu, L., Meng, D., Xu, Z., & Paisley, J. (2018). for hyperspectral image mixed denoising. Signal Processing, 184,
Hyperspectral image classification with markov random fields and 108060.
a convolutional neural network. IEEE Trans Image Processing, He, K., Sun, J., & Tang, X. (2012). Guided image filtering. IEEE Trans
27(5), 2354–2367. Pattern Analysis and Machine Intelligence, 35(6), 1397–1409.
Chakrabarti, A., & Zickler, TE. (2011). Statistics of real-world hyper- He, W., Zhang, H., Zhang, L., & Shen, H. (2015). Total-variation-
spectral images. In: Proc. of Conference on Computer Vision and regularized low-rank matrix factorization for hyperspectral image
Pattern Recognition, pp. 193–200. restoration. IEEE Trans Geoscience and Remote Sensing, 54(1),
Chang, Y., Yan, L., & Zhong, S. (2017). Hyper-laplacian regularized 178–188.
unidirectional low-rank tensor recovery for multispectral image He, W., Yao, Q., Li, C., Yokoya, N., & Zhao, Q. (2019). Non-local meets
denoising. In: Proc. of Conference on Computer Vision and Pattern global: An integrated paradigm for hyperspectral denoising. In:
Recognition, pp. 4260–4268. Proc. of Conference on Computer Vision and Pattern Recognition,
Chang, Y., Yan, L., Fang, H., Zhong, S., & Liao, W. (2018). Hsi-denet: pp. 6868–6877.
Hyperspectral image restoration via convolutional neural network. He, W., Yao, Q., Li, C., Yokoya, N., Zhao, Q., Zhang, H., & Zhang,
IEEE Trans Geoscience and Remote Sensing, 57(2), 667–682. L. (2020). Non-local meets global: An integrated paradigm for
Chang, Y., Yan, L., Zhao, X. L., Fang, H., Zhang, Z., & Zhong, hyperspectral image restoration. IEEE Trans Pattern Analysis and
S. (2020). Weighted low-rank tensor recovery for hyperspectral Machine Intelligence Early Access.
image restoration. IEEE Trans Cybernetics, 50(11), 4558–4572. Healey, G. E., & Kondepudy, R. (1994). Radiometric ccd camera cal-
Charbonnier, P., Blanc-Feraud, L., Aubert, G., & Barlaud, M. (1994). ibration and noise estimation. IEEE Trans Pattern Analysis and
Two deterministic half-quadratic regularization algorithms for Machine Intelligence, 16(3), 267–276.
computed imaging. In: Proc. of International Conference on Image Holst, G.C. (1998). Ccd arrays, cameras, and displays.
Processing, 2, 168–172 Hui, TW., Loy, CC., & Tang, X. (2016). Depth map super-resolution by
Chen, C., Li, W., Tramel, E. W., Cui, M., Prasad, S., & Fowler, J. E. deep multi-scale guidance. In: Proc. of European Conference on
(2014). Spectral-spatial preprocessing using multihypothesis pre- Computer Vision, pp. 353–369.
diction for noise-robust hyperspectral image classification. IEEE Jiang, H., & Zheng, Y. (2019) Learning to see moving objects in the
Journal of Selected Topics in Applied Earth Observations and dark. In: Proc. of International Conference on Computer Vision,
Remote Sensing, 7(4), 1047–1059. pp. 7324–7333.
Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in Kawakami, R., Zhao, H., Tan, R. T., & Ikeuchi, K. (2013). Camera
the dark. In: Proc. of Conference on Computer Vision and Pattern spectral sensitivity and white balance estimation from sky images.
Recognition, pp. 3291–3300. International Journal of Computer Vision, 105(3), 187–204.
Chen, C., Chen, Q., Do, MN., & Koltun, V. (2019). Seeing motion in the Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic opti-
dark. In: Proc. of International Conference on Computer Vision, mization. arXiv preprint arXiv:1412.6980
pp. 3185–3194. Kruse, F. A., Lefkoff, A., Boardman, J., Heidebrecht, K., Shapiro, A.,
Chen, G., & Qian, S. E. (2010). Denoising of hyperspectral imagery Barloon, P., & Goetz, A. (1993). The spectral image processing
using principal component analysis and wavelet shrinkage. IEEE system (sips) interactive visualization and analysis of imaging
Trans Geoscience and Remote Sensing, 49(3), 973–980. spectrometer data. Remote Sensing of Environment, 44(2–3), 145–
Chen, Y., Cao, X., Zhao, Q., Meng, D., & Xu, Z. (2017). Denois- 163.
ing hyperspectral image with non-iid noise structure. IEEE Trans Kullback, S., & Leibler, R. A. (1951). On information and sufficiency.
Cybernetics, 48(3), 1054–1066. The Annals of Mathematical Statistics, 22(1), 79–86.
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image Kwon, H., & Nasrabadi, N. M. (2007). Kernel spectral matched filter for
denoising by sparse 3-d transform-domain collaborative filtering. hyperspectral imagery. International Journal of Computer Vision,
IEEE Trans Image Processing, 16(8), 2080–2095. 71(2), 127–141.
Dong, W., Li, G., Shi, G., Li, X., & Ma, Y. (2015). Low-rank tensor Li, Y., Huang, JB., Ahuja, N., & Yang, MH. (2016). Deep joint image
approximation with laplacian scale mixture modeling for multi- filtering. In: Proc. of European Conference on Computer Vision,
frame image denoising. In: Proc. of International Conference on pp. 154–169.
Computer Vision, pp. 442–449. Lin, B., Tao, X., & Lu, J. (2019). Hyperspectral image denoising via
Dong, W., Wang, H., Wu, F., Shi, G., & Li, X. (2019). Deep spatial- matrix factorization and deep prior regularization. IEEE Trans
spectral representation learning for hyperspectral image denoising. Image Processing, 29, 565–578.
IEEE Trans Computational Imaging, 5(4), 635–648. Liu, L., Jia, X., Liu, J., & Tian, Q. (2020). Joint demosaicing and denois-
Fu, Y., Lam, A., Sato, I., & Sato, Y. (2017). Adaptive spatial-spectral ing with self guidance. In: Proc. of Conference on Computer Vision
dictionary learning for hyperspectral image restoration. Interna- and Pattern Recognition, pp. 2240–2249.
tional Journal of Computer Vision, 122(2), 228–245. Liu, X., Bourennane, S., & Fossati, C. (2012). Denoising of hyperspec-
Fu, Y., Zheng, Y., Huang, H., Sato, I., & Sato, Y. (2018). Hyperspec- tral images using the parafac model and statistical performance
tral image super-resolution with a mosaic rgb image. IEEE Trans analysis. IEEE Trans Geoscience and Remote Sensing, 50(10),
Image Processing, 27(11), 5539–5552. 3717–3724.
Fu, Y., Zhang, T., Zheng, Y., Zhang, D., & Huang, H. (2019). Hyper- Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional
spectral image super-resolution with optimized rgb guidance. In: networks for semantic segmentation. In: Proc. of Conference on
Proc. of Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition, pp. 3431–3440.
pp. 11661–11670. Lu, G., & Fei, B. (2014). Medical hyperspectral imaging: A review.
Gu, S., Li, Y., Gool, LV., & Timofte, R. (2019). Self-guided network Journal of Biomedical Optics, 19(1), 010901.
for fast image denoising. In: Proc. of International Conference on Ma, C., Cao, X., Tong, X., Dai, Q., & Lin, S. (2014). Acquisition of high
Computer Vision, pp. 2511–2520. spatial and spectral resolution video with a hybrid camera system.
Guo, S., Liang, Z., & Zhang, L. (2021). Joint denoising and demosaick- International Journal of Computer Vision, 110(2), 141–155.
ing with green channel prior for real-world burst images. IEEE Maggioni, M., Katkovnik, V., Egiazarian, K., & Foi, A. (2012). Non-
Trans Image Processing, 30, 6930–6942. local transform-domain filter for volumetric data denoising and
He, C., Sun, L., Huang, W., Zhang, J., Zheng, Y., & Jeon, B. (2021). reconstruction. IEEE Trans Image Processing, 22(1), 119–133.
Tslrln: Tensor subspace low-rank learning with non-local prior

123
International Journal of Computer Vision

Miao, Y. C., Zhao, X. L., Fu, X., Wang, J. L., & Zheng, Y. B. Xie, Y., Qu, Y., Tao, D., Wu, W., Yuan, Q., & Zhang, W. (2016). Hyper-
(2021). Hyperspectral denoising using unsupervised disentangled spectral image restoration via iteratively regularized weighted
spatiospectral deep priors. IEEE Trans Geoscience and Remote schatten p-norm minimization. IEEE Trans Geoscience and
Sensing, 60, 1–16. Remote Sensing, 54(8), 4642–4659.
Monno, Y., Kiku, D., Tanaka, M., & Okutomi, M. (2015). Adaptive Xiong, F., Zhou, J., Zhao, Q., Lu, J., & Qian, Y. (2021). Mac-net: Model
residual interpolation for color image demosaicking. In: Proc. of aided nonlocal neural network for hyperspectral image denoising.
International Conference on Image Processing, pp. 3861–3865. IEEE Trans Geoscience and Remote Sensing.
Morgan, E. C., Lackner, M., Vogel, R. M., & Baise, L. G. (2011). Prob- Xu, L., Ren, J., Yan, Q., Liao, R., & Jia, J. (2015). Deep edge-aware
ability distributions for offshore wind speeds. Energy Conversion filters. In: Proc. of International Conference on Machine Learning,
and Management, 52(1), 15–26. pp. 1669–1678.
Nair, V., & Hinton, GE. (2010). Rectified linear units improve restricted Yasuma, F., Mitsunaga, T., Iso, D., & Nayar, S. K. (2010). Generalized
boltzmann machines. In: Proc. of International Conference on assorted pixel camera: Postcapture control of resolution, dynamic
Machine Learning. range and spectrum. IEEE Trans Image Processing, 19(9), 2241–
Ojha, L., Wilhelm, M. B., Murchie, S. L., McEwen, A. S., Wray, J. J., 2253.
Hanley, J., et al. (2015). Spectral evidence for hydrated salts in Yuan, Q., Zhang, L., & Shen, H. (2012). Hyperspectral image denoising
recurring slope lineae on Mars. Nature Geoscience, 8(11), 829– employing a spectral-spatial adaptive total variation model. IEEE
832. Trans Geoscience and Remote Sensing, 50(10), 3660–3677.
Paszke, A., Gross, S,. Massa, F., Lerer, A., Bradbury, J., Chanan, G., Yuan, Q., Zhang, Q., Li, J., Shen, H., & Zhang, L. (2018). Hyperspectral
Killeen, T., Lin, Z., Gimelshein, N., Antiga, L. et al. (2019). image denoising employing a spatial-spectral deep residual con-
Pytorch: An imperative style, high-performance deep learning volutional neural network. IEEE Trans Geoscience and Remote
library. arXiv preprint arXiv:1912.01703 Sensing, 57(2), 1205–1218.
Peng, Y., Meng, D., Xu, Z., Gao, C., Yang, Y., & Zhang, B. (2014). Zhang, H., He, W., Zhang, L., Shen, H., & Yuan, Q. (2013). Hyperspec-
Decomposable nonlocal tensor dictionary learning for multispec- tral image restoration using low-rank matrix recovery. IEEE Trans
tral image denoising. In: Proc. of Conference on Computer Vision Geoscience and Remote Sensing, 52(8), 4729–4743.
and Pattern Recognition, pp. 2949–2956. Zhang, L., Wei, W., Zhang, Y., Shen, C., Avd, Hengel, & Shi, Q. (2018).
Plotz, T., & Roth, S. (2017). Benchmarking denoising algorithms with Cluster sparsity field: An internal hyperspectral imagery prior for
real photographs. In: Proc. of Conference on Computer Vision and reconstruction. International Journal of Computer Vision, 126(8),
Pattern Recognition, pp. 1586–1595. 797–821.
Porter, WM., & Enmark, HT. (1987). A system overview of the airborne Zhang, T., Fu, Y., & Li, C. (2021). Hyperspectral image denoising with
visible/infrared imaging spectrometer (aviris). In: Proc. of Annual realistic data. In: Proc. of International Conference on Computer
Technical Symposium, pp. 22–31. Vision, pp. 2248–2257.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolu- Zhao, B., Ulfarsson, M. O., Sveinsson, J. R., & Chanussot, J. (2022).
tional networks for biomedical image segmentation. In: Proc. Hyperspectral image denoising using spectral-spatial transform-
of International Conference on Medical image computing and based sparse and low-rank representations. IEEE Trans Geo-
computer-assisted intervention, pp. 234–241. science and Remote Sensing, 60, 1–25.
Schott, JR. (2007). Remote sensing: the image chain approach. Oxford Zheng, H., Ji, M., Wang, H., Liu, Y., & Fang, L. (2018). Crossnet: An
University Press on Demand. end-to-end reference-based super resolution network using cross-
Shi, Q., Tang, X., Yang, T., Liu, R., & Zhang, L. (2021). Hyperspectral scale warping. In: Proc. of European Conference on Computer
image denoising using a 3-d attention denoising network. IEEE Vision, pp. 88–104.
Trans Geoscience and Remote Sensing, 59(12), 10348–10363. Zhou, Y., Wu, G., Fu, Y., Li, K., & Liu, Y. (2021). Cross-mpi: Cross-
Wald, L. (2000). Quality of high resolution synthesised images: Is there scale stereo for image super-resolution using multiplane images.
a simple criterion? In: Proc. of Conference on Fusion of Earth In: Proc. of Conference on Computer Vision and Pattern Recogni-
Data, pp. 99–103. tion, pp. 14842–14851.
Wang, L., Zhang, S., & Huang, H. (2021). Adaptive dimension-
discriminative low-rank tensor recovery for computational hyper-
spectral imaging. International Journal of Computer Vision, Publisher’s Note Springer Nature remains neutral with regard to juris-
129(10), 2907–2926. dictional claims in published maps and institutional affiliations.
Wang, X., Yu, K., Dong, C., & Loy, CC. (2018). Recovering realistic
texture in image super-resolution by deep spatial feature trans- Springer Nature or its licensor (e.g. a society or other partner) holds
form. In: Proc. of Conference on Computer Vision and Pattern exclusive rights to this article under a publishing agreement with the
Recognition, pp. 606–615. author(s) or other rightsholder(s); author self-archiving of the accepted
Wang, Y., Peng, J., Zhao, Q., Leung, Y., Zhao, X. L., & Meng, D. manuscript version of this article is solely governed by the terms of such
(2017). Hyperspectral image restoration via total variation regu- publishing agreement and applicable law.
larized low-rank tensor decomposition. IEEE Journal of Selected
Topics in Applied Earth Observations and Remote Sensing, 11(4),
1227–1243.
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image
quality assessment: From error visibility to structural similarity.
IEEE Trans Image Processing, 13(4), 600–612.
Wei, K., Fu, Y., & Huang, H. (2021). 3-d quasi-recurrent neural network
for hyperspectral image denoising. IEEE Trans Neural Networks
and Learning Systems, 32(1), 363–375.
Xie, Q., Zhao, Q., Meng, D., Xu, Z., Gu, S., Zuo, W., & Zhang, L.
(2016a). Multispectral images denoising by intrinsic tensor spar-
sity regularization. In: Proc. of Conference on Computer Vision
and Pattern Recognition, pp. 1692–1700.

123

You might also like