You are on page 1of 12

Received January 20, 2021, accepted February 11, 2021, date of publication February 15, 2021, date of current

version February 26, 2021.


Digital Object Identifier 10.1109/ACCESS.2021.3059498

C-LIENet: A Multi-Context Low-Light


Image Enhancement Network
PRAVEEN RAVIRATHINAM , DIVYAM GOEL , AND J. JENNIFER RANJANI
Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani 333 031, India
Corresponding author: J. Jennifer Ranjani (j.jenniferranjani@yahoo.co.in)
This work was supported by the Birla Institute of Technology and Science, Pilani, India.

ABSTRACT Enhancement of low-light images is a challenging task due to the impact of low brightness,
low contrast, and high noise. The inability to collect natural labeled data intensifies this problem further.
Many researchers have attempted to solve this problem using learning-based approaches; however, most
models ignore the impact of noise in low-lit images. In this paper, an encoder-decoder architecture, made
up of separable convolution layers that solve the issues encountered in low-light image enhancement, is
proposed. The architecture is trained end-to-end on a custom low-light image dataset (LID), comprising both
clean and noisy images. We introduce a unique multi-context feature extraction module (MC-FEM) where
the input first passes through a feature pyramid of dilated separable convolutions for hierarchical-context
feature extraction followed by separable convolutions for feature compression. The model is optimized
using a novel three-part loss function that focuses on high-level contextual features, structural similarity,
and patch-wise local information. We conducted several ablation studies to determine the optimal model
for low-light image enhancement under noisy and noiseless conditions. We have used performance metrics
like peak-signal-to-noise ratio, structural similarity index matrix, visual information fidelity, and average
brightness to demonstrate the superiority of the proposed work against the state-of-the-art algorithms.
Qualitative results presented in this paper prove the strength and suitability of our model for real-time
applications.

INDEX TERMS Encoder-decoder architecture, separable convolution, dilated convolution, ASPP, percep-
tual loss, low-light image enhancement.

I. INTRODUCTION distribution, leading to limited enhancement ability. Spatial


Low light image enhancement is an active area of research filters and other image processing techniques have shown
that enables the acquisition system to capture superior improvement in the low-lit image quality, but these methods
quality images even under low-light conditions. Its appli- fail to work under noisy context. Noise, typically, is amplified
cations include autonomous driving, photography, mili- by the filters, as they rely on a small neighborhood. These rea-
tary, object detection, and surveillance. Low-light image sons justify the need for deep-learning in enhancing low-light
enhancement (LIE) algorithms consider several factors like images, as these methods can dynamically learn and handle
color contrast, brightness, image resolution, and dynamic noisy as well as clean images.
range. Several researchers use image processing techniques
to enhance low-light images, such as histogram equaliza- A. LIE ALGORITHMS USING RETINEX
tion (HE) [1] that tend to equalize the dynamic range of One category of LIE algorithms utilizes Retinex the-
intensities. Also, histogram equalization methods focus only ory formulated by Edward H. Land, which accounts for
on increasing the image contrast, whereas they fail to address color constancy and human perception [3], [4]. This
the actual illumination issues. As mentioned in [2], linear theory assumes that the image intensity is a product of
and non-linear low-light image enhancement methods have reflectance and illumination coefficients. Algorithms based
simple yet fast implementation but don’t consider the image on Retinex theory calculate the illumination map by remov-
ing the reflectance component and utilizing it for enhanc-
The associate editor coordinating the review of this manuscript and ing the image. Methods like single-scale and multi-scale
approving it for publication was Varuna De Silva . Retinex (MSR) aim to estimate illumination from the image

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 31053
P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

using a surround function [5], [6]. However, these methods This gap led to the evolution of low-light image enhance-
have a trade-off between dynamic range compression and ment algorithms based on deep learning fused with the
rendition factors. Adaptive weighting based MSR that com- Retinex theory. In [22], a deep Retinex-Net, comprising of
bines color constancy with local enhancement to transform decomposition and the enhancement net, is proposed. Here,
the images is suggested, in [7]. Low-light image enhance- a decomposition net decomposes the image into reflectance
ment (LIME) [8], is another Retinex based enhancement and illumination, followed by an enhancement network for
technique that estimates the illumination of each pixel by adjusting the illumination component. But this method fails
finding the strongest response among the red, green, and blue to improve the color, contrast, and brightness of the image.
channels. A structure prior is also imposed on the illumina- Recently, a CNN based progressive Retinex model, to address
tion map to refine the initial map into a well constructed, the noise and the over-illumination issues, is proposed in [23].
enhanced map. Another method [9], which supersedes LIME, Two point-wise convolutional networks are employed to
uses a low-light enhancement component followed by noise determine the regularities behind light and noise. Though
removal. The enhancement component takes care of lumi- progressive Retinex facilitates noise suppression, it fails to
nance estimation and image restoration, but it does not yield capture the structural properties in the image. An approach
superior results, as noise in the image still prevails. for combining Retinex with GANs is proposed, in [24]. Here,
regularization loss helps to prevent the local-optimal solution.
B. LIE ALGORITHMS USING DEEP LEARNING Retinex-based theory using a two-stage model for low-light
Deep learning has yielded promising results in image pro- enhancement, employing a separate stage for denoising,
cessing tasks such as denoising, de-hazing, super-resolution, is proposed in [25]. In [26], a pseudo-Retinex based method,
and other computer vision problems. One such deep learn- using a hybrid network to combine content and edge features
ing network for low-light enhancement is the LLNet, which in two paths, is proposed to learn the global context. This
enhances the images using a stacked auto-encoder with- method also uses RNNs, thus making it a complex network.
out convolutional layers [10]. LLCNN avoids the vanishing
gradient problem using a module to extract the multi-scale D. SEPARABLE AND DILATED CONVOLUTION
feature maps [11]. In, multi-branch low-light enhancement Most models mentioned above have a common characteristic
network (MBLLEN) [13], enhancements applied at multiple that they use convolutional layers. But, the computational
subnets are fused to generate the desired output. A unique complexity of complex architecture using convolutional lay-
loss function involving structural, contextual, and regional ers is quite expensive. In general, deep networks for low-light
information aided MBLLEN to produce superior results in enhancement require more layers and are highly complex. For
noise-free images. GLADNet is another CNN based model example, an architecture like attention-guided network [15]
that employs an encoder-decoder for extracting the global, outperforms simpler models like LLCNN [11]. Separable
prior information about the illumination and a CNN for detail convolution instead of the traditional convolutional layers
reconstruction [14]. helps to achieve a trade-off between computational complex-
Attention guided low-light image enhancement proposed ity and performance.
in [15] uses two attention maps for low-light enhance- As observed in [12], separable convolutions can reduce the
ment. The first map distinguishes the under-exposed from number of parameters required per layer. A simplified form of
well-exposed ones, and the second map identifies real tex- U-Net, using convolutional layers, can have up to 30 million
tures from noise. A novel four-part loss function, includ- parameters, whereas the same architecture with separable
ing attention and enhancement loss, is also introduced in convolution requires 5 million parameters only. We have also
this work, thus covering all factors required for enhancing observed that most of the existing deep architectures for
the low-light images. However, its performance degrades on low-light image enhancement are not context-sensitive. The
images with large black regions and compressed images. inclusion of dilated convolutions using atrous spatial pyra-
Apart from CNN based approaches, Generative Adversar- mid pooling (ASPP) for semantic segmentation has enabled
ial Networks (GANs) [16] have also become very popular in networks to learn context-sensitive information [27]. For an
image processing [17]–[20]. EnlightenGAN [21] is a highly input x, the output y of atrous convolution using the filter w
effective unsupervised GAN trained without low or regular is given by [27],
light image pairs. It enhances the image by employing a UNet, S,T ,U
followed by two discriminators. The performance of this y(i,j) =
X
x[i + r · s, j + r · t, u]w[s, t, u]
algorithm in terms of quantitative metrics is inferior though s,t,u
its visual perception is superior.
Here r is the dilation rate. where as the traditional convolution
C. LIE ALGORITHMS USING RETINEX AND DEEP is given by,
LEARNING S,T
X ,U
Most techniques ignore the possibility of the noise, and y(i,j) = w[s, t, u] · x(i + s, j + t, u)
in some cases, they even amplify the presence of noise. s,t,u

31054 VOLUME 9, 2021


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

FIGURE 1. Illustrative Diagram of the Context LIE-Net Architecture.

Mathematical formulation for separable convolution is as or noise, whereas the decoder utilizes the features required
follows: to enhance the image. Success of UNet [28] and UNet-like
U
X architectures [13], [14], [21] for low-light image enhance-
PointwiseConv(w, x)(i,j) = wu · x(i,j,u) ment further validated our choice. In this section, we have
u presented a detailed description of the proposed context
S,T
X LIE-Net architecture along with the implementation details.
DepthwiseConv(w, x)(i,j) = w(s, t) x(i+s,j+t) Our proposed unique three-part loss function, specific for
s,t low-light image enhancement, is discussed, with its impact
on the overall learning is highlighted through qualitative
SepConv(wp , wd , y)(i,j) analysis.
= PointwiseConv(i,j) (wp , DepthwiseConv(i,j) (wd , x)) (1)
A. CONTEXT LIE-NET ARCHITECTURE
In the proposed work, dilated, separable convolutional lay- In Fig.1, an illustrative view of the proposed context
ers, using ASPP capture the context-sensitive information. LIE-Net architecture, is presented. At the encoder, instead
Simultaneously, it reduces the computational complexity of of the conventional convolution layers, we have used a
the network. We have also proposed a novel loss function multi-context feature extraction module (MC-FEM). At the
that addresses the issues mentioned above and enhances the decoder, we have utilized depth-wise separable convolution
low-light images. The organization of the paper is as follows: to improve the generalization ability of the network. Gen-
Section II elaborates on the proposed context-LIE archi- eralization is the vital aspect of applications like low-light
tecture and the proposed three-part loss function, in detail. image enhancement that lack naturally labeled data. Speed
In section III, we have discussed the motivation behind the up is achieved by performing convolution on the channels
dataset used in our experiments. The quantitative analysis of separately and then concatenating the intermediate output,
various experiments and visual observations is presented in thereby resulting in fewer parameters than the traditional
Section IV. Section V sums up with the conclusion and further convolution layers, making the network less prone to over-
extensions. fitting. Skip connections are employed from the encoder to
their corresponding decoder blocks to facilitate the recon-
II. PROPOSED METHOD struction of features and image information otherwise lost
In this work, we have proposed an architecture based during the encoding or down-sampling stages. Convolution
on the standard encoder-decoder that is a natural choice layers in the beginning and at the end help the network
for low-light image enhancement. The encoder extracts to capture complex mapping between the image and its
enhancement related information such as brightness, contrast, features.

VOLUME 9, 2021 31055


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

1) MULTI-CONTEXT FEATURE EXTRACTION features. Subsequently, the average mean square error differ-
MODULE (MC-FEM) ence between these features determine the perceptual loss.
The proposed MC-FEM extracts multi-contextual features The proposed perceptual loss can be computed as,
followed by feature compression. We have used parallel 1
dilated depth-wise separable convolutions to extract the LPer =
MN × (2n+1 − 1)
multi-context features using the feature pyramid, inspired
n XM X N
by Atrous Spatial Pyramid Pooling (ASPP) block. Fea- X
× 2k (Qk (Ŷ (i, j)) − Qk (Y (i, j)))2 (3)
ture pyramid is constructed by concatenating the output of
k=0 i=1 j=1
all the parallel dilated depth-wise separable layers. These
parallel-layers have progressive wide-contexts owing to the Here, Y and Ŷ denotes the M × N ground truth and
increasing dilation rates. The dilation rate is varied from 1 to predicted images, respectively. n is the number of blocks
7 in increments of 2 with a fixed filter size, 3 × 3. These considered in the VGG network, and n is set to 3 to avoid over-
characteristics allow the network to learn multi-contextual fitting. Qk (:) denotes the output of the k th block of VGG16
or spatial-hierarchical features with minimal increase in the and k = 0 refers to the input. The difference in outputs
network complexity. Feature compression achieved using the of the later VGG16 blocks is weighted more by scaling the
stack of separable convolution layers takes a concatenated difference using 2k .
feature pyramid as input and discards redundant information For image enhancement applications, the ground truth
accumulated in the feature pyramid. The combined effect of serves as a guiding factor for the network to learn the
these two operations enables MC-FEM to extract richer and enhanced image as there is no defined, exact output. Percep-
more relevant features from a spatially-wide neighborhood. tual loss enhances the hierarchical features to minimize the
Specifically, this module is useful in noisy context due difference between the predicted image and the ground truth.
to its ability to capture a wide-spatial neighborhood, using Perceptual loss has also shown benefits in applications apart
the stacked dilated layers that have increasing dilation rates, from low light enhancement like image inpainting [44].
thus capturing a larger area. Outputs from these modules
are sent as skip connections to later parts of the model to 2) STRUCTURAL LOSS
enhance reconstruction. Particularly under noisy context, the From [30], we observed that illumination does not affect
skip connections prove to be very useful as it combines the the structural information in a scene. The structural simi-
neighborhood from encoder module output with the corre- larity index matrix (SSIM) compares the similarity between
sponding decoder block. two images in terms of luminance, contrast, and structure.
The structural information remains intact independent of the
B. LOSS FUNCTION changes in the luminance component or contrast. The SSIM
In this section, a three-part loss function proposed to include loss is thus included as a part of the proposed loss function
vital factors of low-light image enhancement, such as noise as the human visual system is more susceptible to capture the
removal, feature, and structure preservation is presented. Our structural information from images.
novel loss function, can be represented as: The SSIM loss component is given by
LTotal = wper Lper + wssim Lssim + wwpw Lwpw (2) (2µy µŷ + C1 ) + (2σyŷ + C2 )
Lssim = 1 − (4)
(µ2y + µ2ŷ + C1 )(σy2 + σŷ2 + C2 )
In Eq. (2), Lper , Lssim and Lwpw corresponds to percep-
tual loss, loss based on structural similarity and weighted where µy , µŷ denote pixel mean, σy , σŷ denote the variance,
patch-wise Euclidean loss, and wper , wssim , and wwpw corre- σyŷ refers to the covariance between the ground truth, Y and
spond to their respective weights. the predicted image, Ŷ respectively. Here, C1 and C2 are
constants to prevent divide by zero error.
1) PERCEPTUAL LOSS
Per-pixel based loss functions consider two similar images 3) WEIGHTED PATCH-WISE EUCLIDEAN LOSS
as different even when they differ by one-pixel intensity. Euclidean loss [23] and L1−norm [14] are the most com-
As a result, per-pixel based loss functions like Euclidean fail monly used loss functions in low-light image enhancement.
to capture the difference in high-level features between the MBLLEN [13] improved the Euclidean based loss function
ground-truth and the predicted image effectively [29]. The by providing more weightage to the lower intensity levels by
difference in high-level features, extracted using pre-trained utilizing the 25th percentile of the histogram as a threshold.
CNNs, could be minimized using a perceptual loss function. However, MBLLEN fails to capture the contextual informa-
It leads us to the first component in Eq. 2, which extracts tion in the image.
the high-level features between the enhanced image and the We have proposed a weighted patch-wise (WPW),
ground truth from the convolution layers of the VGG16 Euclidean loss function as a replacement to the traditional
network pre-trained on the ImageNet dataset. It enables the Euclidean loss. In this method, Euclidean loss is computed
network to extract a mixture of low, mid, and high-level by segmenting the predicted image into patches and assigning

31056 VOLUME 9, 2021


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

TABLE 1. Optimal Parameter Setting Selection. III. DATASET


Collecting natural low-light scenes along with their corre-
sponding normal/bright image pair is a challenging task. Sev-
eral authors collected a set of raw images to generate a syn-
thetic dataset [13]–[15], [22]. However, most of these datasets
consider images taken in a carefully monitored environment,
and therefore they do not ideally represent the natural scene.
Moreover, it is challenging to generate a synthetic dataset
as there is no consensus over the low-light image genera-
tion method. Some works have used software like Adobe
larger weights to those patches with lower average intensities. Photoshop Lightroom [14], while others have devised more
Patches with a lower mean average in the predicted image elaborate image selection and transformation techniques [15].
signify that they are low lit even after enhancement. Thus, We utilized the MIT outdoor and indoor scenes dataset [31]
the low-light image enhancer is accurate as these patches to generate synthetic, low-light images. Gamma correction,
are closer to the ground truth. We have sorted the mean used in [13], is incorporated to simulate low illumination
average intensity of the patches in increasing order and took effects. We have used 5000 images from the MIT dataset
the nth percentile to determine the threshold, T . The weighted to generate a low-light image dataset (LID). LID comprises
patch-wise loss function is normalized to avoid its dominance both clean and noisy images to simulate real-world scenarios.
in the total loss function in Eq. (2). Using Matlab, Poisson-Gaussian Noise [15], [32] is added to
The weighted patch-wise loss function is given by: simulate noise in real-time. Initially, random Gaussian noise,
whose variance varied between 0 and 0.005, is added to the
1 X
Lwpw = (w ||P̂(i, j) − P(i, j)||22 low-light image, followed by Poisson noise.
λMN We have used 4500 and 500 image pairs for training and
∀µp̂ ≤T
X testing, respectively. Fig. 2 portrays few sample pairs from
+ ||P̂(i, j) − P(i, j)||22 ) (5) LID. The top row consists of synthetic, low lit images, and
∀µp̂ >T
the row beneath consists of their corresponding ground truth.
Here, the weight w, is used to assign more weightage to the Two images from the right are noisy, and two images from
patches with lower average intensities. Patches, in the ground the left are noiseless.
truth and predicted image, are represented as P(:) and P̂(:)
respectively. µp̂ denotes the average intensity of the patch, IV. EXPERIMENTAL RESULTS
and T denotes the average threshold intensity. This section summarizes the experiments conducted to verify
The normalization factor λ is given by, the quantitative and qualitative performance of synthetic and
real low-light images, respectively. We conducted separate
λ = px py [α(w − 1) + 1] ∗ MN (6) experiments to evaluate the performance of the proposed
Context LIE-Net on clean as well as noisy images from LID.
where px and py denote the number of patches along the rows As mentioned before, we have used 500 image pairs from
and columns, respectively. α denotes the percentage of total the LID dataset for testing and comparative analysis. The
patches considered for adding weights. performance is evaluated using commonly used metrics like
Peak Signal to Noise Ratio (PSNR), Structural Similarity
C. IMPLEMENTATION DETAILS Index (SSIM) [30], Visual Information Fidelity (VIF) [34]
The proposed encoder design has two convolutional layers and Average Brightness (AB) [33]. The model is trained
and four MC-FEM blocks. The input to the MC-FEM blocks with 4500 image pairs using a batch size, 8, and Adam
is down-sampled using 2×2 max-pooling layer. The convolu- optimizer with a learning-rate of 1e − 4. The proposed model
tional layers and the first MC-FEM block uses 64 filters each. converges in 50 epochs under these specifications.
The no. of filters in the subsequent blocks is doubled, making The experimental analysis is subdivided into four sections:
the fourth MC-FEM block with 512 filters. The decoder ablation on architecture that demonstrates its efficiency under
network has separable convolution blocks interleaved with noisy and noiseless conditions, ablation on the loss function
up-sampling layers. The no. of filters is halved after each that validates the three-part loss, a quantitative and qualitative
upsampling layer, making the last separable layer and the two comparison using synthetic LID, and qualitative comparison
subsequent convolutional layers with 64 filters. For an image using natural low-light images.
of size 256 × 256, various loss parameters are set as follows:
α = 0.25, px = py = 16, w = 4,n = 3, wper = 1, wssim = 1, A. ABLATION STUDY ON ARCHITECTURE
and wwpw = 0.1. Table 1 describes the different experimental The use of dilated convolutions in a pyramid structure along-
parameters along with its minimum and maximum values, side separable convolutions is the first of its kind for low-light
as well as the step size, used to decide the optimal value for image enhancement. We conducted experiments on architec-
each. ture ablation to verify the impact of dilated and separable

VOLUME 9, 2021 31057


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

FIGURE 2. Clean and Noisy Images from LID along with their Ground Truth. First row: Synthesized Low-light Images, Second row: Ground Truth.

TABLE 2. Comparison of Performance Metrics for Architecture Ablation. TABLE 3. Comparison of Performance Metrics for Loss Ablation.

differences are discernible on both clean (rows 1 and 2) as


convolution layers. As seen in Table 2, the significance of sep- well as noisy (rows 3 and 4) images.
arable convolution on U-Net and the proposed architecture, Inclusion of structural loss function without per-pixel or
named C-LIENet, is depicted clearly. All these architectures perceptual loss will be ideal only for edge detection or seg-
are trained, for the same number of epochs, using the widely mentation applications, hence ruled out from the ablation
used loss function, the mean squared error [23]. study. Including per-pixel loss without the other compo-
From Table 2, we can see that the use of separable convolu- nents may suit only noise removal application. Similarly,
tions in U-Net improves the performance in terms of PSNR, perceptual is only for feature detection, and hence, we have
SSIM, and AB, but VIF decreases slightly both under noisy excluded these variants from the experimental analysis.
as well as noise-less conditions. We can verify that the pro- First and the last column in Fig. 3 denotes the input and
posed C-LIENet with MCFEM blocks and skip connections the ground truth, respectively, whereas the fourth column
outperforms all the other variants. portrays the results of the C-LIENet with the proposed
three-part loss function. The second and third columns
B. ABLATION STUDY ON LOSS FUNCTION display outcomes from a combination of different loss
This section is to establish the significance of the three-part functions.
loss incorporated in C-LIENet. We trained the C-LIENet with For each row, in Fig. 3, a reference region is marked, which
a combination of perceptual, SSIM, and weighted patch-wise shows the efficacy of the three-part loss function. The texture
Euclidean (WPW Euclidean) losses. We can observe from of cloud and sky in the first image or the bush and area around
Table 3 that the quantitative evaluations of these combina- the license plate in the second or light post and color contrast
tions are very similar, however from Fig. 3, notable visual in the third noisy image or structural information of the

31058 VOLUME 9, 2021


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

FIGURE 3. Ablation Study on Loss Function using Visual Evaluation.

TABLE 4. Quantitative Comparison on Clean and Noisy Images.

building, color and smoothness of the sky in the fourth image C. COMPARISON ON CLEAN AND NOISY LID IMAGES
are few notable examples to demonstrate the advantages of In this section, we have compared the qualitative and quan-
the three-part loss function. titative performance of C-LIENet against other traditional

VOLUME 9, 2021 31059


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

in conjunction with methods based on deep-learning. Meth- are trained on LID to the same extent as C-LIENet for a
ods based on deep learning such as Exposure [42], Deep fair comparison. We trained these models for 50 epochs,
Retinex [22], Distort [43], MBLLEN [13] and GLADNet [14] with default settings, as mentioned by the respective authors.

FIGURE 4. Qualitative Analysis on Clean Images.


31060 VOLUME 9, 2021
P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

FIGURE 5. Qualitative Analysis on Noisy Images.

VOLUME 9, 2021 31061


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

FIGURE 6. Qualitative Comparison on Real-time Images.

Official code repositories, mentioned in the literature, are D. COMPARISON ON NATURAL IMAGES
used for the experimental analysis. Lastly, we verified the performance of the algorithm on real
In Table 4, blue, green, and red colors denote the low-light images. Regions marked, in Fig. 6 confirms the
top three algorithms, respectively. For the evaluation met- superiority of the proposed approach. These results ensure
rics, the proposed C-LIENet ranks among the top 2 both the suitability of C-LIENet for real-time applications like
on noisy and clean images. Contextual learning offered autonomous navigation.
by MC-FEM blocks attributes towards the prominence of
C-LIENet on noisy images. The high VIF of LIME [8] is V. CONCLUSION
due to the efficiency of the denoising algorithm on noisy In this paper, we have proposed a novel architecture called
images. Context-LIENet using dilated and separable convolution.
Visual comparison of the proposed approach, on noise-free Then, we have developed an innovative three-part loss func-
and noisy images, is shown in Fig. 4 and Fig. 5, respectively. tion exploring the contextual and structural information.
The visual quality of the proposed C-LIENet is superior C-LIENet consists of a unique multi-context feature extrac-
compared to the other benchmarking algorithms, especially tion module in the encoder, depth-wise separable convolu-
in noisy contexts. We observe that the methods like LIME tions in the decoder, and skip connections from the encoder
[8] and Dong et al. [36] still yield dark images even upon to the decoder. The multi-context feature extraction mod-
enhancement. Several deep learning methods show compa- ule enables the network to learn complex features such
rable results, but they fail to retain the sharpness and over- as brightness, contrast, and noise from a wider-context.
all hue of the image, specifically in the noisy context. The Quantitative and qualitative experimental analysis on sim-
usefulness of the proposed algorithm is due to the increased ulated and natural low-light images prove the ability of
spatial coverage provided by the MCFEM module. Thus, C-LIENet to outperform the benchmarking algorithms. The
enabling improved learning under noisy conditions around proposed three-part loss function using perceptual, struc-
the neighborhood. tural, and weighted patch-wise loss components yields an

31062 VOLUME 9, 2021


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

enhanced image with the improved objective quality com- [19] R. Li, J. Pan, Z. Li, and J. Tang, ‘‘Single image dehazing via conditional
pared to the standard loss functions. An extensive study on generative adversarial network,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit., Jun. 2018, pp. 8202–8211.
component-wise hyper-parameter tuning needs focus. Using [20] Y.-S. Chen, Y.-C. Wang, M.-H. Kao, and Y.-Y. Chuang, ‘‘Deep photo
a unified model for both noisy and clean image enhancement enhancer: Unpaired learning for image enhancement from photographs
with GANs,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
might lack generalisability across domains. We have left
Jun. 2018, pp. 6306–6314.
these limitations open to the researchers working on low-light [21] Y. Jiang, X. Gong, D. Liu, Y. Cheng, C. Fang, X. Shen, J. Yang,
image enhancement. Several ideas presented in this work can P. Zhou, and Z. Wang, ‘‘EnlightenGAN: Deep light enhancement with-
out paired supervision,’’ 2019, arXiv:1906.06972. [Online]. Available:
be further extended to other image enhancement applications http://arxiv.org/abs/1906.06972
like image denoising, demosaicing, deraining, etc. [22] C. Wei, W. Wang, W. Yang, and J. Liu, ‘‘Deep Retinex decomposi-
tion for low-light enhancement,’’ in Proc. Brit. Mach. Vis. Conf., 2018,
pp. 127–136.
VI. ACKNOWLEDGMENT
[23] Y. Wang, Y. Cao, Z.-J. Zha, J. Zhang, Z. Xiong, W. Zhang, and F. Wu,
(Praveen Ravirathinam and Divyam Goel contributed ‘‘Progressive Retinex: Mutually reinforced illumination-noise perception
equally to this work.) network for low-light image enhancement,’’ in Proc. 27th ACM Int. Conf.
Multimedia, Oct. 2019, pp. 2015–2023.
[24] Y. Shi, X. Wu, and M. Zhu, ‘‘Low-light image enhancement
REFERENCES algorithm based on Retinex and generative adversarial network,’’
[1] M. Abdullah-Al-Wadud, M. H. Kabir, M. A. A. Dewan, and O. Chae, 2019, arXiv:1906.06027. [Online]. Available: http://arxiv.org/abs/
‘‘A dynamic histogram equalization for image contrast enhancement,’’ 1906.06027
IEEE Trans. Consum. Electron., vol. 53, no. 2, pp. 593–600, May 2007. [25] Y. Guo, Y. Lu, R. W. Liu, M. Yang, and K. T. Chui, ‘‘Low-light image
[2] W. Wang, X. Wu, X. Yuan, and Z. Gao, ‘‘An experiment-based enhancement with regularized illumination optimization and deep noise
review of low-light image enhancement methods,’’ IEEE Access, vol. 8, suppression,’’ IEEE Access, vol. 8, pp. 145297–145315, 2020.
pp. 87884–87917, 2020. [26] W. Ren, S. Liu, L. Ma, Q. Xu, X. Xu, X. Cao, J. Du, and M.-H. Yang,
[3] E. H. Land, ‘‘The Retinex,’’ Amer. Sci., vol. 52, no. 2, pp. 247–264, 1964. ‘‘Low-light image enhancement via a deep hybrid network,’’ IEEE Trans.
[4] E. H. Land, ‘‘The Retinex theory of color vision,’’ Sci. Amer., vol. 237, Image Process., vol. 28, no. 9, pp. 4364–4375, Sep. 2019.
no. 6, pp. 108–128, Dec. 1977. [27] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
[5] D. J. Jobson, Z. Rahman, and G. A. Woodell, ‘‘Properties and performance ‘‘DeepLab: Semantic image segmentation with deep convolutional nets,
of a center/surround Retinex,’’ IEEE Trans. Image Process., vol. 6, no. 3, atrous convolution, and fully connected CRFs,’’ IEEE Trans. Pattern Anal.
pp. 451–462, Mar. 1997. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018.
[6] D. J. Jobson, Z. Rahman, and G. A. Woodell, ‘‘A multiscale Retinex [28] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networks
for bridging the gap between color images and the human observation for semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
of scenes,’’ IEEE Trans. Image Process., vol. 6, no. 7, pp. 965–976, Recognit. (CVPR), Jun. 2015, pp. 3431–3440.
Jul. 1997. [29] J. Johnson, A. Alahi, and L. F.-Fei., ‘‘Perceptual losses for real-time style
[7] C.-H. Lee, J.-L. Shih, C.-C. Lien, and C.-C. Han, ‘‘Adaptive multiscale transfer and super-resolution,’’ in Proc. Eur. Conf. Comput. Vis., 2016,
Retinex for image contrast enhancement,’’ in Proc. Int. Conf. Signal-Image pp. 694–711.
Technol. Internet-Based Syst., Dec. 2013, pp. 43–50. [30] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, ‘‘Image quality
[8] X. Guo, Y. Li, and H. Ling, ‘‘LIME: Low-light image enhancement via
assessment: From error visibility to structural similarity,’’ IEEE Trans.
illumination map estimation,’’ IEEE Trans. Image Process., vol. 26, no. 2,
Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
pp. 982–993, Feb. 2017.
[9] M. Yang, X. Nie, and R. W. Liu, ‘‘Coarse-to-fine luminance estimation [31] A. Quattoni and A. Torralba, ‘‘Recognizing indoor scenes,’’ in Proc. IEEE
for low-light image enhancement in maritime video surveillance,’’ in Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 413–420.
Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Auckland, New Zealand, [32] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, ‘‘Toward convolutional
Oct. 2019, pp. 299–304. blind denoising of real photographs,’’ in Proc. IEEE/CVF Conf. Comput.
[10] K. G. Lore, A. Akintayo, and S. Sarkar, ‘‘LLNet: A deep autoencoder Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 1712–1722.
approach to natural low-light image enhancement,’’ Pattern Recognit., [33] Z. Chen, B. R. Abidi, D. L. Page, and M. A. Abidi, ‘‘Gray-level grouping
vol. 61, pp. 650–662, Jan. 2017. (GLG): An automatic method for optimized image contrast enhancement-
[11] L. Tao, C. Zhu, G. Xiang, Y. Li, H. Jia, and X. Xie, ‘‘LLCNN: A convo- part I: The basic method,’’ IEEE Trans. Image Process., vol. 15, no. 8,
lutional neural network for low-light image enhancement,’’ in Proc. IEEE pp. 2290–2302, Aug. 2006.
Vis. Commun. Image Process. (VCIP), Dec. 2017, pp. 1–4. [34] H. R. Sheikh and A. C. Bovik, ‘‘Image information and visual quality,’’
[12] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo- IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.
lutions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), [35] X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, ‘‘A weighted vari-
Jul. 2017, pp. 1800–1807. ational model for simultaneous reflectance and illumination estimation,’’
[13] F. Lv, F. Lu, J. Wu, and C. Lim, ‘‘MBLLEN: Low-light image/video in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016,
enhancement using CNNs,’’ in Proc. Brit. Mach. Vis. Conf., 2018, pp. 2782–2790.
pp. 1–13. [36] X. Dong, G. Wang, Y. Pang, W. Li, J. Wen, W. Meng, and Y. Lu, ‘‘Fast
[14] W. Wang, C. Wei, W. Yang, and J. Liu, ‘‘GLADNet: Low-light enhance- efficient algorithm for enhancement of low-lighting video,’’ in Proc. IEEE
ment network with global awareness,’’ in Proc. 13th IEEE Int. Conf. Int. Conf. Multimedia Expo, Jul. 2011, pp. 1–6.
Autom. Face Gesture Recognit. (FG), May 2018, pp. 751–755. [37] S. Wang, J. Zheng, H.-M. Hu, and B. Li, ‘‘Naturalness preserved enhance-
[15] F. Lv, Y. Li, and F. Lu, ‘‘Attention-guided low-light image enhancement,’’ ment algorithm for non-uniform illumination images,’’ IEEE Trans. Image
2019, arXiv:1908.00682. [Online]. Available: http://arxiv.org/abs/1908. Process., vol. 22, no. 9, pp. 3538–3548, Sep. 2013.
00682 [38] X. Fu, D. Zeng, Y. Huang, Y. Liao, X. Ding, and J. Paisley, ‘‘A fusion-
[16] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
based enhancing method for weakly illuminated images,’’ Signal Process.,
S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial net-
vol. 129, pp. 82–96, Dec. 2016.
works,’’ 2014, arXiv:1406.2661, 2014. [Online]. Available: http://arxiv.
[39] Z. Ying, G. Li, and W. Gao, ‘‘A bio-inspired multi-exposure fusion
org/abs/1406.2661
framework for low-light image enhancement,’’ 2017, arXiv:1711.00591.
[17] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta,
[Online]. Available: http://arxiv.org/abs/1711.00591
A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, ‘‘Photo-realistic
[40] S. Parthasarathy and P. Sankaran, ‘‘An automated multi scale Retinex with
single image super-resolution using a generative adversarial network,’’
color restoration for image enhancement,’’ in Proc. Nat. Conf. Commun.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
(NCC), Feb. 2012, pp. 1–5.
pp. 4681–4690.
[18] H. Zhang, V. Sindagi, and V. M. Patel, ‘‘Image de-raining using a condi- [41] A. B. Petro, C. Sbert, and J.-M. Morel, ‘‘Multi-scale Retinex,’’
tional generative adversarial network,’’ IEEE Trans. Circuits Syst. Video Image Process. Line, vol. 4, pp. 71–88, 2014. [Online]. Available:
Technol., vol. 30, no. 11, pp. 3943–3956, Nov. 2020. http://www.ipol.im/pub/art/2014/107/

VOLUME 9, 2021 31063


P. Ravirathinam et al.: C-LIENet: Multi-Context Low-Light Image Enhancement Network

[42] Y. Hu, H. He, C. Xu, B. Wang, and S. Lin, ‘‘Exposure: A whitebox photo J. JENNIFER RANJANI received the B.E. degree
post-processing framework,’’ ACM Trans. Graph., vol. 37, no. 2, p. 26, in electronics and communication engineering,
2018. from the Noorul Islam College of Engineering,
[43] J. Park, J.-Y. Lee, D. Yoo, and I. S. Kweon, ‘‘Distort- and-recover: Color Nagercoil, India, the M.Tech. degree in computer
enhancement using deep reinforcement learning,’’ in Proc. IEEE/CVF and information technology from Manonmaniam
Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5928–5936. Sundaranar University, Tirunelveli, India, and the
[44] N. Wang, S. Ma, J. Li, Y. Zhang, and L. Zhang, ‘‘Multistage attention Ph.D. degree in information and communica-
network for image inpainting,’’ Pattern Recognit., vol. 106, Oct. 2020,
tion engineering from Anna University, Chennai,
Art. no. 107448.
India, in 2002, 2005, and 2011, respectively. From
July 2005 to August 2012, she was with the
PRAVEEN RAVIRATHINAM is currently pursu- Department of Information Technology, Thiagarajar College of Engineer-
ing the bachelor’s degree in computer science ing, Madurai. Later, she joined the Vivekanandha College of Engineering
with the Birla Institute of Science and Technol- for Women, Tiruchengode, as an Associate Professor. From June 2014 to
ogy, Pilani, Rajasthan, India. His research inter- November 2017, she was with the School of Computing, SASTRA Uni-
ests include deep learning, image processing, and versity, Thanjavur, India. She is currently working with the Department of
remote sensing in particular applications of deep Computer Science and Information Systems, Birla Institute of Technology
learning in remote sensing. and Science, Pilani, Rajasthan, India.
Her research interests include statistical image processing, multiresolution
signal analysis, data hiding and embedded systems. She is a reviewer of
journals, such as IEEE TRANSACTION ON GEOSCIENCE and REMOTE SENSING,
IEEE GEOSCIENCE and REMOTE SENSING LETTERS, IET Image Processing, IET
Radar, Sonar and Navigation, and Multimedia Tools and Application..

DIVYAM GOEL is currently pursuing the degree


in computer science, with a minor in data science,
with the Birla Institute of Science and Technology,
Pilani. His research interest includes computer
vision with a focus on autonomous vehicles.

31064 VOLUME 9, 2021

You might also like