You are on page 1of 5

2023 6th International Conference on Computer Network, Electronic and Automation (ICCNEA)

2023 6th International Conference on Computer Network, Electronic and Automation (ICCNEA) | 979-8-3503-0538-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCNEA60107.2023.00014

A New Approach for Iris Segmentation Based on


U-Net
Yan Yang
Jianguo Wang*
School of Computer Science and Engineering
Research Institute of Artificial Intelligence and Data
Xi'an Technological University
Science
Xi’an, China
Xi'an Technological University
e-mail: yangyan_0919@163.com
Xi’an, China
e-mail: wjg_xit@126.com
*Corresponding author

Abstract—The concept of iris segmentation was created to segmentation plays a vital role in iris recognition by
increase the accuracy of iris recognition. Past recognition identifying the iris region's pixel points in an image.
methods used the entire eye image directly for recognition Accurate segmentation enables the extraction of
classification, which led to poor recognition results. For the discriminative features and improves recognition accuracy.
sake of sovling this problem, this paper proposes the ResU-Net However, practical scenarios often involve imperfect image
(RU-Net) model, which can guide the network to learn more acquisition, presenting challenges like occlusion and
features that distinguish between iris and non-iris pixels. First, illumination variations, making iris segmentation
based on the U-Net, the backbone network model is changed to challenging.
ResNet50 in this paper. This has the advantage of reducing the
number of parameters and network complexity, and For the sake of solving this problem, this paper proposes
improving the learning capability. For the sake of solving the an accurate and robust iris segmentation model called
problem of sample imbalance between iris region and RU-Net (Res U-Net). The model is constructed based on the
background region, this paper introduces the Focal Loss loss original UNet model, and the performance and segmentation
function. focal loss can effectively deal with the case of sample accuracy of the model are improved by replacing the
category imbalance and make the network focus more on the backbone network of UNet with ResNet50. In practical
pixels that are difficult to classify. In this paper, the proposed segmentation scenarios, there are often problems such as too
RU-Net model is experimentally evaluated on the many background features, so the focal-loss loss function is
CASIA-Iris-Thousand dataset. The experimental results
introduced in this paper to solve the problem of positive and
demonstrate that RU-Net achieves significant improvements
on NIR iris images, reaching 96.22% MIoU and 98.19% MPA.
negative sample imbalance.
This indicates that the RU-Net method outperforms other II. RELATED WORK
representative iris segmentation methods and has better
segmentation capability. The concept of iris segmentation was created to increase
the accuracy of iris recognition. The principle of using
Keywords—Iris Segmentation, Focal-loss, ResNet, U-Net calculus method to localize the inner and outer circle
boundary of the iris was proposed by Daugman [4]. The
I. INTRODUCTION approach is based on integrating the circumference of the
Traditional authentication techniques have multiple circle when increasing the radius along the radial direction
objective security concerns, such as token authentication and and differentiating the integral value against the radius and
password authentication, and these methods can easily lead using the location where the maximum value is obtained as
to the loss of individuals or industries [1]. To address these the boundary. Subsequently the Hough transform proposed
issues, biometric technologies such as fingerprint, face, voice, by wildes [5] and the least squares method proposed by
and iris use the user's physical characteristics as Yunhong Wang et al.. [6] Are boundary localization methods
authentication information. Compared to the data based on binarized edge points. Hough transform can be
information used for traditional authentication, biometric used to detect targets with known shapes, and the inner and
features have the advantages of being less forgettable, less outer radii of the circle and the center of the inner and outer
reproducible, and readily available, making them the most boundaries are detected using the hough transform of the
convenient and secure solution. Iris-based technology has circle in the image, which first generates boundary points on
significant advantages in iris recognition. It has a high degree the human eye image and votes on the boundary points, thus
of individual variability and stability and is able to achieving iris boundary localization. The least squares
distinguish different individuals with high accuracy [2]. The method localizes based on the binarized edge image, and this
iris is a circular region located between the pupil and the algorithm is more accurate for localization only when the
sclera with rich textural features. The appearance of the iris image quality is ideal. In the context of the research on the
is genetically determined, is formed during infancy, and does two main classes of classical iris localization methods,
not change. The iris, being an internal organ, is subsequent researchers have carried out continuous
well-protected and possesses unique structures even among enrichment and development. And with the development of
genetically identical individuals. This makes it a highly deep learning, related techniques are gradually applied in the
accurate and effective biometric technique [3]. Iris field of iris localization. Arsalan et al. proposed a two-stage

2770-7695/23/$31.00 ©2023 IEEE 20


DOI 10.1109/ICCNEA60107.2023.00014
Authorized licensed use limited to: Universitas Airlangga. Downloaded on January 08,2024 at 16:43:30 UTC from IEEE Xplore. Restrictions apply.
convolutional neural network (CNN)-based iris localization while the inverse convolutional part obtains semantic
scheme that can accurately perform iris segmentation in the segmentation maps of original size by upsampling operations.
complex and noisy environment of iris recognition [7]. Wang As shown in Fig.1, it is structured as an end-to-end
proposed an efficient iris segmentation method called architecture for efficient image segmentation tasks.
IrisParseNet based on deep learning techniques, which
actively models the iris mask and parameterized inner and
outer iris boundaries, and incorporates a well-designed
attention module to improve localization performance [8].
These research works take full advantage of deep learning to
provide higher accuracy and efficiency for iris localization
and segmentation.
Traditional segmentation methods require high eye image
quality and imaging environment, and thus have lower
accuracy and higher error rate in non-ideal situations. Deep
learning-based methods have better performance compared
to traditional methods and can further improve the accuracy Fig. 1. FCN network structure
of iris segmentation, but there is still room for improvement.
The U-Net-based iris segmentation model proposed in this U-Net is a traditional semantic segmentation network,
paper has better segmentation performance based on the which consists of three parts, the first part is the backbone
original model. feature extraction part, the second part is the enhanced
feature extraction part, and the third part is the prediction
III. PROPOSED METHOD part.
The model proposed in this paper is an improvement on
The network process is approximately as follows: the
the original U-Net [9] model. The U-Net network was
backbone network obtains five initial effective feature layers
originally designed to solve the semantic segmentation
and passes the feature layers into the enhanced feature
problem in the medical field, and the iris in the human eye is
extraction network in turn, up sampling is performed in the
an important part of medical research. Therefore, the U-Net
enhanced network in turn, and the sampled results are fused
network is highly adaptable for the iris segmentation task.
with the incoming feature layers in turn, and finally input
However, there is still room to improve the segmentation
into the prediction section.
performance of the U-Net network in practical segmentation
tasks. To further enhance the segmentation capability of the In summary, U-Net's symmetric encoder-decoder
model, the backbone network of the original U-Net model is structure, with the systolic and extended paths, enables
replaced with ResNet50 [10] in this paper.ResNet50 is a effective feature extraction and integration for image
deep residual network structure with more parameters and segmentation. Finally, the prediction part uses a 1 × 1
higher accuracy. Compared with the original U-Net model, convolutional layer to map the feature maps to the desired
ResNet50 is able to obtain better segmentation performance number of classes for pixel-level prediction.
in the iris segmentation task. It is worth mentioning that the
backbone network of the original U-Net model is similar to The structure of U-Net is widely used in fields such as
that of VGG-16 [11]. This replacement of the backbone biomedical image segmentation with good results. It is able
network was done to increase the depth and complexity of to capture detailed information while maintaining contextual
the model to extract more rich features to support accurate information, and has good performance for tasks such as iris
iris segmentation tasks.ResNet50 introduces the concept of segmentation. The U-Net network structure diagram is
residual learning compared to VGG-16,ResNet-50 is deeper shown in Fig.2.
than VGG-16, but due to the introduction of residual blocks,
it actually has a smaller number of parameters, and
ResNet-50 reduces information loss by jumping connections,
thus reducing the transformation parameters to be learned
within each residual block, which makes ResNet-50 easier to
optimize and train relative to VGG-16. In summary,
ResNet50 has fewer parameters and higher accuracy, so it
should have better segmentation performance after replacing
the backbone network of U-Net. During the experimental
process, the background of iris accounts for a large
proportion of the background, and there is a positive and
negative sample imbalance problem, for which the
Focal-Loss [12] loss function is also introduced in this paper
to solve this problem.
A. U-Net Network Model
U-Net is a new segmentation architecture based on FCN
(Full Convolutional Network) [13]. FCN consists of two
Fig. 2. U-net network structure
main parts: a full convolutional part and an inverse
convolutional part. The full convolutional part uses classical
B. ResNet50 Network Structure
CNN networks (e.g., VGG, ResNet, etc.) to extract features,
ResNet-50 [10] consists of multiple residual blocks,

21

Authorized licensed use limited to: Universitas Airlangga. Downloaded on January 08,2024 at 16:43:30 UTC from IEEE Xplore. Restrictions apply.
each of which contains three convolutional layers inside.  few categories.
The residual module of resnet-50 is shown in Fig.3.
Focal Loss [12] solves this problem by introducing a
decay factor that makes the model focus more on the
hard-to-classify samples. The core idea is to balance the
category imbalance and hard-to-classify sample imbalance
by reducing the weight of easy-to-classify samples and
increasing the weight of hard-to-classify samples, as in
Equation (1).

 FL( pt ) Dt (1  Pt )J log ( Pt )  

Fig. 3. ResNet-50 residual block Where ‫݌‬௧ is the predicted probability that ߙ௧ is a
moderator used to balance the importance of different
The input of each residual block is summed to the output categories, andߛ is a tunable parameter.
within the residual block via a jump connection and then
processed by a nonlinear activation function. This design In this expression is an attenuation factor that increases
allows the network to learn the residuals directly, allowing the weight of a sample when it is misclassified, making the
the network to skip nonlinear transformations in some layers model focus more on difficult samples. The effect of the
and thus better deliver and capture features.ResNet-50 is attenuation factor is to reduce the influence of easily
relatively deep, containing a total of 50 convolutional layers. classified samples, thus improving the learning ability of the
It uses convolutional layers as the basic building blocks of model for difficult samples.
the network and uses different blocks of residuals in the
Adjustment factorsߙ௧ can be set based on category
network. In this case, the first four residual blocks are down
frequency or other task-related factors and are used to
sampled at different layer resolutions to increase the
balance the importance of different categories. Typically,
perceptual field of the network, while the later residual
samples with fewer categories are given higher weights to
blocks are kept at the same resolution. Finally, the final
emphasize the learning of the model for a few categories.
category predictions are output after global average pooling
and fully connected layers. Although ResNet-50 is a By introducing Focal Loss, the model can better handle
relatively deep model, it actually has a relatively small the problems of category imbalance and hard and easy
number of parameters due to the introduction of residual samples imbalance, improve the learning ability for few
blocks and jump connections. This makes ResNet-50 easier categories and difficult samples, and thus improve the
to optimize and train compared to other deep performance of the model in image segmentation tasks.
networks.ResNet-50 has achieved remarkable results in
image classification tasks and has become one of the IV. EXPERIMENT
benchmark models for many computer vision tasks. It has A. Data Set
higher accuracy, better feature representation capability and
stronger generalization ability. The ResNet-50 network In this paper, the CASIA database [14] is chosen, which
structure is shown in Table I. is a free iris database provided by CASIA Automation. The
CASIA iris library from the Institute of Automation of the
TABLE I. RESNET-50 NETWORK STRUCTURE Chinese Academy of Sciences has been publicly available in
four versions, and the more commonly used one is
Layer Name Output Size ResNet50 CASIA-V4, which includes six sub-repositories, of which
CASIA-Iris-Thousand dataset is selected in this paper,
Conv1 ͳͳʹ ൈ ͳͳʹ ͹ ൈ ͹ǡ͸Ͷǡ •–”‹†‡ʹ
including 1000 categories of human eye data and a total of
͵ ൈ ͵ ƒš  ’‘‘Žǡ •–”‹†‡ʹ 20,000 iris images, which is a rich amount of data to meet
ͳ ൈ ͳǡ͸Ͷ the experimental requirements. The dataset contains iris
Conv2_x ͷ͸ ൈ ͷ͸ ൥ ͵ ൈ ͵ǡ͸Ͷ ൩ ൈ ͵ images under different scenarios such as eyelash occlusion,
ͳ ൈ ͳǡʹͷ͸
ͳ ൈ ͳǡͳʹͺ indoor lighting, and other scenarios. In this paper,1000 left
Conv3_x ʹͺ ൈ ʹͺ ൥͵ ൈ ͵ǡͳʹͺ൩ ൈ Ͷ eye images are taken and divide the dataset according to the
ͳ ൈ ͳǡͷͳʹ ratio of 9:1 during the training process. The introduction of
ͳ ൈ ͳǡʹͷ͸
Conv4_x ͳͶ ൈ ͳͶ ൥ ͵ ൈ ͵ǡʹͷ͸ ൩ ൈ ͸ the CASIAv4-Iris-Thousand dataset is shown in Table II.
ͳ ൈ ͳǡͳͲʹͶ
ͳ ൈ ͳǡͷͳʹ TABLE II. BASIC INFORMATION OF THE DATA SET
Conv5_x ͹ൈ͹ ൥ ͵ ൈ ͵ǡͷͳʹ ൩ ൈ ͵
ͳ ൈ ͳǡʹͲͶͺ
ͳൈͳ Average pool,1000-d Dataset CASIAv4-Iris-Thousand
fc,softmax
Sensor Irisking IKEMB-100
C. Focal Loss Loss Function Environment Indoor with lamp on/off
In the traditional cross-entropy loss function, the loss of Resolution 640 * 480
each sample is treated equally. However, in the case of Color gray-level
category imbalance, there exist some common categories No.of.train 900
with a small number of samples, causing the model to tend No.of.test 100
to over-focus on most categories. This can lead to samples
from fewer categories being overshadowed by samples from B. Evaluation Indicators
more categories, reducing the model's ability to learn from a mIoU (Mean Intersection over Union) is often used in

22

Authorized licensed use limited to: Universitas Airlangga. Downloaded on January 08,2024 at 16:43:30 UTC from IEEE Xplore. Restrictions apply.
evaluation metric to measure the accuracy of image loss variation curves of different networks on the training
semantic segmentation tasks. It calculates the Intersection set are shown in Fig.4.
over Union between the predicted segmentation results and
the true labels, and then averages the Intersection over
Union for all categories to obtain the average Intersection
over Union. mIoU is a commonly used evaluation metric for
image segmentation, and is widely used in performance
evaluation and model comparison of semantic segmentation
tasks. It can integrate the segmentation accuracy of each
category and provide a concise value to measure the
performance of the model on the whole, with the following
Equation (2):

1 TP
¦
k
 mIoU  
k+1 i 0 FN  FP  TP

In the above table, k is the total number of detected Fig. 4. Loss variation of different network training sets
images, TP and FN are true positives and false negatives, FP
The loss variation curves of different networks on the
and TN are false positives and true negatives.
training set are shown in Fig.5.
The category mean pixel accuracy (MPA), abbreviated
as MPA, is another commonly used segmentation evaluation
metric, which indicates the meaning of calculating the
proportion of the number of pixels correctly classified for
each class separately and then accumulating to find the
average. The equation is shown in Equation (3).

TP  TN
¦
k
 mPA  
i 0
TP  TN  FP  FN

In the formula, k denotes the total number of detected


images, TP denotes True Positives, FN denotes False
Negatives, TN denotes True Negatives, and FP denotes
False Positives.
C. Experimental Implementation Fig. 5. Loss variation for different network validation sets
In this study, we utilized PyTorch to implement the
RU-Net model and trained it using the Adam optimizer. The From Fig.4 and Fig.5, it can be observed that the loss of
initial learning rate was set to 0.0001, and the training was each term decreases with increasing number of iterations
conducted for a total of 50 epochs. The architecture of the and the frequency of change remains essentially the same
RU-Net model follows the same structure as ResNet50, with among different networks. However, the model proposed in
the exception of excluding all fully connected layers in the this paper performs more accurately in terms of faster
system path. convergence and less loss fluctuation on the validation set
compared to other models.
D. Experimental Results
For the second part of the experiment, we compare
In the initial phase of the experiments, the paper
RU-Net with other networks on CASIAv4-Iris-Thousand by
recorded the loss value variation curves of the proposed
different evaluation metrics, respectively.
model and other models on the training and validation sets
of the CASIAv4-Iris-Thousand dataset. The training loss As shown in Table III, Table III compares the mIOU of
was used to evaluate the model's generalization ability on different networks.
the training data and assess its fit to the training data.
Validation loss evaluates model performance on unseen data, TABLE III. COMPARISON OF MIOU FOR DIFFERENT NETWORKS
indicating its generalization ability. Lower validation loss
indicates better performance and generalization. Method mIoU(%)

Firstly, we record the loss variation curves on the FCN 95.16


training set between different networks, as shown in Fig. 4.
U-Net 95.26
Secondly, the loss changes of different networks on the
validation set were recorded. Finally, the mean intersection RU-Net 96.22
to union ratio and mean pixel accuracy of different networks As shown in Table IV, Table IV compares the mPA of
were calculated separately.
different networks. The purpose of recording mPA
Through the comparison of different experimental comparisons between different networks is to provide a
results mentioned above, a comprehensive comparison of more comprehensive assessment of the network model in
the segmentation capabilities of different network models is conjunction with the mIOU comparison table mentioned
conducted to achieve the judgment of network models.The above.

23

Authorized licensed use limited to: Universitas Airlangga. Downloaded on January 08,2024 at 16:43:30 UTC from IEEE Xplore. Restrictions apply.
TABLE IV. COMPARISON OF MPA FOR DIFFERENT NETWORKS the backbone network of U-Net with ResNet50 and
introduces the Focal Loss loss function to handle the sample
Method mPA(%) imbalance between iris and background regions. The
segmentation results are evaluated using mIoU and mPA,
FCN 97.21
and the results show that RU-Net has fast convergence, low
U-Net 97.46 fluctuation of loss variation, good fitting ability and stable
performance. RU-Net outperforms the representative model
RU-Net 98.19
in terms of accuracy, which indicates that replacing the
Through table III compares three methods: original FCN, backbone network of U-Net with ResNet50 and using the
U-Net with VGG-16 backbone, and the proposed ResU-Net Focal Loss loss function can further improve the
(based on U-Net with ResNet50 backbone and Focal Loss). performance . Further research will focus on optimizing the
The evaluation metric used is mIoU. The results indicated a model process to make the model more robust in different
performance of 96.22% for RU-Net. RU-Net outperforms scenarios.
both U-Net and FCN in terms of mIoU.
REFERENCES
The data in Table IV takes mPA as the evaluation metric,
where the three methods are consistent with Table III, and [1] Hao P,He E. Active authentication technology and its research
progress[J]. Communication Technology,2015,48(05):514-518.
the results in the table can demonstrate that the performance
[2] S. Prabhakar, S. Pankanti, A.K. Jain, Biometric recognition: security
of RU-Net in this paper reaches 96.22%. The results suggest and privacy concerns, IEEE Symp. Secur. Privacy 1 (2003) 33-42.
that our proposed RU-Net outperforms both U-Net and FCN [3] A.K. Jain, Biometric recognition: how do i know who you are? in:
in this evaluation metric. the comparison of the two International Conference on Image Analysis and Processing, 2005, pp.
evaluation metrics indicates that the model segmentation 1-5.
capability of this paper is superior to both other two models. [4] Daugman J G. Biometric personal identification system based on iris
analysis: U.S. Patent 5,291,560 [P]. 1994-3-1.
Finally, some of the experimental results are shown in [5] Wildes R P. Iris recognition: an emerging biometric technology[J].
Fig.6. Proceedings of the IEEE, 1997, 85(9): 1348-1363.
[6] Wang Yunhong,Zhu Yong,Tan Tieniu. Identity identification based
on iris recognition[J]. Journal of
Automation,2002(01):1-10.DOI:10.16383/j.aas.2002.01.001.
[7] Arsalan M, Hong H G, Naqvi R A, et al. Deep learning-based iris
segmentation for iris recognition in visible light environment[J].
Symmetry, 2017, 9(11): 263.
[8] Wang C, Muhammad J, Wang Y, et al. Towards complete and
accurate iris segmentation using deep multi-task attention network for
non-cooperative iris recognition[J]. IEEE Transactions on information
forensics and security, 2020, 15: 2944-2959.
[9] O. Ronneberger, P. Fischer, T. Brox, U-net: convolutional networks
for biomedical image segmentation, in: International Conference on
Medical Image Computing and Computer-Assisted Intervention,
Springer, 2015, pp. 234-241.
Fig. 6. Graph of some experimental results [10] He K, Zhang X, Ren S, et al. Deep residual learning for image
recognition[C]//Proceedings of the IEEE conference on computer
vision and pattern recognition. 2016: 770-778.
From Fig.6 it can be seen that RU-Net can handle the
effects of environmental factors such as eyelash occlusion [11] Simonyan K, Zisserman A. Very deep convolutional networks for
large-scale image recognition[J]. arXiv preprint arXiv:1409.1556,
and lighting very well, and has good segmentation ability. 2014.
[12] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object
V. CONCLUSIONS detection[C]//Proceedings of the IEEE international conference on
In this paper, a new approach for iris segmentation computer vision. 2017. 2980-2988.
network model called RU-Net is proposed, which aims to [13] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for
overcome the occlusion and illumination problems in iris semantic segmentation, in: IEEE Conference on Computer Vision and
Pattern Recognition, 2015, pp. 3431-3440.
segmentation. Traditional learning methods usually use
global iris images for training, but the pixels in non-iris [14] Automation Research, Chinese Academy of Sciences. Shared iris
library [EB/OL].
regions of these images are not important for segmentation
http://www.cbsr.ia.ac.cn/china/Iris%20Databases%20CH.asp, 2017.
results. For sake of solving this problem, this paper replaces

24

Authorized licensed use limited to: Universitas Airlangga. Downloaded on January 08,2024 at 16:43:30 UTC from IEEE Xplore. Restrictions apply.

You might also like