Pig Face Recognition Based On Trapezoid Normalized Pixel Difference Feature and Trimmed Mean Attention Mechanism

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.
72, 2023 3500713
Pig Face Recognition Based on Trapezoid

Normalized Pixel Difference Feature and
Trimmed Mean Attention Mechanism
Shuiqing Xu , Member, IEEE, Qihang He , Songbing Tao , Hongtian Chen , Member, IEEE,
Yi Chai , and Weixing Zheng , Fellow, IEEE
Abstract— Pig face recognition has a wide range of applica- it can enable precision feeding by identifying the health status
tions in breeding farms, including precision feeding and disease of individual pigs [3]. Hence, pig face recognition has great
surveillance. This article proposes a method to guarantee its significance and application value.
performance in complex environments such as with dirty faces
and in unconstrained outdoor conditions. First, inspired by the Several methods for individual pig identification and trace-
shape of the pig face, a trapezoid normalized pixel difference ability have been proposed. One of the earliest is manual color
(T-NPD) feature is designed to achieve more accurate detection marking, where each pig is painted with a different color,
in unconstrained outdoor conditions. Subsequently, a trimmed which is compared with a sample library to obtain a pig’s
mean attention mechanism (TMAM) uses the trimmed mean- identity number [4]. This is physically harmful to the pig and
based squeeze method to assign more precise weights to feature
channels, and then fuses it into a 50-layer ResNet (ResNet50) time-consuming for farmers. Radio frequency identification
backbone network to classify detected pig face images with high (RFID) [5] is achieved through an ear tag, but it has only
accuracy. In addition, the TMAM can be applied in numerous achieved 88.6% accuracy, even at close range, and is costly,
common networks due to its universality. Finally, comprehensive as each pig requires a tag [6].
experiments conducted on the publicly available JD pig face Deep learning methods based on convolutional neural net-
dataset indicate that the proposed method has superior perfor-
mance compared with other methods, with an overall accuracy works (CNNs) have enjoyed great success in the domain
of 95.06%. of image classification [7], [8], [9], and some related meth-
ods have been used for pig face recognition. For example,
Index Terms— Attention mechanism, pig face detection, pig
face recognition, trapezoid normalized pixel difference (T-NPD) a shallow CNN consisting of six convolutional layers with
feature, trimmed mean. alternating dropout and max-pooling layers was designed for
pig face recognition [10]. However, this study required the pig
face image to be painted to achieve manual features, which
I. I NTRODUCTION
is time-consuming in practice. A Haar cascade classifier was
P IG face recognition has an essential role in pig iden-

tification and traceability, and fulfills a wide range of
demands in breeding farms [1]. For example, it enables the
used to detect the pig faces, and an improved shallow CNN to
classify them [11]. This method only achieved 83% accuracy
in a ten-category pig classification experiment, and was mainly
rapid isolation of pigs suffering from swine fever virus, so as to suitable for frontal pig face recognition, without considering
limit its transmission [2]. Combined with behavior detection, the profile. Faster R-CNN and a shallow CNN with a residual
Manuscript received 10 August 2022; revised 2 November 2022; accepted module were cascaded to achieve front and profile pig face
3 December 2022. Date of publication 26 December 2022; date of current recognition [12]. However, it was conducted in a constrained
version 11 January 2023. This work was supported in part by the National environment with few background disturbances. The above
Natural Science Foundation of China under Grant 62273128, Grant 61803140,
and Grant U2034209; in part by the China Postdoctoral Science Foundation methods were mainly implemented for relatively clean pig
under Grant 2020M682474; and in part by the Nantong Basic Science faces in constrained indoor conditions. But there are many
Research Project under Grant JC12022068. The Associate Editor coordinating farms where pigs live in unconstrained outdoor scenes, where
the review process was Dr. Min Xia. (Corresponding author: Hongtian Chen.)
Shuiqing Xu, Qihang He, and Songbing Tao are with the College of the environment is changeable and the pig’s face is very dirty.
Electrical Engineering and Automation, Hefei University of Technology, In such circumstances, these methods may encounter difficulty
Hefei 230009, China (e-mail: xsqanhui91@gmail.com; 1614672098@qq.com; in achieving excellent pig face recognition results. In addition
taosongbing@gmail.com).
Hongtian Chen is with the Department of Chemical and Materials Engi- to the application of the above CNNs, attention mechanisms
neering, University of Alberta, Edmonton, AB T6G 1H9, Canada (e-mail: have also received extensive attention in image classification
chtbaylor@163.com). and face recognition tasks [13], [14], [15]. For example,
Yi Chai is with the State Key Laboratory of Power Transmission Equipment
and System Security and New Technology, Chongqing University, Chongqing SENet [16] first introduced squeeze and excitation operations
400044, China (e-mail: chaiyi@cqu.edu.cn). for assigning weights to different channels, which can improve
Weixing Zheng is with the School of Computing, Engineering and Mathe- the accuracy of image classification. A novel attention mech-
matics, University of Western Sydney, Penrith, NSW 2751, Australia (e-mail:
w.zheng@westernsydney.edu.au). anism was designed to incorporate into MobileNet for cattle
Digital Object Identifier 10.1109/TIM.2022.3232093 face recognition [17]. The global and part feature deep network
1557-9662 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Tamkang Univ.. Downloaded on June 22,2023 at 07:17:38 UTC from IEEE Xplore. Restrictions apply.
3500713 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
model (GPN) was combined with an attention mechanism to (TMAM_ResNet50) can realize high-precision classifi-
perform cattle face re-identification [18]. Nevertheless, these cation of pig faces, and improve the overall accuracy by
attention mechanisms are currently mainly applied to the 2.13% compared to ResNet50. The proposed attention
recognition of human or cattle faces, and the application in mechanism has excellent portability and can help to
pig face recognition needs further exploration. enhance the performance of common networks such as
Face recognition of other animals has attracted attention. ResNet18, ResNet34, ResNet50, ResNet101 [30], and
Inception-V3 was adopted to extract cattle face features from ResNext50 [31].
a rear-view video dataset, and these features were used to train The remainder of this article is organized as follows. Section II
a long short-term memory model to identify each individual reviews related work on CNNs and attention mechanisms.
cattle [19]. Other cattle face recognition methods, including In Section III, we present the proposed pig face recognition
PnasNet-5 [20], VGG-16 [21], and CattleFaceNet [22], have method. Section IV presents experimental results and discus-
been well studied. Sheep face recognition has also been sion. Conclusions are drawn in Section V.
studied. Faster R-CNN was applied to detect sheep faces, and a
ResNet50V2 model with the ArcFace loss function was used to II. R ELATED W ORK
classify them [23]. SheepFaceNet [24] and YOLOv4 [25] were
proposed to realize sheep face recognition, and researchers Many deep learning-based technologies have been proposed
have also worked on dog [26], [27] and giant panda [28], [29] for pig face recognition and other image classification tasks.
face recognition. Due to differences in facial features, it has We briefly review these in the two main categories of CNNs
been difficult to apply the above methods directly to pig face and the attention mechanism.
recognition.
The aforementioned studies indicate that it is necessary to A. CNNs
present a highly accurate recognition method for unconstrained Because a CNN can automatically learn features from the
outdoor scenes, including variable environments and dirty pig input image without manual operators, it has tremendous
faces. We propose a pig face recognition method to address potential in image classification. The association of CNN
these problems. With a wide forehead and narrow mouth performance and network depth has been researched through
area, the pig face is trapezoidal. Hence, we extract features the design of the VGG [32]. GoogLeNet was developed to
based on trapezoidal pixel regions rather than pixel points. boost the multiscale feature extraction ability of CNNs through
We develop a trimmed mean attention mechanism (TMAM) parallel convolution modes [33]. DenseNet [34] connects each
to assign weights for different channels, which helps the back- layer to each other layer in a feedforward manner. DLA [35]
bone network to achieve more accurate pig face classification. ensures that the network has higher accuracy and fewer
In addition, it should be noted that this paper mainly considers parameters by combining layers in a tree structure. The study
the common breeding scenario of only one pig in an enclosure of ResNet [30] showed that the problem of gradient vanishing
in the pig industry. can be mitigated through a skip structure, and its bottle-
The main contributions of this article can be summarized neck architecture improves the capability of the computation.
as follows. Consequently, networks such as Res2Net [36], ResNeXt [31],
1) A pig face recognition method based on trapezoid nor- and ResNeSt [37] have been proposed. Although many CNNs
malized pixel difference (T-NPD) features and TMAM have been proposed to improve performance from different
is proposed to realize highly accurate pig face detection perspectives, the ResNet50 has obvious advantages. It has been
and classification, which is practical when complex shown to require fewer training parameters in CNNs with the
environments such as dirty and unconstrained outdoor same depth, and is less costly to train than networks with
conditions are considered. Particularly, it is able to similar residual structures [38]. More importantly, Resnet50
achieve pig face detection and recognition from different has shown high stability in image recognition tasks [9]. Hence
angles. we use ResNet50 as the backbone network in this article.
2) A new image feature, T-NPD, is developed by analyzing
the biological characteristics of the pig face with a wide
forehead and narrow mouth, which is more suitable for B. Attention Mechanism
complex backgrounds due to its ability to avoid the The attention mechanism is inspired by human perception,
effects of dirty face and facial variations by extracting by which a person concentrates on important features and
features based on trapezoidal pixel regions. The exper- ignores others. The attention mechanism was first used in
iments results show that the proposed pig face detector natural language processing, and has shown great potential
using T-NPD can achieve excellent performance in terms when fused with CNNs for feature extraction and image
of accuracy, precision, recall, and F1. classification [39], [40], [41].
3) An attention mechanism, TMAM, is proposed, which To this end, the design of attention mechanism has received
uses trimmed mean squeeze operation instead of global extensive attention. For instance, GENet [42] introduced a
average pooling (GAP) to eliminate the effect of edge spatial attention mechanism to focus on task-related regions,
values on channel descriptors, thus assigning more accu- which is suitable for object detection. SENet [16] used a
rate weights to feature channels. Experimental results channel attention mechanism to perform GAP to squeeze
show that 50-layer ResNet (ResNet50) with TMAM feature channels, whose weights were calculated with three
XU et al.: PIG FACE RECOGNITION BASED ON T-NPD FEATURE AND TMAM 3500713
Fig. 1. Framework of pig face recognition.
fully connected (FC) layers. A squeeze operation with GAP

only uses one special object feature to explore channel atten-
tion, which yields suboptimal channel descriptors. Therefore,
CBAM [43] used GAP and global max pooling simultaneously
for channel squeezing, which yielded a modest performance
gain. Some other attention mechanisms, such as CAM [44],
LAM [39], and SRM [45], also utilized GAP-based methods
to improve squeeze channels. Most methods used to optimize
the squeeze operation of GAP ignore the issue that the average
value it obtains is susceptible to channel edge values, whose
great deviation from other data seriously affect the represen-
tativeness of the obtained channel descriptor. To address this Fig. 2. Feature extraction methods in (a) NPD method, features are
problem, we developed a channel squeeze operation, TMAM, calculated from pixel x, y, and (b) T-NPD method, features are calculated
adopting a trimmed mean, which can avoid the influence of from trapezoidal pixel region xi , yi .
channel edge values in the traditional attention mechanism,
and is more effective than other attention mechanisms at the
has two drawbacks when used for pig face detection. First,
pig face classification task.
feature extraction based on pixels point is susceptible to noise.
When applied to a dirty pig face, feature extraction becomes
III. P ROPOSED M ETHOD
difficult, which affects the accuracy of detection. Second, the
We introduce the proposed pig face recognition method, detection performance of the NPD detector depends on the
which includes detection and classification. For detection, the richness of pig face views in the training database, but it is
T-NPD detector captures pig face images in unconstrained difficult to build a database with various pig face angles in
outdoor conditions. For classification, TMAM_ResNet50 is practice. The T-NPD feature is designed for feature extraction
applied to classify the detected pig face images. Fig. 1 shows with reference to the features of the pig face. Fig. 2(b) shows
the framework of the method. the T-NPD feature extraction method between two trapezoidal
pixel regions in an image, which is formulated as
A. T-NPD Detector for Pig Face Detection xi − x j
f (x i , x j ) = (1)
While CNNs have been successfully applied to pig face xi + x j
recognition [10], the acquisition of pig face images still where i and j represent the i th and j th trapezoidal pixel
depends on manual work, which is time-consuming and cannot regions in the image, and x i and x j are their respective
meet the needs of large-scale farm automation. Therefore, it is intensity values. f (0, 0) is defined as 0 when x i = x j = 0.
first necessary to detect the face. For an image with width W and height H , x i and x j are
The NPD detector [46] is fast and accurate, and is widely calculated as
used in human face detection. Fig. 2(a) shows its feature
extraction method between two pixel points. However, NPD x i = ri + ri+1 + ri+2 + ri+3 + ri+W +1 + ri+W +2 (2)
features are based on pixel points for feature extraction, which x j = r j + r j +1 + r j +2 + r j +3 + r j +W +1 + r j +W +2 (3)
Fig. 3. Architecture of TMAM_ResNet50.
where ri and r j are the respective intensity values of pixels i

and j .
The T-NPD feature is designed based on the biometrics of
the pig face, comparing the values of two trapezoidal pixel
regions. Hence, T-NPD can better adapt to pig face feature
extraction and avoid the high requirements for pig face view
richness, which is more conducive to practical application.
Based on the extracted T-NPD features, a deep quadratic
tree (DQT) [46] is constructed to capture the interaction
between the dimensions and high-level information of features.
Subsequently, the Gentle AdaBoost algorithm [47] is applied
Fig. 4. TMAM.
to train the T-NPD features based on DQT. To train more effec-
tively and eliminate negative samples earlier, a soft cascade
algorithm [48] is used to cascade the DQT and obtain a T-NPD and SRM [45], the GAP-based methods are used to squeeze
detector. In this way, a pig face can be accurately detected in channels. However, these methods ignore the problem that
a timely manner in unconstrained outdoor conditions. the obtained average value is susceptible to the channel edge
values. The channel edge values usually have a large deviation
B. TMAM_ResNet50 for Pig Face Classification from other data, which will seriously affect the representative-
We design TMAM_ResNet50 to achieve pig face recogni- ness of the obtained channel descriptors. To solve this problem
tion. TMAM_ResNet50 uses a ResNet50 backbone network and make better use of global spatial information, the trimmed
and automatically assigns weights to feature channels in mean squeeze operation is adopted on feature map U =
combination with TMAM, for more refined feature extraction. {Uk ( p, q)|1 ≤ k ≤ c, 1 ≤ p ≤ h, 1 ≤ q ≤ w} to generate
Fig. 3 shows the TMAM_ResNet50 architecture. channel descriptors, where c and k indicate the number and
1) Architecture: As shown in Fig. 3, TMAM_ResNet50 sequence number, respectively, of feature channels, and p and
consists of a convolutional layer, a max pooling layer, four q are the sequence numbers of spatial dimensions h × w. The
kinds of blocks (Blocks 1–4), a GAP layer, and a full con- steps of the trimmed mean squeeze operation are as follows.
nection layer. Block 1 includes three sequentially connected A flattening operation is performed on Uk ( p, q) to obtain
identical structures, each composed of three convolution layers channel flattening vector F = ( f1 , f2 , . . . , f λ , . . . , f h×w ),
and a TMAM. These are referred to as a residual block. Blocks where λ = 1, 2, . . . , h × w indicates the sequence number.
2–4 are, respectively, composed of four, six, and three identical An efficient trimmed mean aggregation strategy is used,
structures, each consisting of three convolution layers and a where the edge value is first removed from F, and we
TMAM. Table I describes each layer and its parameters. average the remaining channel flattening values. Under the
2) TMAM: The TMAM attention mechanism models the trimmed mean squeeze operation, R = {μk |1 ≤ k ≤ c} is
feature channel relationships and further improves classifica- obtained by shrinking the spatial dimensions h × w of feature
tion accuracy by automatically assigning precise weights to map U , and the channel descriptor μk of the kth channel is
feature channels. Fig. 4 shows the TMAM architecture, with computed as
trimmed mean squeeze, excitation, and rescale steps. λ2
1
a) Trimmed Mean Squeeze: In the traditional attention μk = sort(F) (4)
mechanisms, such as CBAM [43], CAM [44], LAM [39], λ2 − λ1 λ
1
TABLE I
PARAMETERS AND VALUES U SED IN THE TMAM_R ES N ET 50
where sort(·) denotes the sorting of values in descending TABLE II

order, and sort(F) is a sorting vector with sequence numbers S YSTEM S PECIFICATIONS
λ1 and λ2 .
We select values of λ1 and λ2 as (1/4) × h × w and
(3/4) × h × w, respectively. That is, the values of the first
and last quarter of the sorted vector are treated as edge values.
Through the trimmed mean squeeze, the multidimensional
feature map U on each feature channel is squeezed into a
number whose value is its descriptor.
b) Excitation: To fully utilize the channel descriptor
information gained by the trimmed mean squeeze operation,
the importance of different feature channels is modeled by
excitation, and weights are assigned to them. Excitation adopts
a gating mechanism consisting of two FC layers, which
can better model the complex relationships between feature represents the kth reinforced feature map after TMAM
channels by adding nonlinearity, while enabling more ade- Ok (u, v) = Sk Uk ( p, q) (6)
quate interaction between them. The adaptive recalibration
value S R 1×1×c indicates the importance of different feature where Sk is the weight assigned to the kth channel.
channels, which is obtained as With the rescale operation, feature channels can be given
weights according to their roles. Consequently, the feature map
S = σ (W2 δ(W1 R)) (5) U is enhanced due to the emphasis on important information
where δ is the ReLU activation function. δ is used after the and neglect of inferior information.
first FC layer to relieve gradient disappearance and improve
IV. E XPERIMENTAL R ESULTS AND D ISCUSSION
the convergence speed. σ is the sigmoid activation function,
which is used after the second FC layer to normalize the In this section, we discuss our experiments and evaluate
channel weight values between 0 and 1. W1 R c×(c/r) and their results on T-NPD, TMAM, and TMAM_ResNet50.
W2 R (c/r)×c represent the first and second FC layers, respec-
tively, and r is the reduction ratio of nodes between the two A. Implementation
FC layers. 1) Environment Settings: The PyTorch framework and Ana-
c) Rescale: The output of TMAM, O = {Ok (u, v)|1 ≤ conda development platform were employed to train and test
k ≤ c}, is obtained by rescaling the feature map U with adap- the pig face recognition method. System specifications are
tive recalibration values S = {Sk |1 ≤ k ≤ c}. Here, Ok (u, v) shown in Table II.
2) Network Training: In the experimental part of this paper,

most networks were trained in the following way, i.e., the
Glorot uniform method was used to initialize the network, and
the batch size was set to 8 for training. Adam optimization was
applied to prevent local optima, with a learning rate of 0.0001.
The epoch of training was selected as 300. The error of the
model was quantified by the categorical cross-entropy
ω−1
d−1
L =− log( pn,d ) (7)
n=0 d=0
where ω and d are the numbers of images and categories,

respectively; and pn,d is the dth value of the network output
after softmax normalization. Fig. 5. (a) Training set. (b) Validation set.
3) Database Description: Negative and positive sample
databases were constructed to train the T-NPD detector. TABLE III
The negative sample database consisted of non-pig face R ESULTS OF P IG FACE D ETECTORS
images, including templates E of 20 × 20 pixels and T of
544 × 411 pixels. Template E was applied to train the
T-NPD detector. Template T was used to check the current
T-NPD detector, and could add a false detection image to
the negative sample database as new template E. Eventually,
a total of 1000 samples of template E and 200 samples of
template T were selected to establish the negative sample
database.
A positive sample database was built from pig face images verification. The process of image augmentation is illustrated
chosen from 30 publicly available videos of the JD pig face in Fig. 6.
dataset. Among these, 20 videos were randomly selected to
manually generate 500 pig face images to form the positive B. Evaluation of Proposed T-NPD Detector
sample database. To show the effectiveness of T-NPD fea- In the experiments, four indicators were used to evaluate
ture extraction, a test sample database was constructed with the T-NPD detector [50]
250 non-pig face images and 250 pig face images, and pig TP
Recall = (8)
face images were manually selected from the remaining ten TP + FN
videos. Training and validation sets were built to establish TP
TMAM_ResNet50, and obtained by using the T-NPD detector Precision = (9)
TP + FP
to filter pig faces from 20 videos from the pig face dataset. Recall · Precision
Considering that pig faces show diverse poses in practice, the F1 = 2 · (10)
Recall + Precision
number of images in the training set was designed to be less TP + TN
than in the verification set before data enhancement. Based on Accuracy = (11)
TP + FP + FN + TN
this, five face images were selected for each pig in the training
where TP and FN are the numbers of positive samples that are
set, and at least 10 for each pig in the validation set. In the
classified as pig face and non-pig face, respectively; FP and
end, the original training set was made up of 100 pig face
TN are the numbers of negative samples identified as pig face
images, and the validation set included 466 pig face images.
and non-pig face, respectively.
Fig. 5 shows an example of the settings of the original training
Meanwhile, the frames/s (FPS) was introduced as a metric
and verification sets of a pig.
to evaluate the detection time, which is formulated as [51]
4) Data Augmentation: To increase the number of images
in the training set is important to measure the performance FRA
FPS = (12)
of TMAM and TMAM_ResNet50, and it can prevent them T
from learning irrelevant features [49]. We applied 12 data where the T and FRA indicate the time and the number of
augmentation methods to increase the number and diversity of frames, respectively.
images in the original training set. These are mirror symmetry, The proposed T-NPD detector was compared with the
rotation by 10◦ , rotation by 20◦ , image sharpening, Gaussian traditional NPD detector [46] on the test sample database, and
blur, contrast increase, contrast reduction, brightness increase, five indicators are compared in Table III, which shows that
brightness reduction, random clipping, color increase, and although the proposed T-NPD detector had a high accuracy
color reduction. After augmentation, each pig had 65 images of 97.40% on the test sample database, and outperformed
for training, and more than 10 for verification. Ultimately, NPD detector on recall, precision and F1 metrics, the FPS
1300 pig face images were obtained for training, and 466 for of the proposed T-NPD detector was not significantly different
Fig. 6. Augmentation of pig face images.
Fig. 7. Detection results. (Top) T-NPD. (Bottom) NPD.
Fig. 8. Pig faces detected by T-NPD detector in unconstrained outdoor conditions. Green boxes represent detected pig faces.
from NPD detector. Consequently, it can be confirmed that the a visual comparison between T-NPD and NPD. It is obvious
proposed T-NPD detector is superior to the NPD detector, and that T-NPD correctly obtained the location of a pig face
has excellent practicability in pig face detection. Fig. 7 shows that was incorrectly detected by NPD. Fig. 8 displays more
examples, further illustrating that T-NPD can effectively detect TABLE IV

pig faces from different angles in unconstrained outdoor C OMPARISONS OF A RCHITECTURES W ITH AND W ITHOUT TMAM
conditions.
C. Evaluation on Proposed TMAM

Two experiments were performed to evaluate the proposed
TMAM. First, the generalizability of TMAM was investigated
by incorporating it with commonly used backbone networks
to classify pig face images. Then TMAM was compared
with state-of-the-art attention mechanisms. In addition, the
grad-class activation map (Grad-CAM) was used to visual-
ize the enhancement of network feature extraction capability
by TMAM. In these experiments, the attention mechanism-
based method was trained on a training set, and classification
performance was evaluated on a verification set.
1) Incorporation of TMAM in Commonly Used Networks:
To investigate its generalizability, TMAM was incorporated
in the ResNet18, ResNet34, ResNet50, ResNet101, and
ResNext50 networks, and we compared their pig face classifi-
cation performance with the original networks. We introduce
four classification performance metrics
αt
Overall accuracy = (13)
αt + α f
φ
1
Mean recall = Recalll (14)
φ l=1
φ
1
Mean precision = Precisionl (15)
φ l=1
φ
1 l
Mean F1 = F1 (16)
φ l=1
where φ is the number of pigs to be recognized; l refers
Fig. 9. Overall accuracy on validation set for different commonly used
to the lth pig; and αt and α f , respectively, are the numbers networks with and without TMAM.
of correctly and incorrectly classified pig face images in the
validation set. Furthermore, it should be pointed out that the
training parameters of these models in this part are remained enhance classification performance. The above results indicate
consistent with those in Section IV-A, and the only difference that TMAM is generic, and it can efficiently improve the
is that the epochs here are chosen as 100, because we only performance of commonly used networks.
need to know whether TMAM is generic. 2) Comparison With Other Attention Mechanisms: We com-
Table IV shows classification results with and without pared some widely used attention mechanisms, such as
TMAM in commonly used networks. It can be observed that SENet [16], FCA [52], CBAM [43], and TMAM in the clas-
TMAM could enhance the classification performance of these sification of pig faces. ResNet50 was chosen as the backbone
networks on all indicators. Specifically, the improvements network for each attention mechanism. Params, FLOPs, mean
of overall accuracy, mean recall, mean precision, and mean AUC, and overall accuracy were used to evaluate the space
F1 of ResNet50 were 1.93%, 1.28%, 1.71%, and 1.50%, complexity, time complexity, generalization performance, and
respectively. Fig. 9 shows the overall accuracy curves of pig model accuracy, respectively. Specifically, the Params for
face classification for different networks, from which it can be evaluating the space complexity and the FLOPs for evaluating
seen that the overall accuracy of these networks was higher the time complexity are calculated as follows [18]:
after incorporating TMAM. Params = Co × (ke2 × Ci + 1) (17)
To further illustrate the effect of TMAM, Fig. 10 shows
FLOPs = Co × (ke2 × Ci + 1) × Hi × We (18)
the class activation maps of ten successfully classified pigs
by applying the Grad-CAM technique. It can be noticed that where Co represents the number of output channels, Ci rep-
ResNet50 with TMAM focuses more on learning features from resents the number of input channels, ke represents the size
the pig face than the background. This is why TMAM can of the convolution kernel, and Hi , We represent the height
Fig. 10. Class activation maps with and without the proposed attention mechanism for pig face images. (Top) Original pig face images. (Middle) Class
activation maps drawn by ResNet50. (Bottom) Class activation maps drawn by TMAM_ResNet50.
Fig. 11. Confusion matrix of incorporating different attention mechanisms into ResNet50 for pig face classification tasks. (a) ResNet50. (b) ResNet50+SE.
(c) ResNet50+CBAM (d) ResNet50+FCA. (e) ResNet50+TMAM.
and width of the output feature map. Furthermore, it is worth are better. This demonstrates that the original model equipped
noting that the training parameters of these models in this part with TMAM can have a better ability to achieve pig face
are remained consistent with those in Section IV-A. classification. Moreover, it should be noted that the experi-
As can be observed from Table V, although the Params mental results of Table V are obtained by training the model
and FLOPs of TMAM are comparable to those of other several times and averaging these results. Fig. 11 shows the
attention mechanisms, the mean AUC and overall accuracy confusion matrix of different methods, where the diagonal
TABLE V TABLE VI
A CCURACIES AND PARAMETERS F ROM D IFFERENT C LASSIFICATION R ESULTS OF D IFFERENT A RCHITECTURE
A RCHITECTURES AND ATTENTION M ECHANISMS
Fig. 12. Overall accuracy curves on validation set for different attention
mechanisms.
Fig. 13. Overall accuracy curves on validation set for different classification
methods.
and non-diagonal values indicate the proportion of correct
and incorrect predictions, respectively. It is clear that the
diagonal response of TMAM outperforms the other attention those in Sections IV-A and IV-C, respectively. The classi-
mechanisms, which means that it can improve classification fication results obtained by different algorithms are shown
performance. Fig. 12 presents the overall accuracy curves of in Table VI, from which it can be seen that the overall
pig face classification of various attention modules in different accuracy, mean recall, mean precision, and mean F1 of
epochs, which intuitively suggests that TMAM can be better TMAM_ResNet50 reached 95.06%, 94.82%, 95.28%, and
equipped with the backbone network than other attention 95.05%, respectively, which were superior to other methods in
mechanisms. Based on the above results, it is concluded that pig face classification tasks. In addition, it should be pointed
TMAM can better improve the performance of the backbone out that the experimental results of Table VI are also achieved
network compared to other attention mechanisms with the by training the model several times and averaging these
same Params and FLOPs. results.
Fig. 13 presents the overall accuracy curves for dif-
ferent epochs and classification algorithms, which were
D. Evaluation of Proposed TMAM_ResNet50 usually higher for TMAM_RseNet50 than for other algo-
In the last experiment, to validate the proposed method of rithms. Fig. 14 displays some examples of recognition of
pig face classification, TMAM_ResNet50 was compared with pig face images in unconstrained outdoor conditions by
six state-of-the-art classification algorithms: ResNet50 [30], combining T-NPD and TMAM_ResNet50, which demon-
MobileNetV2 [53], MobileNetV3 [54], RegNet [55], Shuf- strates the effectiveness of TMAM_ResNet50. Consequently,
fleNetV2 [56], and EfficientNetB0 [57]. The network training it can be confirmed that TMAM_ResNet50 can realize pig
parameters and performance metrics remained consistent with face classification in unconstrained outdoor conditions, and
Fig. 14. Examples of pig face recognition in unconstrained outdoor conditions by combining T-NPD and TMAM_ResNet50. Green boxes represent detected
pig faces, class represents identity of pig, and prob represents confidence.
outperforms other methods on all classification performance [5] E. Rigall, X. Wang, Q. Chen, S. Zhang, and J. Dong, “An RFID
metrics. tag localization method based on hologram mask and discrete cosine
transform,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–12, 2022.
[6] J. Maselyne et al., “Validation of a high frequency radio frequency iden-
V. C ONCLUSION tification (HF RFID) system for registering feeding patterns of growing-
finishing pigs,” Comput. Electron. Agricult., vol. 102, pp. 10–18,
In this article, we proposed a method for pig face recog- Mar. 2014.
[7] G. Shi, Y. He, and C. Zhang, “Feature extraction and classification of
nition in unconstrained outdoor conditions. A detector using cataluminescence images based on sparse coding convolutional neural
the T-NPD feature was developed to obtain pig face regions networks,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–11, 2021.
in an image. The TMAM attention mechanism was added to [8] J. Ni, K. Shen, Y. Chen, W. Cao, and S. X. Yang, “An improved deep
network-based scene classification method for self-driving cars,” IEEE
ResNet50 to recognize the detected pig face. Experimental Trans. Instrum. Meas., vol. 71, pp. 1–14, 2022.
results indicated that the T-NPD feature can better describe [9] R. Xia, G. Li, Z. Huang, L. Wen, and Y. Pang, “Classify and local-
the biological characteristics of a pig face and improve the ize threat items in X-ray imagery with multiple attention mechanism
effectiveness of its detection. TMAM was shown to be generic and high-resolution and high-semantic features,” IEEE Trans. Instrum.
Meas., vol. 70, pp. 1–10, 2021.
when applied in common networks and more effective than [10] M. F. Hansen et al., “Towards on-farm pig face recognition using
other attention mechanisms. TMAM_ResNet50 also showed convolutional neural networks,” Comput. Ind., vol. 98, pp. 145–152,
excellent performance, with overall accuracy of 95.06% at the Jun. 2018.
[11] M. Marsot et al., “An adaptive pig face recognition approach using
pig face recognition task. In the future, we plan to improve convolutional neural networks,” Comput. Electron. Agricult., vol. 173,
our method with model pruning, which could make it more Jun. 2020, Art. no. 105386.
lightweight and available in practice. Moreover, we will also [12] R. Wang, Z. Shi, Q. Li, R. Gao, C. Zhao, and L. Feng, “Pig face
recognition model based on a cascaded network,” Appl. Eng. Agricult.,
carry out some new research directions, such as the recognition vol. 37, no. 5, pp. 879–890, 2021.
of multiple pigs in one image, and the identification of pigs [13] H. Uppal, A. Sepas-Moghaddam, M. Greenspan, and A. Etemad, “Depth
of different ages. as attention for face representation learning,” IEEE Trans. Inf. Forensics
Security, vol. 16, pp. 2461–2476, 2021.
[14] X. Wang, W. Fan, M. Hu, Y. Wang, and F. Ren, “A self-fusion
R EFERENCES network based on contrastive learning for group emotion recognition,”
IEEE Trans. Computat. Social Syst., early access, Sep. 12, 2022, doi:
[1] H. Liu et al., “Development of a face recognition system and its 10.1109/TIM.2022.3193711.
intelligent lighting compensation method for dark-field application,” [15] C. Wang, J. Xue, K. Lu, and Y. Yan, “Light attention embedding for
IEEE Trans. Instrum. Meas., vol. 70, pp. 1–16, 2021. facial expression recognition,” IEEE Trans. Circuits Syst. Video Technol.,
[2] S. A. Perdomo et al., “SenSARS: A low-cost portable electrochemical vol. 32, no. 4, pp. 1834–1847, Apr. 2022.
system for ultra-sensitive, near real-time, diagnostics of SARS-CoV-2 [16] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in
infections,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–10, 2021. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
[3] Z. Wang and T. Liu, “Two-stage method based on triplet margin loss for pp. 7132–7141.
pig face recognition,” Comput. Electron. Agricult., vol. 194, Mar. 2022, [17] X. Chen et al., “Holstein cattle face re-identification unifying global and
Art. no. 106737. part feature deep network with attention mechanism,” Animals, vol. 12,
[4] Y. Gómez et al., “A systematic review on validated precision livestock no. 8, p. 1047, Apr. 2022.
farming technologies for pig production and its potential to assess animal [18] Z. Li and X. Lei, “Cattle face recognition under partial occlusion,”
welfare,” Frontiers Veterinary Sci., vol. 8, May 2021, Art. no. 660565. J. Intell. Fuzzy Syst., vol. 43, no. 1, pp. 67–77, Jun. 2022.
[19] Y. Qiao, D. Su, H. Kong, S. Sukkarieh, S. Lomax, and C. Clark, [44] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient
“Individual cattle identification using a deep learning based framework,” mobile network design,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
IFAC-PapersOnLine, vol. 52, no. 30, pp. 318–323, 2019. Recognit. (CVPR), Jun. 2021, pp. 13713–13722.
[20] L. Yao, Z. Hu, C. Liu, H. Liu, Y. Kuang, and Y. Gao, “Cow face detection [45] H. Lee, H.-E. Kim, and H. Nam, “SRM: A style-based recalibration
and recognition based on automatic feature extraction algorithm,” in module for convolutional neural networks,” in Proc. IEEE/CVF Int.
Proc. ACM Turing Celebration Conf. China, May 2019, pp. 1–5. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1854–1862.
[21] H. Wang, J. Qin, Q. Hou, and S. Gong, “Cattle face recognition method [46] S. Liao, A. K. Jain, and S. Z. Li, “A fast and accurate unconstrained
based on parameter transfer and deep learning,” J. Phys., Conf. Ser., face detector,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2,
vol. 1453, no. 1, Jan. 2020, Art. no. 012054. pp. 211–223, Feb. 2016.
[22] B. Xu et al., “CattleFaceNet: A cattle face identification approach based [47] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression:
on RetinaFace and ArcFace loss,” Comput. Electron. Agricult., vol. 193, A statistical view of boosting,” Ann. Statist., vol. 28, no. 2, pp. 337–407,
Feb. 2022, Art. no. 106675. 1998.
[23] A. Hitelman, Y. Edan, A. Godo, R. Berenstein, J. Lepar, and I. Halachmi, [48] L. Bourdev and J. Brandt, “Robust object detection via soft cascade,” in
“Biometric identification of sheep via a machine-vision system,” Com- Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR),
put. Electron. Agricult., vol. 194, Mar. 2022, Art. no. 106713. vol. 2, Jun. 2005, pp. 236–243.
[49] K. Mohan, A. Seal, O. Krejcar, and A. Yazidi, “Facial expression recog-
[24] H. Xue, J. Qin, C. Quan, W. Ren, T. Gao, and J. Zhao, “Open set sheep
face recognition based on Euclidean space metric,” Math. Problems Eng., nition using local gravitational force descriptor-based deep convolution
vol. 2021, pp. 1–15, Nov. 2021. neural networks,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2021.
[50] S. Liu, W. Jiang, L. Wu, H. Wen, M. Liu, and Y. Wang, “Real-time
[25] M. Billah, X. Wang, J. Yu, and Y. Jiang, “Real-time goat face recogni-
classification of rubber wood boards using an SSR-based CNN,” IEEE
tion using convolutional neural network,” Comput. Electron. Agricult.,
Trans. Instrum. Meas., vol. 69, no. 11, pp. 8725–8734, Nov. 2020.
vol. 194, Mar. 2022, Art. no. 106730. [51] S. Du, K. Gu, and T. Ikenaga, “Subpixel displacement measurement
[26] G. Mougeot, D. Li, and S. Jia, “A deep learning approach for dog face at 784 FPS: From algorithm to hardware system,” IEEE Trans. Instrum.
verification and recognition,” in Proc. Pacific Rim Int. Conf. Artif. Intell., Meas., vol. 71, pp. 1–10, 2022.
2019, pp. 418–430. [52] Z. Qin, P. Zhang, F. Wu, and X. Li, “FcaNet: Frequency channel
[27] B. Yoon, H. So, and J. Rhee, “A methodology for utilizing vector space attention networks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
to improve the performance of a dog face identification model,” Appl. Oct. 2021, pp. 783–792.
Sci., vol. 11, no. 5, p. 2074, Feb. 2021. [53] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
[28] L. Wang et al., “Giant panda identification,” IEEE Trans. Image Process., “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc.
vol. 30, pp. 2837–2849, 2021. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
[29] P. Chen et al., “A study on giant panda recognition based on images pp. 4510–4520.
of a large proportion of captive pandas,” Ecology Evol., vol. 10, no. 7, [54] A. Howard et al., “Searching for MobileNetV3,” in Proc. IEEE/CVF Int.
pp. 3561–3573, Apr. 2020. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1314–1324.
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for [55] I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar,
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. “Designing network design spaces,” in Proc. IEEE/CVF Conf. Comput.
(CVPR), Jun. 2016, pp. 770–778. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 10428–10436.
[31] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual [56] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical
transformations for deep neural networks,” in Proc. IEEE Conf. Comput. guidelines for efficient CNN architecture design,” in Proc. Eur. Conf.
Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1492–1500. Comput. Vis. (ECCV), Sep. 2018, pp. 116–131.
[32] K. Simonyan and A. Zisserman, “Very deep convolutional networks for [57] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con-
large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. volutional neural networks,” in Proc. ACM Int. Conf. Ser., 2019,
(ICLR), 2015, pp. 1–16. pp. 6105–6114.
[33] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1–9. Shuiqing Xu (Member, IEEE) received the Ph.D.
[34] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely degree from the Department of Automation,
connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Chongqing University, Chongqing, China, in 2017.
Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708. Since 2017, he has been an Associate Profes-
[35] F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” sor with the College of Electrical Engineering and
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, Automation, Hefei University of Technology, Hefei,
pp. 2403–2412. China. His current research interests include image
[36] S. H. Gao, M. M. Cheng, and K. Zhao, “Res2Net: A new multi-scale processing, computer vision, and deep learning.
backbone architecture,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43,
no. 2, pp. 652–662, Feb. 2021.
[37] H. Zhang et al., “ResNeSt: Split-attention networks,” in Proc. IEEE/CVF
Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022,
pp. 2736–2746. Qihang He received the B.E. degree from the
[38] R. Chen, D. Cai, X. Hu, Z. Zhan, and S. Wang, “Defect detection method School of Electrical and Information Engineering,
of aluminum profile surface using deep self-attention mechanism under Anhui University of Technology, Ma’anshan, China,
hybrid noise conditions,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–9, in 2020. He is currently pursuing the M.E. degree
2021. with the College of Electrical Engineering and
[39] Y. Cui, Y. An, W. Sun, H. Hu, and X. Song, “Lightweight attention Automation, Hefei University of Technology, Hefei,
module for deep learning on classification and segmentation of 3-D point China.
clouds,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2021. His research interests include machine learning
[40] X. Wang, W. Fan, M. Hu, Y. Wang, and F. Ren, “CFJLNet: and deep learning.
Coarse and fine feature joint learning network for bone age assess-
ment,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–11, 2022, doi:
10.1109/TIM.2022.3193711. Songbing Tao received the Ph.D. degree from the
[41] H. Li, X.-J. Wu, and T. Durrani, “NestFuse: An infrared and visible Department of Automation, Chongqing University,
image fusion architecture based on nest connection and spatial/channel Chongqing, China, in 2020.
attention models,” IEEE Trans. Instrum. Meas., vol. 69, no. 12, Since 2020, he has been a Post-Doctoral
pp. 9645–9656, Dec. 2020. Researcher with the College of Electrical Engineer-
[42] J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, “Gather-excite: ing and Automation, Hefei University of Technol-
Exploiting feature context in convolutional neural networks,” Adv. ogy, Hefei, China. His research interests include
Neural Inform. Process. Syst., vol. 31, pp. 9401–9411, 2018. deep learning and pattern recognition.
[43] S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, “CBAM: Convolutional
block attention module,” in Proc. Eur. Conf. Comput. Vis. (ECCV),
Sep. 2018, pp. 3–19.
Hongtian Chen (Member, IEEE) received the B.S. Weixing Zheng (Fellow, IEEE) received the B.Sc.
and M.S. degrees from the School of Electrical and degree in applied mathematics and the M.Sc. and
Automation Engineering, Nanjing Normal Univer- Ph.D. degrees in electrical engineering from South-
sity, Nanjing, China, in 2012 and 2015, respectively, east University, Nanjing, China, in 1982, 1984, and
and the Ph.D. degree from the College of Automa- 1989, respectively.
tion Engineering, Nanjing University of Aeronautics Over the years, he has held various faculty/
and Astronautics, Nanjing, in 2019. research/visiting positions at Southeast University;
He was a Visiting Scholar with the Institute for the Imperial College of Science, Technology and
Automatic Control and Complex Systems, Uni- Medicine, London, U.K.; The University of Western
versity of Duisburg–Essen, Duisburg, Germany, Australia, Perth, WA, Australia; the Curtin Univer-
in 2018. He is currently a Post-Doctoral Fellow with sity of Technology, Perth; the Munich University of
the Department of Chemical and Materials Engineering, University of Alberta, Technology, Munich, Germany; the University of Virginia, Charlottesville,
Edmonton, AB, Canada. His research interests include machine learning and VA, USA; and the University of California at Davis, Davis, CA, USA.
pattern recognition. He is currently a University Distinguished Professor with Western Sydney
Dr. Chen was a recipient of the Grand Prize of Innovation Award of University, Sydney, NSW, Australia.
Ministry of Industry and Information Technology of the People’s Republic Dr. Zheng has served as an Associate Editor for IEEE T RANSACTIONS
of China in 2019, the Excellent Ph.D. Thesis Award of Jiangsu Province ON AUTOMATIC C ONTROL, IEEE T RANSACTIONS ON F UZZY S YSTEMS ,
in 2020, and the Excellent Doctoral Dissertation Award from the Chinese IEEE T RANSACTIONS ON N EURAL N ETWORKS AND L EARNING S YSTEMS ,
Association of Automation (CAA) in 2020. He currently serves as an IEEE T RANSACTIONS ON C YBERNETICS , IEEE T RANSACTIONS ON C ON -
Associate Editor and a Guest Editor for a number of scholarly journals such as TROL OF N ETWORK S YSTEMS , IEEE T RANSACTIONS ON C IRCUITS AND
IEEE T RANSACTIONS ON I NSTRUMENTATION AND M EASUREMENT, IEEE S YSTEMS —I: R EGULAR PAPERS , and several other flagship journals. He has
T RANSACTIONS ON N EURAL N ETWORKS AND L EARNING S YSTEMS , and been an IEEE Distinguished Lecturer of IEEE Control Systems Society.
IEEE T RANSACTIONS ON A RTIFICIAL I NTELLIGENCE.
Yi Chai received the Ph.D. degree from Chongqing

University, Chongqing, China, in 2001.
He is currently a Professor with the College of
Automation, Chongqing University. His research
interests include computer vision, artificial intelli-
gence, and machine learning.

Pig Face Recognition Based On Trapezoid Normalized Pixel Difference Feature and Trimmed Mean Attention Mechanism

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pig Face Recognition Based On Trapezoid Normalized Pixel Difference Feature and Trimmed Mean Attention Mechanism

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.

72, 2023 3500713

Pig Face Recognition Based on Trapezoid

P IG face recognition has an essential role in pig iden-

Fig. 1. Framework of pig face recognition.

fully connected (FC) layers. A squeeze operation with GAP

Fig. 3. Architecture of TMAM_ResNet50.

where ri and r j are the respective intensity values of pixels i

where sort(·) denotes the sorting of values in descending TABLE II

2) Network Training: In the experimental part of this paper,

where ω and d are the numbers of images and categories,

Fig. 6. Augmentation of pig face images.

Fig. 7. Detection results. (Top) T-NPD. (Bottom) NPD.

examples, further illustrating that T-NPD can effectively detect TABLE IV

C. Evaluation on Proposed TMAM

Yi Chai received the Ph.D. degree from Chongqing

You might also like