Anomaly Detection of Defect Using Energy of Poi - 2024 - Engineering Application

Engineering Applications of Artificial Intelligence 130 (2024) 107706
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence

journal homepage: www.elsevier.com/locate/engappai
Research paper
Anomaly detection of defect using energy of point pattern features within

random finite set framework
Ammar Mansoor Kamoona a,b ,∗, Amirali Khodadadian Gostar a , Xiaoying Wang a , Mark Easton a ,
Alireza Bab-Hadiashar a , Reza Hoseinnezhad a
a
School of Engineering, RMIT University, 3000 VIC, Melbourne, Australia
b Department of Electrical Engineering, Faculty of Engineering, University of Kufa, Najaf, Iraq
ARTICLE INFO ABSTRACT
Dataset link: https://github.com/AmmarKamo In this paper, we propose a lightweight approach for industrial defect detection that is modeled based on
ona/RFS-Energy-Anomaly-Detection-of-Defect anomaly detection using point pattern data. Most recent works use global features for feature extraction to
summarize image content. However, global features are not robust against lighting and viewpoint changes
Keywords:
Defect detection
and do not describe images’ geometrical information which, thus, cannot be fully utilized in manufacturing
Anomaly detection industry. To the best of our knowledge, we are the first to propose using transfer learning of local/point pattern
Random finite set features to overcome these limitations and capture geometrical information of the interested image regions.
Point pattern features We model these local/point pattern features as a random finite set (RFS). In addition, we propose RFS energy,
Transfer learning in contrast to RFS likelihood as an anomaly score. The similarity distribution of point pattern features of the
normal sample has been modeled as a multivariate Gaussian. Parameters learning of the proposed RFS energy
does not require any heavy computation. We evaluate the proposed approach on the MVTec AD dataset, a
multi-object defect detection dataset. Experimental results show the outstanding performance of our proposed
approach compared to the state-of-the-art methods, and the proposed RFS energy outperforms the state-of-
the-art in the few shot learning settings. Also, we evaluated the proposed approach on needle dataset, a real
life application, to detect defective needle samples given three different views. The results show superior
performance compared to DiffeNet.
1. Introduction industry and academia. These methods have been widely used to
identify defects in many industrial products, such as steel (Tang et al.,
Automated visual inspection in the manufacturing process is crucial 2023), wood (Hacıefendioğlu et al., 2022), ceramic (Lu et al., 2022),
for the quality management of the manufactured products. Poor quality fabric (Zhang et al., 2022), and architecture (Jiang et al., 2023).
products provoke cost increases in the handling defects within the Early approaches in defect detection using vision sensors involve
warranty period and decay in market reputation. To avoid unnecessary employing various image processing techniques. A number of examples
losses and improve quality, automated visual inspection is a devel- include the use of Haar filter for tile surface inspection (Liu and Ye,
opment trend in industrial intelligence (Zhang et al., 2022). In this 2022), the use of local order binary pattern for fabric defect detec-
context, defect detection is one of the critical challenges that need to tion (Jing et al., 2013), and the use of SIFT features for printed circuit
be addressed, which paves the way for full industrial intelligence (Gao board (PCB) inspection (Lowe, 2004). Later, the use of both image
et al., 2021). processing and machine learning algorithms has also shown satisfactory
Regardless of the application, the early approach for defect de-
performance. A simple example of this is fabric defect detection via
tection was carried out manually by expert inspectors. However, the
the use of the histogram of oriented gradients (HOG) features with a
efficiency is extremely low to satisfy the industrial intelligence, and
support vector machine (SVM) (Shumin et al., 2011).
recognition of defects is not stable because inspectors are prone to
Advancements in machine learning methods, particularly represen-
fatigue (Neogi et al., 2014).
tation learning or feature learning, provide an alternative approach to
With the development of vision-based sensors, defect detection
based on machine vision techniques has attracted wide attention in overcome the drawbacks of using image processing techniques. The
∗ Correspondence to: RMIT University, Victoria, Australia.

E-mail addresses: ammar.kamoona@rmit.edu.au (A.M. Kamoona), amirali.khodadadian@rmit.edu.au (A.K. Gostar), xiaoying.wang@rmit.edu.au (X. Wang),
mark.easton@rmit.edu.au (M. Easton), alireza.bab-hadiashar@rmit.edu.au (A. Bab-Hadiashar), reza.hoseinnezhad@rmit.edu.au (R. Hoseinnezhad).
https://doi.org/10.1016/j.engappai.2023.107706
Received 17 October 2022; Received in revised form 25 October 2023; Accepted 11 December 2023
Available online 18 December 2023
0952-1976/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
A.M. Kamoona et al. Engineering Applications of Artificial Intelligence 130 (2024) 107706
significant problem with image processing techniques is that they use • Improved Precision: By focusing on local features, we can reduce
implicit engineering features that fail to address complex scenes. A false positives that may arise when anomalies in one part of the
recent study done by Kotsiopoulos et al. (2021) highlights the potential data set skew the global statistic.
of using deep learning in smart manufacturing. In the light of defect • Scalability: As data sets grow in size and complexity, analyzing
detection, various deep learning techniques have been proposed for global features may become computationally intensive. Local fea-
different surface defect detection (Dai et al., 2020; Yin et al., 2020). Yu ture analysis can often be more efficient and scalable, as it focuses
et al. (2017) proposed to use a two-stage approach (segmentation on subsets of the data rather than the entire dataset.
and detection) using a fully convolutional network for surface defect • Evolving Anomaly Detection: With the increasing complexity of
inspection in industrial scenes, and Wu et al. (2017) proposed to use data and the emergence of new anomalies, local feature-based
a CNN-based general defect detection method, in which a multiscale methods can be more adaptable and easier to fine-tune than
scheme was used, to obtain highly accurate identification. Wang et al. global feature-based approaches.
(2019) proposed a LEDNet network for LED chips defect detection
Using handcrafted local features for anomaly detection has been
based on encoder–decoder networks.
addressed in the literature. Most related ones proposed by Vo et al.
The effectiveness of deep representation learning, such as convolu-
(2018) where they model the local point pattern features (e.g., SIFT)
tional neural networks (CNNs), is constrained by the training samples’ using random finite set via multiple instances learning. The proposed
availability which raises two problems: the class imbalance between approach is based on modeling each local SIFT feature as an RFS sample
the normal and defected samples and the difficulty of data annotation. and includes the introduction of a log-likelihood function based on
The above-mentioned problems are well-known in the literature and the RFS density assumptions (e.g. Poisson IID clusters) as an anomaly
a subject of continuing research (Hwang et al., 2019; Oliver et al., score. The anomaly score function allows incorporating the number of
2018; Yoon et al., 2018). In the manufacturing environment, it is quite extracted features (cardinality) information and feature density infor-
simple to obtain as many normal samples as required. In contrast, it is mation into the anomaly detection solution. However, their approach
extremely difficult or even impossible to obtain a sufficient number of use handcrafted features and has only been applied for novelty de-
defective samples in a short period to train a robust classification model tection, where there is a high intra-class variance between normal
for defect detection. Consequently, defect detection is conducted using and abnormal samples. Fig. 1 shows a few image examples used in
unsupervised-based methods such as anomaly detection or one-class novelty detection performed by Vo et al. (2018), where they train
classification using normal samples only. the RFS approach using one of the classes (e.g., class T14) as the
Anomaly detection is defined as the process of identifying in- normal example, and in the test phase, they try to detect another class
stances in the data that deviate from the predefined norm (Pang (e.g., T20), as a novel.
et al., 2021). Anomaly detection in visual data, especially images, The aim of defect detection is to detect different irregularities in an
aims to find ‘‘anomalous’’ images with irregularities. Correspondingly, image. This implies that using local features to capture these variations
image anomaly detection using computer vision techniques has various is more intuitive and practical since they are more robust against
applications such as quality control (Bergmann et al., 2021) or medical illumination and viewpoint changes. To the best of our knowledge, no
images (Baur et al., 2021). The uniqueness of anomaly detection work has explored the use of transfer learning of local features or point
problem compared to supervised classification problem is twofold:(i) pattern features for defect detection. The contribution of the paper is as
the lack of anomalous samples either labeled or unlabeled, which comes follows:
due to the nature of the problem, and (ii) the small difference between
• We propose using local features or point-pattern features instead
normal and anomalous images as shown in Fig. 5. Due to the lack of
of commonly used global feature-based methods and modeling
anomalous samples, the anomaly detection problem is addressed by
these features as unordered sets within random finite set statistics.
building an anomaly detector using only normal samples.
• We propose to model the set of features using the energy-based
Bergmann et al. (2021) proposed an encoder–decoder network for
model to explicitly represent the probability distribution function
defect detection. In addition, to overcome the limitation of having
of the random finite set (RFS) features.
multi-objects defect detection datasets, they proposed an MVTec-AD
• We propose to learn the parameters of the RFS energy on the high
dataset, which includes different objects and textures with a wide
dimension to avoid the decoupling problem due to dimension
variety of defects. Several works in this domain have been proposed
reduction.
(Bergmann et al., 2021; Rudolph et al., 2021), in which the authors
• As opposed to other methods, the proposed approach is unique.
concentrate on performing anomaly detection by looking at the whole
The uniqueness of the approach comes from the fact that it is
image by extracting a single feature vector known as a global feature.
very computationally efficient and requires few samples to train,
The global feature is very effective at summarizing the content of an
proven by few-shot experiments.
entire image (Cao et al., 2020). This is in stark contrast to local features
which are more suitable for describing the geometrical information The rest of this paper is organized as follows: Section 2 summaries
regarding a specific part of an image. Local features have been widely the most current works on defect detection and point pattern feature
used in different computer vision applications, such as simultaneous extraction Section 3 provides background on the energy-based model
localization and mapping (SLAM) (Tang et al., 2019), Structure from and RFS framework and how it can be employed for anomaly detection.
Motion (SfM) (Schönberger et al., 2018). These features offer very Section 4 covers the proposed approach used in the paper. Section 5
memory efficient representation and, most importantly, are more ro- presents experimental results and Section 6 concludes the paper.
bust under extreme viewpoint changes and lighting conditions. Using
local features for anomaly detection offers several advantages over 2. Related works
traditional methods that rely on global features or points. The following
motivates the use of local features for anomaly detection instead of In this section, we review the most recent works on defect de-
commonly used global features: tection using vision sensors. Most of the existing methods that treat
defect detection as anomaly detection can be categorized into two ap-
• Sensitivity to Local Anomalies: Local features allow us to capture proaches, generative-based models and transfer learning-based models
anomalies that are specific to particular regions or segments of the (pre-trained networks). Note that the focus of this work is on anomaly
data. Traditional global feature-based approaches may overlook detection rather than localizing the anomaly in the image. In addition,
these localized deviations, which could be crucial for detecting we review point-pattern-based feature extraction methods since this
subtle but significant anomalies. work is the first to use these features for defect detection.
2
Fig. 1. Examples of gray scale images used in novelty detection proposed by Vo et al. (2018). One class is used in the training phase as normal examples, while other classes are
used in the test phase as novel examples. The red circles are SIFT keypoints.
2.1. Defect detection related works The reconstruction error is modeled by the difference between discrim-
inator features of the input and reconstructed image output which is
In this section, we review the most related works for defect detec- used as an anomaly score. Akcay et al. (2018) exploit the advantage
tion. of adversarial training by making autoencoder act as a generator of
the GAN. The main goal of this method is to force the autoencoder to
2.1.1. Defect detection using generative models generate only the normal samples, which can be done by minimizing
Generative models (Goodfellow et al., 2020; Rudolph et al., 2019) the reconstruction error between the embedding and the original of the
are statistical models that learn the data distribution in an unsupervised reconstructed data.
manner. These models can generate samples from the training data
manifold. Thus, anomaly detection occurs because anomalies cannot 2.1.2. Defect detection with pre-trained networks
be generated since they do not belong to the training set. Examples These methods use the feature space of a pre-trained network to de-
of these generative models are an autoencoder (Qiu et al., 2019; Uzen tect anomalies. Pre-trained networks use deep representation learning,
et al., 2023; Üzen et al., 2022) and GAN (Goodfellow et al., 2020). particularly convolution neural networks trained to discover multiple
Autoencoder-based approaches encode the high dimension features layers of representation in a supervised approach, such as classification
to their latent variable and reconstruct it at the decoder end. Then, tasks. The feature space of these networks is often generic enough
reconstruction error is calculated between the input and the output of to be transferred to dissimilar tasks and still achieve competitive re-
the autoencoder. Thus, a high reconstruction error should reflect there sults (Donahue et al., 2013). Anomaly detection using a pre-trained
is an anomaly. Bergmann et al. (2021) proposed to use the Structure network is usually done using a simple machine learning approach. For
Similarity Index (SSIM) as a loss function to train the autoencoder. example, Andrews et al. (2016) use a simple One-Class-SVM (OCSVM)
The SSIM index is used as a reconstruction error to capture visual on VGG (Simonyan and Zisserman, 2014) features. Nazare et al. (2018)
similarities among the training data. Tan et al. (2021) introduced a evaluated different pre-trained networks with different normalization
novel model called TrustMAE, which combines memory-augmented techniques and performed anomaly detection by applying a 1-Nearest-
autoencoders with trust regions. This model incorporates a memory Neighbor classifier on the reduced PCA features. Sabokrou et al. (2018)
unit to facilitate connections between the encoder and decoder. Dur- model the normal features extracted from a pre-trained network using
ing testing, this memory unit dynamically weighs the importance of unimodal distribution. Rudolph et al. (2021) propose to model the
distribution of pre-trained features using a normalized flow network in
features based on the weight parameters learned during training. As a
which anomaly detection is performed based on the likelihood, lower
result, TrustMAE enhances the accuracy and reliability of reconstruc-
likelihood as higher anomaly score. Fan et al. (2023) uses pre-trained
tion modeling, improving the overall performance of the autoencoder
conditional autoencoder for visual inspection. The proposed approach
in capturing and reproducing data features. The main problem of the
shows the effectiveness of transfer learning of deep neural networks for
autoencoder-based approach is that they can generalize too strongly;
visual inspection. However, this approach still uses global features to
as a result, they can reconstruct the anomalies as good as for the
detect defects in high-quality images.
normal samples. Therefore, Gong et al. (2019) proposed tackling the
generalization problem using a memory module to discretize the latent
2.2. Point-pattern-based feature extraction
space. Zhai et al. (2016) propose to use regularized autoencoder with
an energy-based model for modeling the data distribution of normal
Point-pattern, commonly known as Keypoints, detection is regarded
samples. Samples with high energies are considered an anomaly.
as a local feature extraction method. Local features extraction methods
Dong et al. (2019) proposed a deep learning-based approach called
have many applications ranging from image retrieval (Teichmann et al.,
PGA-Net for automated surface defect detection with pixel-wise ac-
2019), 3D reconstruction (Zaman et al., 2022), camera pose estima-
curacy. Pyramid Feature Fusion (PFF) module is provided to fuse
tion (Sarlin et al., 2021) and medical image application (Busam et al.,
multilevel features from all stages of backbone CNN into multiscale res-
2018). The aforementioned applications show the advantage of using
olutions, then a Global Context Attention (GCA) is designed to ensure
sparse features over a direct network (tensor-based methods). Local
efficient information transfer from low-resolution to high-resolution; in
feature extraction can be classified into handcrafted and learned-based
so doing the introduction of boundary refinement and deep supervision
methods.
allow to optimize the network and accelerate convergence during the
training process, thus obtaining the final prediction. Handcrafted detectors. Classical local feature detection or keypoints
GAN-based anomaly detection methods are developed based on the detection performs keypoints detection and descriptors computation
assumption that the GAN model can only generate normal samples. independently. The localization of these keypoints is done by a hand-
Commonly GAN models consist of two steps, namely, generator and engineered algorithm that looks at the geometric structure in the
discriminator. Schlegl et al. (2019) propose a two-stage training ap- images. SIFT (Lowe, 2004) extracts keypoints by finding blobs over
proach for anomaly detection. Firstly, the GAN is trained, then an multi-scale levels on images and uses the gradient of the histogram
encoder is optimized as an inverse generator. The idea of using the as a descriptor. Wang et al. (2022) and Beaudet (1978) detectors
generator as a decoder enables the calculation of reconstruction error. use the first and second-order derivatives to locate corners or blobs
3
Fig. 2. Superpoint point pattern feature extraction robustness against noise (DeTone et al., 2017).
in images. Later, a multi-scale and affine transformations variation of small image region and can be significantly affected by small changes
these detectors has been proposed in Mikolajczyk and Schmid (2004) in pixel intensities. Another possible approach is to use describe-then-
and Tuytelaars and Mikolajczyk (2008). An acceleration of the detec- detect approach (Tian et al., 2020). D2-Net (Dusmanu et al., 2019)
tion process via using the integral images as an approximation of the uses a single CNN network for joint detection and description using
Hessian matrix has been proposed in Bay et al. (2008), which is known the describe-to-detect approach. The detection is based on the local
as SURF. A-KAZE (Alcantarilla and Solutions, 2011) keypoint detection maxima across the channels and spatial feature maps.
uses a Hessian detector applied to non-linear diffusion scale space
in contrast to the commonly used Gaussian pyramid. Finally, maxi- 3. Background
mally stable extremal regions (MSER) (Burger and Burge, 2022) detect
keypoints by segmenting the image and looking for stable regions. This section outlines the background theory of energy-based models
Representation learning detectors. The success of representation learning and random finite set theory.
in general object detection methods and feature descriptors inspires
these methods. The earliest attempt to use machine learning for cor- 3.1. Energy-based models
ner keypoints detection was made by the introduction of Features
from Accelerated Segment Test (FAST) method (Rosten and Drum- Energy minimization is a convenient way to represent the general
mond, 2006). Various works have been proposed to extend FAST process of reasoning and inference. Energy-based models (EBMs) are a
method by either optimizing it (Leutenegger et al., 2011), or adding family of statistical models that use energy function 𝐸(𝐱; 𝜃) to represent
a descriptor and adding orientation estimation (Rublee et al., 2011). probability distribution through unnormalized negative log probability.
The recent advances in convolutional neural networks (CNNs) for The density function for input 𝐱 ∈ R𝐷 can be describe as:
representation learning have also made an impact on keypoints de-
𝑒−𝐸(𝐱;𝜃)
tection. Verdie et al. (2015) introduced temporally invariant learned 𝑝(𝐱; 𝜃) = , (1)
detector (TILDE) that utilizes CNN to detect keypoints that are robust ∫𝐱 𝑒−𝐸(𝐱;𝜃) 𝑑𝐱
under severe weather and illumination changes and trained on multiple where ∫𝐱 𝑒−𝐸(𝐱;𝜃) 𝑑𝐱 is the normalized factor to ensure the density func-
piece-wise linear regression models. Another approach to train CNN tion is integrated to 1 i.e. ∫ 𝑝(𝐱)𝑑𝐱 = 1. Parameters learning of EBM
for keypoints detection is by using covariant constraints (Lenc and models can be done by assigning lower energy (hence high probability)
Vedaldi, 2016). Barroso-Laguna et al. (2019) proposed KeyNet that to the observed samples. Learning via the Maximum likelihood estima-
uses handcrafted filters and learned filters altogether to detect key- tor (MLE) is often not practical due to the intractability of the integral
points. The above approaches mainly focus on keypoints detection only. in denominator of (1). A common way to calculate the likelihood
In DeTone et al. (2017), DeTone et al. proposed two deep networks function is to use Markov Chain Monte Carlo (MCMC) methods and
named MagicPoint and MagicWarp. MagicPoint extracts salient points approximate the integral (Zhai et al., 2016). The energy function for
and MagicWarp parameterize a transformation between two pairs of point pattern within a random finite set has not been explored in the
images. Later, a self-supervised deep joint point detector and descriptor literature yet.
was proposed (DeTone et al., 2018) named as SuperPoint. SuperPoint
network is trained on synthetic shapes (with rendered lines, cubes,
3.2. Random finite set-based anomaly detection
stars, triangles, quadrilaterals, and checkboards) to detect corners.
SuperPoint can be utilized for detecting corners in noisy images, as
Random Finite Sets Statistics (FISST) is one of the stochastic geo-
shown in Fig. 2. Furthermore, feature detection and description are
metric models, which is a well-established study area that dates back to
done in a self-supervised manner. The disadvantage of this method
the famous Buffon’s needle problem (Chiu et al., 2013). Random Finite
is that it can be used only on grayscale images, thus, a color-based
defect cannot be detected. LIFT (Yi et al., 2016) proposed end-to-end Sets (RFSs) are set-valued random variables with an unknown number
learning of keypoints detection and description including orientation of elements that are themselves random. Within the scope of this paper,
estimation of every feature. LF-net (Ono et al., 2018) estimates fea- there is a set of keypoints and their corresponding descriptors for each
tures’ scale, orientation, and position by joint learning of detector and particular image. Given an underlying space  as keypoints space, a
descriptor. The LF-net trains the network by pair of images with a non- random finite set draws instantiation from the hyperspace of all finite
differential backpropagation for detection and the descriptor trained subsets  () of . Generally, a random finite set may contain a finite
by triplet loss. Instead of using local maxima in detection keypoints, number of elements. Possible realizations of a random finite set 𝑋 ∈
R2D2 (Revaud et al., 2019) trains the network to detect reliable and  () are
repeatable keypoints and have uniform coverage of the image. R2D2 { } { } { }
𝑋 = ∅, 𝑋 = 𝐱1 , 𝑋 = 𝐱1 , 𝐱2 , … , 𝑋 = 𝐱1 , … , 𝐱|𝑋| , (2)
uses dilated convolution to preserve the spatial resolution. The above
methods use the detect-then-describe approach in which the keypoints where 𝐱𝑖 , 𝐱𝑗 ∈  and ∀𝑖 , 𝑗 ∶ 𝐱𝑖 ≠ 𝐱𝑗 . Also, RFS refers to a random
detection network is responsible for detecting low-level features such (spatial) point pattern, such as measurements displayed on a radar
as corners or blobs under different scales, rotations, and viewpoints, screen. RFS is different from a random vector in the way that: the
while the descriptor network is responsible for high-level information number of elements in the set is random, and the elements themselves
via extracting patches around these keypoints. The problem with such are random and unordered. In RFS-based visual anomaly detection, the
approach is the lack of repeatability of these keypoints under strong measurements are modeled as an RFS. Hence, each extracted set of
|𝑋|
appearance changes due to the fact that the local detector considers a features 𝑋 = {𝐱𝑖 }𝑖=1 is treated as an RFS, where 𝐱𝑖 ∈ R𝐷 is a high
4
dimension real continuous random variable. The density of RFS with where  (⋅; 𝜇, 𝛴) is the multivariate Gaussian density function with
respect to dominating measure 𝜈 is given by (Vo et al., 2018): mean 𝜇 ∈ R𝐷 and covariance 𝛴 ∈ R𝐷×𝐷 . The above optimization has a
closed-form solution as follows:
𝑝(𝑋) = 𝑝(|𝑋|)(|𝑋|)!𝑉 |𝑋| 𝑝|𝑋| (𝐱1 , … , 𝐱|𝑋| ), (3) /
∑ ∑ ∑
where |𝑋| is the cardinality of the set 𝑋, 𝑝(|𝑋| = 𝑛) = 𝑝(𝑛) is the discrete 𝜇̂ = 𝐱 |𝑋| (13)
𝑋∈ 𝐱∈𝑋 𝑋∈
cardinality distribution, 𝑉 is the unit hyper-volume, and 𝑝𝑛 (𝐱1 , … , 𝐱𝑛 ) /
∑ ∑ ∑
is the symmetric joint feature density for a given cardinality |𝑋| = 𝑛. 𝛴̂ = (𝒙 − 𝜇)(𝒙
̂ ̂ ⊤
− 𝜇) |𝑋|. (14)
In practice, different assumptions can be considered regarding 𝑋∈ 𝐱∈𝑋 𝑋∈
the mathematical form of the RFS density 𝑝(𝑋). For instance, Pois- Eqs. (13) and (14) are used to estimate the mean and covariance of a
son RFS (Li, 2022), Beta RFS (Kamoona et al., 2019), the Bernoulli
single Gaussian component.
RFS (Mahler, 2007), the multi-Bernoulli RFS (Li, 2022), and finally the
From Eq. (13) and (14), it is obvious that the estimated mean 𝜇̂ and
generalized labeled multi-Bernoulli RFS (Papi et al., 2015). Most of the
covariance 𝛴̂ for single feature density are simply the sample mean
densities mentioned earlier are used to model multi-object entity as an
and covariance respectively. To avoid the singularity problem of the
RFS. Another simplified assumption about RFS form is an independent
identical distributed (IID) cluster RFS density as follows (Liu et al., covariance estimate on the high dimension space 𝛴̂ ∈ R𝐷×𝐷 , Vo et al.
2023; Kamoona et al., 2019): (2018) use PCA algorithm to reduce the dimension of the training fea-
tures as a preprocessing step. However, the problem with this approach
𝑝(𝑋) = 𝑝(|𝑋|)(|𝑋|)!𝑉 |𝑋| [𝑝(⋅)]|𝑋| , (4) is that it suffers from decoupled model learning and the incapability
∏
where 𝑝(⋅) is single feature density and [𝑝(⋅)]|𝑋|
≜ 𝑥∈𝑋 𝑝(𝐱) is a
of preserving essential information (An et al., 2022) due to two-step
finite set exponential. Given the discrete cardinality distribution 𝑝(|𝑋|) approaches, dimension reduction, and parameters learning. We avoid
follows the Poisson distribution, the IID-cluster RFS turns into Poisson this by performing parameters learning in the high dimension space
RFS given by: as shown in Section 4.3, which does not require any training phase
∏ and can be estimated given few training samples as proven in few-shot
𝑝(𝑋) = 𝜌|𝑋| exp (−𝜌) [𝑈 ]𝑋 𝑝(𝐱), (5)
learning experiments, see Section 5.9.
𝑥∈𝑋
where 𝜌 > 0 is Poisson intensity.
4. Proposed approach
3.2.1. Parameters learning of IID-RFS cluster density
Parameters learning of IID-RFS cluster density is similar to param- In this paper, we propose using an energy-based RFS function as an
eter learning of probabilistic generative models (Goodfellow et al., anomaly score to detect detection, as illustrated in Fig. 3. The proposed
2016), which is done by maximizing the log-likelihood of normal approach consists of two parts: local feature extraction and RFS energy
samples. Given IID cluster RFS, the goal is to estimate the following calculations. The proposed approach is memory and computationally
parameters: efficient and does not require a heavy training computation. For feature
∏ ∏ extraction, a pre-trained CNN local features network is used to capture
̂ 𝜃̂ = argmax
𝜌, (𝜌, 𝜃|𝑋) = argmax (𝑋|𝜌, 𝜃) 𝑝(𝜌, 𝜃), (6)
𝜌,𝜃 𝑋∈ 𝜌,𝜃 𝑋∈ the geometrical information of the object. Then, a set of extracted point
pattern features is modeled as Random Finite Set (RFS). Finally, the
where (𝜌, 𝜃) are parameters of the cardinality as 𝑝𝑐 (|𝑋|; 𝜌), and multi-
feature joint density as 𝑝(𝐱1 , … , 𝐱|𝑋| ; 𝜃) respectively, (𝑋; 𝜌, 𝜃) is the energy of the RFS density is computed for the extracted point pattern
likelihood function that measure the plausibility of (𝜌, 𝜃), given some features.
observed set 𝑋, where  is the ensemble of all training feature sets,
and finally 𝑝(𝜌, 𝜃) is the prior distribution which assumed uniform in 4.1. Feature extraction backbone
most applications. In light of this, Eq. (6) can be more simplified to the
following: In anomaly detection, we are interested in detecting the irregular-
∏
(𝜌, ̂ = argmax
̂ 𝜃) (𝜌, 𝜃|𝑋). ities in the image, and the variation of these irregularities could be
(7)
𝜌,𝜃 𝑋∈ very small. One approach to capture these irregularities is to extract
Eq. (7) can be re-written for Poisson RFS density as follows: and use global features. However, the problem with this approach
∏ is that it can be affected by a wide range of conditions, such as
(𝜌, ̂ = argmax
̂ 𝜃) 𝑝(|𝑋|; 𝜌)|𝑋|!𝑈 |𝑋| [𝑝(⋅; 𝜃)]𝑋 . (8) lighting and viewpoint changes. Alternatively, we propose using a local
𝜌,𝜃 𝑋∈
feature extraction method, namely D2-Net (Dusmanu et al., 2019).
Vo et al. (2018) have proven that the aforementioned optimization This network uses the describe-to-detect approach (see Fig. 4).Given an
can be turned into two different optimizations one for cardinality and input image 𝐼, a D2-Net feature extractor network  produces two sets
one for feature density as: of output 𝑃 , 𝑋 =  (𝐼), where 𝑃 = {𝑝𝑖1 ,𝑗1 , … , 𝑝𝑖|𝑃 | ,𝑗|𝑃 | } is set of keypoints
∏
𝜌̂ = argmax 𝑝(|𝑋|; 𝜌) (9) location, and 𝑋 = {𝑑𝑖1 ,𝑗1 , … , 𝑑𝑖|𝑋| ,𝑗|𝑋| } is set of their vector descriptor
𝜌 𝑋∈ in which 𝑑𝑖,𝑗 = 𝐹𝑖,𝑗 , 𝑑 ∈ R𝑛 . Here, 𝐹 is a 3D tensor 𝐹 ∈ Rℎ×𝑤×𝑛 , where
∏
𝜃̂ = argmax [𝑝(⋅; 𝜃)]𝑋 . (10) ℎ × 𝑤 is the spatial dimension of the feature map and 𝑛 is the number
𝜃 𝑋∈ of the channels. The 3D tensor 𝐹 can be seen as a set of 2D responses
/ of features maps 𝐷:
Substituting 𝑝(|𝑋|; 𝜌) with 𝜌|𝑋| 𝑒−𝜌 |𝑋|! in Eq. (9), the solution simply
turns out as average cardinality of all training feature sets values as 𝐷𝑘 = 𝐹∶∶𝑘 , 𝐷𝑘 ∈ Rℎ×𝑤×𝑛 , (15)
follows:
/
∑ where 𝑘 = 1, … , 𝑛. The feature extraction  network can generate 𝑛
𝜌̂ = |𝑋| ||. (11) different response maps 𝐷𝑘 . These response maps are similar to its
𝑋∈
analogue counter apart Difference-of-Gaussian response maps in SIFT.
Assuming that all point features within each feature set are IID vectors
To detect a location of keypoint and their descriptor, D2-Net defines
in R𝐷 and distributed according to a Gaussian density with parameters
detection at point (𝑖, 𝑗) if it satisfies the following:
𝜃 = (𝜇, 𝛴), Eq. (10) turns into:
[ ] 𝑘
(𝑖, 𝑗) is a detection ⇔ 𝐷𝑖,𝑗 is a local max in𝐷𝑘 ,
∏ ∏
(𝜇, ̂ = argmax
̂ 𝛴)  (𝐱; 𝜇, 𝛴) , (12) 𝑡 (16)
with 𝑘 = argmax𝐷𝑖,𝑗 .
(𝜇,𝛴) 𝑋∈ 𝐱∈𝑋 𝑡
5
Fig. 3. Proposed energy-based RFS of point pattern features for anomaly detection.
Fig. 4. D-Net features extraction pipeline.
The keypoint at location (𝑖, 𝑗) is detected if only if (𝑖, 𝑗) must be the local 4.2. Point pattern RFS energy
maximum in channel 𝑘 that it is the maximum value in location 𝑖, 𝑗
among the channels in tensor 𝐹 . The aforementioned method is called The pre-trained features extraction network returns a set of local
hard detection module which cannot be used to train the model using point pattern features that can be modeled as a RFS. For RFS anomaly
back-propagation. Thus, during the training phase, a soft detection detection, the general approach for modeling the likelihood of new
module is used to detect keypoints. The soft detection module involves measurement 𝑋 ∗ is via RFS log-likelihood as follows:
firstly, a soft local max is defined as follows (Dusmanu et al., 2019): (𝑋 ∗ ; 𝜌, 𝜃) = log(𝑝(|𝑋 ∗ |; 𝜌)) + log(|𝑋 ∗ |!)
∏ (21)
𝑘 )
exp(𝐷𝑖,𝑗 + log 𝑝(𝐱; 𝜃)
𝑘 𝐱∈𝑋 ∗
𝛼𝑖 , 𝑗 = ∑ 𝑘 )
, (17)
(𝑖̂,𝑗)∈
̂ exp(𝐷𝑖,𝑗
(𝑖,𝑗) where (𝑋 ∗ ; 𝜌, 𝜃) is the RFS log-likelihood, 𝜌, 𝜃 are the cardinality and
where  (𝑖, 𝑗) defines a set of 9 neighbors of pixel (𝑖, 𝑗). Secondly, soft joint feature density parameters respectively. With Poisson assumption
channel selection is calculated as ratio-to-max in which per descriptor for cardinality and multivariate Gaussian distribution,  (𝐱; 𝜇, 𝛴) for
single feature density, the RFS log-likelihood turns into:
represents channel-wise-non maximum suppression as follows:
𝜌 exp(−𝜌|𝑋 ∗ |)
𝑘
𝐷𝑖,𝑗 (𝑋 ∗ ; 𝜌, 𝜇, 𝛴) = log( ) + log(|𝑋 ∗ |!)
𝑘 |𝑋 ∗ |!
𝛽𝑖,𝑗 = 𝑡
(18) ∏ (22)
max𝑡 𝐷𝑖,𝑗 + log  (𝐱; 𝜇, 𝛴).
𝐱∈𝑋 ∗
Then, a single score map is calculated by maximizing both scores of all
Due to unit inconsistency, the above log-likelihood cannot be used for
feature maps 𝑘:
ranking the point pattern features. To overcome this, Vo et al. (2018)
𝑘
𝛾𝑖,𝑗 = 𝑚𝑎𝑥𝑘 (𝛼𝑖,𝑗 𝑘
, 𝛽𝑖,𝑗 ). (19) proposed RFS ranking 𝑟(𝑋) as follows:
[ ]𝑋
Finally, to detect point, the soft score detection 𝑠𝑖,𝑗 for location (𝑖, 𝑗) is 𝑝(⋅)
𝑟(𝑋) = 𝑝(|𝑋|) , (23)
obtained by performing image normalization as follows: ‖𝑝(⋅)‖22
𝛾𝑖,𝑗 where ‖𝑝(⋅)‖22 is the squared 𝐿2 -norm of 𝑝(⋅). The above ranking
𝑠𝑖,𝑗 = ∑ . (20)
̂ 𝛾𝑖̂,𝑗̂
(𝑖̂,𝑗) function is based on the assumption that ranking function 𝑟(𝑋) ∝
𝑝(|𝑋|) 𝑝(𝑋). Calculating the 𝐿2 in a high dimension is not a feasible
6
Fig. 5. Samples of MVTec-AD dataset, defect free (black box) and defected samples (red box).
solution due to the singularity problem. Also, the RFS ranking function learning of cardinality and feature density as two separate optimization
cannot capture the small variation caused by defects due to the 𝐿2 problems; see Eqs. (11), (13) and (14).
normalization. The main significant difference here from Section 3.2.1 is that
Inspired by the current success of applying the energy model for parameter learning of the feature density is done on the high dimension
anomaly detection (Zhai et al., 2016; Grathwohl et al., 2020; Liu et al., space 𝜇 ∈ R𝐷 , 𝛴 ∈ R𝐷×𝐷 . Since the true distribution of features
2020), we propose a novel RFS energy function that will be used as is unknown, the mean and covariance need to be learned. However,
an efficient approach for defect detection. The proposed RFS energy covariance estimation using Eq. (14) requires the number of training
function for IID RFS is based on the assumption that RFS energy 𝐸(𝑋) ∝ features samples 𝑋 = 𝐱𝟏 , … , 𝐱𝑁 , and |𝑋| = 𝑁, to be significantly
𝐸(𝑝(|𝑋|))×𝐸(𝑝(𝑋)) is proportional to both cardinality energy and single larger than feature dimension 𝑁 ≫ 𝐷. Accordingly, when there are few
feature energy as follows: normal samples, as in MVTec-AD dataset, and also the case in medical
∑ applications where it is expensive to obtain a large number of normal
𝐸(𝑋; 𝜌, 𝜇, 𝛴) = − log(𝜌|𝑋| ) + log(|𝑋|!) + 𝑀(𝐱; 𝜇, 𝛴)2 , (24) samples, this setting is technically known as few-shot learning, the
𝐱∈𝑋
estimation becomes unstable, and leads to the singularity problem. To
where 𝐸(⋅) is the Poisson RFS energy, 𝑀(⋅)2 is the squared Mahalanobis overcome this problem, we use shrinkage covariance estimation (Ledoit
distance: and Wolf, 2004) as follows:
𝑀(𝐱; 𝜇, 𝛴)2 = (𝐱 − 𝜇)𝛴 −1 (𝐱 − 𝜇)𝑇 . (25) ̂

𝑡𝑟(𝛴)
𝛴̂ 𝑠ℎ𝑟𝑢𝑛𝑘 = (1 − 𝛼)𝛴̂ + 𝛼 𝐼, (26)
𝐷
The Mahalanobis distance is a point-to-distribution distance introduced
where 𝛴̂ is the empirical estimated covariance, 𝛼 is the shrinkage
in 1936 (Mahalanobis, 1936), which is well-known for modeling the
intensity, and 𝐼 is the identity matrix. Shrinkage estimation is a linear
sample’s uncertainty. Furthermore, recent works show the effectiveness
combination of empirically estimated covariance 𝛴̂ and the scaled
of using Mahalanobis distance to model the pre-trained features of nor-
identity matrix 𝐼 and 𝛼 regulates this influence on the final matrix.
mal samples (Christiansen et al., 2016; Rippel et al., 2021). However,
Ledoit and Wolf (2004) derived a closed-form solution by minimizing
to the best of our knowledge, no work has used the sum squared of
the expected squared error 𝐸[‖𝛴̂ 𝑠ℎ𝑟𝑢𝑛𝑘 − 𝛴‖2 ] for the amount of the
Mahalanobis distance to model the local point pattern feature of normal
shrinkage allowed to obtain the optimal parameter of 𝛼 given the
samples within the RFS framework. ̂
unstable covariance estimate 𝛴.
4.3. Parameters learning of RFS energy function 4.4. Choosing the 𝑡𝑜𝑝 𝑘 squared Mahalanobis distances
To learn the parameter of RFS energy without countering the prob- In the defect detection problem, we are looking for small irreg-
lem of decoupled learning (An et al., 2022), the parameters learning is ularities in RFS energy caused by a defect in a small region in the
performed on the high dimensional space without further processing, image. Due to the sum operation in RFS energy, capturing the small
such as PCA feature reduction. In addition, inspired by the current variation is close to be negligible. The small variation is reflected via
success of using pre-trained model for anomaly detection (Lee et al., the Mahalanobis distance from the normal feature distribution. The
2018; Rippel et al., 2021), this work uses a pre-trained network for point pattern descriptor of the defect region definitely has greater
extracting local features. Mahalanobis distances compared to the normal one. For this reason,
The IID cluster Poisson RFS energy (Eq. (24)) has three parameters we propose to use the features that have a higher distance from the
to learn: Poisson intensity 𝜌, mean 𝜇, and covariance 𝛴 of multivariate normal distribution in our calculation. We implemented this by ranking
Gaussian density. Following the approach described in Section 3.2.1, the squared Mahalanobis distance and choosing only the 𝑡𝑜𝑝 𝑘 percent
the parameters learning is performed by considering the parameter in the RFS energy calculation.
7
5. Experiments
5.1. Dataset
5.1.1. MVTec-AD dataset

MVTec-AD (Bergmann et al., 2021) is a real-world industrial image
dataset that has been developed as a comprehensive and challeng-
ing benchmark for defect detection. The dataset has a large-scale
collection of texture and object images. In general, it contains 5354
high-resolution color images of 15 different categories (ten objects and
five textures). Each object has normal samples and defective samples.
The training set contains only normal samples, and the test set contains
both normal and defect samples. Seventy different types of defects
differ in size, shape, and structure, including scratches, dents, contam-
ination, and various structural deformations present in this dataset.
Fig. 5 shows a set of these samples. The first row shows image examples
of defect-free samples, while the second row shows image examples of
Fig. 6. Distance (d) between the start of dimples and the heel of the needle bevel.
defective samples.
5.1.2. Needle dataset

The manufacture of medical In Vitro Fertilization (IVF) needles needle dataset with no cropping involved. In addition, we use the multi-
subjects to multidimensional variation of their tip geometry. Even with scale option of D2-Net in the feature extraction phase, which scales the
a low defect rate, the advanced manufacturing of IVF needles still image into three scales (0.5, 1, and 2). Our model does not require
requires a 100% inspection of every unit to remove all defects from pro- any training phase, and parameter learning of the multivariate RFS
duction. This is to ensure stringent quality standards in safety-critical energy has closed-form for the mean of RFS features and cardinality of
applications. The dataset particularly covers images of N5521_RS1222 the features set. The only computation performed is in estimating the
type IVF needle tips with echo tipping position defects (mainly, tip covariance of the feature density using the shrinkage method (Ledoit
damages also shown in some defective samples though) and images and Wolf, 2004).
of specification-conforming IVF needle tips. Images were taken by a
vision system consisting of a Blackfly USB3 color camera, a TechSpec 5.3. Evaluation protocol
PlatinumTL 0.9X telecentric lens, a CCS top dome white light, and
LED created white backlight. Three images from the top, side and back We follow the same protocol defined in Rudolph et al. (2021),
perspectives were taken for one needle tip. Akcay et al. (2018), Rippel et al. (2021) to fully assess the capability of
Dimples (also called echo tipping) are ground on needle cannula the proposed approach by reporting the Area Under Receiver Operator
walls for locating needle positions using ultrasound to ensure that the Characteristic (AUC). The AUC measures the area under the true posi-
needle tip is within the target. For N5521_RS1222 type IVF needles, tive rate as a function of the false positive rate. The advantage of using
dimples of good needle samples start at the heel of bevel +2/−1 mm AUC is that not sensitive to any threshold or percentage of anomalies
(bevel can cut into echo tip dimples up to 1 mm), as shown in Fig. 6. present in the test set.
Bevels which cut into the echo tip dimples more than 1 mm (or where
the heel is greater than 2 mm away) are considered defective. The nee- 5.4. MVTec-AD experimental results and discussion
dle tips in the images were visually examined at a pre-determined scale
(replicating that of a 15_Optical Microscope) to determine whether In this section, we explore the performance of the proposed ap-
there was any damage to the tip (or at least any damage which was proach against different deep-learning models that use global features
visible at this pre-determined scale). The acceptance criteria used to for defect detection. We evaluate the proposed approach on the MVTec
qualify the needles was based on the manufacturer’s Standard Operat- AD dataset and report the AUC accuracy. We compare the proposed
ing Procedure. There are 249 images in the dataset in total, which have approach with two transfer learning-based models, One-Class SVM
been manually classified into ‘Pass’ and ‘Fail’ categories by a subject (OCSVM) (Andrews et al., 2016) that used transfer learning of CNN
matter expert. Specifically, there are 84 images of 28 good samples global feature, the use of distance to the nearest neighbor (1-NN)
(where dimples start at heel of bevel 2 mm in this group), 45 images of the CNN features, and the z-score normalization (Nazare et al.,
of 15 defected samples of standard needle size, and 120 images of 40 2018). Also, we compare our model with non-transfer learning methods
defected samples of smaller needle size (see Figs. 7 and 8). such as GeoTrans (Golan and El-Yaniv, 2018). GeoTrans calculates
the anomaly score based on the classification of conducted geomet-
5.2. Implementation details rical transformations that alleviate the need for a generative model.
GANomaly (Akcay et al., 2018) uses adversarial learning (GAN) by
In all experiments, we use pre-trained CNN-based for point pattern exploiting the autoencoder as the generator of the GAN, which forces
feature extraction. We have used a D2-Net network for feature extrac- the decoder to generate only normal samples. DSEBM (Zhai et al.,
tion, and we have examined different other networks, more details in 2016) uses the energy model for anomaly detection of defects. Also,
the ablation study. D2-net uses VGG-16 as backbone feature extractor. we compare our approach with the recent normalized flow approach
We use pre-trained D2-net which is trained on MegaDepth dataset (Li known by DifferNet (Rudolph et al., 2021).
and Snavely, 2018). In our experiments, we use the last layer of D2-net Table 1 shows the results for the MVTec AD dataset. Despite the fact
network which is conv-4 followed by a hard detection module. After the that the proposed RFS energy-based does not require any training, the
hard-detection module, the network generates a set of keypoints and proposed approach outperforms most existing models that use global
their descriptors, each descriptor is with size 512-D. Before passing the features. There is only a very low margin of 0.2 compared to the
image to the D2-Net, we resized all images into 256 × 256 and cropped current state-of-the-art DifferNet (Rudolph et al., 2021). However, our
at the center, which yields an image size 224 × 224 except for the approach outperforms all methods for 11 out of 15 objects and achieves
8
Fig. 7. Samples of defect free Needles. Columns reflect different views of the same needle, as view-1, view-2, and view-3, respectively.
Algorithm 1 RFS energy-based defect detection.

1: procedure Defect_Detection(𝐼 ∗ , 𝜌, 𝜇, 𝛴, 𝑡𝑜𝑝 𝑘, 𝐸 < 𝐸th ) ⊳ 𝐼 ∗ ∶ input image, 𝜌, 𝜇, 𝛴, ∶ RFS energy parameters, 𝑡𝑜𝑝 𝑘 ∶ RFS energy top 𝑡𝑜𝑝 𝑘.
2: if needles dataset then
3: ∗ , 𝐼∗ , 𝐼∗ }
𝐼 ∗ = {𝐼𝑡𝑜𝑝 ⊳ 𝐼 ∗ ∶ needle images of different views
𝑠𝑖𝑑𝑒 𝑏𝑎𝑐𝑘
4: end if
5: 𝐸𝑠 ← 0 ⊳ 𝐸𝑠 ∶ RFS energy of the entire needle sample
6: for 𝐼 ∈ 𝐼 ∗ do
7: (𝑝𝑡𝑠, 𝑋 ∗ ) ← Keypoint_Detector(𝐼) ⊳ A set of keypoints 𝑝𝑡𝑠 and their descriptors 𝑋 ∗ are extracted from the 𝐼.
8: 𝐸 ← RFS Energy(𝑋 ∗ ;𝜌, 𝜇, 𝛴.𝑡𝑜𝑝 𝑘) ⊳ RFS energy is calculated for each view.
9: 𝐸𝑠 ← 𝐸𝑠 + 𝐸 ⊳ Sum the RFS energy of different views
10: end for
11: 𝜂←0 ⊳ Sample is normal by default.
12: if 𝐸𝑠 < 𝐸th then 𝜂 ← 1 ⊳ Defect sample detected.
13: end if
14: return 𝜂
15: end procedure
16: function RFS Energy(𝑋; 𝜌, 𝜇, 𝛴, 𝑡𝑜𝑝 𝑘)

17: 𝑀 ←0
18: for 𝑥 ∈ 𝑋 do
∑
19: 𝑀(𝑥)2 ← 𝑀(𝑥; 𝜇, 𝛴)2 ; 𝑀 ← 𝑀 + 𝑀(𝑥)2 ⊳ Squared Mahalanobis distance terms are computed and 𝑥∈𝑋 𝑀(𝑥)2 is gradually formed.
20: end for
21: 𝑀 ← 𝑟𝑎𝑛𝑘(𝑀) &𝑝𝑖𝑐𝑘𝑀(𝑡𝑜𝑝 𝑘)
22: ⊳ Choose only the 𝑡𝑜𝑝 𝑘 𝑀 squared distances
23: 𝐸 ← − log(𝜌|𝑋| ) + log(|𝑋|!)| + 𝑀 ⊳ RFS energy is calculated from (24).
24:
25: return 𝐸
26: end function
the second-best performance for the rest, 3 out of 4. The lowest perfor- feature extraction backbone. Better performance has been observed for
mance of the proposed approach lies in the screw object. It achieves this object using other point pattern feature extraction backbone, see
only (AUC=70), and we argue the main reason for this is due to the the ablation study. The ROC curves of our proposed approach for 15
9
Fig. 8. Samples of defective Needles, defective samples of standard needle size in red box and green box shows defective samples of smaller needle size.
categories are shown in Figs. 9, 10, and 11. We can observe a high
true positive rate of our proposed approach for most objects. The pro-
posed approach offers several notable advantages. First and foremost, it
exhibits impressive memory efficiency, a quality substantiated by the
findings presented in the respective section. Furthermore, this model
boasts the advantage of requiring minimal training, as corroborated by
the results highlighted in the few-shot learning section. However, it is
important to acknowledge certain limitations inherent to the model.
Specifically, its reliance on point pattern feature extraction networks
makes it susceptible to any shortcomings or inaccuracies within this
component. As a result, any failure in the feature extraction process
can ultimately lead to a failure in the overall defect detection process.
The source code of the paper is available at.1
5.5. Needle experimental results and discussion
In this section, the performance of our proposed RFS energy on the

Needle dataset is evaluated. Each sample of the needle dataset has three
different views, top, side, and back. All views of a defect free needle are Fig. 9. ROC curve for bottle, cable, capsule, hazelnut, and metalnut using D2-Net
defect free. However, for a defective sample, more than one view might features.
1
https://github.com/AmmarKamoona/RFS-Energy-Anomaly-Detection-of-
be defective. Besides this, the proposed method only requires ‘Pass’
Defect or ‘Fail’ annotation to needle samples without further requirement for
10
Table 1
AUC in % for detected anomalies of all categories of MVTec AD grouped into textures and objects. The best results are in bold, second best
are underlined.
Category GeoTransa GANomalyb DSEBMc OCSVMd 1-NNe DifferNetf RFS energy (Our)
Bottle 74.4 89.2 81.8 99.0 98.7 99.0 100
Cable 78.3 75.7 68.5 80.3 88.5 95.9 92.0
Capsule 67.0 73.2 59.4 54.4 71.1 86.9 89.4
Hazelnut 35.9 78.5 76.2 91.1 97.9 99.3 99.9
Metal Nut 81.3 70.0 67.9 61.1 76.7 96.1 98.2
Objects
Pill 63.0 74.3 80.6 72.9 83.7 88.8 94.5
Screw 50.0 74.6 99.9 74.7 67.0 96.3 70.0

Toothbrush 97.2 65.3 78.1 61.9 91.9 98.6 99.2
Transistor 86.9 79.2 74.1 56.7 75.6 91.1 91.9
Zipper 82.0 74.5 58.4 51.7 88.6 95.1 98.7
Carpet 43.7 69.9 41.3 62.7 81.1 92.9 98.4
Grid 61.9 70.8 71.7 41.0 55.7 84.0 89.64
Leather 84.1 84.2 41.6 88.0 90.3 97.1 100
Tile 41.7 79.4 69.0 87.6 96.9 99.4 96.9
Textures
Wood 61.1 83.4 95.2 95.3 93.4 99.8 98.1
Average 67.2 76.2 70.9 71.9 83.9 94.7 94.5

Median 96.1 98.1
a Golan and El-Yaniv (2018).
b Akcay et al. (2018).
c
Zhai et al. (2016).
d
Andrews et al. (2016).
e
Andrews et al. (2016).
f
Rudolph et al. (2021).
Table 2
AUC in % for detected anomalies of Needle dataset.
Method AUC
DifferNet (Rudolph et al., 2021) 84.59
Our 95.99
Fig. 11. ROC curve for carpet, grid, leather, tile, and wood using D2-Net features.
(2021) is used. Table 2 shows that the AUC of our proposed approach
outperforms DifferNet on the Needle dataset with 11.4 margin.
5.6. Memory efficiency

Fig. 10. ROC curve for pill, screw, toothbrush, transistor, and zipper using D2-Net
features.
In this section, we show the advantage of the proposed approach.
Our proposed approach is very memory efficient compared to others.
Model size is very crucial for the machine learning model to be de-
which image view shows defect in the test set. For this reason, in
ployed into a mobile system. Running a model on the mobile system has
the evaluation phase, we propose to use the sum of RFS energy of all
many great advantages, such as better privacy, real-time processing,
views as the sample RFS energy as shown in Algorithm 1. For the sake and less network bandwidth, which means less power consumption (Cai
of comparison, we have re-run the DifferNet (Rudolph et al., 2021) et al., 2022). From this point of view, we have compared our model to
on the Needle dataset and changed the evaluation criteria similar to the current best global-based feature model (DifferNet Rudolph et al.,
our proposed approach by summing the anomaly score for all views 2021). For a fair comparison, we have included the pre-trained network
as a sample anomaly score. We trained both algorithms using normal (D2-Net) weight in our model. Table 3 shows the comparison between
needle samples only. The same training procedure in Rudolph et al. our proposed model and DifferNet. Our model is very lightweight
11
Table 3
Model weights in MB comparison.
Method Model weight in MB
DifferNet (Rudolph et al., 2021) 890.8
Our 31.76 ∓ 0.56
Table 4
AUC in % for MVTec AD comparison between using RFS energy, RFS log-likelihood,
and sum of Mahalanobis distance 𝐴𝑆 as anomaly scores.
Category RFS energy 𝐴𝑆 RFS log-likelihood
Bottle 100.0 100.0 0.1
Cable 92.0 86.2 30.8
Capsule 89.4 81.4 36.7
Hazelnut 99.9 77.5 3.3
Metalnut 98.2 74.1 17.4
Pill 94.5 85.9 48.8 Fig. 12. AUC in % performance of the MVTec AD datase, for One Shot, Five Shot,
Screw 70.0 4.6 0.6 and Ten Shot settings.
Toothbrush 99.2 88.6 16.7
Transistor 91.9 80.8 9.3
Zipper 98.7 90.7 28.8
Carpet 98.4 87.6 1.9 The results show that handcrafted ORB features have a very poor
Grid 89.6 58.7 40.5 performance compared to other transfer learned-based features. On the
Leather 100.0 100.0 0.0 other hand, Superpoint features show a good performance but still do
Tile 96.9 89.5 8.0
not achieve better performance compared to D2-Net features, and one
Wood 98.1 93.2 4.3
reason is that this network extracts corner points and thus does not take
Average 94.5 79.9 16.5
into account the corner that results due to color deformation. Another
important point to notice is that R2D2 achieves the best performance
for the screw object compared with others. Finally, we could achieve
compared to DifferNet, our model is 96% less weight compared to the the best mean performance when we take the best performance features
DifferNet model. using our RFS energy-based model.
5.7. Study analysis 5.9. Few-shots learning experimental results
In this section, we further investigate the performance of our RFS In this section, we demonstrate the effectiveness of our proposed
energy compared to the following: the use of RFS log-likelihood as approach via few-shots learning. Few shot learning studies the use of
anomaly score (Vo et al., 2018) as shown in Eq. (21), and the use of limited supervision for classification tasks and has been widely stud-
the sum of Mahalanobis distances only as anomaly score as follows: ied (Xie et al., 2022; Wang et al., 2020). Other work considers a limited
∑ number of samples from anomalous classes (Sheynin et al., 2021). The
𝐴𝑆 = 𝑀(𝐱; 𝜇, 𝛴), (27) most related work proposed by Sheynin et al. (2021) proposes to
𝐱∈𝑋 use a limited number of normal samples for anomaly detection. We
where 𝐴𝑆 is the anomaly score of a set of the point pattern features. We compare our approach with the baseline methods in Sheynin et al.
compare the aforementioned anomaly scores with the proposed RFS en- (2021). The comparison includes DifferNet (Rudolph et al., 2021), Geo-
ergy, and we use the D2-Net features and the same learned parameters, Trans (Golan and El-Yaniv, 2018), and GOAD (Bergman and Hoshen,
𝜇 ∈ R𝐷 , 𝛴 ∈ R𝐷×𝐷 . We report the AUC for the MVTec AD dataset as 2019) a modified version of GeoTrans that modify the anomaly score.
shown in Table 4. Table 4 shows the AUC of the proposed RFS energy DeepSVDD (Ruff et al., 2018) uses a deep version of SVDD algorithm
approach achieves the best results in all categories compared to using of the pre-trained deep features. PatchSVDD (Defard et al., 2021) use
𝐴𝑆 and RFS log-likelihood as an anomaly score, and this highlights self-supervised learning and extend DeepSVDD to patch-based method.
the significance of using RFS energy as an anomaly score. RFS-energy- DROCC (Goyal et al., 2020) approach trains classier to distinguish
based method is successful because it can detect even small variations between normal samples and their perturbed versions generated adver-
in images. The reason for this success lies in the fact that our RFS sarially. Finally, we compare our method with HTDGM (Sheynin et al.,
energy combines two terms: the energy of cardinality of the features 2021) that uses self supervised learning of the multis-scale hierarchical
and energy of the features, the squared Mahalanobis distances. transformation of the generative model. Experiments for MVTec AD
dataset are shown in Fig. 12. We follow the same evaluation protocol
5.8. Ablation study and also the same D2-Net for feature extraction Fig. 12 shows that
the proposed approach outperforms all other methods in all few shot
In this section, we study the effect of different point pattern fea- settings. Fig. 13 shows the mean AUC of our proposed model for the
ture extraction methods on the proposed RFS energy-based defect MVTec AD dataset over different shots. More details about the AUC
detection performance. All the deep point pattern feature networks accuracy for each category over the 16 shots are shown in Figs. 14 and
used in this experiment are pre-trained. The focus of this ablation 15.
study is on transfer learning of deep point pattern features. How-
ever, we also include one handcrafted feature extraction, namely an 6. Conclusion
ORB (Rublee et al., 2011). General statistics of these methods/networks
are shown in Table 5. Table 5 shows the name of the feature extraction This paper proposes the use of transfer learning of local features
method/network, the name of the datasets that have been used to as an efficient alternative to global features for defect detection. The
train these models, the type of the input image, whether the network use of deep local features has been limited to SLAM or SFM ap-
detects keypoints using color information or not, and the size of output plications and has not been used for anomaly/defect detection in
descriptor of the network. Table 6 shows the results of different point the literature. The pre-trained local features network returns a set of
pattern feature extraction methods using our RFS energy-based model. point pattern/keypoints with their descriptors, and these features are
12
Table 5
General statistics of point pattern feature extraction methods used in the ablation study.
Feature extraction Training dataset Input image Descriptor size
ORB – gray 256-D
LF-Net (Ono et al., 2018)a ScanNet gray 256-D
KeyNetb synthetic training set from ImageNet & ILSVRC 2012 dataset gray 128-D
Superpoint (SP)c synthetic shapes & COCO 2014 gray 256-D
R2D2d Oxford and Paris retrieval dataset color 128-D
a
https://github.com/vcg-uvic/lf-net-release
b
https://github.com/axelBarroso/Key.Net
c https://github.com/rpautrat/SuperPoint
d https://github.com/naver/r2d2
Table 6
AUC in % for detected anomalies of all categories of MVTec AD using different point pattern feature extraction methods of our proposed RFS
energy. The last two columns show the comparison between the best performance of our RFS energy and DifferNet.
Category ORB R2D2 LF-Net KeyNet SP D2-Net Best performance DifferNet (Rudolph et al., 2021)
(Max)
Bottle 63.9 50.0 93.7 94.9 99.4 100.0 100.0 99.0
Cable 44.1 42.0 82.8 63.0 83.2 92.0 92.0 95.9
Capsule 47.3 70.4 58.7 66.7 74.0 89.4 89.4 86.9
Hazelnut 83.2 79.9 86.0 85.3 94.9 99.9 99.9 99.3
Metalnut 30.3 0.0 54.1 77.5 71.2 98.2 98.2 96.1
Pill 70.0 69.1 75.7 78.5 75.7 94.5 94.5 88.8
Screw 59.1 79.0 60.9 78.5 65.4 70.0 79.0 96.3
Toothbrush 59.2 65.6 94.2 91.1 94.2 99.2 99.2 98.6
Transistor 52.0 50.5 77.3 59.0 79.7 91.9 91.9 91.1
Zipper 69.3 88.3 51.2 98.0 85.5 98.7 98.7 95.1
Carpet 36.4 62.4 69.1 48.0 90.4 98.4 98.4 92.9
Grid 35.1 57.6 56.6 38.5 65.5 89.6 89.6 84.0
Leather 47.4 63.5 66.9 70.7 99.7 100.0 100.0 97.1
Tile 47.8 81.4 63.9 88.7 82.2 96.9 96.9 99.4
Wood 92.5 88.7 97.9 78.4 94.4 98.1 98.1 99.8
Average 55.8 57.3 72.6 72.9 83.7 94.5 95.1 94.7
Fig. 13. The effect of increasing the number of samples on the average AUC in % of
our proposed approach for the MVTec AD dataset. Fig. 14. The effect of increasing the number of samples on AUC in % of our proposed
approach for 10 objects of MVTec AD dataset.
well-known to be robust against viewpoints and lighting changes. We

propose to model these descriptors as random finite sets and model the By using the different local feature extraction shown in the ablation
similarity of normal distribution based on RFS statistics. We propose study, we can outperform the state-of-art on the MVTec AD dataset.
to use the transfer learning of deep local feature detection/description Another experiment has been conducted on MVTec AD using a few-
and model these features as RFS. We also propose to use RFS energy shots learning setting, in which only a few samples are used in the
instead of using RFS log-likelihood as an anomaly score. The proposed training phase. In the few-shots setting, the approach RFS energy
IID Poisson RFS energy includes learning the Mahalanobis distance pa- outperforms the state-of-art methods in 1, 5, and 10 shots. Finally,
rameters and Poisson intensity of the cardinality distribution. To avoid the proposed approach shows outstanding performance on the needle
further preprocessing of features and loss of information due to PCA, dataset, a real-life application, where we proposed to use the sum of
we choose to learn the parameters of the Mahalanobis distance on the the RFS energy to detect defective needle samples. Fine-tuning the D2-
high dimension space. Experimental results on the MVTec AD dataset net network on the MVTec dataset can be considered as future work,
and needle dataset, challenging surface defect detection datasets, show which would involve generating ground truth data for images with
that our proposed approach has outstanding performance compared various poses. Additionally, joint learning of RFS energy and feature
to the state-of-the-art approaches. The proposed approach outperforms detection/description can be explored as future work to enhance defect
other methods in 11 out 15 objects/textures on the MVTec AD dataset. detection and anomaly detection.
13
Burger, W., Burge, M.J., 2022. Maximally Stable Extremal Regions (MSER). In: Digital
Image Processing: An Algorithmic Introduction. Springer, pp. 765–795.
Busam, B., Ruhkamp, P., Virga, S., Lentes, B., Rackerseder, J., Navab, N., Hen-
nersperger, C., 2018. Markerless inside-out tracking for 3D ultrasound com-
pounding. In: Simulation, Image Processing, and Ultrasound Systems for Assisted
Diagnosis and Navigation. Springer, pp. 56–64.
Cai, H., Lin, J., Lin, Y., Liu, Z., Tang, H., Wang, H., Zhu, L., Han, S., 2022. Enable
deep learning on mobile devices: Methods, systems, and applications. ACM Trans.
Des. Autom. Electron. Syst. 27 (3), 1–50.
Cao, B., Araujo, A., Sim, J., 2020. Unifying deep local and global features for image
search. In: European Conference on Computer Vision. Springer, pp. 726–743.
Chiu, S.N., Stoyan, D., Kendall, W.S., Mecke, J., 2013. Stochastic geometry and its
applications. John Wiley & Sons.
Christiansen, P., Nielsen, L.N., Steen, K.A., Jørgensen, R.N., Karstoft, H., 2016.
DeepAnomaly: Combining background subtraction and deep learning for detecting
obstacles and anomalies in an agricultural field. Sensors 16 (11), 1904.
Dai, W., Mujeeb, A., Erdt, M., Sourin, A., 2020. Soldering defect detection in automatic
optical inspection. Adv. Eng. Inform. 43, 101004.
Fig. 15. The effect of increasing the number of samples on AUC in % of our proposed Defard, T., Setkov, A., Loesch, A., Audigier, R., 2021. PaDiM: a patch distribution
approach for 5 textures of MVTec AD dataset. modeling framework for anomaly detection and localization. In: ICPR 2020-25th
International Conference on Pattern Recognition Workshops and Challenges.
DeTone, D., Malisiewicz, T., Rabinovich, A., 2017. Toward geometric deep SLAM. arXiv
preprint arXiv:1707.07410.
CRediT authorship contribution statement DeTone, D., Malisiewicz, T., Rabinovich, A., 2018. SuperPoint: Self-supervised interest
point detection and description. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops. pp. 224–236.
Ammar Mansoor Kamoona: Conceptualization, Methodology, Soft-
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T., 2013. A
ware, Investigation, Visualization, Writing – original draft. Amirali Deep Convolutional Activation Feature for Generic Visual Recognition. UC Berkeley
Khodadadian Gostar: Writing – review & editing, Supervision. Xiaoy- & ICSI, Berkeley, CA, USA.
ing Wang: Visualization, Review & editing. Mark Easton: Review & Dong, H., Song, K., He, Y., Xu, J., Yan, Y., Meng, Q., 2019. PGA-Net: Pyramid feature
editing. Alireza Bab-Hadiashar: Writing – review & editing, Supervi- fusion and global context attention network for automated surface defect detection.
IEEE Trans. Ind. Inform. 16 (12), 7448–7458.
sion. Reza Hoseinnezhad: Conceptualization, Visualization, Writing –
Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T., 2019.
review & editing, Supervision, Funding acquisition. D2-Net: A trainable CNN for joint description and detection of local features. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Declaration of competing interest pp. 8092–8101.
Fan, C., Jin, Y., Liu, P., Zhao, W., 2023. Transferable visual pattern memory network
for domain adaptation in anomaly detection. Eng. Appl. Artif. Intell. 121, 106013.
The authors declare that they have no known competing finan-
Gao, Y., Li, X., Wang, X.V., Wang, L., Gao, L., 2021. A review on recent advances in
cial interests or personal relationships that could have appeared to vision-based defect recognition towards industrial intelligence. J. Manuf. Syst.
influence the work reported in this paper. Golan, I., El-Yaniv, R., 2018. Deep anomaly detection using geometric transformations.
In: Advances in Neural Information Processing Systems. pp. 9758–9769.
Data availability Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d., 2019.
Memorizing normality to detect anomaly: Memory-augmented deep autoencoder
for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International
https://github.com/AmmarKamoona/RFS-Energy-Anomaly-Detecti Conference on Computer Vision. pp. 1705–1714.
on-of-Defect. Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Acknowledgments Courville, A., Bengio, Y., 2020. Generative adversarial networks. Commun. ACM
63 (11), 139–144.
Goyal, S., Raghunathan, A., Jain, M., Simhadri, H.V., Jain, P., 2020. DROCC: Deep
This work was supported by the Australian Research Council robust one-class classification. In: International Conference on Machine Learning.
through Grant DE210101181. PMLR, pp. 3711–3721.
Grathwohl, W., Wang, K.C., Jacobsen, J.H., Duvenaud, D., Norouzi, M., Swersky, K.,
References 2020. Your classifier is secretly an energy based model and you should treat it like
one.
Akcay, S., Atapour-Abarghouei, A., Breckon, T.P., 2018. Ganomaly: Semi-supervised Hacıefendioğlu, K., Ayas, S., Başağa, H.B., Toğan, V., Mostofi, F., Can, A., 2022. Wood
anomaly detection via adversarial training. In: Asian Conference on Computer construction damage detection and localization using deep convolutional neural
Vision. Springer, pp. 622–637. network with transfer learning. Eur. J. Wood Wood Prod. 80 (4), 791–804.
Alcantarilla, P.F., Solutions, T., 2011. Fast explicit diffusion for accelerated features in Hwang, U., Jung, D., Yoon, S., 2019. HexaGAN: Generative adversarial nets for real
nonlinear scale spaces. IEEE Trans. Pattern Anal. Mach. Intell. 34 (7), 1281–1298. world classification. In: International Conference on Machine Learning. PMLR, pp.
An, P., Wang, Z., Zhang, C., 2022. Ensemble unsupervised autoencoders and Gaussian 2921–2930.
mixture model for cyberattack detection. Inf. Process. Manage. 59 (2), 102844. Jiang, S., Wu, Y., Zhang, J., 2023. Bridge coating inspection based on two-stage
Andrews, J., Tanay, T., Morton, E., Griffin, L., 2016. Transfer Representation-Learning automatic method and collision-tolerant unmanned aerial system. Autom. Constr.
for Anomaly Detection. JMLR. 146, 104685.
Barroso-Laguna, A., Riba, E., Ponsa, D., Mikolajczyk, K., 2019. Key.Net: Keypoint Jing, J., Zhang, H., Wang, J., Li, P., Jia, J., 2013. Fabric defect detection using gabor
detection by handcrafted and learned CNN filters. In: Proceedings of the IEEE/CVF filters and defect classification based on LBP and tamura method. J. Text. Inst. 104
International Conference on Computer Vision. pp. 5836–5844. (1), 18–27.
Baur, C., Denner, S., Wiestler, B., Navab, N., Albarqouni, S., 2021. Autoencoders for Kamoona, A.M., Gostar, A.K., Bab-Hadiashar, A., Hoseinnezhad, R., 2019. Sparsity-
unsupervised anomaly segmentation in brain MR images: a comparative study. Med. based naive Bayes approach for anomaly detection in real surveillance videos. In:
Image Anal. 69, 101952. 2019 International Conference on Control, Automation and Information Sciences.
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L., 2008. Speeded-up robust features (SURF). ICCAIS, pp. 1–6.
Comput. Vis. Image Underst. 110 (3), 346–359. Kamoona, A.M., Gostar, A.K., Tennakoon, R., Bab-Hadiashar, A., Accadia, D., Thorpe, J.,
Beaudet, P.R., 1978. Rotationally invariant image operators. In: Proc. 4th Int. Joint Hoseinnezhad, R., 2019. Random finite set-based anomaly detection for safety
Conf. Pattern Recog. Tokyo, Japan, 1978. monitoring in construction sites. IEEE Access 7, 105710–105720.
Bergman, L., Hoshen, Y., 2019. Assification-based anomaly detection for general data. Kotsiopoulos, T., Sarigiannidis, P., Ioannidis, D., Tzovaras, D., 2021. Machine learning
In: International Conference on Learning Representations. and deep learning in smart manufacturing: The smart grid paradigm. Comp. Sci.
Bergmann, P., Batzner, K., Fauser, M., Sattlegger, D., Steger, C., 2021. The MVTec Rev. 40, 100341.
anomaly detection dataset: a comprehensive real-world dataset for unsupervised Ledoit, O., Wolf, M., 2004. A well-conditioned estimator for large-dimensional
anomaly detection. Int. J. Comput. Vis. 129 (4), 1038–1059. covariance matrices. J. Multivariate Anal. 88 (2), 365–411.
14
Lee, K., Lee, K., Lee, H., Shin, J., 2018. A simple unified framework for detecting Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G., Schmidt-Erfurth, U., 2019.
out-of-distribution samples and adversarial attacks. Adv. Neural Inf. Process. Syst. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial
31. networks. Med. Image Anal. 54, 30–44.
Lenc, K., Vedaldi, A., 2016. Learning covariant feature detectors. In: European Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T., 2018. Semantic visual local-
Conference on Computer Vision. Springer, pp. 100–117. ization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Leutenegger, S., Chli, M., Siegwart, R.Y., 2011. BRISK: Binary robust invariant scalable Recognition. pp. 6896–6906.
keypoints. In: 2011 International Conference on Computer Vision. IEEE, pp. Sheynin, S., Benaim, S., Wolf, L., 2021. A hierarchical transformation-discriminating
2548–2555. generative model for few shot anomaly detection. arXiv preprint arXiv:2104.14535.
Li, G., 2022. An IGGM-based Poisson multi-Bernoulli filter and its application Shumin, D., Zhoufeng, L., Chunlei, L., 2011. Adaboost learning for fabric defect
to distributed multisensor fusion. IEEE Trans. Aerosp. Electron. Syst. 58 (4), detection based on HOG and SVM. In: 2011 International Conference on Multimedia
3666–3677. Technology. IEEE, pp. 2903–2906.
Li, Z., Snavely, N., 2018. Megadepth: Learning single-view depth prediction from Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale
internet photos. In: Proceedings of the IEEE Conference on Computer Vision and image recognition. arXiv preprint arXiv:1409.1556.
Pattern Recognition. pp. 2041–2050. Tan, D.S., Chen, Y.C., Chen, T.P.C., Chen, W.C., 2021. Trustmae: A noise-resilient
Liu, B., Tharmarasa, R., Jassemi, R., Brown, D., Kirubarajan, T., 2023. RFS-based defect classification framework using memory-augmented auto-encoders with trust
multiple extended target tracking with resolved multipath detections in clutter. regions. In: Proceedings of the IEEE/CVF Winter Conference on Applications of
IEEE Trans. Intell. Transp. Syst. Computer Vision. pp. 276–285.
Liu, W., Wang, X., Owens, J., Li, Y., 2020. Energy-based out-of-distribution detection. Tang, B., Chen, L., Sun, W., Lin, Z.-k., 2023. Review of surface defect detection of steel
Adv. Neural Inf. Process. Syst. 33. products based on machine vision. IET Image Process. 17 (2), 303–322.
Liu, T., Ye, W., 2022. A semi-supervised learning method for surface defect classification Tang, J., Ericson, L., Folkesson, J., Jensfelt, P., 2019. GCNv2: Efficient correspondence
of magnetic tiles. Mach. Vis. Appl. 33 (2), 35. prediction for real-time SLAM. IEEE Robot. Autom. Lett. 4 (4), 3505–3512.
Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J. Teichmann, M., Araujo, A., Zhu, M., Sim, J., 2019. Detect-to-retrieve: Efficient regional
Comput. Vis. 60 (2), 91–110. aggregation for image search. In: Proceedings of the IEEE/CVF Conference on
Lu, Q., Lin, J., Luo, L., Zhang, Y., Zhu, W., 2022. A supervised approach for automated Computer Vision and Pattern Recognition. pp. 5109–5118.
surface defect detection in ceramic tile quality control. Adv. Eng. Inform. 53, Tian, Y., Balntas, V., Ng, T., Barroso-Laguna, A., Demiris, Y., Mikolajczyk, K., 2020.
101692. D2D: Keypoint extraction with describe to detect approach. In: Proceedings of the
Mahalanobis, P.C., 1936. On the Generalized Distance in Statistics. National Institute Asian Conference on Computer Vision.
of Science of India. Tuytelaars, T., Mikolajczyk, K., 2008. Local Invariant Feature Detectors: A Survey. Now
Mahler, R.P., 2007. Statistical Multisource-Multitarget Information Fusion, Vol. 685. Publishers Inc.
Artech House Norwood, MA. Uzen, H., Turkoglu, M., Hanbay, D., 2023. Multi-dimensional feature extraction-based
Mikolajczyk, K., Schmid, C., 2004. Scale & affine invariant interest point detectors. Int. deep encoder–decoder network for automatic surface defect detection. Neural
J. Comput. Vis. 60 (1), 63–86. Comput. Appl. 35 (4), 3263–3282.
Nazare, T., de Mello, R., Ponti, M., 2018. Are pre-trained CNNs good feature extractors Üzen, H., Türkoğlu, M., Yanikoglu, B., Hanbay, D., 2022. Swin-MFINet: Swin trans-
for anomaly detection in surveillance videos? arXiv preprint arXiv:1811.08495. former based multi-feature integration network for detection of pixel-level surface
Neogi, N., Mohanta, D.K., Dutta, P.K., 2014. Review of vision-based steel surface defects. Expert Syst. Appl. 209, 118269.
inspection systems. EURASIP J. Image Video Process. 2014 (1), 1–19. Verdie, Y., Yi, K., Fua, P., Lepetit, V., 2015. TILDE: A temporally invariant learned
Oliver, A., Odena, A., Raffel, C.A., Cubuk, E.D., Goodfellow, I., 2018. Realistic detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
evaluation of deep semi-supervised learning algorithms. In: Advances in Neural Recognition. pp. 5279–5288.
Information Processing Systems. pp. 3235–3246. Vo, B.N., Dam, N., Phung, D., Tran, Q.N., Vo, B.T., 2018. Model-based learning for
Ono, Y., Trulls, E., Fua, P., Yi, K.M., 2018. LF-net: learning local features from images. point pattern data. Pattern Recognit. 84, 136–151. http://dx.doi.org/10.1016/J.
In: Advances in Neural Information Processing Systems. pp. 6237–6247. PATCOG.2018.07.008.
Pang, G., Shen, C., Cao, L., Hengel, A.V.D., 2021. Deep learning for anomaly detection: Wang, M., Sun, C., Sowmya, A., 2022. Efficient corner detection based on corner
A review. ACM Comput. Surv. 54 (2), 1–38. enhancement filters. Digit. Signal Process. 122, 103364.
Papi, F., Vo, B.N., Vo, B.T., Fantacci, C., Beard, M., 2015. Generalized labeled multi- Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M., 2020. Generalizing from a few examples: A
Bernoulli approximation of multi-object densities. IEEE Trans. Signal Process. 63 survey on few-shot learning. ACM Comput. Surv. 53 (3), 1–34.
(20), 5487–5497. Wang, Y., Zhou, Q., Liu, J., Xiong, J., Gao, G., Wu, X., Latecki, L.J., 2019. LEDNet: A
Qiu, L., Wu, X., Yu, Z., 2019. A high-efficiency fully convolutional networks for lightweight encoder-decoder network for real-time semantic segmentation. In: 2019
pixel-wise surface defect detection. IEEE Access 7, 15884–15893. IEEE International Conference on Image Processing. ICIP, IEEE, pp. 1860–1864.
Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., Humen- Wu, X., Cao, K., Gu, X., 2017. A surface defect detection based on convolutional neural
berger, M., 2019. R2d2: Repeatable and reliable detector and descriptor. In: 33rd network. In: International Conference on Computer Vision Systems. Springer, pp.
Conference on Neural Information Processing Systems. 185–194.
Rippel, O., Mertens, P., Merhof, D., 2021. Modeling the distribution of normal data Xie, J., Long, F., Lv, J., Wang, Q., Li, P., 2022. Joint distribution matters: Deep
in pre-trained deep features for anomaly detection. In: 2020 25th International brownian distance covariance for few-shot classification. In: Proceedings of the
Conference on Pattern Recognition. ICPR, IEEE, pp. 6726–6733. IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7972–7981.
Rosten, E., Drummond, T., 2006. Machine learning for high-speed corner detection. In: Yi, K.M., Trulls, E., Lepetit, V., Fua, P., 2016. LIFT: Learned invariant feature transform.
European Conference on Computer Vision. Springer, pp. 430–443. In: European Conference on Computer Vision. Springer, pp. 467–483.
Rublee, E., Rabaud, V., Konolige, K., Bradski, G., 2011. ORB: An efficient alternative Yin, X., Chen, Y., Bouferguene, A., Zaman, H., Al-Hussein, M., Kurach, L., 2020. A deep
to SIFT or SURF. In: 2011 International Conference on Computer Vision. IEEE, pp. learning-based framework for an automated defect detection system for sewer pipes.
2564–2571. Autom. Constr. 109, 102967.
Rudolph, M., Wandt, B., Rosenhahn, B., 2019. Structuring autoencoders. In: Proceedings Yoon, J., Jordon, J., Van Der Schaar, M., 2018. GAIN: Missing data imputation using
of the IEEE/CVF International Conference on Computer Vision Workshops. generative adversarial nets. arXiv preprint arXiv:1806.02920.
Rudolph, M., Wandt, B., Rosenhahn, B., 2021. Same same but differnet: Semi-supervised Yu, Z., Wu, X., Gu, X., 2017. Fully convolutional networks for surface defect inspec-
defect detection with normalizing flows. In: Proceedings of the IEEE/CVF Winter tion in industrial environment. In: International Conference on Computer Vision
Conference on Applications of Computer Vision. pp. 1907–1916. Systems. Springer, pp. 417–426.
Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Zaman, A., Yangyu, F., Irfan, M., Ayub, M.S., Guoyun, L., Shiya, L., 2022. LifelongGlue:
Müller, E., Kloft, M., 2018. Deep one-class classification. In: International Keypoint matching for 3D reconstruction with continual neural networks. Expert
Conference on Machine Learning. PMLR, pp. 4393–4402. Syst. Appl. 195, 116613.
Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., Klette, R., 2018. Deep-anomaly: Zhai, S., Cheng, Y., Lu, W., Zhang, Z., 2016. Deep structured energy based models for
Fully convolutional neural network for fast anomaly detection in crowded scenes. anomaly detection. In: International Conference on Machine Learning. PMLR, pp.
Comput. Vis. Image Underst. 172, 88–97. 1100–1109.
Sarlin, P.E., Unagar, A., Larsson, M., Germain, H., Toft, C., Larsson, V., Pollefeys, M., Zhang, J., Jing, J., Lu, P., Song, S., 2022. Improved MobileNetV2-SSDLite for automatic
Lepetit, V., Hammarstrand, L., Kahl, F., et al., 2021. Back to the feature: Learning fabric defect detection system based on cloud-edge computing. Measurement 201,
robust camera localization from pixels to pose. In: Proceedings of the IEEE/CVF 111665.
Conference on Computer Vision and Pattern Recognition. pp. 3247–3257.
15

Anomaly Detection of Defect Using Energy of Poi - 2024 - Engineering Application

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anomaly Detection of Defect Using Energy of Poi - 2024 - Engineering Application

Uploaded by

Copyright:

Available Formats

Engineering Applications of Artificial Intelligence 130 (2024) 107706

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

Anomaly detection of defect using energy of point pattern features within

ARTICLE INFO ABSTRACT

∗ Correspondence to: RMIT University, Victoria, Australia.

Fig. 4. D-Net features extraction pipeline.

𝑀(𝐱; 𝜇, 𝛴)2 = (𝐱 − 𝜇)𝛴 −1 (𝐱 − 𝜇)𝑇 . (25) ̂

5.1.1. MVTec-AD dataset

5.1.2. Needle dataset

Algorithm 1 RFS energy-based defect detection.

16: function RFS Energy(𝑋; 𝜌, 𝜇, 𝛴, 𝑡𝑜𝑝 𝑘)

5.5. Needle experimental results and discussion

In this section, the performance of our proposed RFS energy on the

Pill 63.0 74.3 80.6 72.9 83.7 88.8 94.5

Screw 50.0 74.6 99.9 74.7 67.0 96.3 70.0

Wood 61.1 83.4 95.2 95.3 93.4 99.8 98.1

Average 67.2 76.2 70.9 71.9 83.9 94.7 94.5

5.6. Memory efficiency

5.7. Study analysis 5.9. Few-shots learning experimental results

well-known to be robust against viewpoints and lighting changes. We

You might also like