You are on page 1of 5



  
 

 
     
 


 

Invoice Classification Using Deep Features and


Machine Learning Techniques
1st Ahmad S. Tarawneh 2nd Ahmad B. Hassanat 3rd Dmitry Chetverikov
Dept. of Algorithms Dept. of Information Technology Dept. of Algorithms
and Their Applications Mutah University and Their Applications
Eötvös Loránd University Karak, Jordan Eötvös Loránd University
Budapest, Hungary Budapest, Hungary

4th Imre Lendak 5th Chaman Verma


Dept. Data Science and Engineering Dept. of Media
Eötvös Loránd University and Educational Informatics
Budapest, Hungary Eötvös Loránd University
Faculty of Technical Sciences Budapest, Hungary
University of Novi Sad
Novi Sad, Serbia

Abstract—Invoices are issued by companies, banks and dif- invoices automatically and indexing their contents to speed
ferent organizations in different forms including handwritten up the searching process [5]. Each of the aforementioned
and machine-printed ones; sometimes, receipts are included as types of invoices has its own structure, so it is desirable to
a separated form of invoices. In current practice, normally,
classifying these types is done manually, since each needs a handle each type separately, since the accuracy of OCR can
special kind of processing such as making them suitable for be dependent on knowing the type of the handled document.
optical character recognition systems (OCR). Classifying the In other words, Knowing the type of the document allows
invoices manually to different categories is a hard and time- for selecting the best method to be applied, for example, HW
consuming task. Therefore, we propose an automatic approach recognition needs a different OCR approach from that of the
to classify invoices into three types: handwritten, machine-printed
and receipts. The proposed method is based on extracting MP. Classifying these types manually is a time-consuming
features using the deep convolutional neural network AlexNet. process, needs a lot of efforts and expensive, particularly, for
The features are classified using various machine learning algo- big invoice datasets, which are produced by banks and large
rithms, namely including Random Forests, K-nearest neighbors companies. In this paper, we propose a robust method for
(KNN), and Naive Bayes. Different cross-validation approaches invoice classification using features extracted from the well-
are applied in the experiments to ensure the effectiveness of
the proposed solution. The best classification result was 98.4% known Deep Convolutional Neural Network (CNN) Alexnet
(total accuracy), which was achieved by the KNN, such an almost (pre-trained model) [6], [7]. Pre-trained models are usually
perfect performance allows the proposed method to be used in trained on large-scale datasets such as the ImageNet [8] and
practice as a preprocess for OCR systems, or as a standalone normally used for fast learning on other relatively small
application. datasets. Figure 1 shows the architecture of AlexNet model.
Index Terms—Invoice classification, Deep features, Machine
learning, OCR

I. I NTRODUCTION
Invoices are produced in most business environments such
as banks, companies, e-commerce, etc [1], [2]. They are avail-
able in different formats/types: Handwritten (HW), machine-
printed (MP) and Receipt (RT) [3]. Typically, invoices are
captured using either phone cameras or optical scanners to
be archived in an image dataset for further use, processing
and/or quality improvements [4]. One of the most important
forms of processing applied to such documents is optical
character recognition (OCR). OCR is an image processing and
analysis technique for obtaining valuable information from
image dataset. This information is important for handling Fig. 1. Typical architecture of Alexnet [6]

 !"# $$%!&' 855



  
 

 
     
 


 

Some previous studies showed that the features, which are different types of validation. One can see that the best results
extracted from the fully connected layer number 6 (FC6) allow were achieved using 10-fold, where 98.4% (total accuracy) of
for better performance than the fully connected layer number 7 the instances were classified correctly.
(FC7) [7], [9]. Typically, for each image, FC6 extracts a feature
vector of 4096 dimensions, the feature vectors of all images TABLE II
are saved in a feature database. In this paper, we investigate the R ESULTS OF INVOICE CLASSIFICATION USING KNN WITH DIFFERENT
CROSS - VALIDATION APPROACHES [%]
use of different classifiers trained using these features under
various cross-validation approaches. 5-fold
Class
TP Rate FP Rate Precision
II. DATA AND F EATURE EXTRACTION HW 95.8 0.60 96.7
RT 94.1 0.30 95.2
The data used in this paper is a part of a huge dataset of MP 99.3 3.40 99.1
45000 invoice documents. Since we do not have class ground 10-fold
truth for the data, we selected sample invoices belonging to HW 95.3 0.70 96.2
RT 92.9 0.50 92.9
the three classes, and manually labeled them to allow for MP 99.1 4.00 98.9
supervised learning. The numbers of samples in the selected 66% Train
subset are summarized in table 1. HW 95.8 1.30 93.2
RT 90.3 0.50 93.3
MP 98.3 4.90 98.6
TABLE I
T HE CLASSES OF THE DOCUMENTS AND THE NUMBERS OF IMAGES PER
CLASS OF THE SELECTED SAMPLE The same validation was also done using the Random
Document Class Number of images Forests (RF) classifier [18]. Table 3 shows the results of the
HW 220 classification process using the 5-fold, 10-fold and percentage
RT 90 split 66%/34% approaches.
MP 1070
Total 1380
TABLE III
R ESULTS OF INVOICE CLASSIFICATION USING RF WITH DIFFERENT
Figure 2 shows images from the selected sample dataset for CROSS - VALIDATION APPROACHES [%]
all classes.
5-fold
The dataset contains images captured by phone cameras or Class
TP Rate FP Rate Precision
scanned by digital scanners of different qualities. Some of the HW 89.7 0.90 95.0
images are affected by shadows, geometric distortions (skew) RT 62.4 00.0 100
MP 99.4 16.8 95.5
and various complex backgrounds. CNNs are not invariant 10-fold
to rotation and scale [10]. However, Fortunately, pre-trained HW 88.7 0.60 96.4
CNNs are trained on huge images with different rotation RT 61.2 00.0 100
angles and scale ratios, which probably makes the resultant MP 99.5 18.5 95.0
66% Train
deep features invariant to rotation and scale variation [7], HW 93.1 1.00 94.4
[11]–[13]. The features are extracted by feeding an image to RT 41.9 00.0 100
the input layer of the CNN and calculating the feedforward. MP 99.2 21.4 94.1
Finally, the features are collected from FC6 before the ReLU
Activation layer. Using the Deep features extracted from our dataset, KNN
performed better than RF that achieved 95.5% as the best result
III. E XPERIMENTS AND R ESULTS using 5-fold cross-validation.
The experiments are conducted using the Weka tool which The third classifier used in our study is the Naive Bayes
is a flexible and easy-to-use environment that provides many (NB) one [19], [20]. Table 4 shows the results of NB under
classifiers and different cross-validation approaches [14]–[16]. the same previous criteria.
All the used classifiers implemented using their default pa- As can be noted from the tables 1, 2, and 3, the clas-
rameters in Weka, e.g. the KNN used k=1, linear search, and sification results are satisfactory. However, and the KNN
Euclidean distance as a similarity measure. classifier achieved the best results compared to the classifiers
Since the data is unbalanced, we need to list different considered.
performance metrics for providing a clear evaluation for the The Weka tool provides t-test that gives the ability to
system’s performance. In this context, we use True Positive compare the models under the same criteria [21]. In this
Rate (TP), False Positive Rate (FP) and Precision rates for work, we compared the performance of the classifiers under
each class. the 5-fold cross-validation approach. Figures 3 and 4 present
the Mean Absolute Error (MAE), the Root Mean Squared
A. Full-size feature vectors Error (RMSE), the Relative Mean Absolute Error (RMAE) and
The first part of the experiments is done using the KNN the Root Relative Mean Squared Error (RRMSE) [22], [23],
classifier [17]. Table 2 illustrates the results of KNN with these calculations are normally used for continuous variables,

856

  
 

 
     
 


 

Fig. 2. Sample invoices from the sample dataset, the first row contains HW documents, the second row contains MP documents and the third row contains
RT documents

TABLE IV
R ESULTS OF INVOICE CLASSIFICATION USING NB WITH DIFFERENT
CROSS - VALIDATION APPROACHES [%]

5-fold
Class
TP Rate FP Rate Precision
HW 85.0 11.3 58.4
RT 88.2 3.70 61.5
MP 84.9 8.40 97.3
10-fold
HW 84.5 11.7 57.5
RT 85.9 3.30 63.5
MP 85.2 8.70 97.2
66% Train
HW 87.5 10.6 60.6
RT 77.4 1.90 75.0
MP 87.4 11.7 96.3

Fig. 3. MAE and RMSE of classification algorithms used for invoice


classification
however, Weka calculates them for the discrete classification
problems using Quadratic loss function, which takes into
account all class probability estimates for an instance. All
of the considered error measures indicate that KNN has the
lowest classification error, which means that this classifier is
preferable for such a classification problem.
Figure 5 compares the percentages of correct classification
rate for the three classifiers used. One can see that KNN is
the best as its correct classification percentage is around 98%
while RF and NB achieve 95.3% and 85%, respectively.

B. Reduced-size feature vectors


The curse of dimensionality appears when the number of
samples is not high compared to the number of features
Fig. 4. RAE and RRSE of classification algorithms used for invoice
(attributes). A higher number of features also allows for more classification
training time [24]. In this section, we run the experiments after
reducing the dimensionality of Deep features using Principle
Component Analysis (PCA). number of folds. The rate is between 95% and 95.8% for
PCA in Weka reduces the number of features from 4096 all of the tested folds. Compared to the accuracy at the full
to 183, which means significant reduction while keeping the size feature vectors, PCA significantly reduces the number of
KNN model stable under several cross-validation approaches. features while practically maintaining the performance without
Figure 6 shows the classification rate as a function of the allowing for a significant decline.

857

  
 

 
     
 


 

IV. C ONCLUSION
In this paper, we propose a new approach for solving the
invoice classification problem using Deep features investi-
gating some machine learning techniques, namely, k-Nearest
Neighbours, Random Forests and Naive Bayes. Our experi-
mental study indicates that Alexnet deep features allow for
high classification rates, particularly, when the KNN classifier
is applied on.
Since the number of deep features obtained by Alexnet is
very large (4096), we applied the PCA to significantly reduce
the number of features and speed up the training process.
Our experimental results show that the PCA could reduce the
Fig. 5. Correct classification rates of KNN, RF and NB. dimensionality to only 183 while keeping high performance.
The proposed approach was tested and validated in terms
of classification rate under several cross-validation approaches
using different measures: the Mean Absolute Error, the Root
Mean Squared Error, the Relative Absolute Error, and the
Root Relative Squared Error. The testing dataset contained
images with different handwriting styles, rotations, complex
backgrounds, illuminations, noise, etc. Our results on real
invoice data show reasonable classification rates, with the best
performance achieved by the KNN classifier. These results
can be attributed to the Deep features used which proved to
be efficient for this computer vision task; this conclusion is
supported by many researchers such as [10], [25], [26]. And
therefore, We can recommend this approach to be used in
practice for invoice classification as a standalone application,
Fig. 6. Performance of KNN classifier for varying numbers of folds. or as a preprocess step for OCR systems. In our future work,
we plan to improve the performance of the proposed method
by applying image enhancement before feature extraction, and
the use of other feature extraction methods, such as [27]–[30].
Figure 7 presents the classification rate as a function of
training ratio. More data used for training leads to a more ACKNOWLEDGMENT
accurate model. However, the results shown in Figure 7 indi-
cate that even when just 40% of the data is used for training This research was supported by EIT DIGITAL 2018 Integral
the model, it achieves a high classification rate of 94%. and Agreement - Grant 2018 - 071 ELTE. Also, The first author
this indicates that the small sample that we considered for wishes to acknowledge the sponsorship of Tempus Public
our experiments might be enough to classify the remaining Foundation for his PhD study. Also, Thanks to Tomas Horvath,
(45000) documents. Gabor Szegedi and David Fonyo for sharing their knowledge
to increase the quality of this work.

R EFERENCES
[1] H. T. Ha, “Recognition of invoices from scanned documents,” RASLAN
2017 Recent Advances in Slavonic Natural Language Processing, p. 71,
2017.
[2] C. Boström, J. Herelius, M. Hugosson, and S. Maleev, “Automatic
reading and interpretation of paper invoices: Adc invoice,” 2016.
[3] N. V. Rao, A. Sastry, A. Chakravarthy, and P. Kalyanchakravarthi, “Op-
tical character recognition technique algorithms.” Journal of Theoretical
& Applied Information Technology, vol. 83, no. 2, 2016.
[4] R. J. Becker, R. Kandpal, P. Kothari, S. Porcina, and P. Malynin, “Image
quality assessment and improvement for performing optical character
recognition,” May 3 2018, uS Patent App. 15/337,285.
[5] H. T. Ha, Z. Nevěřilová, A. Horák et al., “Recognition of ocr invoice
metadata block types,” in International Workshop on Temporal, Spatial,
and Spatio-Temporal Data Mining. Springer, 2018, pp. 304–312.
Fig. 7. Performance of KNN classifier for varying training ratios. [6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.

858

  
 

 
     
 


 

[7] A. S. Tarawneh, C. Celik, A. B. Hassanat, and D. Chetverikov, “Detailed [30] A. B. Hassanat, “On identifying terrorists using their victory signs,”
investigation of deep features with sparse representation and dimension- Data Science Journal, vol. 17, 2018.
ality reduction in cbir: A comparative study,” Intelligent Data Analysis,
”in press”.
[8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
A Large-Scale Hierarchical Image Database,” in CVPR09, 2009.
[9] A. S. Tarawneh, D. Chetverikov, and A. B. Hassanat, “Pilot comparative
study of different deep features for palmprint identification in low-
quality images,” arXiv preprint arXiv:1804.04602, 2018.
[10] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning.
MIT press Cambridge, 2016, vol. 1.
[11] G. Cheng, P. Zhou, and J. Han, “Learning rotation-invariant convolu-
tional neural networks for object detection in vhr optical remote sensing
images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54,
no. 12, pp. 7405–7415, 2016.
[12] “Data Augmentation — How to use Deep Learning when you have
Limited DataŁŁPart 2, https://medium.com/nanonets/how-to-use-
deep-learning-when-you-have-limited-data-part-2-data-augmentation-
c26971dc8ced, note = Accessed: 14-12-2018.”
[13] The 9 Deep Learning Papers You Need To Know About,
https://adeshpande3.github.io/the-9-deep-learning-papers-you-need-
to-know-about.html, note = Accessed: 14-12-2018.
[14] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.
Witten, “The weka data mining software: an update,” ACM SIGKDD
explorations newsletter, vol. 11, no. 1, pp. 10–18, 2009.
[15] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine
Learning Tools and Techniques, 3rd ed. San Francisco, CA, USA:
Morgan Kaufmann Publishers Inc., 2011.
[16] T. C. Smith and E. Frank, “Introducing machine learning concepts with
weka,” in Statistical Genomics. Springer, 2016, pp. 353–378.
[17] J. M. Brown, “Predicting math test scores using k-nearest neighbor,”
in Integrated STEM Education Conference (ISEC), 2017 IEEE. IEEE,
2017, pp. 104–106.
[18] M. Belgiu and L. Drăguţ, “Random forest in remote sensing: A review
of applications and future directions,” ISPRS Journal of Photogrammetry
and Remote Sensing, vol. 114, pp. 24–31, 2016.
[19] L. Jiang, C. Li, S. Wang, and L. Zhang, “Deep feature weighting
for naive bayes and its application to text classification,” Engineering
Applications of Artificial Intelligence, vol. 52, pp. 26–39, 2016.
[20] K. Gull, S. Padhye, and D. S. Jain, “A comparative analysis of lexical/nlp
method with wekas bayes classifier,” International Journal on Recent
and Innovation Trends in Computing and Communication (IJRITCC),
vol. 5, no. 2, pp. 221–227, 2017.
[21] J. Alcala-Fdez, S. Garcia, A. Fernandez, J. Luengo, S. Gonzalez, J. A.
Saez, I. Triguero, J. Derrac, V. Lopez, L. Sanchez et al., “Comparison of
keel versus open source data mining tools: Knime and weka software,”
2016.
[22] C. J. Willmott and K. Matsuura, “Advantages of the mean absolute error
(mae) over the root mean square error (rmse) in assessing average model
performance,” Climate research, vol. 30, no. 1, pp. 79–82, 2005.
[23] J. Li and A. D. Heap, “A review of comparative studies of spatial in-
terpolation methods in environmental sciences: performance and impact
factors,” Ecological Informatics, vol. 6, no. 3-4, pp. 228–241, 2011.
[24] A. S. Tarawneh, D. Chetverikov, C. Verma, and A. B. Hassanat,
“Stability and reduction of statistical features for image classification
and retrieval: Preliminary results,” in Information and Communication
Systems (ICICS), 2018 9th International Conference on. IEEE, 2018,
pp. 117–121.
[25] S. Hoo-Chang, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao,
D. Mollura, and R. M. Summers, “Deep convolutional neural networks
for computer-aided detection: Cnn architectures, dataset characteristics
and transfer learning,” IEEE transactions on medical imaging, vol. 35,
no. 5, p. 1285, 2016.
[26] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,
no. 7553, p. 436, 2015.
[27] A. Hassanat, V. S. Prasath, M. Al-kasassbeh, A. S. Tarawneh, and A. J.
Al-shamailh, “Magnetic energy-based feature extraction for low-quality
fingerprint images,” Signal, Image and Video Processing, pp. 1–8, 2018.
[28] A. Hassanat and A. S. Tarawneh, “Fusion of color and statistic fea-
tures for enhancing content-based image retrieval systems.” Journal of
Theoretical & Applied Information Technology, vol. 88, no. 3, 2016.
[29] A. B. Hassanat, V. S. Prasath, B. M. Al-Mahadeen, and S. M. M.
Alhasanat, “Classification and gender recognition from veiled-faces,”
International Journal of Biometrics, vol. 9, no. 4, pp. 347–364, 2017.

859

You might also like