You are on page 1of 13

A two-stage CNN-based hand-drawn electrical and

electronic circuit component recognition system


Darshan S Shetty
1RN21EC040
Dept. of ECE
RNSIT

Abstract—Though handwriting recognition is a well-explored rough analysis for large-scale applications in the electrical or
research area for decades, there are a few sub-areas of this field electronic domain is made from hand-drawn diagrams, and
that have still not obtained much attention from the researchers. often in bulk, it becomes imperative to design a model for the
Some examples include recognition of hand-drawn graphics
components like circuit components and diagrams. Complete detection and the recognition of the individual components,
digitization of such handwritten documents is not possible with- which are extracted from the digitized version of the raw
out automatic conversion of the said circuit diagrams. Besides, (paper-based) images. Modern research works elaborate on the
to date, in most of the cases for commercial circuit design need for digitization of images and emphasize computer-based
purposes, concerned people manually enter the components into analysis.
the simulating software like Cadence, Spice to analyze the circuit
and judge its performance. In this work, it has been tried to
move one step towards automating this process by recognizing Besides, concerned people can interpret the symbols of
the hand-drawn circuit components which are considered as the hand-drawn circuit diagram using their knowledge and ex-
most important step for this automation. The present endeavour perience; however, this task thereafter requires manual inter-
is to design a two-stage convolutional neural network (CNN)- vention to enter the hand-drawn components into CAD tools
based model that recognizes the hand-drawn circuit components.
In the first stage, all the similar-looking (i.e., similar shape and like Multisim, CircuitMaker, Cadence followed by performing
structure) circuit components are clustered into a single group related processes like simulation and analysis of the circuit
using visual perception and input from confusion matrix of parameters. The overall process is very cumbersome and time-
singlestage CNN-based classification, and in the later stage, the taking. Also, the effort put into this rises with the complexity
circuit components belong to the same group are classified into of the diagrams under consideration. So it would be much
their actual classes. The proposed model has been evaluated on
a self-made database where 20 different classes of handdrawn simpler if a method is devised such that components are
circuit components are considered. The experimental outcome recognized directly from handdrawn schematics, reducing both
shows that the proposed two-stage classification model provides time and human resources. It is to be noted that to recognize
an accuracy 97.33which is much higher than the single-stage the entire hand-drawn circuit diagram one has to extract the
method, which provides an accuracy of 86.00 circuit components [3, 4] first from the circuit diagram and
Index Terms—Circuit component recognition, Hand-drawn
circuit, Analog and digital, CNN model then can apply circuit component recognition process.

I. I NTRODUCTION Keeping the above facts in mind, the present work deals
An electrical or electronic circuit diagram is a graphical with the classification of 20 different hand-drawn analog and
representation containing various symbols used to represent digital circuit components (e.g., Ammeter, Voltmeter, Inductor,
the circuit components such as a resistor and battery which are resistance and different gates) using a convolutional neural
connected by lines representing the wires in a real-time circuit. network (CNN)-based approach which involves grouping of
Engineers, scientists, as well as students and researchers, have visually similar components using a CNN model at the first
to deal with circuit diagrams quite often, and they require deep stage, then designing CNN models for each group to classify
analysis of the diagrams under scrutiny as well. This involves the components of a group into their exact classes. It is note-
recognition or identification of the symbols as the first step. worthy that the task of recognition from hand-drawn circuit
Modern scenario showcases ardent inclination of researchers components in offline mode possesses inherent challenges
as well as tech giants towards digitization of documents owing to variation in drawing style, with people following
[1, 2]. Digitization is converting a document into a digital random pen-up and pen-down fashion while drawing, often
format, where the data are organized into bits. The biggest leading to erratic diagrams which may turn out to be difficult
reform that digitization has brought about is the preservation for even the veterans to comprehend. Moreover, the low quality
of off-line data. It becomes especially important in large- of paper and ink, the natural noise and the noise appearing
scale applications where manufacturers have to deal with and while acquiring the images are the typical challenges in this
keep track of huge data, typically millions of datasets. Since domain. Mainly these issues motivate us to design a deep
learning-based model for the said task, as conventional feature
Identify applicable funding agency here. If none, delete this. engineering-based methods may fail to yield the desired result
II. R ELATED W ORK binary level and morphological operations are applied to obtain
a clean, connected representation using thinned lines. The
Although hand-drawn circuit component recognition is not diagram is comprised of nodes (a node is any point on a circuit
a well-explored research area, still literature survey reveals where the terminals of two or more circuit elements meet),
that some researchers have attempted this in the past, few of connections and components. Using appropriate threshold on
those are briefed here. Dewangan and Dhole [5] use a strategy a spatially varying object pixel density, nodes and components
utilizing K-nearest neighbor (KNN) classifier to build a system of the image are segmented. By the use of shape-based features
that directly reads the electrical circuit components from a and SVM classifier, components and nodes are classified.
hand-drawn circuit image. The feature vector is prepared However, the accuracy of classification is not mentioned. Also,
using a shape-based feature extraction process considering no information is available about the size of the dataset and
the geometric properties of the circuit components. Analog total classes used.
components, divided into 10 classes, are used to obtain an Patare and Joshi [14] propose a method to recognize com-
accuracy of around 90many major circuit components like ponents in a hand-drawn digital logic circuit diagram. This
digital gates, transformers are not taken into consideration and system uses a region-based segmentation method to segment
the number of circuit components used in the dataset is not circuit sketch and classifies each component using SVM that
specified. uses Fourier descriptor as the feature vector. However, an
In another work, Feng et al. [6] rely on a two-dimensional average of only 83circuit recognition accuracy is achieved.
dynamic programming (2D-DP) technique allowing symbol Edward and Chandran [15] propose a method in which the
hypothesis generation, which can correctly segment and rec- scanned image of a diagram is pre-processed to remove noise
ognize interspersed symbols. Besides, as discriminative clas- and then converted to a bi-level intensity image. Morpho-
sifiers usually have limited capability to reject outliers, some logical operations are applied to obtain a clean, connected
domain-specific knowledge is included to circumvent those representation using thinned lines. The diagram comprises
errors due to untrained patterns corresponding to erroneous nodes (a node is any point on a circuit where the terminals
segmentation hypotheses. With a point level online mea- of two or more circuit elements meet), connections, and
surement, the experiment shows that the proposed approach components. Nodes and components are segmented using
can achieve an accuracy of more than 90However, very few appropriate thresholds on a spatially varying object pixel
components (only 9) are used and the components are drawn density. Connection paths are traced using a pixel stack.
using a digital pen on the digital surface Nodes are classified using syntactic analysis. Components are
A system of offline circuit recognition and simulation classified using a combination of invariant moments, scalar
using digital image processing is proposed by Angadi and pixel-distribution features, and vector relationships between
Naika [7]. The model consists of four stages, namely pre- straight lines in polygonal representations. The node recog-
processing, segmentation, support vector machine (SVM) [8]- nition accuracy of 92and component recognition accuracy of
based circuit component classification and simulation stages, 86is achieved on a database comprising 107 nodes and 449
i.e., the authors propose a complete system. In this work, components. However, only 9 different components are taken
different shape-based features like average component height, into consideration and the size of the dataset is quite small.
inclination, and entropy of the components are used. However, Digital components are not taken into consideration in this
no data regarding the number of components used, accuracy, work.
and size of the dataset are available. A topology-based segmentation method to segment circuit
Moetesum and Younus [9] present an effective technique sketch and classify each component using the Fourier descrip-
for the segmentation and recognition of electronic components tor as a feature vector for SVM is done by Liu and Xiao [16].
from hand-drawn circuit diagrams. Segmentation is carried out An accuracy rate of over 90is achieved for each component.
by using a series of morphological operations [10] on the However, only 5 classes are considered and the dataset consists
binarized images of circuits and discriminating between three of only 55 components per class.
categories of components (closed shape, components with Dreijer [17] aims to create an alternative to the common
connected lines, disconnected components). Each segmented schematic capture process through the use of an interactive
component is characterized by computing the Histogram of pen-based interface to the capturing software. Sketches are
Oriented Gradients (HOG) descriptor [11] while classification interpreted through a process of vectorizing the user’s strokes
is carried out using an SVM classifier [12]. A segmentation into primitive shapes, extracting information on intersections
accuracy of 87.70and a classification rate of 92.00is realized between primitives, and using a naı¨ve Bayesian classifier to
demonstrating the effectiveness of the proposed technique. identify symbol components. Training data are generated by
However, the dataset is quite small consisting of 35 compo- mutating a single definition of each symbol. The symbols
nents per class having 10 classes in total. Circuit components are divided into 14 classes, each class consisting of 100
like digital gates and transistors are not taken into considera- symbols. The overall accuracy obtained was average. The
tion. system confuses diodes with boxes and not gate with dc source
In the work by Veena and Naik [13], the scanned image of thus having a very low accuracy of 29and 50, respectively.
a diagram is pre-processed to remove noise and converted to Naika et al. [18] try to recognize hand-drawn electronic
components (analog only) using a HOG-based features and much needed. Some machine learning along with conventional
subsequently used an SVM classifier. The authors have exper- feature engineering-based (e.g., [5, 7, 15]) approaches has
imented on a dataset having 2000 isolated circuit component also set foot to solve this problem and recorded a promising
images and the proposed method yields a 96.90recognition rate outcome. However, on the downside, these methods have
on a 10-class problem. In their work, they have only used some compromised with a number of circuit components (e.g., 5
electrical components while completely ignored the digital part in [16], 9 in [15], and 10 in [9]) and a variety of hand-drawn
and important analog components like ammeter and voltmeter. circuit components (e.g., 55 samples per class in [16], total 449
An artificial neural network (ANN)-based model is used by samples in [15] and 35 samples per class in [9]). Additionally,
Rabbani et al. [19] to make a system that can directly read while classifying the circuit components, these works do
the electrical symbols from a hand-drawn circuit image. The not take any measure for similar-looking circuit components
recognition process involves two steps: The first step is shape- (e.g., OR and NOR GATE, PNP and NPN transistor, and
based feature extraction, and the second one is classification ammeter and voltmeter) that might generate high inter-class
using ANN that uses a backpropagation algorithm. The ANN misclassification. Therefore, framing a more generic model
is trained and tested with different hand-drawn electrical for the recognition of offline hand-drawn circuit components
circuit component images. The results show that their proposal is indeed required.
is viable, but the accuracy obtained is much lower and the In the work [18], the authors have recorded satisfactory
dataset used is very small in size. recognition accuracy using a texture-based feature (i.e., HOG)
Recently, Roy et al. [20] have proposed a method for for a 10 class problem. In this work, texture representation
the recognition of hand-drawn electrical and electronic cir- using a customized CNN has been extracted. In this context,
cuit components, with both analog and digital components it should be mentioned that there are many recently published
included. In this method, the pre-processed images of circuit good works where authors have claimed CNN can identify
components are used for training and testing a recognition texture-based features with ease [26–28]. In these research
model using a feature set consisting of a texture-based feature articles, it has been mentioned that convolutional layers in any
descriptor, called HOG, and shape-based features that include CNN model generate different feature maps and these feature
centroid distance, tangent angle, and chain code histogram maps can be thought of as filter banks with higher complexity
[21]. Besides, the texture-based feature is optimized using a with the increase of depth. These feature maps are powerful
feature selection algorithm called ReliefF and then classified tools to extract texture features [29] and these feature maps
using sequential minimal optimization (SMO) classifier [22]. have been widely used in texture analysis. Additionally, the
It is to be noted that the current work is an extension of this authors of the work [30] have conducted different experiments
work. to provide evidence that CNN relies on object textures rather
The present authors come across some works which are than global object shapes as commonly assumed. Hence, the
aligned to the current work that include sketch symbol recog- authors have introduced Shape-ResNet (a modified version of
nition (e.g., Deufemia et al. [23]), identifying handdrawn ResNet-50) to overcome the texture biases of commonly used
graphics components from the textual parts (e.g., Avola et al. CNN models. These works establish the fact that CNN-based
[24, 25]) in an online handwritten document using machine deep learning models can be used to extract texture features
learning-based approaches. The authors of the work [24] from an object. More such claims could be found in the survey
have used discriminative features like entropy, band ratio, paper [31]. However, not all CNN architecture may work well
X scan, intersection and projection, and SVM as classifier. for all types of image classification problems. Therefore, here
While use of features like curvature and linearity along with a customized CNN model is designed.
the 6 features used in the work [24] and extreme learning Considering the facts mentioned above, a CNN-based model
machine (ELM) as classifier could be found in the work [25] to is designed to extract texture features and eventually recognize
perform classification of drawing symbols and texts. In another hand-drawn circuit components satisfactorily. Concisely, the
work, Deufemia et al. [23] propose a two-stage clustering- highlights of the present work are as follows:
based approach for labelling different types sketched symbols. Grouping of similar-looking circuit components using a
In the first stage, the authors use latent-dynamic conditional single-stage CNN-based classification and visual percep-
random field (LDCRF) to analyze the features of unsegmented tion.
stroke sequences based on spatiotemporal information of the Designing CNN models for each group obtained in the
strokes, and then select the stokes of symbol part(s) using first stage (i.e., for classification of circuit components
the contextual information. In the later stage, they group the within a group).
previously labelled stokes into symbol labels using a distance- Obtained more than 10more recognition accuracy than
based clustering technique. while recognizing the circuit components with a single-
stage classification using the CNN model.
III. M OTIVATION AND CONTRIBUTIONS
The above discussions reflect that though few attempts have IV. P RESENT W ORK
been made by the researchers in the past, still the necessity The present work deals with the classification of hand-drawn
of circuit component recognition with substantial efficiency is electrical and electronics circuit components using a two-stage
framework where the CNN model is used as a backbone
architecture. In the first stage, the authors have performed
group-level classification (i.e., classifying a component as one
of the predefined groups) while in the second stage group-
specific classification (i.e., intra-group classification) using
the same CNN architecture is performed. It is to be noted
that the predefined set of groups for visually similar looking
circuit components (e.g., AND 13370 Neural Computing and
Applications (2021) 33:13367–13390 123 gate and NAND
gate or Ammeter and Voltmeter) is formed with the help of
the confusion matrix obtained by a single-stage classification
model and visual exploration. In this section, first the data
preparation technique is described and then the customized
CNN model designed here. The entire process in Algorithm 1
is described here, while the associated processes are described
in the following subsections.

A. Data preparation
To evaluate any recognition model, a suitable database is a
prerequisite. However, to the best of the author’s knowledge,
no such dataset is available publicly for hand-drawn circuit
component recognition. Therefore, they have prepared an in-
house dataset containing circuit components images that are
drawn by engineering students, research scholars, and faculty
members. Components belonging to 20 different classes have
been collected in a preformatted sheet, similar to the works
[32, 33]. Also there is no constraint in ink colour used for
drawing the samples. A sample of the filled-in datasheet is
shown in Fig. 1. The class index for each circuit component Fig. 1. A sample filled-in datasheet containing hand-drawn electrical and
as defined is shown in Table 1. In this context, it is to be electronic circuit component images
mentioned that the components are drawn by the contributors
in a complete unconstrained fashion, thus bearing several
drawing styles for the components which have, in turn, helped During the forward pass, each filter is convolved with the input
the authors to establish the robustness of the proposed model. image using a kernel. The output volume of a convolutional
For each circuit component, 150 sample images are collected layer is obtained by stacking the feature maps generated using
which have been resized to 64 * 64 pixels. all filters along the depth dimension. It helps in extracting
features from the input image. While extracting the features,
Algorithm 1 : Classification of the circuit component images it preserves the spatial relationship between the pixels by
Input: Images of the circuit components
Output: Predicted class for the input images extracting image features using small squares of input kernel
[36].
In this work, the original binarized input image of dimension
B. Customised CNN M * N * 1 is fitted, where M and N are original image
It is already mentioned that the authors have classified 20 height and width, respectively, to the initial convolutional layer
different hand-drawn electrical and electronic circuit compo- having n1 number of filters. As a result, it generates n1 number
nents using a simple CNN-based model, which hereafter is of feature maps at the end of the first convolutional layer.
called as customized CNN. In this section, the overall archi- Later, in each convolutional layer, feature maps of dimension
tecture of this customized CNN model has been described. Mi *Ni * nj (i = 2; 4; 6; 8; j = 1; 2; 3; 4 as shown in Fig. 2) are
In this network, first, five convolution layers with successive fed, where Mi, Ni and nj denote the height and width of each
max-pooling layers are used and then the final feature map feature map, and the number of feature maps, respectively.
is flattened in a linear form which is hereafter called flatten The values of Mi and Ni are suitably chosen depending upon
vector. Finally, two fully connected (FC) layers are used of the original image height (M) and width (N). A filter with
which the last layer maps to output classes. The CNN model dimension mj * mj * nj is used where mj and nj (j = 1; 2;
is described in brief in the following subsections. 3; 4; 5) represent the length of a square-shaped filter and the
1) Convolution Layer: Convolutional layers are the fun- number of filters (sometimes called activation functions) in
damental building blocks of any CNN model [34, 35]. The each convolutional layer. In the performed experiments, the
parameters of each layer consist of a set of learnable filters. authors set mj = 3; 8j while varying the value of nj. They
TABLE I
VARIOUS CLASSES OF ANALOG AND DIGITAL CIRCUIT COMPONENTS THAT ARE COLLECTED UNDER THE SCOPE OF PRESENT WORK

Class Component name Class Component name Class Component name Class Component name
1 AC source 2 Ammeter 3 AND Gate 4 Capacitor
5 DC source 6 Ground 7 Inductor 8 NAND Gate
9 NOR Gate 10 NOT Gate 11 NPN Transistor 12 OR Gate
13 PN Junction Diode 14 PNP Transistor 15 Power Supply 16 Resistor
17 Switch 18 Transformer 19 Voltmeter 20 Zener diode

also set M ¼ N ¼ 64 for the simplicity of the model. For the as probabilities. The graph of this function is shown in Fig.
convenience of the common readers, the detailed architecture 3a.
ezi
of the convolutional layers is shown in Table 2 σ(z)i = PK
2) Pooling Layer: The pooling layer is used for reducing j=1
the spatial dimension (i.e., height and width) of any feature for i = 1,..., K and z = (z1 , z2 , ..., zK )(1)
map and doing so, helps in reducing the computational power 5) ReLU: : ReLU is one of the most popular activation
required to process the data through dimensionality reduction. functions used in deep learning. If the input to the function
Besides, it is useful for generating rotational and positional is negative then the output is going to be 0 while for the
invariant dominant features from input images, thus lessening positive inputs the output remains the same as the input is. The
the chance of overfitting during training. The commonly used mathematical form of the function is shown in Eq. (2). The
variants of the pooling strategy are max-pooling and avg- graph of this function is shown in Fig. 3b. The main advantage
pooling [23]. In the proposed work, avg-pooling technique is of using ReLU in deep learning is, in a neural network all
used with kernel dimension kj * kj (where, j= 1; 2; ...; 5). The the neurons are not activated at the same time. Those getting
value of kj may be varied but, in the performed experiments, negative input are deactivated by the ReLU function.
the commonly used value, i.e., kj = 2; 8j, is chosen.
f (x) = max(0, x) (2)
3) Fully connected layer: At the end of convolution layers,
all the feature maps are flattened linearly as vectors and then
fed to FC layers that are similar to traditional multilayer per-
ceptron models with an activation function [37] like SoftMax,
sigmoid, ReLU and tanh in the output layer. In the FC layers,
MSELoss loss is minimized by the Adam optimizer with
gradient 1e4 and the corresponding learning rate is 0.0005.
The output layer is another vector that represents the number
of classes to be recognized by the underlying CNN model.
During training, the batch size is set to 10 whereas the number
of epochs is 150. Information on FC the layers of the proposed
customized CNN model is provided in Table 2.
4) Activation Function: In a neural network, activation
function is a function that decides whether a neuron should
be activated or not by calculating a weighted sum and further
adding bias with it. A neural network without an activation
function is essentially just a linear regression model. It is used
to introduce non-linearity into the output of a neuron and using
this, it makes a neural network model capable to learn and per-
form more complex tasks. In this section, two such commonly
used activation functions have been described. SoftMax: The
SoftMax function is often used in the final layer of a neural
network-based classifier. SoftMax is a function that takes input
as a vector of K real numbers (say, 2 z1;z2...;zK RK), and Fig. 2. Architecture of the customized CNN model used here. The variables
Mi and Ni (i=1,2,...,10) represent the height and width of feature maps,
normalizes it into a probability distribution consisting of K respectively, whereas nj (j=1,2,...,5) represent the number of feature maps
probabilities (say, r z i ) proportional to the exponentials of (i.e., the number of filters) at this stage. M and N represent the height and
the input numbers as shown in Eq. (1). Some components width of the input image at the beginning, respectively. The variables kj and
mj (j=1,2,...,5) represent the length of square-shaped pooling mask (at jth
in the last fully connected layer could be negative or greater pooling) and kernel (at jth convolutional layer), respectively
than one and might not sum to 1. However, after applying
SoftMax, each component will be in the interval [0, 1] and the The value of C (i.e., the number of output classes) is varied
components will add up to 1 so that they can be interpreted as per requirement
TABLE II and 68.40 GB Disc space and Python 3 is used as a python
A RCHITECTURE OF THE CUSTOMIZED CNN MODEL THAT IS USED interpreter. Experiments are performed using 70and 30of the
THROUGHOUT THIS WORK
entire dataset as training and test samples, i.e., 105 and
Layer Input dimension feature maps(i.e., ni) Output dimension 45 samples are used per circuit component as train and
Conv1 64 *64 * 1 8 64 * 64* 8 test samples. During CNN models’ training, a set of data
Pool1 64 * 64 * 8 32 * 32* 8 augmentation techniques are applied in train samples, which
Conv2 32 * 32 * 8 16 32 * 32 *16
Pool2 32 * 32 *16 16 * 16 * 16 are rotation, dilation, erosion, and skeletonization. The batch
Conv3 16 * 16 * 16 32 16 * 16 * 32 size and number of epochs are set to 10 and 200, respectively,
Pool3 16 * 16 * 16 8 * 8 * 32 for all the following experiments.
Conv4 8 * 8 * 32 64 8 * 8* 64
Pool4 8 * 8 * 64 4 * 4 * 64
Conv5 4 * 4 * 64 128 4 * 4 * 128
Pool5 4 * 4 * 128 2 * 2 * 128
FC1 512 - 64
FC2 64 C

C. Grouping of circuit components


The customized CNN model (see Table 2) is used to classify
the hand-drawn circuit components into 20 different classes
using the 7:3 train and test split of the entire dataset. A
recognition accuracy of 86.25is achieved at this phase. It is Fig. 3. Graphical representation of a SoftMax and b ReLU functions
to be noted that the batch size and number of epochs are
set as 10 and 150, respectively, during the experiment. Next,
using the confusion matrix, shown in Fig. 4 along with the
visual understanding, the circuit components are clustered
into four different groups as shown in Table 3. To form the
groups, the authors rely on inter-class misclassification. Let,
CM (i,j)
P for i,j= 1,2, ..., 20 represent the confusion matrix.
Then PCM (i,j
C(i) > δ , (8i*6= j) is checked and i,j=1, ...,
20, where δ = [0, 1] is a predefined threshold value and,
C(i) is the number of test samples in ith class and jth class,
respectively. If the condition is true, then they are put into the
same group, i.e., by initial screening of C(i,j) . In the authors’
case, δ = 0.1 is chosen and the corresponding cells (i.e., inter-
class misclassification) that satisfy the condition are marked in
Fig. 4. Later, through visual screening, the groups are finalized
as shown in Table 3.
V. R ESULT AND D ISCUSSION
In this work, a hand-drawn analog and digital circuit
component classification is performed. To start with, first,
the circuit components are grouped into 4 groups (see Sect.
4.2) using information from the confusion matrix (see Fig.
4) obtained from the customized CNN and visual perception.
Next, the two-stage CNN-based classification is applied where Fig. 4. Confusion matrix of the circuit component classification using
first, an input circuit component is classified into any one the customized CNN. Intra-group misclassifications (Group 1: yellow color,
of the specified groups using group-level classification, and Group 2: green color, and Group 3: light blue color) are marked therein (color
figure online)
then the same is further classified as one of the components
of the group to which it has been classified into. In this
section, describe experimental setups, performance metrics, B. Performance metrics
classification performances of two-stage mechanisms, perfor- To evaluate the model performances of the present dataset
mances obtained using varied CNN architecture, and error case five popularly used performance metrics have been used that
analysis are described one after another. are recall, precision, F1-score, accuracy and specificity [32,
A. Experiment setup 39]. These metrics are defined using the following equations.
Google Colab [38] has been used to run the code with TP
runtime type as “GPU.” Google provides 12.72 GB RAM Recall = (3)
TP + FP
TP
P recision = (4) be mentioned that while citing the performance of the group-
TP + FN specific CNN models (i.e., Model 1 to Model 4 in Fig. 5), any
2 ∗ recall ∗ precision misclassification of the prior stage I not considered, i.e., all
F 1 − score = (5)
recall + P recesion the accuracies shown here are obtained on the actual test set
TP + TN only.
Accuracy = (6)
TP + FP + TN + FN 1) 4 circuit components belong to this specific group and
TN 96.11recognition accuracy is obtained here. The confu-
Specif icity = (7) sion matrix is given in Table 5.
TN + Fp
2) Group 2 consists of 2 circuit components.100recognition
In these equations, TP, TN, FP and FN represent true accuracy is obtained in this case.
positive, true negative, false positive and false negative. TP 3) It consists of 3 different circuit components and in
represents the number of circuit components of some class this case, 91.85recognition accuracy is obtained. The
(say, C) classified as components of class C while TN is the confusion matrix is given in Table 6.
number of circuit components not belonging to class C has 4) It contains 11 circuit components and recognition ac-
been classified as components of class other than C. Similarly, curacy of 98.78is obtained while recognized using the
FN and FP indicate the count of circuit components of class customized CNN model. The confusion matrix is pro-
C classified as components of class other than C and number vided in Table 7
of circuit components misclassified as components of class C.
E. Performance of the two-stage classification model
C. Performance on group level classification
In the first stage, an input circuit component is predicted into
It is already mentioned that the components are grouped
one of the specified groups and then it is further classified into
into 4 major groups using an initial level classification and
one of the target classes within the group. In the two-stage
visual perception (refer to Sect. 4.3). A classification accuracy
model, an overall recognition accuracy of 97.33is obtained.
of 99.77in the group level classification is obtained using the
The confusion matrix representing this result is shown in
CNN model (refer to Table 2) and the confusion matrix is
Fig. 6. If confusion matrices shown in Figs. 4 and 6 (the
shown in Table 4. During this classification, only a single
highlighted cells) are compared, then it can be see that a signif-
component of Group 1 has been misclassified as Group 4
icant improvement for intra-group classification is obtained by
component, and eventually, it will be misclassified when it
using the two-stage CNN model. The comparative recognition
passes through the group-specific classifier of Model 4 as in
accuracies for all the circuit components using single- and
that model there is no component belonging to this sample
two-stage classification scenarios is also provided in Fig.
7. Additionally, the authors have provided the comparison
on different performances like recall, precision and F1-score
of the single-stage and present two-stage CNN model in
Fig. 8. The results are recorded in component-wise manner.
From these results, it could safely be commented that better
recognition accuracy is obtained using the two-stage model
than that of using a single-stage classification protocol for
almost in each of the circuit components considered here.

TABLE III
S HOWS THE CONFUSION MATRIX OF GROUP - LEVEL CLASSIFICATION OF
THE CIRCUIT COMPONENTS

Group 1 Group 2 Group 3 Group 4


Group 1 179 0 1 0
Fig. 5. 5 Schematic representation of the two-stage CNN-based model used Group 2 0 90 0 0
for hand-drawn electrical and electronic circuit component recognition. In this Group 3 1 0 134 0
Figure, CNN Models 1–4 represent the models that are used for corresponding Group 4 0 0 0 495
group-specific classification while CNN Model 0 represents the model that is
used for group-level classification. Model Selector decides the CNN model
(i.e., one of the 4 CNN Models) to which an input circuit component will be
fed after it is assigned to a specific group F. Performance on the augmented dataset
The present dataset contains only 150 samples per circuit
D. Performances of group-specific CNN models component and these samples are collected in a constrained
After obtaining the group level information of a test sample, condition. Therefore, to test the effectiveness of the proposed
it passes through the group-specific CNN model to get its method in unconstrained conditions and with increased data
actual class label. In this section, the performances of each samples, first more samples are created by applying some
group-specific classifier is provided. In this context, it is to offline augmentation techniques on the actual data, then apply
the proposed method on the increased dataset. The augmented [35]. The comparative results are shown in Fig. 12. From this
data are prepared by first rotating each actual circuit com- figure, it is found that the network current pooling strategy
ponent image by angles -5,-10,+5and +10, respectively, and (i.e., avg-pool) yields the highest accuracy in all of the cases
subsequently adding some amount of Gaussian noise to the for this dataset
actual as well as on the circuit component images rotated by 2) Performance by changing the number of the convo-
-5and +5 , respectively. Some examples of augmented samples lutional layer: In this case, some models are considered that
along with their original versions are shown in Table 8. The are different from the number of convolution layers. Conv-
increased dataset thus prepared contains 1050 samples per 5 of the base model (refer to Fig. 2) is removed and one
class (150 original, 150 * 4 =600 rotated circuit images and more convolutional layer after conv-is added separately to
150 * 2 = 300 Gaussian noise added samples). The same exper- obtain two new CNN models having four and six convolutional
imental setup as mentioned before on the augmented dataset layers, respectively. In all these models, the avg-pool strategy
is followed. The recognition accuracy obtained is 99.34. The is used. The comparative results are shown in Fig. 13. From
confusion matrix is shown in Fig. 9 while comparative results these results, it has been found that the network with 5
for all the circuit components using single- and two-stage convolution layers (i.e., current model) yields the highest
classification scenarios in Fig. 10. The results in terms of accuracy in all of the cases for this dataset
recall, precision and F1-score for each circuit components
are also shown while testing on the augmented dataset. The
results are shown in Fig. 11. The rise in recognition accuracy
is due to the increased number of training samples per circuit
component generated through data augmentation. These results
clearly show that increase in data samples helps the system to
learn the input patterns more accurately.

TABLE IV
S HOWS THE CONFUSION MATRIX WHEN CIRCUIT COMPONENTS OF G ROUP
1 ARE CLASSIFIED

AND Gate NAND Gate NOR Gate OR Gate


AND Gate 42 2 0 1
NAND Gate 0 45 0 0
NOR Gate 0 3 42 0
OR Gate 0 0 1 44

TABLE V
S HOWS THE CONFUSION MATRIX OF THE CIRCUIT COMPONENTS
BELONGING TO G ROUP 3 Fig. 6. 6 Overall confusion matrix obtained by applying the present two-stage
CNN model-based model while classifying the hand-drawn 20 different circuit
AC source Ammeter Voltmeter components. Intra-group misclassification of the first three groups (Group 1:
AC source 44 0 1 yellow colour, Group 2: green colour and Group 3: light blue colour) are
Ammeter 2 43 0 marked therein (color figure online)
Voltmeter 3 4 37

G. Tuning of parameters
As no work has been done before on this particular domain
using a deep learning model. Hence, comparative analysis with
the other deep learning models that can’t be performed have
been used to solve similar problem. However, in this section,
more experiments are performed by fitting different CNN
architectures in the proposed work. In this context, it is to be
mentioned that considering the CNN model, described in Table
2, as the base model, and by removing/adding convolutional
layers from/to a standard CNN model, many more models can
be obtained. However, for simplicity, two such variations are
Fig. 7. Comparative recognition accuracy for each circuit component using
tried with that are described in the following subsections. single-stage and two-stage classification models on the present dataset
1) Performance by changing the pooling type: In the cus-
tomized CNN model, avg-pool strategy is used. Therefore, in 3) Performances with varying number of epochs: In the
this experiment, avg-pool is substituted by max-pool strategy third experiment, the number of epochs are varied to check
TABLE VI
S HOWS THE CONFUSION MATRIX OF THE CIRCUIT COMPONENTS BELONGING TO G ROUP 4

Capacitor DC Ground Inductor NOT PN Power Resistor Switch Transformer Zener


source Gate junction supply diode
diode Capacitor
45 0 0 0 0 0 0 0 0 0 0
DC 0 45 0 0 0 0 0 0 0 0 0
Source
Ground 0 0 45 0 0 0 0 0 0 0 0

Inductor 0 0 0 45 0 0 0 0 0 0 0
NOT 0 0 0 1 43 1 0 0 0 0 0
Gate
PN 0 0 0 0 0 44 0 0 1 0 0
Junction
Diode
Power 0 0 0 0 0 0 45 0 0 0 0
Supply
Resistor 0 0 0 0 0 0 0 45 0 0 0
Switch 0 0 0 0 2 1 0 0 45 0 0
Transformer 0 0 0 0 0 0 0 0 0 45 0
Zener 0 0 0 0 0 2 0 0 1 0 42
diode

Fig. 8. Comparative recall, precision, specificity and F1-scores for each circuit
component using single-stage and two-stage classification models

the characteristics of the models that can be constructed using Fig. 9. 9 Overall confusion matrix obtained by applying the present two-
the model descriptions of Sects. 5.7.1 and 5.7.2. In total, 6 stage CNN-based classification model on the augmented dataset. Intra-group
misclassification of the first three groups (Group 1: yellow colour, Group 2:
different models have been constructed. The comparative test green colour and Group 3: light blue colour) are marked therein (color figure
accuracies at different iterations of all these models have been online)
shown in Figs. 13, 14, 15, 16, 17. In these Figures, a model
“Conv X Y-pool” indicates it uses X number of convolutional
layers and Y (avg/max) type of pooling strategy. From Figs. methods, i.e., Dewangan and Dhole [5], Rabbani et al. [19],
14, 15, 16, 17, 18, it is observed that in the customized CNN Naika et al. [18], Liu and Xiao [16], Angadi and Naika
model (i.e., 5 convolution layers with average pooling layer), [7], Moetesum et al. [9], Veena and LakshmanNaik [13],
the accuracy increases rapidly up to a certain number of epochs Patare and Joshi [14] and, Roy et al. [20] and three standard
and then it stabilizes to a constant value. Increasing the number deep learning models, namely AlexNet [40], ResNet [41] and
of epochs further does not improve the results. So, from these MobileNet [42]) have been implemented.
results it can comment that the customized CNN used here These methods have been trained and tested on the same
is more consistent than its alternatives. Besides, these results data (train test split is described in Sect. 5.1) for all the cases.
show that the network using 5 convolutional layers and an The comparative accuracy results are shown in Table 9. This
average pooling strategy provides us the best result in all the Table also includes the performance of present customized
cases. CNN when it is applied following a single-stage approach (i.e.,
no group level classification performed) and termed as “single-
H. Comparison with state-of-the-art methods stage CNN.” Comparisons are also made in terms of average
To compare the performance of the proposed model with recall, precision F1- score, accuracy and specificity. The same
state-of-the-art methods, more experiments are performed. For experiments have been repeated on the augmented dataset as
this reason, 12 different methods (among these nine existing well and comparative results are shown in Table 10. Here also,
Fig. 10. 0 Comparative recognition accuracy for each circuit component using
single-stage and two-stage classification models on the present augmented
dataset Fig. 13. Comparative results by varying the pooling type

1) Error case analysis: Though the recognition accuracy


of the proposed model is quite impressive, still some mis-
classification produced by this model (see Fig. 6) has been
observed. Some of the notable cases for misclassification is
described in this section.
• Most intra-class misclassification (8.14) occurs for the
circuit components belonging to Group 3. The reason
behind this is their strong structural similarity.The shape
inside the circle of an Ammeter is concave downward
and a Voltmeter is concave upward, whereas the shape of
an AC source is a combination of the two as shown in
Fig. 20. One such misclassified sample is shown in Table
Fig. 11. Recall, precision, specificity and F1-score for each circuit component 11 (sample number 3).
using two-stage classification models for the augmented dataset

all the methods have been evaluated on the dataset division


described in Sect. 5.6. From these results, it can be safely
stated that the proposed method (i.e., two-stage classification
approach) outperforms the state-of-the-art methods considered
here for comparison

Fig. 14. Variation of accuracy for Group level classification

• It has been observed that a considerable amount of mis-


classification (3.88) among circuit components of Group
1. Upon analysis of the confusion matrix for this group
(refer to Table 5), then it has been found that 2 AND Gate
samples (out of 3 misclassified samples) are classified as
Fig. 12. Comparative results by varying the pooling type
NAND Gate. These two Digital gates differ from each
other by a small shape dissimilarity, shown in Fig. 21.
Sample 2 of Table 11 shows one such misclassified circuit
I. Testing time analysis
component.
In Fig. 19, the variation of time taken by a model to test • Some errors have also been found when the samples
a sample is shown. Here it is clearly seen that the time gets are erroneously drawn (see sample 1 and sample 4 of
increased with increase in convolution layer. Also, between Table 11). Sample 1 of Table 11 is misclassified during
two models with same number of convolution layers, the group level classification, while sample 4 of this table is
model with max pool takes more time to test a sample. misclassified during group-specific classification.
Fig. 17. Variation of accuracy for Group-Specific Classification for Group 3

Fig. 15. Variation of accuracy for Group-Specific Classification for Group 1

Fig. 18. Variation of accuracy for Group-Specific Classification for Group 4

Fig. 16. Variation of accuracy for Group-Specific Classification for Group 2


practical cases right now. However, more work on this topic
can help to come up with a fool proof system in the future
VI. C ONCLUSION
by using state-of-the-art deep learning-based object detection
In this paper, a two-stage CNN-based model that recognizes and recognition model.
various hand-drawn analog and digital circuit components
from their optically scanned version has been proposed. In
the first stage, all the similar-looking (i.e., similar shape and
structure) circuit components are grouped into a single group
and in the later stage, the circuit components belonging to the
same group are further classified into their actual class using
group-specific CNN models. When evaluated on a self-made
hand-drawn circuit component database, the proposed model
provides a reasonably good recognition accuracy (97.33) than
the single-stage classification (86) owing to the complex-
ity of the hand-drawn circuit components. Also the above
model obtains 99.34accuracy and 96.20accuracy with single-
stage classification using the augmented dataset. Although
the present technique provides a high degree of recognition
accuracy in most cases, a noticeable error of the proposed
model is its inability of recognizing very similar components
like (ammeter and voltmeter) with a higher accuracy which is
believed to be reduced by introducing novel approaches that
capture the local features more accurately.
Thus, in the future, the proposed approach can be improved
and thus can be used for industrial as well as academic
purposes. Here it should be mentioned that the present system Fig. 20. Structural similarity of AC source with voltmeter and ammeter
does not provide a 100correct result, so may not be applied in
TABLE VII
9 TABLE SHOWING COMPARISON OF THE PROPOSED METHOD WITH SOME STATE - OF - THE - ART METHODS AND SINGLE - STAGE CNN MODEL ON THE
ORIGINAL DATASET

Method Average classification performances (in )


Accuracy Recall Precision F1-score Specificity
Dewangan and Dhole [5] 81.92 84.68 85.29 84.76 99.21
Angadi and Naika [7] 84.64 84.64 84.98 84.62 99.27
Moetesum et al. [9] 84.90 85.01 85.29 85.15 99.23
Veena and LakshmanNaik [13] 84.72 84.83 85.13 84.91 99.24
Patare and Joshi [14] 85.43 85.51 85.74 85.56 99.22
Liu and Xiao [16] 80.21 84.53 84.70 84.50 99.29
Naika et al. [18] 83.41 85.54 85.80 85.50 99.30
Rabbani et al. [19] 77.40 83.90 84.13 84.02 99.26
Roy et al. [20] 93.63 94.78 94.84 94.77 99.71
AlexNet [40] 81.24 81.21 86.45 82.13 99.01
ResNet [41] 91.85 91.85 92.46 91.65 99.68
MobileNet [42] 73.76 73.76 83.11 73.22 98.68
Single-stage CNN 86.00 86.01 87.38 86.54 99.21
Proposed Work 97.33 97.33 97.43 97.32 99.96
a Bold face score indicate the best scores.

TABLE VIII
C OMPARISON OF THE PROPOSED METHOD WITH SOME STATE - OF - THE - ART METHODS AND SINGLE - STAGE CNN MODEL ON THE AUGMENTED DATASET

Method Average classification performances (in )


Accuracy Recall Precision F1-score Specificity
Dewangan and Dhole [5] 85.53 84.59 87.78 86.60 99.23
Angadi and Naika [7] 85.91 85.90 86.23 86.11 99.31
Moetesum et al. [9] 85.80 85.80 86.12 85.82 99.38
Veena and LakshmanNaik [13] 85.48 85.52 85.79 85.56 99.27
Patare and Joshi [14] 86.10 86.10 86.29 86.17 99.34
Liu and Xiao [16] 84.83 84.82 85.10 84.80 99.22
Naika et al. [18] 85.43 85.80 86.06 85.82 99.39
Rabbani et al. [19] 84.55 84.51 84.74 84.62 99.30
Roy et al. [20] 96.86 96.22 96.25 96.23 99.48
AlexNet [40] 94.77 94.73 95.26 94.91 99.79
ResNet [41] 98.61 98.64 98.67 98.65 99.81
MobileNet [42] 91.32 91.35 92.78 91.69 99.51
Single-stage CNN 96.20 96.16 96.35 96.71 99.37
Proposed Work 99.34 99.35 99.35 99.68 99.97
a Bold face score indicate the best scores.

VIII. R EFERENCES

R EFERENCES

[1] Majumder S, Ghosh S, Malakar S et al (2021) A voting-based technique


for word spotting in handwritten document images. Multimed Tools
Appl. https://doi.org/10.1007/s11042-020- 10363-0
[2] Malakar S, Ghosh M, Sarkar R, Nasipuri M (2019) Development of a
two-stage segmentation-based word searching method for handwritten
Fig. 21. The region of dissimilarity between (a) and (b) is shown within the document images. J IntellSyst 29:719–735
red bounding box [3] De Jesus EO, Lotufo RDA (1998) ECIR-an electronic circuit diagram
image recognizer. In: Proceedings SIBGRAPI’98. International Sympo-
sium on Computer Graphics, Image Processing, and Vision (Cat. No.
98EX237). IEEE, pp 254–260
A. Acknowledgement [4] Bhattacharya A, Roy S, Sarkar N, et al (2020) Circuit component
detection in offline hand-drawn electrical/electronic circuit diagram. In:
We would like to thank the CMATER research laboratory of IEEE Calcutta Conference (CALCON 2020). IEEE, Kolkata
[5] Dewangan A, Dhole A (2018) KNN based hand drawn electrical circuit
the Computer Science and Engineering Department, Jadavpur recognition. Int J Res ApplSciEngTechnol 6:1–6
University, India, for providing us the infrastructural support. [6] Feng G, Viard-Gaudin C, Sun Z (2009) On-line hand-drawn electric
circuit diagram recognition using 2D dynamic programming. Pattern
Recognit 42:3215–3223
VII. D ECLARATIONS [7] Angadi M, Naika RL (2014) Handwritten circuit schematic detection
and simulation using computer vision approach. Int J ComputSci Mob
Comput 3:754–761
Conflicts of interest The authors declare that they have no [8] Wang L (2005) Support vector machines: theory and applications.
conflict of interest. Springer
[9] Moetesum M, Younus SW, Warsi MA, Siddiqi I (2017) Segmentation [35] Malakar S, Paul S, Kundu S et al (2020) Handwritten word recog-
and recognition of electronic components in handdrawn circuit diagrams. nition using lottery ticket hypothesis based pruned CNN model:
EAI Endorsed Trans Scalable InfSyst 5:1–6 a new benchmark on CMATERdb2. 1.2. Neural ComputAppl.
[10] Hasan MA (2018) Introduction to digital image processing. Prentice- https://doi.org/10.1007/s00521-020-04
Hall Englewood Cliffs, Upper Saddle river [36] Use of convolutional neural network for image classification.
[11] Dalal N, Triggs B (2005) Histograms of oriented gradients for human https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-
detection. In: Computer Vision and Pattern Recognition, 2005. CVPR network-image-classification/:*:text=CNNs derive their name
2005. IEEE Computer Society Conference on. IEEE, pp 886–893 from,small squares of input data. Accessed 6 Jun 2020
[12] . Platt J (1998) Sequential minimal optimization: a fast algorithm for [37] Liu W, Wang Z, Liu X et al (2017) A survey of deep neural network
training support vector machines architectures and their applications. Neurocomputing 234:11–26
[13] Veena E, LakshmanNaik R (2014) Hand written circuit schematic [38] google colab. https://colab.research.google.com/. Accessed 6 Jun 2020
recognition. Sch J EngTechnol 2:681–684 [39] Malakar S, Sarkar R, Basu S et al (2020) An image database of
[14] Patare MD, Joshi MS (2016) Hand-drawn digital logic circuit component handwritten Bangla words with automatic benchmarking facilities for
recognition using svm. Int J ComputAppl 143:24–28 character segmentation algorithms. NEURAL ComputAppl 33:449–468
[15] Edwards B, Chandran V (2000) Machine recognition of handdrawn [40] Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification
circuit diagrams. In: 2000 IEEE International Conference on Acoustics, with deep convolutional neural networks. In: Advances inneural infor-
Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100). mation processing systems. pp 1097–1105
IEEE, pp 3618–3621 [41] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image
[16] Liu Y, Xiao Y (2013) Circuit sketch recognition. Dep Electr Eng recognition. In: Proceedings of the IEEE conference on computer vision
Stanford Univ Stanford, CA and pattern recognition, pp 770–778
[17] Dreijer JF (2006) Interactive recognition of hand-drawn circuit diagrams. [42] . Howard AG, Zhu M, Chen B, et al (2017) Mobilenets: Efficient
University of Stellenbosch, Stellenbosch convolutional neural networks for mobile vision applications. arXiv
Prepr arXiv170404861
[18] LakshmanNaika R, Dinesh R, Prabhanjan S (2019) Handwritten electric
circuit diagram recognition: an approach based on finite state machine.
Int J Mach Learn Comput 9:374–380 A. Publisher’s Note
[19] Rabbani M, Khoshkangini R, Nagendraswamy HS, Conti M (2016) Springer Nature remains neutral with regard to jurisdictional
Hand drawn optical circuit recognition. ProcediaComputSci 84:41–48
[20] . Roy S, Bhattacharya A, Sarkar N et al (2020) Offline hand-drawn
claims in published maps and institutional affiliations.
circuit component recognition using texture and shape-based features.
Multimed Tools Appl 79:1–21
[21] Sahoo S, Nandi SK, Barua S et al (2018) Handwritten Bangla word
recognition using negative refraction based shape transformation. J Intell
Fuzzy Syst. https://doi.org/10.3233/JIFS-169712
[22] Barua S, Malakar S, Bhowmik S et al (2017) Bangla handwritten city
name recognition using gradient-based feature. 5th International Con-
ference on Frontiers in Intelligent Computing: Theory and Applications.
Springer, Singapore, pp 343–352
[23] Deufemia V, Risi M, Tortora G (2014) Sketched symbol recognition
using latent-dynamic conditional random fields and distance-based clus-
tering. Pattern Recognit 4
[24] Avola D, Bernardi M, Cinque L, et al (2017) A machine learning
approach for the online separation of handwriting from freehand draw-
ing. In: International Conference on Image Analysis and Processing.
Springer, pp 223–232
[25] Avola D, Bernardi M, Cinque L et al (2020) Online separation of
handwriting from freehand drawing using extreme learning machines.
Multimed Tools Appl 79:4463–4481
[26] Tivive FHC, Bouzerdoum A (2006) Texture classification using con-
volutional neural networks. In: TENCON 2006–2006 IEEE Region 10
Conference. IEEE, pp 1–4
[27] Gatys L, Ecker AS, Bethge M (2015) Texture synthesis using convolu-
tional neural networks. In: Advances in neural information processing
systems. pp 262–270
[28] Lin T-Y, Maji S (2016) Visualizing and understanding deep texture
representations. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. pp 2791–2799
[29] Andrearczyk V, Whelan PF (2016) Using filter banks in convolutional
neural networks for texture classification. Pattern RecognitLett 84:63–69
[30] Geirhos R, Rubisch P, Michaelis C, et al (2019) ImageNet-trained CNNs
are biased towards texture: increasing shape bias improves accuracy and
robustness. In: International Conference on Learning Representations
[31] Liu L, Chen J, Fieguth P et al (2019) From BoW to CNN: two decades
of texture representation for texture classification. Int J Comput Vis
127:74–109
[32] Malakar S, Sharma P, Singh PK et al (2017) A holistic approach for
handwritten hindi word recognition. Int J Comput Vis Image Process
7:59–78. https://doi.org/10.4018/IJCVIP.2017010104
[33] Bhowmik S, Malakar S, Sarkar R et al (2019) Off-line Bangla hand-
written word recognition: a holistic approach. Neural ComputAppl
31:5783–5798
[34] Kundu S, Paul S, Singh PK et al (2020) Understanding NFC-Net:a deep
learning approach to word-level handwritten Indic script recognition.
Neural ComputAppl 32:1–17

You might also like