You are on page 1of 24

DAYANANDA SAGAR COLLEGE OF ENGINEERING

(An Autonomous Institute affiliated to VTU, Belagavi - 590018)


Accredited by NBA& National Assessment & Accreditation Council (NAAC) with ‘A’ grade

Internship Report on

Data Analytics

Submitted in partial fulfillment for the award of degree of

Bachelor of Engineering
in
Electrical and Electronics Engineering
Submitted by
VARNIKA A P
1DS17EE108
Under the Guidance of
Prof. SREEVIDYA T R
Asst. Professor
Dept. of E&E Engg.
DSCE, Bengaluru

VISVESVARAYA TECHNOLOGICAL UNIVERSITY


JNANASANGAMA, BELAGAVI-590018
2020-2021
1
External Organization Certificate

2
ABSTRACT

The increase in demand of the automobiles, leading to the increase in its manufacturing. CEAT is the
manufacturing company of tyres which is the basic necessity in building an automobile. The accuracy in the
manufacturing process is very much necessary. The importance of prints on the tyres during production and its
accuracy on the examining of prints which was done manually is to be changed to improve its accuracy. This is
done by using data analytics and deep learning.
In this internship, improving the examining accuracy of prints using data analytics which is carried out before the
mass production phase is done.
Worked on a text extraction model which can detect and extract the text information from various tyre images,
o Created the training dataset by annotating more than 4000 tyre images manually using Windows Label
tool software.
o Identified regions of interest in an image and extraction of the printed text.

3
ACKNOWLEDGEMENT

The Satisfaction and Euphoria that Accompanies the successful completion of seminar would be incomplete
without the mention of the people who made it possible with their guidance and blessings, whose constant
encouragement crowned my effort with success.

I consider my privilege to have studied in Dayananda Sagar College of Engineering. I express my gratitude to our
Principal, Dr. C P S Prakash for permitting us to utilize all the necessary facilities of the institution.

I owe a great sense of gratitude to our beloved Head of the Department, Dr. P Usha, Professor and HOD,
Department of Electrical and Electronics, DSCE for providing excellent academic environment.

I would also like to convey my sincere gratitude towards Mr. Amarendar Andhe, Manager, CEAT ltd for
bridging the gap between academic skills and industry skills which gave an amazing experience.

I feel to express my indebtedness and deep sense of gratitude to my guide Prof. Sreevidya T R, Assistant
Professor, Department of Electrical and Electronics, DSCE whose valuable guidance and motivation given to
me throughout.

I express my sincere thanks to all the Teaching and Non-teaching staff of Department of Electrical and Electronics
Engineering, for their kind and constant support throughout the academic Journey.

Lastly, we would like to express our deep appreciation towards our friends and our family for providing us with
constant moral support and encouragement.

VARNIKA A P

4
TABLE OF CONTENTS

ABSTRACT 03
CHAPTER 1.1: INTRODUCTION 08
CHAPTER 1.2: LITERATURE SURVEY 15
CHAPTER 1.3: OBJECTIVE OF THE WORK 16
CHAPTER 2 : CONVENTIONAL METHOD DESCRIPTION 17
CHAPTER 3 : DETAILNG OF WORK CARRIED OUT 18
CHAPTER 4 : TESTING, RESULTS AND ANALYSIS 22
CHAPTER 5 : CONCLUSION 23
RFERENCES 24

5
LIST OF FIGURES

Fig 1.1.1: SUPERVISED LEARNING 10


Fig 1.1.2: UNSUPERVISED LEARNING 11
Fig 1.1.3: REINFORCEMENT LEARNING 11
Fig 1.1.4: COMPUTER VISION 12
Fig 1.1.5: PIXELATED BW IMAGE 13
Fig 1.1.6: PIXELATED COLOURED IMAGE 13
Fig 1.1.7: RGB PIXELS 14
Fig 2.1 : LABELLED IMAGE 15
Fig 2.2 : IMAGE INDICATING THE UNIQUE SZE
AND PATTERN (FUELSSMART) OF A 4-WHEELER. 15
Fig 3.1 : NUMBERED TYRE IMAGES 18
Fig 3.2 : LABELLED TYRE IMAGES 19
Fig 3.3 : TOOL USED FOR EXTRACTION OF TEXT 20
Fig 3.4 : EXAMPLE OF EXTRACTION OF TEXT 21
Fig 4.1 : FLOW CHART OF TRAINING AND TESTING 22

6
CO-PO MAPPING

DAYANANDA SAGAR COLLEGE OF ENGINEERING


(An Autonomous Institute affiliated to VTU, Approved by AICTE & ISO 9001:2008 Certified)
Accredited by NBA & National Assessment & Accreditation Council (NAAC) with ‘A’ grade

DEPARTMENT OF ELECTRICAL & ELECTRONICS ENGINEERING


DETAILS OF INTERNSHIP
COURSE OUTCOMES

CO1 Ability to demonstrate the application of knowledge and skill sets acquired from the course
and workplace in the assigned job functions.
CO2 Solve real life challenges in the workplace by analysing work environment and conditions, and
selecting appropriate skill sets acquired from the programme.
CO3 An opportunity to develop a right work attitude, self-confidence, interpersonal skills and
ability to work as a team in a real organisational setting.
CO4 Exhibit professional ethics by displaying positive disposition during the programme
CO5 Communicate and collaborate effectively with different professionals in the work environment
through written and oral means
CO6 Ability to recognize the need and engage in life-long learning for professional growth.

CO-PO / PSO Mapping


CO’S\PO’S\P
SO PO PO PO PO PO PO PO PO PO PO1 PO1 PO1 PSO PSO PSO
1 2 3 4 5 6 7 8 9 0 1 2 1 2 3

CO1 3 2 1 1 2 2 2 1 2 3 3 2 2 1 3
CO2 3 3 2 - 2 3 2 1 2 1 1 2 2 2 2
CO3 - - 2 - - 1 3 3 3 2 1 3 - 1 -
CO4 - - - 2 - 3 2 2 2 3 - 3 - 1 -
CO5 2 2 - 1 3 2 - 2 2 1 3 2 - 2 -
CO6 - 1 - - 2 3 1 2 1 2 - 3 2 2 -

7
CHAPTER1
CHAPTER 1.1 - INTRODUCTION
CEAT is one of the leading companies in the manufacturing of the tyres for automobiles which is the most
important aspect of an automobile. Using data analytics, the speed and accuracy of the production process can be
increased by which the wastage is also reduced. The tyre brand, its size, model, age and condition monitoring are
critical for many vehicle users, especially fleet operators. Thus automating it is very much essential.

DATA ANALYTICS:
Data analytics is one of the leading technologies which is used to analyse the past and present data which helps
the organisations to make the sense of the data. Analysing the raw data and the insights obtained from it is very
essential for the growth of an organisation.
The raw data obtained should be cleaned i.e., the duplicates, unwanted data should be deleted and the outliers
(exceptions) should be removed by using some tools. This process is the first step before data analytics. The
whole process depends on this step where if any data goes wrong, the whole process will be wasted.
The raw data can be in any format such as images, CSV file, excel sheets etc. By using tools like python
programming language, MATLAB etc, we can clean the data as required. Here we used image analytics which is
used to identify the black letters on black background and got a clean data.
Data Analytics can:
Gather Hidden Insights – Hidden insights from data are gathered and then analyzed with respect to business
requirements.
Generate Reports – Reports are generated from the data and are passed on to the respective teams and individuals
to deal with further actions for a high rise in business.
Perform Market Analysis – Market Analysis can be performed to understand the strengths and weaknesses of
competitors.
Improve Business Requirement – Analysis of Data allows improving Business to customer requirements and
experience.
The obtained cleaned datasets are then labelled or unlabelled depending on which type of Machine learning n
Deep learning techniques are used. In the project I have worked on, the Machine learning and Deep learning
techniques are used for examining the prints on the tyres using algorithms.

8
MACHINE LEARNING
There are different types of Machine learning techniques:
o Supervised learning
o Unsupervised learning
o Reinforcement learning
 SUPERVISED LEARNING
Supervised learning is the machine learning task of learning a function that maps an input to an output based on
example input-output pairs. The dataset has to be labelled inorder to train the algorithms and obtain the predicted
outcomes accurately. It infers a function from labeled training data consisting of a set of training examples. Here,
the input data is provided along with the output to the model that has to be trained. The training process continues
until the model achieves a desired level of accuracy on the training data. Then they are tested on the testing data
for the verification of the trained model.
For Example: The Text classification problems - In this set of problems, the goal is to predict the class label of a
given piece of text. One particularly popular topic in text classification is to predict the sentiment of a piece of
text, like a tweet or a product review.

There are 2 subcategories of supervised learning:


o Classification
o Regression.
 CLASSIFICATION
Machine is trained to classify something into the required classes i.e., either 1 or 0.
Examples: 1. Classifying whether a patient has disease or not.
2. Classifying whether an email is spam or not.
 REGRESSION
Machine is trained to predict some value like price, weight or height i.e., real values.
Examples: 1. Predicting house/property price
2. Predicting stock market price

9
Fig. 1.1.1 – Supervised Learning

 UNSUPERVISED MACHINE LEARNING


Unsupervised learning is a type of algorithm that learns patterns from untagged data. The input data need not be
labelled in this type of learning. In this algorithm, we do not have any target or outcome variable to predict /
estimate. It is used for clustering population in different groups, which is widely used for segmenting customers
in different groups for specific intervention. The learning process happens by classifying the input data using the
algorithm and providing the required predicted outcome. This is also done by using training data until the accuracy
is achieved. Then it is tested on testing data for the verification. Only input data is provided to the model.
For Example: The variety of molecules are present and part of them are drugs and part are not which is unknown.
To predict if the molecules are part of drug or not we use unsupervised learning.
There are 2 ways by which the unsupervised learning can be made
o Clustering
o Association
 CLUSTERING
A clustering problem is where the inherent groupings are required in the data.
Example: Grouping the customers by their purchasing behaviour.
 ASSOCIATION
An association rule learning problem is where you want to discover rules that describe large portions of your
data.
Example: In a shopping place, people those who buy milk also tend to buy biscuits.

10
Fig. 1.1.2 – Unsupervised Learning

 REINFORCEMENT LEARNING
This type of learning is very similar to Human brain. Using this algorithm, the machine is trained to make
specific decisions. Here, the machine is exposed to an environment where it trains itself continually using trial
and error. This machine learns from past experience and tries to capture the best possible knowledge to make
accurate decisions.
For Example: Playing chess - the machine learns by itself by training itself rigorously based on the movement
of the Pawns.

Fig. 1.1.3 – Reinforcement Learning

11
DEEP LEARNING
Deep learning is a specific subset of Machine Learning, which is a specific subset of Artificial Intelligence. For
individual definitions:
o Artificial Intelligence is the broad mandate of creating machines that can think intelligently.
o Machine Learning is one way of doing that, by using algorithms to glean insights from data
o Deep Learning is one way of doing that, using a specific algorithm called a Neural Network
Deep Learning is a type of algorithm that seems to work really well for predicting things. Machine Learning has
been used for classification on images and text for decades, but it struggled to cross the threshold – there’s a
baseline accuracy that algorithms need to have to work in business settings. Deep Learning is finally enabling
us to cross that line in places we weren’t able to before.
Computer vision is a great example of a task that Deep Learning has transformed into something realistic for
business applications. Using Deep Learning to classify and label images isn’t only better than any other
traditional algorithms: it’s starting to be better than actual humans.
COMPUTER VISION
Computer Vision is the process of using machines to understand and analyze imagery (both photos and
videos). The above mentioned techniques can be used even for photos and videos as the datasets and can be
trained by using the similar models. For any computations involving visual content – that means images, videos,
icons, and anything else with pixels, the computer vision is used.
For Example: Video motion analysis - to estimate the velocity of objects in a video, or the camera itself.

Fig. 1.1.4 – Computer Vision

WORKING:
The main aim of computer vision is to make the model behave similar to the human brain in identification of the
images or any other pixelated objects. Machines interpret images very simply: as a series of pixels, each with
their own set of color values. Consider the simplified image below, and how grayscale values are converted into
a simple array of numbers:

12
Fig. 1.1.5 – Pixelated BW image
In the above image, the different squares are mentioned with the numbers which represents the pixel number of
the range 0-255.

Fig. 1.1.6 – Pixelated Colored image

If colours are present in the image, the pixel numbering becomes difficult as it is stored as RGB values (series
of 3 values). As each colour value is stored in 8 bits, there is the need of large memory to hold for one particular
image and the number of iterations also increase to obtain the accuracy of the prediction.
Examples: 1. Facebook and other social media companies are using computer vision to identify people in
photos, and do a number of things with that information. For eg: for tagging of the people in the particular image.
2. Much of diagnosis is image processing, like reading x-rays, MRI scans, and other types of
diagnostics.

13
Fig. 1.1.7 – RGB Pixels

14
CHAPTER 1.2 – LITERATURE SURVEY

1. Annina Simon, Mahima Singh Deo, S. Venkatesan and D. R. Ramesh Babu – “An Overview of Machine
Learning and its Applications” – International Journal of Electrical Sciences & Engineering – Volume 1, Issue 1;
2015
A brief introduction on Machine Learning techniques and its types showed many applications where these
techniques are used.

2. Nada Elgendy and Ahmed Elragel – “Big Data Analytics: A Literature review” – Conference paper in Lecture
Notes in Computer science – August 2014.
This paper gives an overview of how the huge data is stored and analyzed using some of the techniques
along with many real time applications.

3. https://algorithmia.com/blog/introduction-to-computer-vision
A brief introduction on computer vision which gives an overview of what computer vision is and its
working along with examples.

4. Wajahat Kazmi, Ian Nabney, George Vogiatizis, Peter Rose and Alexander Codd – “Vehicle tyre detection and
text recognition using deep learning” – 15th IEEE Conference on Automation Science and Engineering – 2015.
It gives a complete idea on how the Deep learning techniques are used for the text recognition on the
vehicle tyres.

15
CHAPTER 1.3 - OBJECTIVE OF THE WORK

PROBLEM STATEMENT:
 Extracting text from tyre images.
CHALLENGES:
 Curved text on the tyres.
 Black on black text (engraved text) on the tyre.
USE-CASES:
 Warranty and Claim journey, where user click picture of tyre and our algorithm will extract text on tyre
and predict SKU name (Product name) and Serial number (Manufacturing date) which helps in
completing the process seamless because as a customer they don’t understand what text written on tyre.
 First tyre quality check:
In plant, before doing mass production of any tyre we do first tyre check where we examine text printed
on tyre is correct or not. Previously they were doing manual check but now with this algorithm we only
need to click picture and automatically our algorithm will tell what text is printed on tyre.

16
CHAPTER 2 - CONVENTIONAL METHOD DESCRIPTION
The production phase is one of the major process in the mass manufacturing of the tyres. Before the mass
production process, the inspection of the tyre is to be done. The inspection process consists of scrutinizing if the
lettering done during manufacturing. This is a very tedious process as the letters and the background was same in
color (black on black). Conventionally, the inspection process was done manually leading to errors, less accuracy
and wastage of money and time.
The inspection process done manually would also lead to errors if they missed to check SKU name (product
name) and serial number (manufacturing date) on the tyres which would cause huge loss for mass production.

Fig. 2.1 – Labelled Image

Fig. 2.2 – Image indicating the unique size and pattern (FUELSSMART) of a 4-wheeler

17
CHAPTER 3 - DETAILNG OF WORK CARRIED OUT
To make the inspection process accurate and to use the machine learning and deep learning techniques, firstly the
data should be labelled and extracted. This is the most important step as the model selection, training and testing
all the further procedures depends on the dataset. Here, the datasets are divided into training and testing datasets.
 Training dataset -
The model is trained using the training dataset until the predicted outcome is obtained. These are the
datasets the model has an experience with.
 Testing dataset -
The model after training, it is tested with the datasets which are not experienced by the model. This
outcome reveals about the efficiency of the trained model
Usually 20% of the whole dataset is referred as the testing dataset. The number of iterations that has to be carried
out to train the model depends on how accurately the datasets are obtained, cleaned and extracted. In this project,
the images are captured by the camera then cropped to the required size. As the images of tyres are curved, there
must be an extraction stage where only the texts on the tyres are extracted. This process was carried on by the
following steps.
 Numbering the images(data):
The images are numbered in the ascending order in the batches or parts as there are huge datasets present.
 Referencing it in numerical order in the Excel spreadsheet:
It helps in easy identification of the particular image and any necessary corrections can be easily made
 Labelling the accuracy of the images:
The percentage of correctness, mentioning whether the text seen in the image and the declared text in
Excel sheet is correct or not is done to know whether any further corrections are needed or not.
This leads to the creation of training and testing datasets

Fig. 3.1 – Numbered tyre images


18
Fig. 3.2 – Labelled tyre images

EXTRACTION OF TEXT FROM THE IMAGES:


Extraction of the particular text on the image is very much necessary for the processing of the image using
Python programming language, where the embossed text is clearly where the main objective is to convert raw
image where text would be different color and background is different color which is very important for the
model building process
.

19
Fig. 3.3 – Tool used for extraction of text

OPERATION OF THE TOOL


Each lower case character inside the parenthesis is the short key of the corresponding button. Note that the short
keys are only available when using English input method with lower case mode.
o “b” and “f”: moving to previous or next image, respectively. Saving current states.
o “r”: changing labeling mode. (Rectangle or quadrangle)
o “d”: selecting a instance (or using \Ctrl" or \Shift" to select multiple instances) first, and using this button
to delete.
o “Mouse right-click”: save current states and cancel current operation.
o “Mouse left-click”: labeling the text region. Rectangle: two clicks. Quadrangle: four clicks. Curved:
explain below.
o “Double mouse left-click”: select the instance, and using this key to label the content. (“Esc” can be used
to cancel)
o “Mouse wheel”: zoom in or zoom out the image. “Ctrl”+”Mouse wheel” or
“Shift”+”Ctrl”+”Mouse wheel” can be faster. Because bilinear interpolation is used to scale the image,

20
resizing too many times may infect the localization accuracy. In such case, simply reentering the same
image can recover the original resolution.
o “Arrow keys”: move the canvas. Only available when image is very large.
o “t”: select the instance, and using this key to _nd the localization of the instance. One can also use the
coordinate of the _rst point and color information to find the box.
SYSTEM
Currently, the labeling tool can be used in Windows and Ubuntu operating systems:
o Windows label tool is for windows system.
o Ubuntu label tool is available on Ubuntu 14.04 and 16.04. Remember to use “chmod” command to change
the permission of binary file.
RULES FOR EXTRACTION OF TEXT:
 Spaces should not be neglected and the correct text should be entered while labelling.
 Quotes should not be added in between quotes,
Example: Correct: “ZOOM”
Wrong: “ZOO”M”
 SKU name = Size + Pattern
 Properties of Text In Images
 Texts usually have different appearance due to changes in font, size, style, orientation, alignment, texture,
colour, contrast, and background. These changes will make the problem of automatic text extraction
complicated and difficult. Text in images exhibit variations due to the difference in the following
properties:
 Size: The size of text may vary a lot.
 Alignment: Scene text may be aligned in any direction and have geometric distortions while caption text
usually aligned horizontally and sometimes may appear as non-planar text.
 Color: The characters tend to have same or similar color but low contrast between text and background
makes text extraction difficult.
 Edge: Most caption and scene texts are designed to be easily read, hence resulting in strong edges at the
boundaries of text and background.
 Compression: Many images are recorded, transferred, and processed in compressed format. Thus, a faster
text extraction system can be achieved if one can extract text without decompression.
 Distortion: Due to changes in camera angles, some text may carry perspective distortions that affect
extraction performance

21
Fig. 3.4 – Example of Extraction of text

CHAPTER 4 - TESTING, RESULTS AND ANALYSIS


The training data and the testing data is labelled and annotated. The learning is done by iterating the data to the
model until the expected results are obtained. The Supervised learning or unsupervised learning are being used.
Using the computer vision algorithms, the text is extracted and is confirmed before the mass production.
If the data fed to the model (algorithm) gives the expected results, then the training of the machine can be
stopped, then can be taken for the testing process where we get the confirmation of the trained model on the
accuracy of the output.
If the data fed to the model (algorithm) does not give the expected results, then the training of the machine
should be continued i.e., by feeding the data into the model and continuing until the predicted results are
obtained. Later the testing process can be done where we get the confirmation of the trained model on the
accuracy of the output.

Fig. 4.1 – Flow Chart of training and testing

ANALYSIS:
 The successful extraction of the texts on the tyres before mass production process. Thus, reducing the
errors in the production process.
 The warranty and claiming by the customers is to be carried out, which is easier for the customer to
know the date of manufacturing and the product name.
22
CHAPTER 5 – CONCLUSION
CEAT limited, one of the top most company in the manufacturing sector gave an opportunity to learn and work
on the real time application.
The 1st stage of the training the model, the data cleaning and the annotations task helped me to give an overview
of the learning process and techniques used.
Using the machine learning and deep learning techniques, the manual inspection process is converted to
automated inspection process became very less time consuming and the process became very accurate leading to
drastic decrease in the wastage. And also decrease in the usage of man power.

23
REFERENCES

1. Annina Simon, Mahima Singh Deo, S. Venkatesan and D. R. Ramesh Babu – “An Overview of Machine
Learning and its Applications” – International Journal of Electrical Sciences & Engineering – Volume 1, Issue 1;
2015

2. Nada Elgendy and Ahmed Elragel – “Big Data Analytics: A Literature review” – Conference paper in
Lecture Notes in Computer science – August 2014.

3. https://algorithmia.com/blog/introduction-to-computer-vision

4. Wajahat Kazmi, Ian Nabney, George Vogiatizis, Peter Rose and Alexander Codd – “Vehicle tyre detection
and text recognition using deep learning” – 15th IEEE Conference on Automation Science and Engineering –
2015.

5. https://www.edureka.co/blog/what-is-data-analytics/

24

You might also like