Professional Documents
Culture Documents
Undergone at
A TRANING REPORT
ON
Submitted by
I would like to thank Mrs Mousita Dhar(Senior Technical Consultant) at Webtek Labs Pvt
Ltd. I would also like to thank Dr B.Amutha, HOD-CSE who gave encouragement and
continued support throughout the training. Next, I would like to thank Mrs P.Visalakshi
Academic Adviser, to experience a sustained interest from her side.
I would also like to thank Mr M.Karthikeyan, Faculty Advisor for the continued guidance and
support he provided.
RAHIL BHOWAL
(RA1711003010386)
CSE
3
INDEX
BONAFIDE CERTIFICATE 2
ACKNOWLEGMENT 3
2. TRAINING SCHEDULE 6
3 OBSERVATIONS/WORK DONE 8
4 PROJECT HANDLED 20
6 SUMMARY 22
CHAPTER-1
WebTek Labs Pvt. Ltd. is recognized as a leading IT solution providing organization with a
dynamic and fast growing team of diversely talented individuals. In corporate in 2001, aim is
to provide the best talent, we initially started with Recruitment & Staffing Services. We
paralleled this by providing knowledge and skill development certification programmes.
Webtek Certified Tester(WCT) Program that aims to provide IT companies trained software
Testers has reached soaring heights of recognition over the years.
Website: http://www.webteklabs.com
Founded :2001
5
CHAPTER-2
TRAINING SCHEDULE
During my 24 days training period at WebTek Labs Pvt Ltd, we were introduced to machine
learning and its applications in various fields. The main purpose was to first make us learn
machine learning and then I was given to handle many projects in image processing as well
as machine learning. The detailed schedule of the training is as follows:
WEEK 1:
In the first week of my training I was given the basics of Python and Machine Learning. I
was explained about the difference between Artificial Intelligence (AI),Deep Learning and
Machine Learning and then a thorough introduction to python and the use of python in
Machine Learning. Based on my understanding I also made a game using pygame module in
python. The overview of the first week is as follows:
Introduction to Python
Uses of Python
Basics in Python
What is Machine Learning?
Machine Learning Methods
Supervised Learning
Unsupervised Learning
Machine Learning Packages in Python
i. Numpy
ii. Pandas
iii. Scikit-learn
iv. Matplotlib
v. Seaborn
6
WEEK 2:
The second week of training after the first week of basics I started working on the data pre
processing and regression algorithms. Data preprocessing is a very important step before any
data processing or the analysis of data. This involves missing values, categorical variable,
dummy values, label encoding, scaling,etc. Also linear regression, multiple regression on
various datasets. The overview of week 2 is:
Data Preprocessing
Overfitting and the bias variance tradeoff
Combining models
Regularization
Linear Regression
Multiple Regression
WEEK 3:
The third week of training was really interesting. We were introduced to classification
algorithms such as K-Nearest Neighbours(KNN), Logistic Regression, SVM (Support Vector
Machine),etc. We were also introduced to image processing using OpenCV on which I made
the later project.
Classification
i. KNN
ii. Logistic Regression
iii. SVM(Support Vector Machine)
Opencv
Final Project
21-24 June’2019:
This final days of my training I was asked to submit a project for the internship completion. I
chose image processing the field on which I wanted to make my final project.
7
CHAPTER-3 OBSERVATIONS
MACHINE LEARNING:-
Machine Learning is the field of study that gives computers the ability to learn without being
explicitly programmed. Machine learning process is an approach to developing artificial
intelligence. It is a method of data analysis that automates analytical model building.
Machine learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. The process of learning begins with observations or data, such as examples,
direct experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim is to allow
the computers learn automatically without human intervention or assistance and adjust
actions accordingly.
1. Supervised Learning
This algorithm consist of a target / outcome variable (or dependent variable) which is to be
predicted from a given set of predictors (independent variables). Using these set of variables, we
generate a function that map inputs to desired outputs. The training process continues until the
model achieves a desired level of accuracy on the training data. Examples of Supervised
Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.
2. Unsupervised Learning
In this algorithm, we do not have any target or outcome variable to predict / estimate. It is used
for clustering population in different groups, which is widely used for segmenting customers in
different groups for specific intervention. Examples of Unsupervised Learning: Apriori
algorithm, K-means.
3. Reinforcement Learning:
In this algorithm, the machine is trained to make specific decisions. It works this way: the
machine is exposed to an environment where it trains itself continually using trial and error. This
machine learns from past experience and tries to capture the best possible knowledge to make
accurate business decisions. Example of Reinforcement Learning: Self Driving Cars
SUPERVISED LEARNING:-
Regression
Classification
REGRESSION
CLASSIFICATION
This is a type of problem where we predict the categorical response value where the data can
be separated into specific “classes”. The output variables are often called labels or categories.
The mapping function predicts the class or category for a given observation. For example, an
email of text can be classified as belonging to one of two classes: “spam“ and “not spam“. A
face is human face or animal face. A classification problem requires that examples be
classified into one of two or more classes. A classification can have real-valued or discrete
input variables. A problem with two classes is often called a two-class or binary classification
problem. A problem with more than two classes is often called a multi-class classification
problem. A problem where an example is assigned multiple classes is called a multi-label
classification problem.
9
UNSUPERVISED LEARNING
Unsupervised Learning is a class of Machine Learning techniques to find the patterns in data.
The program is given a bunch of data and must find patterns and relationships therein. The
training data does not include Targets here so we don’t tell the system where to go, the
system has to understand itself from the data we give.
Eg - In ecommerce site all the products are grouped together on their category basis.Identify
bowlers and batsman by a dataset given in cricket, where runs and wicket taken is given, we
need to group bowlers and batsman based on that.
Clustering algorithms will run through your data and find these natural clusters if they exist.
For your customers, that might mean one cluster of 30-something artists and another of dog
owning millennials. You can typically modify how many clusters your algorithms looks for,
which lets you adjust the granularity of these groups. There are a few different types of
clustering you can utilize:
K-Means Clustering – clustering your data points into a number (K) of mutually exclusive
clusters. A lot of the complexity surrounds how to pick the right number for K.
Hierarchical Clustering – clustering your data points into parent and child clusters. You
might split your customers between younger and older ages, and then split each of those
groups into their own individual clusters as well.
Probabilistic Clustering – clustering your data points into clusters on a probabilistic scale. K-
Means is a special case where the probabilities are always either 0 or 1. This is also
sometimes called Fuzzy K-Means.
10
SUPERVISED VS UNSUPERVISED
In supervised learning, the system tries to learn from the previous examples that are given.
(On the other hand, in unsupervised learning, the system attempts to find the patterns directly
from the example given.) So if the dataset is labelled it comes under a supervised problem, it
the dataset is unlabelled then it is an unsupervised problem.
Python has many modules to do different jobs. There are some module mainly used in the
case of machine learning and we are going to used those. Below are the examples of those
packages or module
NUMPY
11
Example
a=np.array([4,6,7,8])
print(a)
PANDAS
Create Series:-
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print s[0]
12
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print s[:3]
Create DataFrame
Lists
dict
Series
Numpy ndarrays
Df=pd.DataFrame(data)
Df=pd.DataFrame(data,index=[‘rank1’,’rank2’,’rank3’,’rank4’]
MATPLOTLIB
x = [5,2,7]
y = [2,16,4]
13
plt.plot(x,y)
plt.title('Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
Data are the building blocks for any machine learning algorithm. The data are divided into
two parts
Training data
Test data
The training set contains a known output and the model learns on this data in order to
be generalized to other data later on. We have the test dataset (or subset) in order to
test our model’s prediction on this subset.
14
Data Pre- processing:-
Pre-processing refers to the transformations applied to our data before feeding it to the
algorithm.
Data Preprocessing is a technique that is used to convert the raw data into a clean data
set. In other words, whenever the data is gathered from different sources it is collected
in raw format which is not feasible for the analysis.
For achieving better results from the applied model in Machine Learning projects the
format of the data has to be in a proper manner. Some specified Machine Learning
model needs information in a specified format, for example, Random Forest algorithm
does not support null values, therefore to execute random forest algorithm null values
have to be managed from the original raw data set.
Most commonly used pre processing techniques are very few like –
cleaning,
missing value imputation,
encoding categorical variables,
scaling, etc.
Regression:-
15
Simple Linear Regression:-
Simple linear regression is a type of regression analysis where the number of independent
variables is one and there is a linear relationship between the independent(x) and
dependent(y) variable. The red line in the above graph is referred to as the best fit straight
line. Based on the given data points, we try to plot a line that models the points the best. The
line can be modelled based on the linear equation shown above.
This is a line where y is the output variable we want to predict, x is the input variable we
know and a_0 and a_1 are coefficients that we need to estimate that move the line around.
Technically, a_0 is called the intercept because it determines where the line intercepts the y-
axis. In machine learning we can call this the bias, because it is added to offset all predictions
that we make. The a_1 term is called the slope because it defines the slope of the line or how
x translates into a y value before we add our bias.
The goal is to find the best estimates for the coefficients to minimize the errors in predicting
y from x.We can start off by estimating the value for a_1 as:
16
Multiple Linear Regression:-
The multiple linear regression explains the relationship between one dependent variable(y)
and two or more independent variables (x1, x2, x3… etc).
Forward Selection
Backward Elimination
Bidirectional Elimination
Classification is the process of predicting the class of given data points. Classes are
sometimes called as targets/ labels or categories. Classification predictive modeling is the
task of approximating a mapping function (f) from input variables (X) to discrete output
variables (y).
For example, spam detection in email service providers can be identified as a classification
problem. This is s binary classification since there are only 2 classes as spam and not spam. A
classifier utilizes some training data to understand how given input variables relate to the
class. In this case, known spam and non-spam emails have to be used as the training data.
When the classifier is trained accurately, it can be used to detect an unknown email.
KNN can be used for both classification and regression predictive problems. However, it is
more widely used in classification problems in the industry. To evaluate any technique we
generally look at 3 important aspects:
17
KNN algorithm
Let’s take a simple case to understand this algorithm. Following is a spread of red circles
(RC) and green squares (GS) :
You intend to find out the class of the blue star (BS) . BS can either be RC or GS and nothing
else. The “K” is KNN algorithm is the nearest neighbors we wish to take vote from. Let’s say
K = 3. Hence, we will now make a circle with BS as center just as big as to enclose only three
datapoints on the plane. Refer to following diagram for more details:
Logistic Regression
Logistic Regression is a Machine Learning classification algorithm that is used to predict the
probability of a categorical dependent variable. In logistic regression, the dependent variable
is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In
other words, the logistic regression model predicts P(Y=1) as a function of X.
18
OPEN CV
OpenCV (Open Source Computer Vision Library) is released under a BSD license and hence
it’s free for both academic and commercial use. It has C++, Python and Java interfaces and
supports Windows, Linux, Mac OS, iOS and Android. The library has more than 2500
optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art
computer vision and machine learning algorithms.
19
CHAPTER-4 PROJECT HANDLED
Title: Livefeed Multicolor Detection Using OpenCV
This project works with the help of a livefeed camera. The aim of the project is used to find
the necessary contours or colors in real time with the use of opencv functions. The project
was successful in removing the noise from the real time and process it real quick to detect
skin color, red color and blue color in real environment. This project aims to be a interactive
environment in the future where this may serve as the backend to help the visually impaired
or the color blind to differentiate between the colors by just using a single camera. This
project in its real time is quick and can can detect contours in all cases.
This is a second project done on opencv in my training period. In this project I had to detect
the lane or the road in which the vehicle is travelling at real time. After detection of roads I
had to find the vehicular position with respect to the road. This project can be further
developed into a driver alertness measure where if the car is going out of its lane it can alert
the driver also it can be helped un mobile robotics where the vehicular position can be again
brought into the center using PID control
20
CHAPTER-5 LEARNING AFTER
TRAINING
After the much needed training in machine learning I moved on to learning DEEP
LEARNING(DL) and have grown a much interest in Image Processing. After the training I
have decided to apply the different algorithms like YOLO ,SSD, MobileNet. Then move onto
the broader part AI(Artificial Intelligence). Artificial Intelligence is the broader concept of
machines being able to carry out tasks in a way that we would consider “smart”. Artificial
intelligence, sometimes called machine intelligence, is intelligence demonstrated by
machines, in contrast to the natural intelligence displayed by humans and other animals. In
my near future I would like to research on Driver less cars, and underwater Robotics.
I want to further study the different concepts of Computer Vision and make further projects
on this topic.
21
CHAPTER-6 SUMMARY
This project will result in the acquisition of theoretical and practical knowledge about the
subject and about the technology toward the solution of problems related to the data and
operation or the automation of the system in a much wider perspective using Machine
Learning. The machine learning subject has changed all over the users. Because of new
computing technologies, machine learning today is not like machine learning of the past. It
was born from pattern recognition and the theory that computers can learn without being
programmed to perform specific tasks; researchers interested in artificial intelligence wanted
to see if computers could learn from data. The iterative aspect of machine learning is
important because as models are exposed to new data, they are able to independently adapt.
They learn from previous computations to produce reliable, repeatable decisions and results.
It’s a science that’s not new – but one that has gained fresh momentum.
While many algorithms are there but still there is a need to process such large data produced
daily and further research in this topic can lead to much development. We can see some
publicized examples of machine learning applications:
Hence the use of machine learning is much visible in our daily life and its use has make our
lifes easy. The following project “Live feed Multicolor detection” can serve as a interactive
product or as a learning tool for the toddlers to learn or as a assistance tool for the color blind
and the visually impaired people .
22