You are on page 1of 22

REPORT OF INDUSTRIAL TRAINING

Undergone at

A TRANING REPORT
ON

MACHINE LEARNING USING PYTHON

Submitted by

NAME: RAHIL BHOWAL

REG NO. RA1711003010386

In partial fulfilment of requirements of

15CS390L – Industrial Training

Under the guidance of Mrs Mousita Dhar(Sr. Technical Consultant)

Department of Computer Science and Engineering


Faculty of Engineering and Technology
SRM Institute of Science and Technology,
Kattankulathur – 603203.
JUNE-2019.
ACKNOWLEDGMENT
It is a matter of great pleasure and privilege for me to present this report of industrial training.
Through this report, I would like to thank numerous people for their guidance and support in
making this report.

I would like to thank Mrs Mousita Dhar(Senior Technical Consultant) at Webtek Labs Pvt
Ltd. I would also like to thank Dr B.Amutha, HOD-CSE who gave encouragement and
continued support throughout the training. Next, I would like to thank Mrs P.Visalakshi
Academic Adviser, to experience a sustained interest from her side.

I would also like to thank Mr M.Karthikeyan, Faculty Advisor for the continued guidance and
support he provided.

RAHIL BHOWAL

(RA1711003010386)

CSE

3
INDEX

CHAPTER NO. TITLE PAGE NO.

BONAFIDE CERTIFICATE 2

ACKNOWLEGMENT 3

1. INTRODUCTION ABOUT THE INDUSTRY 5

2. TRAINING SCHEDULE 6

3 OBSERVATIONS/WORK DONE 8

4 PROJECT HANDLED 20

5 LEARNING AFTER TRAINING 21

6 SUMMARY 22
CHAPTER-1

INTRODUCTION ABOUT THE


INDUSTRY

WebTek Labs Pvt. Ltd. is recognized as a leading IT solution providing organization with a
dynamic and fast growing team of diversely talented individuals. In corporate in 2001, aim is
to provide the best talent, we initially started with Recruitment & Staffing Services. We
paralleled this by providing knowledge and skill development certification programmes.
Webtek Certified Tester(WCT) Program that aims to provide IT companies trained software
Testers has reached soaring heights of recognition over the years.

WebTek Labs is leading independent outsourced software testing service provider


headquartered in New Delhi, India. Our broad spectrum of testing services caters to the
current and upcoming needs of the customers across diversified industry verticals such as
technology, healthcare, banking and financial, telecom, manufacturing and retail. Our key
strength lies in our continuous focus towards creating a benchmark delivery to the customers
by using our dedicated resource pool of expertise. The core line of activity at Webtek Labs is
to develop customized application software covering the entire responsibility of performing
the initial system study, design, development, implementation and training.

Website: http://www.webteklabs.com

Industry: Computer Software

Company size: 51-200 employees

Type: Privately Held

Founded :2001

Specialties: Testing Services, Recruitment, and Training

5
CHAPTER-2

TRAINING SCHEDULE
During my 24 days training period at WebTek Labs Pvt Ltd, we were introduced to machine
learning and its applications in various fields. The main purpose was to first make us learn
machine learning and then I was given to handle many projects in image processing as well
as machine learning. The detailed schedule of the training is as follows:

WEEK 1:

In the first week of my training I was given the basics of Python and Machine Learning. I
was explained about the difference between Artificial Intelligence (AI),Deep Learning and
Machine Learning and then a thorough introduction to python and the use of python in
Machine Learning. Based on my understanding I also made a game using pygame module in
python. The overview of the first week is as follows:

 Introduction to Python
 Uses of Python
 Basics in Python
 What is Machine Learning?
 Machine Learning Methods
 Supervised Learning
 Unsupervised Learning
 Machine Learning Packages in Python
i. Numpy
ii. Pandas
iii. Scikit-learn
iv. Matplotlib
v. Seaborn
6
WEEK 2:

The second week of training after the first week of basics I started working on the data pre
processing and regression algorithms. Data preprocessing is a very important step before any
data processing or the analysis of data. This involves missing values, categorical variable,
dummy values, label encoding, scaling,etc. Also linear regression, multiple regression on
various datasets. The overview of week 2 is:

 Data Preprocessing
 Overfitting and the bias variance tradeoff
 Combining models
 Regularization
 Linear Regression
 Multiple Regression

WEEK 3:

The third week of training was really interesting. We were introduced to classification
algorithms such as K-Nearest Neighbours(KNN), Logistic Regression, SVM (Support Vector
Machine),etc. We were also introduced to image processing using OpenCV on which I made
the later project.

 Classification
i. KNN
ii. Logistic Regression
iii. SVM(Support Vector Machine)
 Opencv
 Final Project

21-24 June’2019:

This final days of my training I was asked to submit a project for the internship completion. I
chose image processing the field on which I wanted to make my final project.

7
CHAPTER-3 OBSERVATIONS
MACHINE LEARNING:-

Machine Learning is the field of study that gives computers the ability to learn without being
explicitly programmed. Machine learning process is an approach to developing artificial
intelligence. It is a method of data analysis that automates analytical model building.
Machine learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. The process of learning begins with observations or data, such as examples,
direct experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim is to allow
the computers learn automatically without human intervention or assistance and adjust
actions accordingly.

Broadly, there are 3 types of Machine Learning Algorithms

1. Supervised Learning

This algorithm consist of a target / outcome variable (or dependent variable) which is to be
predicted from a given set of predictors (independent variables). Using these set of variables, we
generate a function that map inputs to desired outputs. The training process continues until the
model achieves a desired level of accuracy on the training data. Examples of Supervised
Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.

2. Unsupervised Learning

In this algorithm, we do not have any target or outcome variable to predict / estimate. It is used
for clustering population in different groups, which is widely used for segmenting customers in
different groups for specific intervention. Examples of Unsupervised Learning: Apriori
algorithm, K-means.

3. Reinforcement Learning:

In this algorithm, the machine is trained to make specific decisions. It works this way: the
machine is exposed to an environment where it trains itself continually using trial and error. This
machine learns from past experience and tries to capture the best possible knowledge to make
accurate business decisions. Example of Reinforcement Learning: Self Driving Cars
SUPERVISED LEARNING:-

 Regression
 Classification

REGRESSION

This is a type of problem where we need to predict the continuous-response value. A


continuous output variable is a real-value, such as an integer or floating point value. These
are often quantities, such as amounts and sizes. For example, what is the price of house in a
specific city? Or what is the value of the stock? or how many total runs can be on board in a
cricket game? A regression problem requires the prediction of a quantity. A regression can
have real valued or discrete input variables. A problem with multiple input variables is often
called a multivariate regression problem.

CLASSIFICATION

This is a type of problem where we predict the categorical response value where the data can
be separated into specific “classes”. The output variables are often called labels or categories.
The mapping function predicts the class or category for a given observation. For example, an
email of text can be classified as belonging to one of two classes: “spam“ and “not spam“. A
face is human face or animal face. A classification problem requires that examples be
classified into one of two or more classes. A classification can have real-valued or discrete
input variables. A problem with two classes is often called a two-class or binary classification
problem. A problem with more than two classes is often called a multi-class classification
problem. A problem where an example is assigned multiple classes is called a multi-label
classification problem.

9
UNSUPERVISED LEARNING

Unsupervised Learning is a class of Machine Learning techniques to find the patterns in data.
The program is given a bunch of data and must find patterns and relationships therein. The
training data does not include Targets here so we don’t tell the system where to go, the
system has to understand itself from the data we give.

Eg - In ecommerce site all the products are grouped together on their category basis.Identify
bowlers and batsman by a dataset given in cricket, where runs and wicket taken is given, we
need to group bowlers and batsman based on that.

TYPES OF UNSUPERVISED LEARNING

Clustering algorithms will run through your data and find these natural clusters if they exist.
For your customers, that might mean one cluster of 30-something artists and another of dog
owning millennials. You can typically modify how many clusters your algorithms looks for,
which lets you adjust the granularity of these groups. There are a few different types of
clustering you can utilize:

K-Means Clustering – clustering your data points into a number (K) of mutually exclusive
clusters. A lot of the complexity surrounds how to pick the right number for K.

Hierarchical Clustering – clustering your data points into parent and child clusters. You
might split your customers between younger and older ages, and then split each of those
groups into their own individual clusters as well.

Probabilistic Clustering – clustering your data points into clusters on a probabilistic scale. K-
Means is a special case where the probabilities are always either 0 or 1. This is also
sometimes called Fuzzy K-Means.

10
SUPERVISED VS UNSUPERVISED

In supervised learning, the system tries to learn from the previous examples that are given.
(On the other hand, in unsupervised learning, the system attempts to find the patterns directly
from the example given.) So if the dataset is labelled it comes under a supervised problem, it
the dataset is unlabelled then it is an unsupervised problem.

MACHINE LEARNING PACKAGES IN PYTHON:-

Python has many modules to do different jobs. There are some module mainly used in the
case of machine learning and we are going to used those. Below are the examples of those
packages or module

 NumPy (used for Numeric calculation)


 Pandas (used for Data analysis)
 Sklearn (used for scientific algorithm)
 Matplotlib(used for graph plot)
 Seaborn(used for 3D graph plot)

NUMPY

 NumPy’s main object is the homogeneous multidimensional array.


 It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.
 In NumPy dimensions are called axes.
 The number of axes is rank.

11
Example

# importing the package

import numpy as np #create the array

a=np.array([4,6,7,8])

print(a)

# Printing array dimensions (axes)

print("No. of dimensions: ", arr.ndim)

# Printing shape of array

print("Shape of array: ", arr.shape)

# Printing size (total number of elements) of array

print("Size of array: ", arr.size)

# Printing type of elements in array

print("Array stores elements of type: ", arr.dtype)

PANDAS

Create Series:-

pandas.Series( data, index, dtype, copy)


A pandas series can be created as where, data takes various forms like ndarray, list, constants

import pandas as pd

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first element

print s[0]

12
import pandas as pd

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first three element

print s[:3]

Create DataFrame

A pandas DataFrame can be created using various inputs like –

 Lists
 dict
 Series
 Numpy ndarrays

# creating dataframe from dictionary

data = {'Name':[‘mou', ‘abc', 'Sush', ‘rishi'],'Age':[45,78,23,45]}

Df=pd.DataFrame(data)

Df=pd.DataFrame(data,index=[‘rank1’,’rank2’,’rank3’,’rank4’]

MATPLOTLIB

Matplotlib is a 2D library which produces a publication quality images in a variety of


hardcopy formats and interactive environments. It is used to make histograms, scatter plots,
bar plots,etc.

from matplotlib import pyplot as plt

x = [5,2,7]

y = [2,16,4]

13
plt.plot(x,y)

plt.title('Info')

plt.ylabel('Y axis')

plt.xlabel('X axis')

plt.show()

Data of Machine learning:-

Data are the building blocks for any machine learning algorithm. The data are divided into
two parts

 Training data
 Test data
The training set contains a known output and the model learns on this data in order to
be generalized to other data later on. We have the test dataset (or subset) in order to
test our model’s prediction on this subset.
14
Data Pre- processing:-

 Pre-processing refers to the transformations applied to our data before feeding it to the
algorithm.
 Data Preprocessing is a technique that is used to convert the raw data into a clean data
set. In other words, whenever the data is gathered from different sources it is collected
in raw format which is not feasible for the analysis.
 For achieving better results from the applied model in Machine Learning projects the
format of the data has to be in a proper manner. Some specified Machine Learning
model needs information in a specified format, for example, Random Forest algorithm
does not support null values, therefore to execute random forest algorithm null values
have to be managed from the original raw data set.

Most commonly used pre processing techniques are very few like –

 cleaning,
 missing value imputation,
 encoding categorical variables,
 scaling, etc.

Regression:-

Regression is a method of modelling a target value based on independent predictors. This


method is mostly used for forecasting and finding out cause and effect relationship between
variables. Regression techniques mostly differ based on the number of independent variables
and the type of relationship between the independent and dependent variables. There are two
type of regression

 Simple Linear Regression


 Multiple Linear Regression

15
Simple Linear Regression:-

Simple linear regression is a type of regression analysis where the number of independent
variables is one and there is a linear relationship between the independent(x) and
dependent(y) variable. The red line in the above graph is referred to as the best fit straight
line. Based on the given data points, we try to plot a line that models the points the best. The
line can be modelled based on the linear equation shown above.

y = a_0 + a_1 * x ## Linear Equation

This is a line where y is the output variable we want to predict, x is the input variable we
know and a_0 and a_1 are coefficients that we need to estimate that move the line around.

Technically, a_0 is called the intercept because it determines where the line intercepts the y-
axis. In machine learning we can call this the bias, because it is added to offset all predictions
that we make. The a_1 term is called the slope because it defines the slope of the line or how
x translates into a y value before we add our bias.

The goal is to find the best estimates for the coefficients to minimize the errors in predicting
y from x.We can start off by estimating the value for a_1 as:

a_1 = sum((xi-mean(x)) * (yi-mean(y))) / sum((xi – mean(x))^2)

16
Multiple Linear Regression:-

The multiple linear regression explains the relationship between one dependent variable(y)
and two or more independent variables (x1, x2, x3… etc).

The ways of feature selection are:

 Forward Selection
 Backward Elimination
 Bidirectional Elimination

Machine learning classifiers

Classification is the process of predicting the class of given data points. Classes are
sometimes called as targets/ labels or categories. Classification predictive modeling is the
task of approximating a mapping function (f) from input variables (X) to discrete output
variables (y).

For example, spam detection in email service providers can be identified as a classification
problem. This is s binary classification since there are only 2 classes as spam and not spam. A
classifier utilizes some training data to understand how given input variables relate to the
class. In this case, known spam and non-spam emails have to be used as the training data.
When the classifier is trained accurately, it can be used to detect an unknown email.

KNN (K Nearest Neighbour) classifier:-

KNN can be used for both classification and regression predictive problems. However, it is
more widely used in classification problems in the industry. To evaluate any technique we
generally look at 3 important aspects:

 Ease to interpret output


 Calculation time
 Predictive Power

17
KNN algorithm

Let’s take a simple case to understand this algorithm. Following is a spread of red circles
(RC) and green squares (GS) :

You intend to find out the class of the blue star (BS) . BS can either be RC or GS and nothing
else. The “K” is KNN algorithm is the nearest neighbors we wish to take vote from. Let’s say
K = 3. Hence, we will now make a circle with BS as center just as big as to enclose only three
datapoints on the plane. Refer to following diagram for more details:

Logistic Regression

Logistic Regression is a Machine Learning classification algorithm that is used to predict the
probability of a categorical dependent variable. In logistic regression, the dependent variable
is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In
other words, the logistic regression model predicts P(Y=1) as a function of X.

18
OPEN CV

OpenCV (Open Source Computer Vision Library) is released under a BSD license and hence
it’s free for both academic and commercial use. It has C++, Python and Java interfaces and
supports Windows, Linux, Mac OS, iOS and Android. The library has more than 2500
optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art
computer vision and machine learning algorithms.

19
CHAPTER-4 PROJECT HANDLED
Title: Livefeed Multicolor Detection Using OpenCV

This project works with the help of a livefeed camera. The aim of the project is used to find
the necessary contours or colors in real time with the use of opencv functions. The project
was successful in removing the noise from the real time and process it real quick to detect
skin color, red color and blue color in real environment. This project aims to be a interactive
environment in the future where this may serve as the backend to help the visually impaired
or the color blind to differentiate between the colors by just using a single camera. This
project in its real time is quick and can can detect contours in all cases.

Title: Lane detection Using Opencv

This is a second project done on opencv in my training period. In this project I had to detect
the lane or the road in which the vehicle is travelling at real time. After detection of roads I
had to find the vehicular position with respect to the road. This project can be further
developed into a driver alertness measure where if the car is going out of its lane it can alert
the driver also it can be helped un mobile robotics where the vehicular position can be again
brought into the center using PID control

20
CHAPTER-5 LEARNING AFTER
TRAINING
After the much needed training in machine learning I moved on to learning DEEP
LEARNING(DL) and have grown a much interest in Image Processing. After the training I
have decided to apply the different algorithms like YOLO ,SSD, MobileNet. Then move onto
the broader part AI(Artificial Intelligence). Artificial Intelligence is the broader concept of
machines being able to carry out tasks in a way that we would consider “smart”. Artificial
intelligence, sometimes called machine intelligence, is intelligence demonstrated by
machines, in contrast to the natural intelligence displayed by humans and other animals. In
my near future I would like to research on Driver less cars, and underwater Robotics.

I want to further study the different concepts of Computer Vision and make further projects
on this topic.

21
CHAPTER-6 SUMMARY
This project will result in the acquisition of theoretical and practical knowledge about the
subject and about the technology toward the solution of problems related to the data and
operation or the automation of the system in a much wider perspective using Machine
Learning. The machine learning subject has changed all over the users. Because of new
computing technologies, machine learning today is not like machine learning of the past. It
was born from pattern recognition and the theory that computers can learn without being
programmed to perform specific tasks; researchers interested in artificial intelligence wanted
to see if computers could learn from data. The iterative aspect of machine learning is
important because as models are exposed to new data, they are able to independently adapt.
They learn from previous computations to produce reliable, repeatable decisions and results.
It’s a science that’s not new – but one that has gained fresh momentum.

While many algorithms are there but still there is a need to process such large data produced
daily and further research in this topic can lead to much development. We can see some
publicized examples of machine learning applications:

 Self Driving Car


 Recommendation System
 Email spam filtering,etc.

Hence the use of machine learning is much visible in our daily life and its use has make our
lifes easy. The following project “Live feed Multicolor detection” can serve as a interactive
product or as a learning tool for the toddlers to learn or as a assistance tool for the color blind
and the visually impaired people .

22

You might also like