You are on page 1of 21

Submitted to :

Submitted BY: Mansi Bhati (15ESKCS737)


Contents
 About Company
 Python for machine learning
• Python libraries for machine learning
 Machine Learning
• Types of Machine Learning
 Regression Models for Machine Learning
 Classification Models for Machine Learning
 Project: Spam Sms Detection
• What is spam sms detection
• Webpage
• Wordcloud & Graph
 Future Scope
About Company
Forsk Techonologies works as a industry partner providing end to end
solution from lab session from content development execution of lab
assignment in the form of sub projects and evaluation of candidates.
The execution is focused on project base learning .It is dynamic class room
approach in which students actively explore real world problems and
challenge and acquire a deeper knowledge.
The mission of this model is to revolutionize the education and improving
the learning outcomes through data and technology backed practical
learning
Python for Machine Learning
 Python is a powerful high-level, object-oriented programming language
created by Guido van Rossum.
 Python is one of the fastest growing programming language after 2015.

Why Python?
 Python is fast as compared to other programming languages like Java C++.
 Unlike in Java ,C++ there is no need to compile python program before
running
 Python interpreter handles the compilation process in background
Python Libraries for Machine Learning
Some basic libraries in python specifically used in machine learning are:
• Pandas
• Numpy
• Matplotlib
• Sklearn
• NLTK
• OpenCv
What is Machine Learning
 Machine learning is a subset of artificial intelligence in the field of
computer science that often uses statistical techniques to give computers
the ability to "learn" (i.e., progressively improve performance on a specific
task) with data, without being explicitly programmed.

 Machine Learning can be divided into two types:


1. Supervised Machine Learning
2. Unsupervised Machine Learning
Types of Machine Learning
 Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs.[1] It infers a
function from labeled training data consisting of a set of training examples. In
supervised learning, each example is a pair consisting of an input object
(typically a vector) and a desired output value (also called the supervisory
signal).Some basic methods:
• Regression
• Classification
 Unsupervised machine learning is the machine learning task of inferring a
function that describes the structure of "unlabeled" data (i.e. data that has not
been classified or categorized).Here some clustering metods are used.
• Clustering
Regression Models for Supervised Learning.
 Regression is basically a statistical approach to find the relationship
between variables. In machine learning, this is used to predict the
outcome of an event based on the relationship between variables obtained
from the data-set.
 Some Regression models:
• Linear Regression
• Multiple Regression
• Polynomial Regression
• Decision Tree
• Random Forest
Classification Models of Supervised Learning
 In machine learning and statistics, classification is the problem of
identifying to which of a set of categories (sub-populations) a new
observation belongs, on the basis of a training set of data containing
observations (or instances) whose category membership is known.
 Some Classification Models:
• Logistic Regression
• KNN
• Decision Tree
• Random Forest
Linear Regression in Supervised Clustering in Unsupervised
Machine Learning Machine Learing

Classification in Supervised
Machine Learning
PROJECT

Connecting Hearts
What is Connecting Hearts
 The project addresses the data analysis for analyzing the basic dating
habits of an individual. The project is 'data centric' i.e. all of the analysis,
results and conclusions are based on the provided data .
 All the people involved in the competition play an important role in
finding out the basic ideas they have in mind while searching for a mate
and what are the odds of finding love as desired.
 Project is been created in 3 phases
o Data Pre-processing
o analysis
o Visualisation
Dataset
 The project "CONNECTING HEARTS " is a product of an analysis of a
dataset compiled by Columbia business school professors Ray Fisman and
Sheena Iyengar for their paper Gender Differences in Mate Selection:
Evidence from a Speed Dating Experiment.
The Dataset compiled gives us an overview of what the dating are all about
as it is collected by a very large scale survey in which all the participants
were asked to rate the six attributes:-Attractiveness, Sincerity, Intelligence,
Fun ,Ambition and shared interests
VISUALISATION :

What Are Participants Looking For in Their Matches


First, we’d like to see what do the participants in these speed dating
events look for in the opposite sex, and if there exist a difference for
male and female participants. At this point in time, the participants
have just signed up for the event and have not met anyone.
What Do Participants Think the Opposite Sex is
Looking For
Here we analyze what participants think their opposite sex
is looking for. We will able to see if there are any difference
in the expectations of men and women with regards to the
speed dating event.
HOW OFTEN DO THEY GO OUT

Here we analysed how often do they go out. We analysed how much they go out
Like some people go out like once a month, some go out several times a week while some people almost never go out.
IN WHICH ACTIVITY THEY ARE INTRESTED

Here we analyzed in which activity our participates are mostly interested .every participant were asked to rate
some mentioned activities out of 10. Visualization of this is given as
Conclusion
 In the dataset we have found that the general length of a spam is 145 to 160
characters , however exceptions were found.
 The general word count of sms was found to be 22 to 34 words however
exceptions were found.
 Some general keywords like Free,Off,Call,Now,Win and many more were
found after applying natural language processing.

Confusion Matrix (By Length) Confusion Matrx (By Confusion Matrix (NLTK)
Word Count)
Future Scope
 In future we will try to connect it with of inboxes of mobile phones and other
messengers present.
 Try to improve the accuracy which is 89.2% till now