You are on page 1of 31


Getting a job is as difficult as beating the crowd because being in the corporate world
demands a lot from the applicant because of which the applicants are putting their best,
which results in the increment of difficulty level. You can see each and every thing is
connected but the solution of this problem is either spending years to reach to a desired
position or come to Ducat. At Ducat we provide the entire necessary computer training
which helps the newbies and also the experienced workers so that they can achieve better
recognition in this competitive world.


Like other educational and training industry at Ducat you will be offered varieties programs
but the instructors makes the difference and make Ducat stand out from others. We have a
variety of skilled and trained trainers whose approach is different which you can see
anywhere. Ducat contributes a lot to the knowledge of its trainees and we try our level hard
to contribute the best to increase our trainee’s ability so that they stand out from others and
whatever they contribute to the corporate world automatically becomes productive. Not
only the fresher but also the corporates who are not able to deal with the rising technology
and software are also helped here. We try our level best to deliver our services to every
corner of the world by the help of customized education. Our motto is to deliver the best
services to you and that is why we have taken the customized approach because we do not
want you to compromise with your education.
It is not necessary that you have to leave your job in order to make-up with us. You can
contact our experts and can get the best result. To serve you we are always at your service,
you can contact us as and when you get time and clear your queries.


Ducat provides the best available programs which helps in enhancing the technical skills
which seems to be beneficial for all the applicants.

Software Development: We provide the best and latest IT software training which helps
all the fresher and the corporates to understand well and give them the knowledge to go
hand in hand with the latest technologies. This does not only helps the companies but also
increases the self-level to deal with all the necessary software.

Instructor led campus: Ducat helps all the new instructors to get the best exposure to
show their talent in right way.

Workshops and Placement Service: At Ducat, workshops are held to increase the
understanding level because theoretical values are always not enough and workshops helps
in getting the practical knowledge which results in better understanding. As everything
leads to the placement because if the institute does not provide placement services then it
is ultimately bad for the applicants but we provide the best placement services and for that
we give our best to give you the best.


 Python is an interpreted programming language designed to be easy to read and

simple to implement. It is open source, which means it is free to use, even for
commercial applications.
 Python can run on Mac, Windows, and Unix systems and has also been ported
to Java and .NET virtual machines.
 Guido Van Rossum is known as the founder of python programming
 Python is a multi-paradigm programming language. Object-oriented
programming and structured programming are fully supported, and many of its
features support functional programming and aspect-oriented programming .
 Python uses dynamic typing, and a combination of reference counting and a cycle-
detecting garbage collector for memory management. It also features
dynamic name resolution (late binding), which binds method and variable names
during program execution.
 Python's design offers some support for functional programming in
the Lisp tradition. It has filter() , map() , and reduce() functions; list

comprehensions, dictionaries, and sets.

1. High Level
2. Simple
3. Open Source
4. GUI Programming
5. Large Standard library
6. Expressive Language
7. Object Oriented
8. Interpreted
9. Platform independent

 String
String can be formed by enclosing a text in the quotes. We can use both single as
well as double quotes for a String.
eg:"Aman" , '12345‘
 Numeric:
Int: Numbers that can be both positive and negative. eg:100

1. Long : Integers of unlimited size followed by lowercase or uppercase L.

eg: 87032845L

2. Float : Real numbers with both integer and fractional part. eg: -26.2
3. Complex : In the form of a+bj where a forms the real part and b forms
theimaginary part of complex number. eg: 3.14j
4. Boolean :A Boolean datatype can have any of the two values: True or False.
5. Special literals.:Python contains one special data type i.e., None.
None is used to specify to that field that is not created.

 Collections : Collections such as tuples, lists and Dictionary are used in Python.


Machine learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
The process of learning begins with observations or data, such as, direct experience, or
instruction, in order to look for patterns in data and make better decisions in the future
based on the examples that we provide. The primary aim is to allow the computers learn
automatically without human intervention or assistance and adjust actions accordingly.
Machine learning explores the study and construction of algorithms that can learn from and
make predictions on data.
Machine learning today is not like machine learning of the past. It was born from pattern
recognition and the theory that computers can learn without being programmed to perform
specific tasks.

Application of machine learning

 Image Recognition: One of the most common uses of machine learning is image
For face detection – The categories might be face versus no face present. There
might be a separate category for each person in a database of several individuals.
For character recognition – We can segment a piece of writing into smaller images,
each containing a single character. The categories might consist of the 26 letters of
the English alphabet, the 10 digits, and some special characters.

 Speech Recognition: Speech recognition (SR) is the translation of spoken words

into text. It is also known as “automatic speech recognition” (ASR),
“computer speech recognition”, or “speech to text” (STT).

 Medical Diagnosis: ML provides methods, techniques, and tools that can help
solving diagnostic and prognostic problems in a variety of medical domains. Like
as : IBM, Genentech, healthmap.

 Prediction :
In bank, computing the probability of any of loan applicants faulting the loan
repayment. To compute the probability of the fault, the system will first need to
classify the available data in certain groups. It is described by a set of rules
prescribed by the analysts.
In finance, statistical arbitrage refers to automated trading strategies that are typical
of a short term and involve a large number of securities. Like as: citi

 Videos Surveillance
Imagine a single person monitoring multiple video cameras! This is why the idea
of training computers to do this job makes sense.
The video surveillance system nowadays are powered by AI that makes it possible
to detect crime before they happen. They track unusual behaviour of people like
standing motionless for a long time, stumbling, or napping on benches etc. The
system can thus give an alert to human attendants, which can ultimately help to
avoid mishaps. And when such activities are reported and counted to be true, they
help to improve the surveillance services. This happens with machine learning
doing its job at the backend.

 Product Recommendations
You shopped for a product online few days back and then you keep receiving emails
for shopping suggestions. If not this, then you might have noticed that the shopping
website or the app recommends you some items that somehow matches with your

 Social Media Services: you must be noticing, using, and loving in your social
media accounts, without realizing that these wonderful features are nothing but the
applications of ML.
o People You May Know: Machine learning works on a simple concept:
understanding with experiences. Facebook continuously notices the friends
that you connect with, the profiles that you visit very often, your interests,
workplace, or a group that you share with someone etc. On the basis of
continuous learning, a list of Facebook users are suggested that you can
become friends with.
o Face Recognition: You upload a picture of you with a friend and Facebook
instantly recognizes that friend. Facebook checks the poses and projections
in the picture, notice the unique features, and then match them with the
people in your friend list. The entire process at the backend is complicated
and takes care of the precision factor but seems to be a simple application
of ML at the front end.

 Search Engine Result Refining :

Google and other search engines use machine learning to improve the search results
for you. Every time you execute a search, the algorithms at the backend keep a
watch at how you respond to the results. If you open the top results and stay on the
web page for long, the search engine assumes that the the results it displayed were
in accordance to the query. Similarly, if you reach the second or third page of the
search results but do not open any of the results, the search engine estimates that
the results served did not match requirement. This way, the algorithms working at
the backend improve the search results


 Supervised Learning
This algorithm consist of a target / outcome variable (or dependent variable) which
is to be predicted from a given set of predictors (independent variables). Using
these set of variables, we generate a function that map inputs to desired
outputs. The training process continues until the model achieves a desired level of
accuracy on the training data. Examples of Supervised Learning:
Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.

 Unsupervised Learning
In this algorithm, we do not have any target or outcome variable to predict /
estimate. It is used for clustering population in different groups, which is widely
used for segmenting customers in different groups for specific intervention.
Examples of Unsupervised Learning: Apriori algorithm, K-means.

 Reinforcement Learning:
Using this algorithm, the machine is trained to make specific decisions. It works
this way: the machine is exposed to an environment where it trains itself continually
using trial and error. This machine learns from past experience and tries to capture
the best possible knowledge to make accurate business decisions. Example of
Reinforcement Learning: Markov Decision Process.


1. Regression:

Regression is used for predicting values. For example, this category of algorithms
provides you answers to questions such as, “what will the temperature be
tomorrow” or “what was the total score of a sporting match”. Regression algorithms
fall under supervised learning. Regression analysis is used to estimate the
connection among two or more variables.

2. Classification:

Classification is used for predicting a category. For example, when an image needs
to be classified as a picture of either a human or a machine, the algorithm to do so
would fall under this category. It contains supervised types of algorithms. In case
you have only two choices, the classification is known as two-class or binomial
classification. When more than two categories are available, the set of algorithms
used for their classification is known as multi-class classification. For example,
while answering the questions like, “which car needs service most urgently”? Or
“which flight will go from Delhi to Atlanta”? For answering these questions,
multiple choices are available. These types of scenarios are subject to study of
multi-class classification.


Discriminative models, also called conditional models, are a class of models used
in machine learning for modeling the dependence of unobserved (target) variables y on
observed variables x. Within a probabilistic framework, this is done by modeling
the conditional probability distribution P(y|x) , which can be used for predicting y from x.
Discriminative models, as opposed to generative models, do not allow one to generate
samples from the joint distribution of observed and target variables. However, for tasks
such as classification and regression that do not require the joint distribution,
discriminative models can yield superior performance (in part because they have fewer
variables to compute).[1][2][3] On the other hand, generative models are typically more
flexible than discriminative models in expressing dependencies in complex learning tasks.
In addition, most discriminative models are inherently supervised and cannot easily
support unsupervised learning. Application-specific details ultimately dictate the
suitability of selecting a discriminative versus generative model.

Examples of discriminative models used in machine learning include:

1. Logistic regression, a type of generalized linear regression used for

predicting binary or categorical outputs (also known as maximum entropy
2. Support vector machines
3. Boosting (meta-algorithm)
4. Conditional random fields
5. Linear regression
6. Neural networks
7. Random forests
8. Perceptron


An alternative division defines these symmetrically as: a generative model is a model of

the conditional probability of the observable X, given a target y,
symbolically, P(X|Y=y), while a discriminative model is a model of the conditional
probability of the target Y, given an observation x, symbolically ,P(Y|X=x) . Regardless of
precise definition, the terminology is because a generative model can be used to "generate"
random instances (outcomes), either of an observation and target (x, y), or of an
observation x given a target value y,[4] while a discriminative model or discriminative
classifier (without a model) can be used to "discriminate" the value of the target variable Y,
given an observation x.[5] The difference between "discriminate" (distinguish) and
"classify" is subtle, and these are not consistently distinguished, so the term "discriminative
classifier" becomes a pleonasm, meaning that it does nothing other than classify
(equivalently, "discriminate") inputs.


In application to classification, the observable X is frequently a continuous variable, the

target Y is generally a discrete variable consisting of a finite set of labels, and the
conditional probability P(Y|X) can also be interpreted as a (non-deterministic) target
function f:X->Y, considering X as inputs and Y as outputs.
Given a finite set of labels, the two definitions of "generative model" are closely related. A
model of the conditional distribution P(X|Y=y) is a model of the distribution of each label,
and a model of the joint distribution is equivalent to a model of the distribution of label
values P(Y), together with the distribution of observations given a label ,P(X|Y) ;
symbolically, P(X,Y)=P(X|Y)P(Y) Thus, while a model of the joint probability
distribution is more informative than a model of the distribution of label (but without their
relative frequencies), it is a relatively small step, hence these are not always distinguished.

Types of generative models are:

1. Gaussian mixture model (and other types of mixture model)

2. Hidden Markov model
3. Probabilistic context-free grammar
4. Naive Bayes
5. Averaged one-dependence estimators
6. Latent Dirichlet allocation
7. Restricted Boltzmann machine
8. Generative adversarial networks

If the observed data are truly sampled from the generative model, then fitting the
parameters of the generative model to maximize the data likelihood is a common method.
However, since most statistical models are only approximations to the true distribution, if
the model's application is to infer about a subset of variables conditional on known values
of others, then it can be argued that the approximation makes more assumptions than are
necessary to solve the problem at hand. In such cases, it can be more accurate to model the
conditional density functions directly using a discriminative model (see below), although
application-specific details will ultimately dictate which approach is most suitable in any
particular case.


Classification predictive modeling is the task of approximating a mapping function (f) from
input variables (X) to discrete output variables (y).
The output variables are often called labels or categories. The mapping function predicts
the class or category for a given observation.
For example, an email of text can be classified as belonging to one of two classes:
“spam“and “not spam“.
A classification can have real-valued or discrete input variables.
A problem with two classes is often called a two-class or binary classification problem.


After train a model, we will evaluate our model performance on test data.
We can use confusion matrix and classification report.


Being extremely interested in everything having a relation with the Machine Learning, the
independant project was a great occasion to give me the time to learn and confirm my
interest for this field. The fact that we can make estimations, predictions and give the ability
for machines to learn by themselves is both powerful and limitless in term of application
possibilities. We can use Machine Learning in Finance, Medicine, almost everywhere.
That’s why I decided to conduct my project around the Machine Learning.

This project was motivated by my desire to investigate the sentiment analysis field of
machine learning since it allows to approach natural language processing which is a very
hot topic actually. Following my previous experience where it was about classifying short
music according to their emotion, I applied the same idea with tweets and try to figure out
which is positive or negative.

Because I truly think that sharing sources and knowledges allow to help others but also
ourselves, the sources of the project are available at the following link:

The Project Sentiment analysis, also refers as opinion mining, is a sub machine learning
task where we want to determine which the general sentiment of a given document is.
Using machine learning techniques and natural language processing we can extract the
subjective information of a document and try to classify it according to its polarity such as
positive, neutral or negative. It is a really useful analysis since we could possibly determine
the overall opinion about a selling objects, or predict stock markets for a given company
like, if most people think positive about it, possibly its stock markets will increase, and so
on. Sentiment analysis is actually far from to be solved since the language is very complex

(objectivity/subjectivity, negation, vocabulary, grammar,...) but it is also why it is very
interesting to working on. In this project I choose to try to classify tweets from Twitter into
“positive” or “negative” sentiment by building a model based on probabilities. Twitter is a
microblogging website where people can share their feelings quickly and spontaneously
by sending a tweets limited by 140 characters. You can directly address a tweet to someone
by adding the target sign “@” or participate to a topic by adding an hastag “#” to your
tweet. Because of the usage of Twitter, it is a perfect source of data to determine the current
overall opinion about anything.

To gather the data many options are possible. In some previous paper researches, they built
a program to collect automatically a corpus of tweets based on two classes, “positive” and
“negative”, by querying Twitter with two type of emoticons:
Happy emoticons, such as “:)”, “:P”, “:)” etc.
Sad emoticons, such as “:(“, “:’(”, “=(“. Others make their own dataset of tweets my
collecting and annotating them manually which very long and fastidious. Additionally to
find a way of getting a corpus of tweets, we need to take of having a balanced data set,
meaning we should have an equal number of positive and negative tweets, but it needs also
to be large enough. Indeed, more the data we have, more we can train our classifier and
more the accuracy will be.
After many researches, I found a dataset of 1578612 tweets in english coming from two
sources: Kaggle and Sentiment140. It is composed of four columns that are ItemID,
Sentiment, SentimentSource and SentimentText. We are only interested by the Sentiment
column corresponding to our label class taking a binary value, 0 if the
tweet is negative, 1 if the tweet is positive and the SentimentText columns containing
the tweets in a raw format.


Once we have applied the different steps of the preprocessing part, we can now focus on
the machine learning part. There are three major models used in sentiment analysis to
classify a sentence into positive or negative: SVM, Naive Bayes and Language Models
(NGram). SVM is known to be the model giving the best results but in this project we focus
only on probabilistic model that are Naive Bayes and Language Models that have been
widely used in this field. Let’s first introduce the Naive Bayes model which is well-known
for its simplicity and efficiency for text classification.


Naive Bayes is a simple technique for constructing classifiers: models that assign class
labels to problem instances, represented as vectors of feature values, where the class labels
are drawn from some finite set. There is not a single algorithm for training such classifiers,
but a family of algorithms based on a common principle: all naive Bayes classifiers assume
that the value of a particular feature is independent of the value of any other feature, given
the class variable. For example, a fruit may be considered to be an apple if it is red, round,
and about 10 cm in diameter. A naive Bayes classifier considers each of these features to
contribute independently to the probability that this fruit is an apple, regardless of any
possible correlations between the color, roundness, and diameter features.
For some types of probability models, naive Bayes classifiers can be trained very
efficiently in a supervised learning setting. In many practical applications, parameter
estimation for naive Bayes models uses the method of maximum likelihood; in other
words, one can work with the naive Bayes model without accepting Bayesian
probability or using any Bayesian methods.
Despite their naive design and apparently oversimplified assumptions, naive Bayes
classifiers have worked quite well in many complex real-world situations. In 2004, an
analysis of the Bayesian classification problem showed that there are sound theoretical
reasons for the apparently implausible efficacy of naive Bayes classifiers.[5] Still, a

comprehensive comparison with other classification algorithms in 2006 showed that Bayes
classification is outperformed by other approaches, such as boosted trees or random
Applying Bayes Theorem we get:
p(Ck|x) = p(x|Ck)p(Ck)/ p(x) = p(x1.x2, ...., xn|Ck)p(Ck)/ p(x1, x2, ..., xn)

We need to use only the numerator as the denominator remains constant for all the classes.
Assumming that each feature is conditionally independent of every other feature (hence
the name Naive Bayes),
p(Ck|x) ∝ p(Ck)p(x1|Ck)p(x2|Ck)....p(xn|Ck) This is a naive Bayes Probability model.

The naive Bayes Classifier combines this model with a decision rule, which picks up the
hypothesis with maximum probability.
An advantage of naive Bayes is that it only requires a small number of training data to
estimate the parameters necessary for classification.

There are several variants of Naive Bayes classifiers that are:
 The Multivariate Bernoulli Model: Also called binomial model, useful if our feature
vectors are binary (e.g 0s and 1s). An application can be text classification with bag

of words model where the 0s 1s are "word does not occur in the document" and
"word occurs in the document" respectively.
 The Multinomial Model: Typically used for discrete counts. In text classification,
we extend the Bernoulli model further by counting the number of times a word
$w_i$ appears over the number of words rather than saying 0 or 1 if word occurs
or not.
 The Gaussian Model: We assume that features follow a normal distribution. Instead
of discrete counts, we have continuous features.

In every machine learning task, it is always good to have what we called a baseline. It often
a “quick and dirty” implementation of a basic model for doing the first classification and
based on its accuracy, try to improve it. We use the Multinomial Naive Bayes as learning
algorithm with the Laplace smoothing representing the classic way of doing text
classification. Since we need to extract features from our data set of tweets, we use the bag
of words modelto represent it. The bag of words model is a simplifying representation of a
document where it is represented as a bag of its words without taking consideration of the
grammar or word order. In text classification, the count (number of time) of each word
appears is a document is used as a feature for training the classifier. Firstly, we divide the
data set into two parts, the training set and the test set. To do this, we first shuffle the data
set to get rid of any order applied to the data, then we from the set of positive tweets and
the set of negative tweets, we take 3/4 of tweets from each set and merge them together to
make the training set. The rest is used to make the test set. Finally the size of the training
set is 1183958 tweets and the test set is 394654 tweets. Notice that they are balanced and
follow the same distribution of the initial data set. Once the training set and the test set are
created we actually need a third set of data called the validation set. It is really useful
because it will be used to validate our model against unseen data and tune the possible
parametersof the learning algorithm to avoid underfitting and overfitting for example. We
need this validation set because our test set should be used only to verify how well the
model will generalize. If we use the test set rather than the validation set, our model could
be overly optimistic and twist the results.


To make the validation set, there are two main options:

1. Split the training set into two parts (60%, 20%) with a ratio 2:8 where each part
contains an equal distribution of example types. We train the classifier with the
largest part, and make prediction with the smaller one to validate the model. This
technique works well but has the disadvantage of our classifier not getting trained
and validated on all examples in the data set (without counting the test set).
2. The Kfold crossvalidation. We split the data set into k parts, hold out one, combine
the others and train on them, then validate against the heldout portion. We repeat
that process k times (each fold), holding out a different portion each time. Then we
average the score measured for each fold to get a more accurate estimation of our
model's performance.



Nowadays, sentiment analysis or opinion mining is a hot topic in machine learning. We are
still far to detect the sentiments of s corpus of texts very accurately because of the
complexity in the English language and even more if we consider other languages such as
Chinese. In this project we tried to show the basic way of classifying tweets into positive
or negative category using Naive Bayes as baseline and how language models are related
to the Naive Bayes and can produce better results. We could further improve our classifier
by trying to extract more features from the tweets, trying different kinds of features, tuning
the parameters of the naïve Bayes classifier, or trying another classifier all together.


Neutral Tweets need to be classified in order to increase the utility of a classifier. Many
tweets lack any particular sentiment and are more focused on direct and unbiased
statements about facts or events. Parts Of Speech tag can be utilized to interpret emotions
in a better way. E.g. ’over’ as a verb conveys a negative emotion, but as a noun it is neutral
(over in cricket).[5] Large Data Set can be used if better computation facility is available.
As it took more than two hours on my system to train the current dataset and obtain the
output, bigger dataset could not be used even though it was available. Usage of bigrams
need to be included as classifiers handling negations always produce unexpected results.


1. In order to facilitate the preprocessing part of the data, we introduce five resources
which are,
2. An emoticon dictionaryregrouping 132 of the most used emoticons in western with
their sentiment, negative or positive.
3. An acronym dictionaryof 5465 acronyms with their translation.
4. A stop word dictionarycorresponding to words which are filtered out before or after
processing of natural language data because they are not useful in our case.
5. A positive and negative word dictionaries given the polarity (sentiment outof-
context) of words.
6. A negative contractions and auxiliaries dictionarywhich will be used to detect
negation in a given tweet such as “don’t”, “can’t”, “cannot”, etc. The introduction
of these resources will allow to uniform tweets and remove some of their
complexities with the acronym dictionary for instance because a lot of acronyms
are used in tweets. The positive and negative word dictionaries could be useful to
increase (or not) the accuracy score of the classifier. The emoticon dictionary has
been built from wikipedia with each emoticon annotated manually. The stop word
dictionary contains 635 words such as “the”, “of”, “without”. Normally they should
not be useful for classifying tweets according to their sentiment but it is possible
that they are.
Also we use Python 2.7 ( which is a programming language
widely used in data science and scikitlearn ( a very complete and
useful library for machine learning containing every techniques, methods we need and the
website is also full of tutorials wellexplained. With Python, the libraries, Numpy
( and Panda ( for manipulating data
easily and intuitively are just essential.


Address: 1808/2 IInd Floor, Old DLF, Sector-14, Gurugram (Haryana)

Near Honda Showroom

Location: Gurgaon - Gurgaon City

Phone: 0124-4219095

Mobile: +91-9873477222

Toll Free: 1800 3000 9533


Landmark: Near Honda Showroom

Postal Code: 122001