You are on page 1of 11

Data Science Session 03 Document

Page 1 of 11
Topics Covered

 Data Science Algorithms 03 To 10

Page 2 of 11
Data Science Algorithms

1. Linear Regression

Linear Regression is a method of measuring the relationship between two continuous variables.
The two variables are –

 Independent Variable – “x”


 Dependent Variable – “y”
In the case of a simple linear regression, the independent value is the predictor value and it is
only one. The relationship between x and y can be described as:

y = mx + c
Where m is the slope and c is the intercept.

Based on the predicted output and the actual output, we perform the calculation

2. Logistic Regression

Logistic Regression is used for binary classification of data-points. It performs categorical


classification that results in the output belonging to either of the two classes (1 or 0). For
example, predicting whether it would rain or not, based on the weather condition is an example
of logistic regression.
The two important parts of Logistic Regression are Hypothesis and the Sigmoid Curve. Using
this hypothesis, we derive the likelihood of an event. The data that is produced from our
hypothesis is fit into the log function that ultimately forms an S shaped curve called ‘sigmoid’.
Based on this log function, we are able to determine the category of the class.

The sigmoid is an S-shaped curve which is represented as follows:

Page 3 of 11
We generate this with the help of logistic function –

1 / (1 + e^-x)
Here, e represents base of natural log and we obtain the S-shaped curve with values between 0
and 1. The equation for logistic regression is written as:

y = e^(b0 + b1*x) / (1 + e^(b0 + b1*x))


Here, b0 and b1 are the coefficients of the input x. These coefficients are estimated using the
data through “maximum likelihood estimation”.

3. K-Means Clustering

According to the formal definition of K-means clustering – K-means clustering is an iterative


algorithm that partitions a group of data containing n values into k subgroups. Each of the n
value belongs to the k cluster with the nearest mean. This means that given a group of objects,
we partition that group into several sub-groups. These sub-groups are formed on the basis of
their similarity and the distance of each data-point in the sub-group with the mean of their
centroid. K-means clustering is the most popular form of an unsupervised learning algorithm. It
is easy to understand and implement.
The objective of the K-means clustering is to minimize the Euclidean distance that each point
has from the centroid of the cluster. This is known as intra-cluster variance and can be
minimized using the following squared error function –

Page 4 of 11
Where J is the objective function of the centroid of the cluster. K are the number of clusters and
n are the number of cases. C is the number of centroids and j is the number of clusters. X is the
given data-point from which we have to determine the Euclidean Distance to the centroid. Let
us have a look at the algorithm for K-means clustering –

 First, we randomly initialize and select the k-points. These k-points are the means.
 We use the Euclidean distance to find data-points that are closest to their centreW of the
cluster.
 Then we calculate the mean of all the points in the cluster which is finding their
centroid.
 We iteratively repeat step 1, 2 and 3 until all the points are assigned to their respective
clusters.

Page 5 of 11
4. Principal Component Analysis

One of the most important part of data science is dimension. There are several dimensions in
data. The dimensions are represented as n. 

For example, suppose that as a data scientist working in a financial company, you have to deal
with customer data that involves their credit-score, personal details, salary and hundreds of
other parameters. In order to understand the significant labels that contribute towards our
model, we use dimensionality reduction. PCA is a type of reduction algorithm.
With the help of PCA, we can reduce the number of dimensions while keeping all the important
ones in our model. There are PCAs based on the number of dimension and each one is
perpendicular to the other (or orthogonal). The dot product of all the orthogonal PCs is 0.

Page 6 of 11
5. Support Vector Machines

Support Vector machines are powerful classifiers for classification of binary data. They are also
used in facial recognition and genetic classification. SVMs have pre-built regularization model
that allows data scientists to SVMs automatically minimize the classification error. It, therefore,
helps to increase the geometrical margin which is an essential part of an SVM classifier.

Support Vector Machines can map the input vectors to n-dimensional space. They do so by
building a maximum separation hyperplane. SVM’s are formed by structure risk minimization.
There also two other hyperplanes, on either side of the initially constructed hyperplane. We
measure the distance from the central hyperplane to the other two hyperplanes.

6. Artificial Neural Networks

Neural Networks are modeled after the neurons in the human brain. It comprises many layers
of neurons that are structured to transmit information from the input layer to the output layer.
Between the input and the output layer, there are hidden layers present. These hidden layers
can be many or just one. A simple neural network comprising of a single hidden layer is known
as Perceptron.

Page 7 of 11
In the above diagram for a simple neural network, there is an input layer that takes the input in
the form of a vector. Then, this input is passed to the hidden layer which comprises of various
mathematical functions that perform computation on the given input. For example, given the
images of cats and dogs, our hidden layers perform various mathematical operations to find the
maximum probability of the class our input image falls in. This is an example of binary
classification where the class, that is, dog or cat, is assigned its appropriate place.

7. Decision Trees

With the help of decision trees, you can perform both prediction and classification. We use
Decision Trees to make decisions with a given set of input. Understand decision tree with the
help of the following example:

Suppose you go to the market to buy a product. First, you assess if you really need the product,
that is, you will go to the market only if you do not have the product. After assessing it, you will
determine if it is raining or not. Only if the sky is clear, you will go to the market, otherwise,
you will not go. We can observe this in the form of a decision tree-

Page 8 of 11
Using the same principle, we build a hierarchical tree to reach a result through a process of
decisions. There are two steps to building a tree: Induction & Pruning. Induction is the process
in which we build the tree, whereas, in pruning, we simplify the tree by removing complexities.

8. Recurrent Neural Networks

Recurrent Neural Networks are used for learning sequential information. These sequential
problems consist of cycles that make use of the underlying time-steps. In order to compute this
data, ANNs require a separate memory cell in order to store the data of the previous step. We
use data that is represented in a series of time-steps. This makes RNN an ideal algorithm for
solving problems related to text processing.

In the context of text-processing, RNNs are useful for predicting future sequences of words.
RNNs that are stacked altogether are referred to as Deep Recurrent Neural Networks. RNNs are
used in generating text, composing music and for time-series forecasting. Chatbots,
recommendation systems and speech recognition systems use varying architectures of
Recurrent Neural Networks.

Page 9 of 11
9. Apriori

In 1994, R. Agrawal and R. Srikant developed the Apriori Algorithm. This algorithm is used for
finding frequently occurring itemsets using the boolean association rule. This algorithm is
called Apriori as it makes use of the ‘prior’ knowledge of the properties in an itemset. In this
algorithm an iterative approach is applied. This is a level-wise search where we mine k-
frequently occurring itemset to find k+1 itemsets.
Apriori makes the following assumptions –

 The subsets of a frequent itemset must also be frequent.


 Supersets of an in-frequent itemset must also be in-frequent.

The three significant components of an Apriori Algorithm are –

 Support
 Confidence
 Lift

Support is a measure of the default popularity (which is a result of frequency) of an item ‘X’.
Support is calculated through the division of the number of transactions in which X appears
with the total number of transactions.

We can define the confidence of a rule as the division of the total number of transactions
involving X and Y with the total number of transactions involving X.

Lift is the increase in the ratio of the sale of X when you sell the item Y. It is used to measure the
likelihood of the Y being purchased when X is already purchased, taking into account the
popularity of the item Y.

Session III

Page 10 of 11
1) Explain the Linear Regression Algorithm?
2) Explain the Logistic Regression Algorithm?
3) Explain the K-means clustering Algorithm?
4) Explain the Principal Component Analysis Algorithm?
5) Explain the Support Vector machines Algorithm?
6) Explain the Artificial Neural Networks Algorithm?
7) Explain the Decision Trees Algorithm?
8) Explain the Recurrent Neural Networks Algorithm?
9) Explain the Apriori Algorithm?

Page 11 of 11

You might also like