You are on page 1of 9

1a.

HMM

• HMM stands for Hidden Markov Model


• It means that they are classes from which a set of unknown variables
can be predicted from the set of observed variables using graphical
models
Basic assumptions
• There is no one to one correspondence between stated and observed
sets
• The existence of states is identi ed using stochastic processes
• Stat changes are made using the statistical process
3 main tasks
• Learning: determining the parameter
• Likelihood: determining the likelihood of a set of observed values
• Decoding: determining the most likely states of hidden states
--------------
.

fi
s

1b. K-means clustering will use a pre-speci ed number of clusters to


categorize items into group
It will work as follows
1. Initialization of k points known as mean
2. Categorization of each item to its closest mean and updation of the
mean’s coordinate
3. Repeat the steps until it reaches the given number of iteratio
Hierarchical clustering will treat every single data point as a separate
cluster. Then it works as follows
1. Identi cation of any 2 closer cluster
2. Merge the 2 maximum comparable clusters
3. Repeat these steps until all the clusters are merged
Pseudocode
Initialize a random value to every k mean
Iterate through the steps for a given number of time
Iterate through item
Calculate the mea
Assign this mean to an ite
Update the value of the mea
fi
:

fi
.

1c.
1d. Variance is nothing but the measurement of the difference between the
observed values and the average of predicted values where covariance is
the measurement of how two variables vary with respect to each other
In single variate regression, the model describes the relationship between
one independent variable and one dependent variable using a straight line
wherein the multivariate regression, the model describes the relationship
between more than one independent variable and more than one
dependent variable, which are linearly related

Problem 2

Problem 4

ANSWER
Distance from each point to other points is calculated using manhattan
distance given by abs(x2-x1)+abs(y2-y1
Let A = (2,2
B=(2,3
C=(3,3
D=(6,7
E=(9,10
F=(8,7
Distance from A to B = (2,2) to (2,3)=abs(2-2)+abs(2-3)=
Distance from A to C = (2,2) to (3,3) = abs(2-3)+abs(2-3)=
Distance from A to D = (2,2) to (6,7) = abs(2-6)+abs(2-7)=
Distance from A to E = (2,2) to (9,10) = abs(2-9)+abs(2-10)=1
Distance from A to F = (2,2) to (8,7) = abs(2-8)+abs(2-7)=1
Distance from B to C = (2,3) to (3,3)= abs(2-3)+abs(3-3)=
Distance from B to D = (2,3) to (6,7)=abs(2-6)+abs(3-7)=
Distance from B to E = (2,3) to (9,10)=abs(2-9)+abs(3-10)=1
Distance from B to F= (2,3) to (8,7)=abs(2-8)+abs(3-7)=1
Distance from C to D = (3,3) to (6,7)=abs(3-6)+abs(3-7)=
Distance from C to E = (3,3) to (9,10)=abs(3-9)+abs(3-10)=1
Distance from C to F =(3,3) to (8,7)=abs(3-8)+abs(3-7)=
Distance from D to E=(6,7) to (9,10)=abs(6-9)+abs(7-10)=
Distance from D to F=(6,7) to (8,7)=abs(6-8)+abs(7-7)=
Distance from E to F= (9,10) to (8,7)=abs(9-8)+abs(10-7)=
There is 2 set of points with shortest distance (2,2) and (2,3) and (2,3)
to (3,3). We choose randomly one. Let it be (2,2) and (2,3)
So the resulting Clusters are AB, C, D, E,
The centroid of AB is calculated as (2,2.5
Distance from AB to C = (2,2.5) to (3,3) = abs(2-3)+abs(2.5-3)=1.
Distance from AB to D = (2,2.5) to (6,7) = abs(2-6)+abs(2.5-7)=8.
Distance from AB to E = (2,2.5) to (9,10) = abs(2-9)+abs(2.5-10)=14.
Distance from AB to F = (2,2.5) to (8,7) = abs(2-8)+abs(2.5-7)=10.
Distance from C to D = (3,3) to (6,7)=abs(3-6)+abs(3-7)=
Distance from C to E = (3,3) to (9,10)=abs(3-9)+abs(3-10)=1
Distance from C to F =(3,3) to (8,7)=abs(3-8)+abs(3-7)=
)

Distance from D to E=(6,7) to (9,10)=abs(6-9)+abs(7-10)=


Distance from D to F=(6,7) to (8,7)=abs(6-8)+abs(7-7)=
Distance from E to F= (9,10) to (8,7)=abs(9-8)+abs(10-7)=
The shortest distance is between AB and C. Hence those two are
merged and the resulting clusters are ABC, D,E,F
The centroid of ABC is calculated as (2.5,2.75
Distance from ABC to D = (2.5,2.75) to (6,7) =
abs(2.5-6)+abs(2.75-7)=7.7
Distance from ABC to E = (2.5,2.75) to (9,10) =
abs(2.5-9)+abs(2.75-10)=13.7
Distance from ABC to F = 2.5,2.75) to (8,7) =
abs(2.5-8)+abs(2.75-7)=9.7
Distance from D to E=(6,7) to (9,10)=abs(6-9)+abs(7-10)=
Distance from D to F=(6,7) to (8,7)=abs(6-8)+abs(7-7)=
Distance from E to F= (9,10) to (8,7)=abs(9-8)+abs(10-7)=
The smallest distance is between D and F. Hence they are merged
together and the resulting clusters are ABC, E, DF. The new cluster
centroids for DF is (7,7)
Distance from ABC to DF = (2.5,2.75) to (7,7) =
abs(2.5-7)+abs(2.75-7)=8.7
Distance from ABC to E = (2.5,2.75) to (9,10) =
abs(2.5-9)+abs(2.75-10)=13.7
Distance from DF to E=(7,7) to (9,10)=abs(7-9)+abs(7-10)=
The shortest distance is between DF and E. Hence the resulting
clusters are ABC and DEF. In the next Iteration ABCDEF are merged
into on e single cluster

The dendrogram for this clustering is plotted as shown below. The cut-
off line is marke when there is a sudden jump in the distance from 2 to
5. The resultant clusters are below the cut-off line
.

Problem
5

You might also like