Professional Documents
Culture Documents
Parameter Smoothing
Introduction
It is designed to detect trends in the presence of noisy data in cases in which the
shape of the trend is unknown.
•In contrast, the noise, or deviation from the trend, is unpredictably wobbly:
4
IMPORTANT
Part of what we try to understand here are the assumptions that permit us to
extract the trend from the noise.
5
We will focus first on a problem with just one predictor.
Specifically, we try to estimate the time trend in the 2008 US popular vote poll
margin (difference between Obama and McCain).
6
For the purposes of this example, do not think of it as a forecasting problem.
Instead, we are simply interested in learning the shape of the trend after the election
is over.
7
We assume that for any given day x, there is a true preference among the
electorate f(x), but due to the uncertainty introduced by the polling, each data point
comes with an error ε. A mathematical model for the observed poll margin Yi is:
But since we don’t know this conditional expectation, we have to estimate it.
8
The line we see does not appear to describe the trend very well.
For example, on September 4 (day -62), the Republican Convention was held and the
data suggest that it gave John McCain a boost in the polls.
However, the regression line does not capture this potential trend. To see the lack of
fit more clearly, we note that points above the fitted line (blue) and those below (red)
are not evenly distributed across days.
Bin Smoothing
Kernels
10
Bayesian Belief Networks
11
Bayesian Belief Networks Definition
Nodes-> Variables
Links-> Dependency
12
BN Formal Definition
•Directed Links – The real meaning of the link from node X to Node Y means X has
direct influence on Y.
•Each node has a CPT with quantifies the effect that a parent has on the child.
13
If an arc is drawn from X to Y, means X is a immediate predecessor pr parent of Y. Y is
the descendant of X.
14
Dimensionality Reduction
Reason:
Sometimes the data can contain a huge number of features, some of which are not
even required.
15
What is dimensionality reduction?
16
Definition :
The process of dimensionality reduction essentially transforms data from high-
dimensional feature space to a low-dimensional feature space.
CAUTION:
It is also important that meaningful properties present in the data are not lost during
the transformation.
17
Curse of Dimensionality
18
Sparsity in data is usually referred to as the features having a value of zero; this
doesn’t mean that the value is missing.
If the data has a lot of sparse features then the space and computational complexity
increase. the model trained on sparse data performed poorly in the test dataset.
In other words, the model during the training learns noise and they are not able to
generalize well. Hence they overfit.
19
Issues that arise with high dimensional data are:
Non-sparse data or dense data on the other hand is data that has non-zero features.
Apart from containing non-zero features they also contain information that is both
meaningful and non-redundant.
Solution:
To tackle the curse of dimensionality, methods like dimensionality reduction are used.
Dimensional reduction techniques are very useful to transform sparse features to dense
features.
Furthermore, dimensionality reduction is also used to clean the data and feature
extraction.
20
21
Techniques for Dimensionality Reduction
1.Decomposition algorithms
Principal Component Analysis
Kernel Principal Component Analysis
Non-Negative Matrix Factorization
Singular Value Decomposition
2.Manifold learning algorithms
t-Distributed Stochastic Neighbor Embedding
Spectral Embedding
Locally Linear Embedding
3. Discriminant Analysis
Linear Discriminant Analysis
22
Components of Dimensionality Reduction
23
Collaborative Filtering Based Recommendation System
In Collaborative Filtering, we tend to find similar users and recommend what
similar users like.
In this type of recommendation system, we don’t use the features of the item to
recommend it.
Rather we classify the users into the clusters of similar types, and recommend each
user according to the preference of its cluster.
24
Example : Movie Recommendation System
25
Method 1: Measuring Similarity
In this type of scenario, we can see that User 1 and User 2 give nearly similar ratings
to the movie, so we can conclude that Movie 3 is also going to be averagely liked by
the User 1 but Movie 4 will be a good recommendation to User 2, like this we can also
see that there are users who have different choices like User 1 and User 3 are
opposite to each other.
One can see that User 3 and User 4 have a common interest in the movie, on that
basis we can say that Movie 4 is also going to be disliked by the User 4. This is
Collaborative Filtering, we recommend users the items which are liked by the users
of similar interest domain.
26
Method 2: Cosine Distance
We can also use the cosine distance between the users to find out the users with
similar interests, larger cosine implies that there is a smaller angle between two
users, hence they have similar interests.
We can apply the cosine distance between two users in the utility matrix, and we can
also give the zero value to all the unfilled columns to make calculation easy, if we get
smaller cosine then there will be a larger distance between the users and if the
cosine is larger than we have a small angle between the users, and we can
recommend them similar things.
27
Method 3: Rounding the Data:
In collaborative filtering we round off the data to compare it more easily like we
can assign below 3 ratings as 0 and above of it as 1, this will help us to compare
data more easily,.
We can see that User 1 and User 2 are more similar and User 3 and User 4 are
more alike.
28
Example of Slide with Bullets
Lorem ipsum dolor sit ametlorem ipsum dolor sit ametlorem ipsum dolor sit
ametlorem ipsum dolor sit ametlorem ipsum dolor sit ametlorem ipsum dolor sit
ametlorem ipsum dolor sit ametlorem ipsum dolor sit amet.
(Body Copy Font: Source sans pro, Size:18), Bullets Style will be following
Topic Name
Example 1 for Slide with Picture
Image Source / references: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cum praesertim illa
perdiscere ludus esset. Item de contrariis.
(Body Copy Font: Source sans pro, Size:18),