2020 2A CS635 Midterm

Web Search and Mining (CS 635) Midterm Exam
Computer Science and Engineering 2020-10-09 Saturday

Indian Institute of Technology Bombay 17:55 - 20:10 - Online
Instructions for online exam:

• If the question is multiple choice or short answer type, choose or type your answer into the
SAFE app.
• If the question requires a long answer, take a clear photo of your answer on a white sheet and
upload into the SAFE app. Ensure that the uploaded pictures are clearly visible.
• Use the marks alongside each question for time management.
• You may not use any computing or communication device during the exam, apart from the
devices required to conduct the exam.
• You may use textbooks, class notes written by you, approved material downloaded prior to the
exam from the course Web page, course news group, or the Internet.
• You may not conduct Web search during the exam.
1. Suppose a coding scheme has a tuned parameter m. Given an input integer n to code, we
express n = qm + r with 0 ≤ r < m. We write q in unary code (1s terminated by 0), then
write r using the Gamma code.
1.a With m = 5, write down the codes for gaps 11, 19, 27.
1.a /3
1.b Omitting constants and rounding, roughly how many bits are needed to encode a gap of
g in general, as a function of g and m?
1.b /1
1.c Suppose the gaps G are sampled from a geometric distribution given by Pr(G = g; p) =
(1 − p)g−1 p for g ≥ 1. What is the approximate best choice of m as a function of p?
1.c /2
2. Suppose queries and documents alike are represented in vector space with raw word counts
(no IDF) and scaled to unit L1 norm. E.g., the query [spicy pizza recipe] is represented
by the sparse vector {spicy → 13 , pizza → 13 , recipe → 13 }. Similarity between query vector
q and document vector d is represented as a souped-up dotPproduct with parameters aw ≥ 0
for each word w in the vocabulary, as follows: sim(q, d) = w qw aw dw . We are given a set of
queries Q, and, for each q ∈ Q, we are given sampled relevant documents Dq⊕ and sampled
irrelevant documents Dq . Propose and justify a loss objective that you can minimize to
learn parameters a, which should assume the role of word IDFs and have small values for
non-informative words.
2. /3
3. For each distribution specified below, state if it is an instance of the exponential family. If it
is, then relate the standard generic parameters θ and functions f, h of the exponential family
with the parameters of the specific distribution. Recall that the exponential family is defined
as Pr(x; θ) = g(θ)f (x) exp(θ · h(x)). You do not need to specify g(θ).
ba
3.a The Gamma density, given by Gamma(x; a, b) = Γ(a)
xa−1 e−bx for x > 0. (Note, the
Gamma density is not the same as the Γ function.)
3.a /2
xm−1 (1−x)n−1
3.b The Beta density, given by Beta(x; m, n) = B(m,n)
for x ∈ [0, 1]. (Note, the Beta
density is not the same as the B function.)
3.b /2
Page 1 of 2
Roll: Name:
P
Γ( w aw )
xaww −1 , for x ∈ ∆W
Q
3.c The Dirichlet density, given by Dir(x; a) = Q
w 1 , which means
P w Γ(aw )
xw ≥ 0 and w xw = 1.
3.c /2
3.d The geometric distribution, given by Pr(x; p) = (1 − p)x p for x = 0, 1, 2, . . . and p ∈
(0, 1).
3.d /2
4. Suppose, from N documents, we have estimated multinomial
P topic mixture parameters θn ∈ ∆K
1
(where n ∈ {1, . . . , N }), which means θn,k ≥ 0 and k θn,k = 1 for all n. We have modeled
all θn as sampled
P from a single global Dirichlet density Dir(θ; a) with a ∈ RK . Recall that
Γ( w aw ) Q aw −1
Dir(θ; a) = Q Γ(a w) w θw . If needed, you may also use the digamma function Ψ(a) =
w
d
da
ln Γ(a) which is available in math libraries.
• Either give a closed-form expression for a given the observations {θn : n ∈ [N ]},
• Or specify precisely a gradient update formula to estimate a numerically.
4. /3
a
5. Recall the Gamma density, given by Gamma(λ; a, b) = Γ(a) b
λa−1 e−bλ for λ > 0. (Note, the
Gamma density is not the same as the Γ function. Here a is called the ‘shape’ and b the ‘rate’
of the density.) We will model the frequency xd,w of word w in a document d ∈ [D] as sampled
from a Poisson distribution with parameter λw > 0. We will assume that λw itself is sampled
from a Gamma(λw ; aw , bw ) density, with prior parameters aw , bw .
5.a Identify the posterior density of λw , with its salient parameters, after observing word
frequencies xd,w in documents d = 1, . . . , D.
5.a /2
5.b Calculate the mean of the posterior density over λw and interpret your result using suitable
extreme values of a, b, D.
5.b /3
Total: 25
Page 2 of 2

2020 2A CS635 Midterm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2020 2A CS635 Midterm

Uploaded by

Copyright:

Available Formats

Web Search and Mining (CS 635) Midterm Exam

Computer Science and Engineering 2020-10-09 Saturday

Instructions for online exam:

You might also like