You are on page 1of 2

Web Search and Mining (CS 635) Midterm Exam

Computer Science and Engineering 2020-10-09 Saturday


Indian Institute of Technology Bombay 17:55 - 20:10 - Online

Instructions for online exam:


• If the question is multiple choice or short answer type, choose or type your answer into the
SAFE app.
• If the question requires a long answer, take a clear photo of your answer on a white sheet and
upload into the SAFE app. Ensure that the uploaded pictures are clearly visible.
• Use the marks alongside each question for time management.
• You may not use any computing or communication device during the exam, apart from the
devices required to conduct the exam.
• You may use textbooks, class notes written by you, approved material downloaded prior to the
exam from the course Web page, course news group, or the Internet.
• You may not conduct Web search during the exam.

1. Suppose a coding scheme has a tuned parameter m. Given an input integer n to code, we
express n = qm + r with 0 ≤ r < m. We write q in unary code (1s terminated by 0), then
write r using the Gamma code.

1.a With m = 5, write down the codes for gaps 11, 19, 27.
1.a /3
1.b Omitting constants and rounding, roughly how many bits are needed to encode a gap of
g in general, as a function of g and m?
1.b /1
1.c Suppose the gaps G are sampled from a geometric distribution given by Pr(G = g; p) =
(1 − p)g−1 p for g ≥ 1. What is the approximate best choice of m as a function of p?
1.c /2
2. Suppose queries and documents alike are represented in vector space with raw word counts
(no IDF) and scaled to unit L1 norm. E.g., the query [spicy pizza recipe] is represented
by the sparse vector {spicy → 13 , pizza → 13 , recipe → 13 }. Similarity between query vector
q and document vector d is represented as a souped-up dotPproduct with parameters aw ≥ 0
for each word w in the vocabulary, as follows: sim(q, d) = w qw aw dw . We are given a set of
queries Q, and, for each q ∈ Q, we are given sampled relevant documents Dq⊕ and sampled
irrelevant documents Dq . Propose and justify a loss objective that you can minimize to
learn parameters a, which should assume the role of word IDFs and have small values for
non-informative words.
2. /3
3. For each distribution specified below, state if it is an instance of the exponential family. If it
is, then relate the standard generic parameters θ and functions f, h of the exponential family
with the parameters of the specific distribution. Recall that the exponential family is defined
as Pr(x; θ) = g(θ)f (x) exp(θ · h(x)). You do not need to specify g(θ).
ba
3.a The Gamma density, given by Gamma(x; a, b) = Γ(a)
xa−1 e−bx for x > 0. (Note, the
Gamma density is not the same as the Γ function.)
3.a /2
xm−1 (1−x)n−1
3.b The Beta density, given by Beta(x; m, n) = B(m,n)
for x ∈ [0, 1]. (Note, the Beta
density is not the same as the B function.)
3.b /2

Page 1 of 2
Roll: Name:
P
Γ( w aw )
xaww −1 , for x ∈ ∆W
Q
3.c The Dirichlet density, given by Dir(x; a) = Q
w 1 , which means
P w Γ(aw )
xw ≥ 0 and w xw = 1.
3.c /2
3.d The geometric distribution, given by Pr(x; p) = (1 − p)x p for x = 0, 1, 2, . . . and p ∈
(0, 1).
3.d /2
4. Suppose, from N documents, we have estimated multinomial
P topic mixture parameters θn ∈ ∆K
1
(where n ∈ {1, . . . , N }), which means θn,k ≥ 0 and k θn,k = 1 for all n. We have modeled
all θn as sampled
P from a single global Dirichlet density Dir(θ; a) with a ∈ RK . Recall that
Γ( w aw ) Q aw −1
Dir(θ; a) = Q Γ(a w) w θw . If needed, you may also use the digamma function Ψ(a) =
w
d
da
ln Γ(a) which is available in math libraries.

• Either give a closed-form expression for a given the observations {θn : n ∈ [N ]},
• Or specify precisely a gradient update formula to estimate a numerically.

4. /3
a
5. Recall the Gamma density, given by Gamma(λ; a, b) = Γ(a) b
λa−1 e−bλ for λ > 0. (Note, the
Gamma density is not the same as the Γ function. Here a is called the ‘shape’ and b the ‘rate’
of the density.) We will model the frequency xd,w of word w in a document d ∈ [D] as sampled
from a Poisson distribution with parameter λw > 0. We will assume that λw itself is sampled
from a Gamma(λw ; aw , bw ) density, with prior parameters aw , bw .

5.a Identify the posterior density of λw , with its salient parameters, after observing word
frequencies xd,w in documents d = 1, . . . , D.
5.a /2
5.b Calculate the mean of the posterior density over λw and interpret your result using suitable
extreme values of a, b, D.
5.b /3

Total: 25

Page 2 of 2

You might also like