Irt 2 Marks With Answer

PRATHYUSHA ENGINEERING COLLEGE
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS8080 – INFORMATION RETRIEVAL TECHNIQUES
QUESTION BANK
UNIT I
Part A (2 Marks)
1. Define information retrieval.
Information Retrieval (IR) is finding material (usually documents) of an unstructured
nature (usually text) that satisfies an information need from within large collections
(usually stored on computers).
2. Explain difference between data retrieval and information retrieval.
3. List and explain components of IR block diagram.

o Input : Store only a representation of the document or query which means that
the text of a document is lost once it has been processed for the purpose of
generating its representation.
o A document representative could be a list of extracted words considered to be
significant.
o Processor : Involve in performing actual retrieval function, executing the search
strategy in response to query.
o Feedback : Improving the subsequent run after sample retrieval.
o Output : A set of document numbers.
4. What is objective term and nonobjective term ?
 Objective terms are extrinsic to semantic content, and there is generally no
disagreement about how to assign them. Examples include author name,
document URL, and date of publication.
Nonobjective terms are intended to reflect the information manifested in the
document, and there is no agreement about the choice or degree of applicability of
these terms. They are also known as content terms.
5. Explain the type of natural language technology used in information retrieval.
Two types of natural language technology can be useful in information retrieval :

 Natural language interfaces make the task of communicating with the information
source easier, allowing a system to respond to a range of inputs, possibly from
inexperienced users, and to produce more customized output.
 Natural language text processing allows a system to scan the source texts, either
to retrieve particular information or to derive knowledge structures that may be
used in accessing information from the texts.
6. What is search engine ?

A search engine is a document retrieval system designed to help find information stored
in a computer system, such as on the WWW. The search engine allows one to ask for
content meeting specific criteria and retrieves a list of items that match those criteria.
7. What is conflation ?
Stemming is the process for reducing inflected words to their stem, base or root form,
generally a written word form. The process of stemming is often called conflation.
8. What is an invisible web ?
Many dynamically generated sites are not indexable by search engines; this phenomenon
is known as the invisible web.
9. Define Zipf’s law.
An empirical rule that describes the frequency of the text words. It states that the i th most
frequent word appears as many times as the most frequent one divided by i Ɵ, for some Ɵ
> 1.
10. What is supervised learning ?
In supervised learning, both the inputs and the outputs are provided. The network then
processes the inputs and compares its resulting outputs against the desired outputs. Errors
are then propagated back through the system, causing the system to adjust the weights
which control the network
11. What is unsupervised learning ?
In an unsupervised learning, the network adapts purely in response to its inputs. Such
networks can learn to pick out structure in their input.
12. What is text mining ?
Text mining is understood as a process of automatically extracting meaningful, useful,
previously unknown and ultimately comprehensible information from textual document
repositories. Text mining can be visualized as consisting of two phases : Text refining
that transforms free-form text documents into a chosen intermediate form, and knowledge
distillation that deduces patterns or knowledge from the intermediate form.
13. Specify the role of an IR system.
The role of an IR system is to retrieve all the documents, which are relevant to a query
while retrieving as few non - relevant documents as possible. IR allows access to whole
documents, whereas, search engines do not.
14. Outline the impact of the web on information retrieval.
Web is a huge, widely-distributed, highly heterogeneous and semi-structured information.
The rapid growth of the Internet, huge information is available on the Web and Web
information retrieval presents additional technical challenges when compared to classic
information retrieval due to the heterogeneity and size of the web.
 Web information retrieval is unique due to the dynamism, variety of languages
used, duplication, high linkage, ill formed query and wide variance in the nature
of users. IR helps users find information that matches their information needs
expressed as queries. Historically, IR is about document retrieval, emphasizing
document as the basic unit.
15. Compare information retrieval and web search.
In information retrieval, databases usually cover only one language or indexing of

documents written in different languages with the same vocabulary. In web search,
documents are in many different languages. Usually search engines use full text indexing;
no additional subject analysis.
PART B (13 Marks)
1. i) Summarize the history of IR. (7)

ii) Explain the purpose of Information Retrieval System.(6)
2. Describe the various components of Information Retrieval System with neat

diagram(13)
3. i) Define Information Retrieval system and its features. (4)

ii) Formulate the working of Search Engine. (9)
4. i) Identify the various issues in IR system. (7)

ii) Examine the various impact of WEB on IR (6)
5. Demonstrate the framework of Open Source Search engine with necessary diagrams.
(13)
6. i) Compare in detail Information Retrieval and Web Search with examples. (8)
ii) Analyze the fundamental concepts involved in IR system. (5)
7. Develop the role of Artificial Intelligence in Information Retrieval Systems. (13)

8. i) Describe the various components of a Search Engine. (8)
ii) Summarize the functions and features of Information Retrieval Systems (5)
9. i) Describe the different stages of IR system. (8)

ii) Estimate the various Search Engine available in current world. (5)
10. i) Demonstrate the working of IR architecture with a diagram. (6)

ii) Infer How Designing Parsing and Scoring functions works in detail. (7)
11. i)Define Information Retrieval. (2)

ii) Describe in detail the IR system, Fundamental concepts, need and purpose of the
system.(4+4+3)
12. Explain how to characterize the web in detail. (13)
13. Explain the different types of computer software used in computer architecture.(13)
14. i) Demonstrate database and Information Retrieval with example (4)
ii) Generalize the Process of Search Engine in detail..( 9)
PART-C ( 15 Marks )
1. Create an open source search engine like Google with suitable functionalities. (15)
2. Evaluate the best search engines other than Google and explain any five of them in
detail. (15)
3. Justify how the AI impact Search and Search Engine optimization. (15)
4. Generalize the Deep Learning and Human Learning capabilities in Future of Search
Engine Optimization. (15)
UNIT II
PART A(2 Marks)
1. What do you mean Information Retrieval Models ?
A retrieval model can be a description of either the computational process or the human
process of retrieval : The process of choosing documents for retrieval; the process by which
information needs are first articulated and then refined.
2. What is cosine similarity ?
This metric is frequently used when trying to determine similarity between two
documents. Since there are more words that are in common between two documents, it is
useless to use the other methods of calculating similarities.
3. What is language model based IR ?
A language model is a probabilistic mechanism for generating text. Language models
estimate the probability distribution of various natural language phenomena
4. Define unigram language.
A unigram (1-gram) language model makes the strong independence assumption that
words are generated independently from a multinomial distribution .
5. What are the characteristics of relevance feedback ?
Characteristics of relevance feedback :
1. It shields the user from the details of the query reformulation process.
2. It breaks down the whole searching task into a sequence of small steps which
are easier to grasp.
3. Provide a controlled process designed to emphasize some terms (relevant
ones) and deemphasize others (non-relevant ones)
6. What are the assumptions of vector space model ?
Assumption of vector space model :

1. The degree of matching can be used to rank-order documents;
2. This rank-ordering corresponds to how well a document satisfying a user's
information needs.
7. What are the disadvantages of Boolean model ?
Disadvantages :
a. It is not simple to translate an information need into a Boolean expression
b. Exact matching may lead to retrieval of too few or too many documents.
c. The retrieved documents are not ranked.
d. The model does not use term weights.
8. Define term frequency
Term frequency (TF): Frequency of occurrence of query keyword in document
9. Explain Luhn's Ideas.
Luhn's basic idea to use various properties of texts, including statistical ones, was
critical in opening handling of input by computers for IR. Automatic input joined the
already automated output.
10. What is a stemming ? Give example.
Conflation algorithms are used in information retrieval systems for matching the
morphological variants of terms for efficient indexing and faster retrieval operations.
The conflation process can be done either manually or automatically. The automatic
conflation operation is also called stemming.
11. What is Recall ?
Recall is the ratio of the number of relevant documents retrieved to the total number
of relevant documents in the collection.
12. What is precision ?
Precision is the ratio of the number of relevant documents retrieved to the total
number of documents retrieved.
13. Explain Latent Semantic Indexing.
Latent Semantic Indexing is a technique that projects queries and documents into a
space with "latent" semantic dimensions. It is statistical method for automatic
indexing and retrieval that attempts to solve the major problems of the current
technology. It is intended to uncover latent semantic structure in the data that is
hidden. It creates a semantic space wherein terms and documents that are associated
are placed near one another.
14. List the retrieval model.
Retrieval models are Boolean model and vector model. Boolean model based on set
theory and Boolean algebra. Vector model is used in information filtering,
information retrieval, indexing and relevancy rankings.
15. Define document preprocessing.
Document pre-processing is the process of incorporating a new document into an
information retrieval system. It is a complex process that leads to the representation
of each document by a select set of index terms.
16. Define an inverted index.
An inverted index is an index into a set of documents of the words in the documents.
The index is accessed by some search method. Each index entry gives the word and a
list of documents, possibly with locations within the documents, where the word
occurs. The inverted index data structure is a central component of a typical search
engine indexing Algorithm
17. What is Zone index ?
A zone is a region of the document that can contain an arbitrary amount of text, e.g.,
Title, Abstract and References. Build inverted indexes on zones as well to permit
querying. Zones are similar to fields, except the contents of a zone can be arbitrary
free text.
18. State Bayes rule.
Bayes' theorem is a method to revise the probability of an event given additional

information. Bayes's theorem calculates a conditional probability called a posterior or
revised probability. Bayes' theorem is a result in probability theory that relates
conditional probabilities. If A and B denote two events, P(A|B) denotes the
conditional probability of A occurring, given that B occurs. The two conditional
probabilities P(A|B) and P(B|A) are in general different.
PART B ( 13 Marks)
1. i) Express what is Boolean retrieval model. (4)
ii) Describe the document preprocessing steps in detail (9)
2. Illustrate the Vector space retrieval model with example (13)
3. Describe about basic concepts of Cosine similarity. (13)
4. Develop on example to implement term weighting .(min docs = 5) (13)
5. i) Tabulate the common preprocessing steps. (4)
ii) Discuss the Boolean retrieval in detail with diagram..(9)
6. i) Discuss in detail about term frequency and Inverse Document Frequency. (7)
ii)Compute TF-IDF .given a document containing terms with the given frequencies
A(3) ,B(2), C(1).Assume document collections 10,000 and document frequencies of these
terms are A(50), B(1300), C(250) (6)
7. i)Explain Latent Semantic Indexing and latent semantic space with an illustration. (9)
ii) Analyze the use of LSI in Information Retrieval. What is its need in synonyms and
semantic relatedness.(4)
8. i)Examine, how to form a binary term - document incidence matrix (7)
ii) Give an example for the above. (6)
9. Describe document preprocessing and its stages in detail. (13)

10. i) Discuss the structure of inverted indices and the basic Boolean Retrieval model (7)
ii)Discuss the searching process in inverted file (6)
11. i) Why do we need sparse vectors? (4)

ii)Estimate sparse vectors and its efficiency with diagram.(9)
12. i) Analyze the language model based IR and its probabilistic representation. (7)
ii)Compare Language model vs Naive Bayes and Language model vs Vector space
model (6)
13. i) Explain in detail about binary independence model for Probability Ranking
Principle(PRP). (7)
ii) Analyze how the query generation probability for query likelihood model can be
estimated.(6)
14. i) Apply how Probabilistic approaches to Information Retrieval are done. (7)
ii) Illustrate the following
a) Probabilistic relevance feedback. (2)
b) Pseudo relevance feedback. (2)
c) Indirect relevance feedback (2)
PART C ( 15 Marks)
1. Compose the information Retrieval services of the internet with suitable design. (15)
2. Assess the best Language model to computational linguistics for investigating the use of
software to translate text or speech from one language to another. (15)
3. Contrast the uses of probabilistic IR in indexing the search in the internet. (15)
4. Create a Relevance feedback mechanism for your college website search in the internet.
(15)
UNIT III
PART A ( 2 Marks)
1. Mention types of classifier techniques.
Types of classifier techniques are back-propagation, support vector machines, and k-

nearest-neighbor classifiers.
2. What is called Bayesian classification ?
Bayesian classifiers are statistical classifiers. They can predict class membership
probabilities, such as the probability that a given tuple belongs to a particular class.
3. Define decision tree.
A decision tree is a tree where each node represents a feature(attribute), each link(branch)
represents a decision(rule) and each leaf represents an outcome(categorical or continues
value). A decision tree or a classification tree is a tree in which each internal node is
labeled with an input feature. The arcs coming from a node labeled with a feature are
labeled with each of the possible values of the feature.
4. Define information gain.
 Entropy measures the impurity of a collection. Information Gain is defined in
terms of Entropy.
 Information gain tells us how important a given attribute of the feature vectors is.
 Information gain of attribute A is the reduction in entropy caused by partitioning
the set of examples S.
where Values (A) is the set of all possible values for attribute A and Sv is the subset of S
for which attribute A has value v.
5. Define pre pruning and post pruning.
 In prepruning, a tree is “pruned” by halting its construction early. Upon halting,

the node becomes a leaf. The leaf may hold the most frequent class among the
subset tuples or the probability distribution of those tuples.
 In the postpruning, it removes subtrees from a “fully grown” tree. A subtree at a
given node is pruned by removing its branches and replacing it with a leaf. The
leaf is labeled with the most frequent class among the subtree being replaced.
6. Why tree pruning useful in decision tree induction ?
When a decision tree is built, many of the branches will reflect anomalies in the training
data due to noise or outliers. Tree pruning methods address this problem of overfitting the
data. Such methods typically use statistical measures to remove the least reliable
branches.
7. What is tree pruning ?
Tree pruning attempts to identify and remove such branches, with the goal of improving
classification accuracy on unseen data.
8. What are Bayesian classifiers ?
Bayesian classifiers are statistical classifiers. They can predict class membership
probabilities, such as the probability that a given tuple belongs to a particular class.
9. What is meant by naive Bayes classifier ?
A naive Bayes classifier is an algorithm that uses Bayes' theorem to classify objects.
Naive Bayes classifiers assume strong, or naive, independence between attributes of data
points.
10. What are the characteristics of k-nearest neighbors algorithm ?
Characteristics :
 The unknown tuple is assigned the most common class among its k nearest
neighbours.
 Nearest-neighbor classifiers use distance-based comparisons that intrinsically
assign equal weight to each attribute.
 Nearest-neighbor classifiers can be extremely slow when classifying test tuples.
 Distance metric is calculated by using Euclidean distance and Manhattan distance.
 It does not use model building.
 It relies on local information.
11. What is dimensionality reduction ?
In dimensionality reduction, data encoding or transformations are applied so as to obtain
a reduced or “compressed” representation of the original data. If the original data can be
reconstructed from the compressed data without any loss of information, the data
reduction is called lossless.
12. Define similarity.
The similarity between two objects is a numerical measure of the degree to which the two
objects are alike. Similarities are usually non-negative and are often between 0 and 1. A
small distance indicating a high degree of similarity and a large distance indicating a low
degree of similarity.
13. Define an inverted index.
An inverted index is an index into a set of documents of the words in the documents. The
index is accessed by some search method. Each index entry gives the word and a list of
documents, possibly with locations within the documents, where the word occurs. The
inverted index data structure is a central component of a typical search engine indexing
algorithm.
14. What is zone index ?
A zone is a region of the document that can contain an arbitrary amount of text, e.g.,
Title, Abstract and References. Build inverted indexes on zones as well to permit
querying. Zones are similar to fields, except the contents of a zone can be arbitrary
freetext.
PART B (13 Marks )
1. (i) Define Topic detection and tracking, Clustering in TDT. (4)

(ii) Examine in detail about Cluster Analysis in Text Clustering.(9)
2. Define Clustering in Metric space with application to Information Retrieval (13)

3. (i) Evaluate the Agglomerative Clustering and HAC in detail. (7)
(ii) Discuss the Types of data and evaluate it using any one clustering techniques. (6)
4. i) Summarize on Clustering Algorithms. (6)

ii) Evaluate on the various classification methods of Text. (7)
5. i) Analyze the working of Nearest Neighbor algorithm along with one representation.
(7)
ii) Analyze the K-Means Clustering method and the problems in it. (6)
6. Analyze about Decision Tree Algorithm with illustration. (13)

7. Examine Inverted index and Forward index (13)
8. i) Discuss in detail about Text Classification. (7)
ii) Explain B Tree Index in detail (6)
9. i) Apply Naïve Bayes Algorithm for an example. (7)

ii) Demonstrate its working in detail. (6)
10. Analyse single Dimension Index in detail (13)
11. Define Knuth Morris Pratt algorithm in detail (13)
12. i) Construct B+ Tree Index in detail (6)
ii) Summarize the significance of SVM classifier in detail. (7)
13. Examine Single dimensional and multi-dimensional index. (13)

14. Explain Sequential search in detail (13)
PART C ( 15 Marks)
1. (i). Rank the impacts of Categorization and clustering of text in the mining with the
suitable examples. (8)
(ii). Detailed about KNN Classifier (7)
2. Design a Plan to overcome the gap in decision theoretic approach for evaluation in
text mining. (15)
3. Compare two types of Dimensional Index in detail with example (15)
4. Estimate R Tree index and R+ Tree index (15)
UNIT IV
PART A ( 2 Marks)
1. Define web server.
A web server is a computer connected to the Internet that runs a program that takes
responsibility for storing, retrieving and distributing some of the web files
2. What is web browsers?
A web browser is a program. Web browser is used to communicate with web servers
on the Internet, which enables it to download and display the web pages. Netscape
Navigator and Microsoft Internet Explorer are the most popular browser software's
available in market.
3. Explain paid submission of search services.
In paid submission, user submit website for review by a search service for a preset fee
with the expectation that the site will be accepted and included in that company's
search engine, provided it meets the stated guidelines for submission. Yahoo! is the
major search engine that accepts this type of submission. While paid submissions
guarantee a timely review of the submitted site and notice of acceptance or rejection,
you're not guaranteed inclusion or a particular placement order in the listings.
4. Explain paid inclusion programs of search services.
Paid inclusion programs allow you to submit your website for guaranteed inclusion in
a search engine's database of listings for a set period of time. While paid inclusion
guarantees indexing of submitted pages or sites in a search database, you're not
guaranteed that the pages will rank well for particular queries.
5. Define search engine optimization.
Search Engine Optimization (SEO) is the act of modifying a website to increase its
ranking in organic (vs paid), crawler-based listings of search engines. There are
several ways to increase the visibility of your website through the major search
engines on the Internet today. The two most common forms of Internet marketing
Paid(Sponsored) Placement and Natural Placement
6. What is the purpose of web crawler ?
A web crawler is a program which browses the World Wide Web in a methodical,
automated manner. Web crawlers are mainly used to create a copy of all the visited
pages for later processing by a search engine that will index the downloaded pages to
provide fast searches.
7. Define focused crawler.
A focused crawler or topical crawler is a web crawler that attempts to download only
web pages that are relevant to a pre-defined topic or set of topics.
8. What are the near-duplicate detection?
Near-duplicate detection is the task of identifying documents with almost identical

content. Near-duplicate web documents are abundant. Two such documents differ
from each other in a very small portion that displays advertisements, for example.
Such differences are irrelevant ant for web search
9. Define web crawling.
A web crawler is a program which browses the World Wide Web in a methodical,
automated manner. Web crawlers are mainly used to create a copy of all the visited
pages for later processing by a search engine that will index the downloaded pages to
provide fast searches.
10. What are politeness policies used in web crawling ?
Politeness policies are as follows :

a. To adjust the crawl frequency
b. To revisit pages with the same frequency, ignoring the change rate of individual
pages.
11. What is snippet generation ? Snippet generation is a special type of extractive

document summarization, in which sentences are selected for inclusion in the
summary on the basis of the degree to which they match the search query. This
process was given the name of query based summarization.
12. What is PageRank ?
A method for rating the importance of web pages objectively and mechanically using
the link structure of the web
13. Define dangling link.
This occurs when a page contains a link such that the hypertext points to a page with
no outgoing links. Such a link is known as dangling link.
14. Define snippets.
Snippets are short fragments of text extracted from the document content or its
metadata. They may be static or query based. In static snippet, it always shows the
first 50 words of the document, or the content of its description metadata, or a
description taken from a directory site such as dmoz.org. A query-biased snippet is
one selectively extracted on the basis of its relation to the searcher's query.
15. Define hubs.
Hubs are index pages that provide lots of useful links to relevant content pages(topic
authorities). Hub pages for IR are included in the home page.
16. Define authorities.
Authorities are pages that are recognized as providing significant, trustworthy, and
useful information on a topic. In-degree (number of pointers to a page) is one simple
measure of authority. However in-degree treats all links as equal.
PART-B ( 13 Marks)
1. Demonstrate the Search Engine Optimization/SPAM in detail.(13)

2. i) Describe in detail about Vector space model for XML Retrieval. (9)
ii)What is Structured and Unstructured Retrieval.(4)
3. i) List the types of Search Engine and explain them. (7)
ii) Distinguish visual vs programmatic crawler. (6)
4. Design and develop a Web search Architecture and the components of search engine
and its issues.(13)
5. i) What is P4P? Elaborate on Paid Placement. (7)
ii) Describe the structure of WEB and its characteristics (6)
6. i) Summarize on the working of WEB CRAWLER with its diagram. (8)
ii) Explain the working of Search Engine. (5)
7. i) Differentiate meta crawler and focused crawler. (8)

ii) Analyze on URL normalization.(5)
8. Recommend the need for Near-Duplication Detection by the way of finger print
algorithm. (13)
9. i) Examine the behavior of web crawler and the outcome of crawling policies. (5)
ii) Illustrate the following (8)
a) Focused Crawling
b) Deep web
c) Distributed crawling
d) Site map
10. i) Explain the overview of Web search. (8)
ii) What is the purpose of Web indexing? (5)
11. Summarize the process of index compression in detail.(13)

12. (i) Examine the need for Web Search Engine. (6)
(ii) List the challenges in data traversing by search engine and how will you
overcome it.(7)
13. (i) Based on the Application of Search Engines, How will you categorize them and
what are the issues faced by them? (9)
(ii) Demonstrate about Search Engine Optimization. (4)
14. Describe the following with example.
i) Bag of Words and Shingling (7)
ii) Hashing, Min Hash and Sim Hash (6)
PART C (15 Marks)
1. Develop a web search structure for searching a newly hosted web domain by the naïve
user with step by step procedure. (15)
2. i) Grade the optimization techniques available for search engine and rank them by your
justification. (9)
ii) Explain Web Crawler Taxonomy in detail (6)
3. Estimate the web crawling methods and illustrate how do the various nodes of a
distributed crawler communicate and share URLs? (15)
4. Formulate the application of Near Duplicate Document Detection techniques and also
Generalize the advantages in Plagiarism checking. (15)
5.
UNIT V
PART A ( 2 Marks)
1. What is collaborative filtering ?
Collaborative filtering is a method of making automatic predictions (filtering) about the

interests of a single user by collecting preferences or taste information from many users
(collaborating). It uses given rating data by many users for many items as the basis for
predicting missing ratings and/or for creating a top-N recommendation list for a given
user, called the active user.
2. What do you mean by item-based collaborative filtering ?
Item-based CF is a model-based approach which produces recommendations based on the

relationship between items inferred from the rating matrix. The assumption behind this
approach is that users will prefer items that are similar to other items they like.
3. What are problem of user based CF ?
The two main problems of user-based CF are that the whole user database has to be kept
in memory and that expensive similarity computation between the active user and all
other users in the database has to be performed.
4. Define user based collaborative filtering.
User-based collaborative filtering algorithms work off the premise that if a user (A) has a
similar profile to another user (B), then A is more likely to prefer things that B prefers
when compared with a user chosen at random.
5. What are the characteristics of relevance feedback ?
Characteristics of relevance feedback :

1. It shields the user from the details of the query reformulation process.
2. It breaks down the whole searching task into a sequence of small steps which are
easier to grasp.
3. Provide a controlled process designed to emphasize some terms (relevant ones)
and deemphasize others (non-relevant ones)
6. Write goal of recommender system.
The goal of a recommender system is to generate meaningful recommendations to a

collection of users for items or products that might interest them.
7. Define recommender systems.
Recommender Systems are software tools and techniques providing suggestions for items
to be of use to a user. The suggestions relate to various decision-making processes, such
as what items to buy, what music to listen to, or what online news to read.
8. What is demographic based recommender system ?
This type of recommendation system categorizes users based on a set of demographic
classes. This algorithm requires market research data to fully implement. The main
benefit is that it doesn't need history of user ratings.
9. What is Singular Value Decomposition (SVD) ?
SVD is a matrix factorization technique that is usually used to reduce the number of
features of a data set by reducing space dimensions from N to K where K < N.
10. What is Content-based recommender ?
Content-based recommenders refer to such approaches, that provide recommendations by
comparing representations of content describing an item to representations of content that
interests the user. These approaches are sometimes also referred to as content-based
filtering.
11. What is matrix factorization model ?
Matrix factorization is a class of collaborative filtering algorithms used in recommender

systems. Matrix factorization algorithms work by decomposing the user-item interaction
matrix into the product of two lower dimensionality rectangular matrices
PART B ( 13 Marks)
1. Define Recommendation based on User Ratings using appropriate example. (13)

2. I) Explain Recommender system. (4)
ii) Explain the techniques of Matrix Factorization (9)
3. Explain the different types of recommendation system.
i) Hybrid Recommendation System (3)
ii) Content Based Recommendation System (3)
iii) Collaborative Recommendation System (3)
iv) Knowledge Based Recommendation System (4)
4. Estimate the Content based recommendation system (13)

5. Differentiate collaborative filtering and content based systems. (13)
6. i) Explain about High Level Architecture (6)
ii) Explain the significance of Collaborative Filtering in detail. (7)
7. Illustrate the advantages and disadvantages of Content based and collaborative filtering
recommendation system (13)
8. Describe Knowledge based recommendation system in detail (13)
9. i) Detailed the rules of HLA (7)
ii) Difference between Hybrid and Collaborative Recommendation (6)
10. i) Describe common HLA terminologies. (3)

ii) Define the steps involved in Collaborative Filtering (10)
11. i) Describe web based recommendation system (7)

ii) When can Collaborative Filtering be used? (6)
12. Define in detail about Matrix factorization models (13)

13. Discuss Neighbouring model in detail (13)
14. i) Explain is Matrix Factorization? (4)
ii) Discuss the approaches of recommender system. (9)
PART C ( 15 Marks)
1. Narrate in detail about a model for Recommendation system. (15)

2. Discuss in detail about High Level Architecture and also common Terminologies in
HLA. (15)
3. Classify Recommendation techniques with examples (15)
4. i) Design Matrix factorization model (8)
ii) Detail about Neighbouring models in detail (7)

Irt 2 Marks With Answer

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Irt 2 Marks With Answer

Uploaded by

Copyright:

Available Formats

PRATHYUSHA ENGINEERING COLLEGE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

3. List and explain components of IR block diagram.

Two types of natural language technology can be useful in information retrieval :

6. What is search engine ?

In information retrieval, databases usually cover only one language or indexing of

PART B (13 Marks)

1. i) Summarize the history of IR. (7)

2. Describe the various components of Information Retrieval System with neat

3. i) Define Information Retrieval system and its features. (4)

4. i) Identify the various issues in IR system. (7)

7. Develop the role of Artificial Intelligence in Information Retrieval Systems. (13)

9. i) Describe the different stages of IR system. (8)

10. i) Demonstrate the working of IR architecture with a diagram. (6)

11. i)Define Information Retrieval. (2)

PART A(2 Marks)

1. What do you mean Information Retrieval Models ?

Assumption of vector space model :

8. Define term frequency

Term frequency (TF): Frequency of occurrence of query keyword in document

9. Explain Luhn's Ideas.

Bayes' theorem is a method to revise the probability of an event given additional

9. Describe document preprocessing and its stages in detail. (13)

11. i) Why do we need sparse vectors? (4)

1. Mention types of classifier techniques.

Types of classifier techniques are back-propagation, support vector machines, and k-

 In prepruning, a tree is “pruned” by halting its construction early. Upon halting,

PART B (13 Marks )

1. (i) Define Topic detection and tracking, Clustering in TDT. (4)

2. Define Clustering in Metric space with application to Information Retrieval (13)

4. i) Summarize on Clustering Algorithms. (6)

6. Analyze about Decision Tree Algorithm with illustration. (13)

9. i) Apply Naïve Bayes Algorithm for an example. (7)

13. Examine Single dimensional and multi-dimensional index. (13)

1. Define web server.

Near-duplicate detection is the task of identifying documents with almost identical

Politeness policies are as follows :

11. What is snippet generation ? Snippet generation is a special type of extractive

1. Demonstrate the Search Engine Optimization/SPAM in detail.(13)

7. i) Differentiate meta crawler and focused crawler. (8)

11. Summarize the process of index compression in detail.(13)

PART C (15 Marks)

1. What is collaborative filtering ?

Collaborative filtering is a method of making automatic predictions (filtering) about the

Item-based CF is a model-based approach which produces recommendations based on the

Characteristics of relevance feedback :

The goal of a recommender system is to generate meaningful recommendations to a

Matrix factorization is a class of collaborative filtering algorithms used in recommender

1. Define Recommendation based on User Ratings using appropriate example. (13)

4. Estimate the Content based recommendation system (13)

10. i) Describe common HLA terminologies. (3)

11. i) Describe web based recommendation system (7)

12. Define in detail about Matrix factorization models (13)

1. Narrate in detail about a model for Recommendation system. (15)

You might also like