You are on page 1of 4

Running head: ASSIGNMENT

Assignment:

Student Name:

University:

Course Name:

Instructor:

Date:
ASSIGNMENT 2

1. Clustering is a technique of identifying objects that depicts some similarity then labeling

the objects to the same group. In the search engine jargon, clustering is a vital technique

that aids the users in getting what they are searching for by grouping the search results by

topics or categories. Via clustering, the search engine is able to compute similarities

between documents and perform clustering based on the semantics of the retrieved

documents. To achieve this, the on submitting the query to a search engine, the search

engine pre-processes the documents from the search results and extracts features from

each of the document. Then, the search engine applies clustering algorithm based on the

similarity matrix so as to obtain the clusters. For example, if the user inputs the keywork

paintbrush, the search engine should display those results that contain the keyword of

“paintbrush” as well as those with the keywords of “paint”, “brush”, “canvas’, etc.

Classification is the technique that is used in identifying and classifying each data

element into a group or class based on set of predefined rules. Classification makes use of

different mechanisms such as a decision tree, linear programming, statistics, and neural

network. In search engine settings, classification can be deployed in aiding users to find a

given information more quickly via organizing the information in meaningful manner.

For example, the search engine can display a list of documents that are related to a given

search keyword.

Anomaly detection is a technique used in detecting the surprising behavior hidden within

the data. In the jargon of search engines, anomaly detection is used in filtering the search

engine spam based on anomaly detection technique. i.e. on submitting the search query,

the search engine assesses the relevance of a webpage’s content based on the keywords

provided and provides the user with the most relevant information. Any webpage that is
ASSIGNMENT 3

not related to the keyword provided is treated as an anomaly and is not displayed to the

user.

Association rule mining is a technique that is used in discovering the association rules

showing the attribute-value conditions inherent within a given dataset. So, based on the

provided keyword by the user, the search engine can provide more details. For example,

if the user searches for “car engine”, the search engine can provide more information

such as the “engine fluid” among others.

2. Color presents numerous benefits especially in visualization. For example, the visualizer

can make use of different colors so as to differentiate the different variables being used.

Also, the presenter may make use of different colors to bring the notion of emphasis on a

given idea. This can be achieved by using eye-catching colors such as red or via

boldening text.

Color may present several drawbacks while using it for visualization. For example,

excessive use of color to present the variables could result into a noisy and un-

understandable. This could result into more confusion and misinterpretation of data by

the audience. More so, misuse of colors such as inconsistency of color use, excessive use

among other could be devastating to the audience and may result in communicating the

wrong message.

3. In this scenario, it is possible for a data to consist only of anomalous objects. This is

because, before deciding the anomalous algorithm to deploy, the data scientist should

first familiarize with the data and understand its nature i.e. the algorithm to be used in

detecting the anomaly should consider both the behavioral and contextual attributes of the

data. For example, the data scientist should understand the spatial data attributes
ASSIGNMENT 4

structured attributes, and behavioral characteristics of the objects so as to evaluate an

outlier in the context to which it belongs.

4. The instance that considering the non-zero values might give a more accurate view of the

objects than considering the actual magnitude of the values is in the market basket. In the

market basket instance, considering only the presence of non-zeros values, one can only

conclude the association of a small list of items instead of evaluating the entire store

inventory list. In clustering analysis considering the presence of non-zeros values may

not be desirable especially if the actual magnitude of the results is to be considered.

5. The shape of the cluster which consists of all documents with the cosine similarity to a

centroid would be almost circular but not completely circular. The data points will be

located at the edges of the cluster.

You might also like